如何在TRIE中找到最长的字符串

Question

如何在TRIE中找到最长的字符串

3

我已经通过两个类Trie和TrieNode实现了一个DS Trie。我需要编写一个函数，在O(h)时间复杂度内返回Trie中最长的字符串。我的TrieNode有一个LinkedList字段，用于存储每个节点的子节点。由于我们还没有学习BFS或DFS，所以我正在尝试想出一些创造性的方法来解决它。

我已经有一个独立的函数，用于插入/创建给定字符的新节点：在构建Trie时，创建一个带有'maxDepth=0'字段的节点，表示我的当前深度。对于我创建的每个新节点，我将迭代直至其父节点（每个节点都已经有指向其父节点的指针），并一直进行，直到达到根节点，并将其父节点的深度增加1。现在我将创建一个函数，通过以下方式返回最长的字符串：对于每个节点，遍历我的子节点，查找最大的整数“maxDepth”，然后向下移动，直到达到'maxDepth==0'。例如，我的算法将对此字符串正常工作："aacgace"。

       root      
       / \
   (2)a   g(0)     
     / 
 (1)c        
   / 
(0)e

=> "ace"实际上是最长的字符串。但对于这个字符串"aacgae"不起作用。

      root      
      /  \
   (2)a   g(0)     
    /  \
 (0)c  (0)e

=> 看起来节点'a'有一个子节点，而这个子节点也有一个子节点，但事实并非如此。

一般来说，我想要利用第一个创建Trie的函数（运行时间：O(h*c)），这样第二个返回最长字符串的函数的运行时间就尽可能少了。O(h)

- user3075653

那是一种相当奇怪的使用字典树的方式... - fge

首先，你的 Trie 中有哪些单词？ - fge

重要吗？整个项目是使用Trie编码文本文件，因此单词可以是任何英文字母。 - user3075653

那么我就不明白你是如何构建你的trie树的。我有一个trie树实现，看看我是如何做的。请注意，你也可以尝试一下基数树。 - fge

你不能使用广度优先遍历来找到最深的节点，然后从那里向上走到根节点来获取字符串吗？ - Matthew Mcveigh

与论坛网站不同，我们不在[so]上使用“感谢”、“任何帮助都会感激”或签名。请参阅“应该从帖子中删除'Hi'、'thanks'、标语和问候语吗？”。顺便说一下，“Thanks in advance”是“提前感谢”，而不是“Thanks in advanced”。 - John Saunders

2个回答

1

好的，你想得很正确 - 如果你想找到最长的字符串而不用迭代整个树，就必须在构建树时存储一些信息。
假设对于节点i，我们存储在max_depth[i]中的最大长度，并记住具有最大长度的子节点是max_child[i]。因此，对于你插入trie的每个新单词，记住你插入的最后一个节点（它也是表示你字符串的最后一个字符的新叶子），然后执行以下操作：

current = last_inserted_leaf
while (current != root):
    if max_depth[parent[current]] < max_depth[current] + 1:
        max_depth[parent[current]] = max_depth[current] + 1
        max_child[parent[current]] = current
    current = parent[current]

现在，要输出最长的字符串，只需执行以下操作：

current = root
while is_not_leaf(current):
    answer += char_of_child[max_child[current]]
    current = max_child[current]
return answer

因此，插入操作需要2*n = O(n)次操作，查找最长字符串需要O(h)，其中h是最长字符串的长度。

然而，上述算法需要额外的O(n)内存，这太多了。最简单的方法是在某个地方存储一个max_string，并且每次将字符串添加到trie中时，只需比较new_string的长度和max_string的长度，如果新长度更大，则将max_string = new_string。它将占用更少的内存，并且可以在O(1)时间内找到最长的字符串。

- Chan Kha Vu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- fge · Accepted Answer

我不确定你想要做什么，但是你可以在这里找到一个trie的例子。

基本上，我通过一个构建器来创建trie。让我们简单地了解一下如何将一个单词添加到trie中：

// In TrieBuilder
final TrieNodeBuilder nodeBuilder = new TrieNodeBuilder();

// ...

/**
 * Add one word to the trie
 *
 * @param word the word to add
 * @return this
 * @throws IllegalArgumentException word is empty
 */
public TrieBuilder addWord(@Nonnull final String word)
{
    Objects.requireNonNull(word);

    final int length = word.length();

    if (length == 0)
        throw new IllegalArgumentException("a trie cannot have empty "
            + "strings (use EMPTY instead)");
    nrWords++;
    maxLength = Math.max(maxLength, length);
    nodeBuilder.addWord(word);
    return this;
}

这会将单词添加到TrieNodeBuilder中，具体步骤如下：

private boolean fullWord = false;

private final Map<Character, TrieNodeBuilder> subnodes
    = new TreeMap<>();

TrieNodeBuilder addWord(final String word)
{
    doAddWord(CharBuffer.wrap(word));
    return this;
}

/**
 * Add a word
 *
 * <p>Here also, a {@link CharBuffer} is used, which changes position as we
 * progress into building the tree, character by character, node by node.
 * </p>
 *
 * <p>If the buffer is "empty" when entering this method, it means a match
 * must be recorded (see {@link #fullWord}).</p>
 *
 * @param buffer the buffer (never null)
 */
private void doAddWord(final CharBuffer buffer)
{
    if (!buffer.hasRemaining()) {
        fullWord = true;
        return;
    }

    final char c = buffer.get();
    TrieNodeBuilder builder = subnodes.get(c);
    if (builder == null) {
        builder = new TrieNodeBuilder();
        subnodes.put(c, builder);
    }
    builder.doAddWord(buffer);
}

假设我们将“trouble”和“troubling”都添加到trie中，会发生以下情况：

第一次，为“trouble”的每个字符创建节点；
第二次，所有节点直到“l”存在；然后为“ing”创建所有节点。

现在，如果我们添加“troubles”，则在“e”后面的“s”将创建另一个节点。

fullWord变量告诉我们这里是否有潜在的完全匹配；这是搜索函数：

public final class Trie
{
    private final int nrWords;
    private final int maxLength;
    private final TrieNode node;

    // ...

    /**
     * Search for a string into this trie
     *
     * @param needle the string to search
     * @return the length of the match (ie, the string) or -1 if not found
     */
    public int search(final String needle)
    {
        return node.search(needle);
    }
    // ...
}

在TrieNode中，我们有：

public final class TrieNode
{
    private final boolean fullWord;

    private final char[] nextChars;
    private final TrieNode[] nextNodes;

    // ...

    public int search(final String needle)
    {
        return doSearch(CharBuffer.wrap(needle), fullWord ? 0 : -1, 0);
    }

    /**
     * Core search method
     *
     * <p>This method uses a {@link CharBuffer} to perform searches, and changes
     * this buffer's position as the match progresses. The two other arguments
     * are the depth of the current search (ie the number of nodes visited
     * since root) and the index of the last node where a match was found (ie
     * the last node where {@link #fullWord} was true.</p>
     *
     * @param buffer the charbuffer
     * @param matchedLength the last matched length (-1 if no match yet)
     * @param currentLength the current length walked by the trie
     * @return the length of the match found, -1 otherwise
     */
    private int doSearch(final CharBuffer buffer, final int matchedLength,
        final int currentLength)
    {
        /*
         * Try and see if there is a possible match here; there is if "fullword"
         * is true, in this case the next "matchedLength" argument to a possible
         * child call will be the current length.
         */
        final int nextLength = fullWord ? currentLength : matchedLength;


        /*
         * If there is nothing left in the buffer, we have a match.
         */
        if (!buffer.hasRemaining())
            return nextLength;

        /*
         * OK, there is at least one character remaining, so pick it up and see
         * whether it is in the list of our children...
         */
        final int index = Arrays.binarySearch(nextChars, buffer.get());

        /*
         * If not, we return the last good match; if yes, we call this same
         * method on the matching child node with the (possibly new) matched
         * length as an argument and a depth increased by 1.
         */
        return index < 0
            ? nextLength
            : nextNodes[index].doSearch(buffer, nextLength, currentLength + 1);
    }
}

请注意，在第一次调用doSearch()时，“nextLength”参数传递了-1。

假设我们有一个包含上述三个单词的trie树，以下是搜索“tr”（失败）的调用序列：

- doSearch("tr", -1, 0) （节点为根）； - doSearch("tr", -1, 1) （节点为't'）； - doSearch("tr", -1, 2) （节点为'r'）； - 没有下一个字符：返回nextLength；nextLength为-1，没有匹配项。

现在，如果我们有“troubles”：

- doSearch("troubles", -1, 0) （节点为根）； - doSearch("troubles", -1, 1) （节点为't'）； - doSearch("troubles", -1, 2) （节点为'r'）； - doSearch("troubles", -1, 3) （节点为'o'）； - doSearch("troubles", -1, 4) （节点为'u'）； - doSearch("troubles", -1, 5) （节点为'b'）； - doSearch("troubles", -1, 6) （节点为'l'）； - doSearch("troubles", -1, 7) （节点为'e'）； - doSearch("troubles", 7, 8) （fullword为true！节点为's'）； - 没有下一个字符：返回nextLength，即8；我们有一个匹配项。