我在Java中如何找到一个标准的基于Trie的映射实现？

Question

我在Java中如何找到一个标准的基于Trie的映射实现？

javaalgorithmoptimizationtrie

80

我有一个Java程序，它存储了许多从字符串到各种对象的映射。目前，我的选择是依靠哈希（通过HashMap）或二进制搜索（通过TreeMap）。我想知道是否有一种高效且标准的基于trie（字典树）的映射实现在流行和优质的集合库中？

过去我曾经自己编写过，但如果有标准的东西可用，我宁愿使用那个。

快速澄清：虽然我的问题很普遍，但在当前项目中，我处理着许多由完全限定类名或方法签名索引的数据。因此，有许多共享前缀。

- Uri

这些字符串是预先知道的吗？它们只需要通过字符串访问吗？ - sfossen

14个回答

10

Apache Commons Collections v4.0现在支持trie结构。

更多信息请参见org.apache.commons.collections4.trie包的说明文档。特别是要查看PatriciaTrie类：

PATRICIA Trie（Practical Algorithm to Retrieve Information Coded in Alphanumeric）的实现。

PATRICIA Trie是一种压缩Trie。PATRICIA不是在Trie的边缘存储所有数据（且具有空的内部节点），而是在每个节点中存储数据。这允许非常高效地进行遍历、插入、删除、前驱、后继、前缀、范围和select(Object)操作。所有操作最坏情况下的时间复杂度为O(K)，其中K是树中最大项的比特数。在实践中，操作实际上需要O(A(K))的时间，其中A(K)是树中所有项的平均比特数。

- Duncan Jones

10

Java核心库中没有Trie数据结构。

可能是因为Trie通常被设计用于存储字符串，而Java数据结构更普遍，通常保存任何Object（定义相等性和哈希操作），尽管有时限制为Comparable对象（定义顺序）。没有“符号序列”的通用抽象，虽然CharSequence适用于字符串，我想你可以使用Iterable处理其他类型的符号。

还有一个要考虑的问题是，当尝试在Java中实现传统的Trie时，你很快就会面临这样一个事实：Java支持Unicode。要获得任何形式的空间效率，则必须将Trie中的字符串限制为某些符号的子集，或放弃按符号索引的数组存储子节点的传统方法。这可能是为什么Trie不被认为足够通用以包含在核心库中的另一个原因，如果你自己实现或使用第三方库，需要注意这一点。

- erickson

这个答案假设我想要为字符串实现一个trie。Trie是一种通用数据结构，能够容纳任意序列并提供快速的前缀查找。 - Paul Draper

1

@PaulDraper 这个答案并不假设你想要什么，因为你是在问题被提出多年后才出现的。而且由于这个问题特别涉及到字符串，所以这个答案的重点就是它。尽管我花了很多时间指出一个Java trie需要被推广到任何类型的“Comparable”。 - erickson

7

同时也请查看concurrent-trees。它们支持基数树和后缀树，并专为高并发环境设计。

- Alex Beardsley

3

截至2014年，这应该是被接受的答案。看起来是一个维护良好、经过充分测试的trie并发实现。 - knub

3

我在这里写了一个简单快速的实现，这里提供下载。

- Melinda Green

我很想喜欢这个，但是你的每个节点需要1024字节，并且只代表一个字符。此外，由于Java更改了substring()的语义，插入现在需要O(n^2)的时间。这个实现真的不太实用。 - Stefan Reich

@Stefan Reich，那个数组空间仅用于内部节点，考虑到Trie树的快速扩展，这个空间非常小。 - Melinda Green

谢谢你的回答，但我还没有被说服。尝试可能不会快速分支，实际上使用真实数据时可能不会。您的数组扫描内容也很慢。我们应该真正使用 Patricia Tries 来使事情紧凑高效。我已经制作了自己的实现，很快就会在这里发布。没有恶意，只是想优化 :) 许多问候 - Stefan Reich

我的尝试只能快速扩展，因为冗余已被分解并存储在“前缀”成员中。根据您要优化的内容，可以有许多不同的实现方式。在我的情况下，我旨在实现简单而实用的目标。 - Melinda Green

1

啊，我误解了代码的那一部分。有太多的“对象”和强制转换，所以我没有看到它。所以这是一个Patricia Trie。我的错。 - Stefan Reich

没问题。至少我学到了什么是Patricia Trie。尽管我试图保持简单，但它似乎是一个重要的优化。最好的部分是它可以正确处理删除操作，并且所有内容都有很好的文档记录。 - Melinda Green

1

Apache的常用集合：org.apache.commons.collections4.trie.PatriciaTrie

- Ilya Kharlamov

复制此答案：https://dev59.com/-nRB5IYBdhLWcg3wbGxB#26465078 - GlenPeterson

1

你可以尝试使用Completely Java库，它提供了PatriciaTrie的实现。该API简单易用，并且可以在Maven中央仓库中获取。

- Filipe Miguel Fonseca

1

你需要的是 org.apache.commons.collections.FastTreeMap，我认为。

- Andrew Dashin

这似乎不是一个 trie 实现。 - Duncan Jones

1

下面是一个基本HashMap实现的Trie。有些人可能会发现这很有用...

class Trie {

    HashMap<Character, HashMap> root;

    public Trie() {
        root = new HashMap<Character, HashMap>();
    }

    public void addWord(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter) == false) {
                node.put(currentLetter, new HashMap<Character, HashMap>());
            }
            node = node.get(currentLetter);
        }
    }

    public boolean containsPrefix(String word) {
        HashMap<Character, HashMap> node = root;
        for (int i = 0; i < word.length(); i++) {
            Character currentLetter = word.charAt(i);
            if (node.containsKey(currentLetter)) {
                node = node.get(currentLetter);
            } else {
                return false;
            }
        }
        return true;
    }
}

- martynas

0

我刚刚尝试了自己的并发 TRIE 实现，但不是基于字符，而是基于 HashCode。我们仍然可以使用每个 CHAR 的哈希码的 Map of Map。
您可以使用以下代码进行测试：https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapPerformanceTest.java https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TrieMap {
    public static int SIZEOFEDGE = 4; 
    public static int OSIZE = 5000;
}

abstract class Node {
    public Node getLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
    public Node createLink(int hash, int level, String key, String val) {
        throw new UnsupportedOperationException();
    }
    public Node removeLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
}

class Vertex extends Node {
    String key;
    volatile String val;
    volatile Vertex next;

    public Vertex(String key, String val) {
        this.key = key;
        this.val = val;
    }

    @Override
    public boolean equals(Object obj) {
        Vertex v = (Vertex) obj;
        return this.key.equals(v.key);
    }

    @Override
    public int hashCode() {
        return key.hashCode();
    }

    @Override
    public String toString() {
        return key +"@"+key.hashCode();
    }
}


class Edge extends Node {
    volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile

    public Edge(int size) {
        array = new AtomicReferenceArray<Node>(8);
    }


    @Override
    public Node getLink(String key, int hash, int level){
        int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
        Node returnVal = array.get(index);
        for(;;) {
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                Vertex node = (Vertex) returnVal;
                for(;node != null; node = node.next) {
                    if(node.key.equals(key)) {  
                        return node; 
                    }
                } 
                return null;
            } else { //instanceof Edge
                level = level + 1;
                index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
                Edge e = (Edge) returnVal;
                returnVal = e.array.get(index);
            }
        }
    }

    @Override
    public Node createLink(int hash, int level, String key, String val) { //Remove size
        for(;;) { //Repeat the work on the current node, since some other thread modified this node
            int index =  Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node nodeAtIndex = array.get(index);
            if ( nodeAtIndex == null) {  
                Vertex newV = new Vertex(key, val);
                boolean result = array.compareAndSet(index, null, newV);
                if(result == Boolean.TRUE) {
                    return newV;
                }
                //continue; since new node is inserted by other thread, hence repeat it.
            } 
            else if(nodeAtIndex instanceof Vertex) {
                Vertex vrtexAtIndex = (Vertex) nodeAtIndex;
                int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1);
                int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1);
                Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1);
                if(newIndex != newIndex1) {
                    Vertex newV = new Vertex(key, val);
                    edge.array.set(newIndex, vrtexAtIndex);
                    edge.array.set(newIndex1, newV);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return newV;
                    }
                   //continue; since vrtexAtIndex may be removed or changed to Edge already.
                } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) {       HERE newIndex == newIndex1
                    synchronized (vrtexAtIndex) {   
                        boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed.
                        if(result == Boolean.TRUE) {
                            Vertex prevV = vrtexAtIndex;
                            for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) {
                                prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL
                                if(vrtexAtIndex.key.equals(key)){
                                    vrtexAtIndex.val = val;
                                    return vrtexAtIndex;
                                }
                            } 
                            Vertex newV = new Vertex(key, val);
                            prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other.
                            return newV;
                        }
                        //Continue; vrtexAtIndex got changed
                    }
                } else {   //HERE newIndex == newIndex1  BUT vrtex.hash != hash
                    edge.array.set(newIndex, vrtexAtIndex);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return edge.createLink(hash, (level + 1), key, val);
                    }
                }
            }               
            else {  //instanceof Edge
                return nodeAtIndex.createLink(hash, (level + 1), key, val);
            }
        }
    }




    @Override
    public Node removeLink(String key, int hash, int level){
        for(;;) {
            int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node returnVal = array.get(index);
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                synchronized (returnVal) {
                    Vertex node = (Vertex) returnVal;
                    if(node.next == null) {
                        if(node.key.equals(key)) {
                            boolean result = array.compareAndSet(index, node, null); 
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex may be changed to Edge
                        }
                        return null;  //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. 
                    } else {
                        if(node.key.equals(key)) { //Removing the first node in the link
                            boolean result = array.compareAndSet(index, node, node.next);
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex(node) may be changed to Edge, so try again.
                        }
                        Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous
                        node = node.next;
                        for(;node != null; prevV = node, node = node.next) {
                            if(node.key.equals(key)) {
                                prevV.next = node.next; //Removing other than first node in the link
                                return node; 
                            }
                        } 
                        return null;  //Nothing found in the linked list.
                    }
                }
            } else { //instanceof Edge
                return returnVal.removeLink(key, hash, (level + 1));
            }
        }
    }

}



class Base10ToBaseX {
    public static enum Base {
        /**
         * Integer is represented in 32 bit in 32 bit machine.
         * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits
         */
        BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ 
        BASE16(15, 4, 8){       
            public String getFormattedValue(int val){
                switch(val) {
                case 10:
                    return "A";
                case 11:
                    return "B";
                case 12:
                    return "C";
                case 13:
                    return "D";
                case 14:
                    return "E";
                case 15:
                    return "F";
                default:
                    return "" + val;
                }

            }
        }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2);

        private int LEVEL_0_MASK;
        private int LEVEL_1_ROTATION;
        private int MAX_ROTATION;

        Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) {
            this.LEVEL_0_MASK = levelZeroMask;
            this.LEVEL_1_ROTATION = levelOneRotation;
            this.MAX_ROTATION = maxPossibleRotation;
        }

        int getLevelZeroMask(){
            return LEVEL_0_MASK;
        }
        int getLevelOneRotation(){
            return LEVEL_1_ROTATION;
        }
        int getMaxRotation(){
            return MAX_ROTATION;
        }
        String getFormattedValue(int val){
            return "" + val;
        }
    }

    public static int getBaseXValueOnAtLevel(Base base, int on, int level) {
        if(level > base.getMaxRotation() || level < 1) {
            return 0; //INVALID Input
        }
        int rotation = base.getLevelOneRotation();
        int mask = base.getLevelZeroMask();

        if(level > 1) {
            rotation = (level-1) * rotation;
            mask = mask << rotation;
        } else {
            rotation = 0;
        }
        return (on & mask) >>> rotation;
    }
}

- Kanagavelu Sugumar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Schlosnagle · Accepted Answer

35

你可能想要查看Limewire正在向Google Guava贡献的Trie实现。

- David Schlosnagle

8

看起来Google-Collections已经被Guava取代了 https://code.google.com/p/guava-libraries/，可惜的是我在那里找不到Trie类。现在Patricia Trie似乎有自己的项目页面了：https://code.google.com/p/patricia-trie/。 - Dan J

1

Limewire/Google的链接现在有些混乱。虽然我成功找到了https://code.google.com/archive/p/google-collections/issues/5 上的实际文件，但请注意，Apache Commons Collections带有多种tries（包括patricia trie）。这是我现在推荐的选择。 - Jason C

此外，Apache Commons实现似乎来自于与Limewire贡献相同的地方，因为Commons文档中对于PatriciaTrie的摘要评论与Limewire贡献的实现中的摘要评论完全相同。 - Jason C