Java - 在 ArrayList 中去除重复项

Question

Java - 在 ArrayList 中去除重复项

20

我正在开发一个程序，使用ArrayList存储Strings。该程序提示用户选择操作，可进行添加字符串到列表、打印条目等操作。我想创建一个名为removeDuplicates()的方法。该方法将搜索ArrayList并删除任何重复值。我希望保留列表中一个重复值的实例。我还希望此方法返回删除的总重复项数。

我一直在尝试使用嵌套循环来完成这个任务，但是遇到了麻烦，因为当条目被删除时，ArrayList的索引会改变，导致事情无法按预期工作。我知道概念上需要做什么，但在代码实现方面遇到了困难。

以下是一些伪代码：

从第一个条目开始; 检查列表中每个后续条目，看它是否与第一个条目匹配; 删除列表中每个匹配第一个条目的后续条目;

检查所有条目后，移至第二个条目; 检查列表中每个条目，看其是否与第二个条目匹配; 删除列表中每个与第二个条目匹配的条目;

对于列表中的每个条目重复此过程

以下是我目前拥有的代码：

public int removeDuplicates()
{
  int duplicates = 0;

  for ( int i = 0; i < strings.size(); i++ )
  {
     for ( int j = 0; j < strings.size(); j++ )
     {
        if ( i == j )
        {
          // i & j refer to same entry so do nothing
        }

        else if ( strings.get( j ).equals( strings.get( i ) ) )
        {
           strings.remove( j );
           duplicates++;
        }
     }
 }

   return duplicates;
}

更新: 看起来Will正在寻找一种作业解决方案，其中涉及开发算法以删除重复项，而不是使用Sets的实用解决方案。请参见他的评论：

谢谢您的建议。这是作业的一部分，我相信老师打算的解决方案不包括集合。换句话说，我需要想出一种在不使用HashSet的情况下搜索并删除重复项的解决方案。老师建议使用嵌套循环，这就是我正在努力做的事情，但在删除某些条目后，我一直遇到一些关于ArrayList索引的问题。

- Will

1

如果无法通过Set（正如其他人已经建议的）来运行它们，那么了解是否存在其他限制将是有帮助的，例如O（？）。您当前的解决方案为O（n ^ 2），在计算机科学课程中通常被认为对于此类问题过于重。 - ponzao

如果你的老师让你用Java做作业，那么就给他一个实用的解决方案，使用Set =) - BorisOkunskiy

19个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thirler · Answer 1

使用集合是最好的选择（正如其他人建议的那样）。

如果您想将列表中的所有元素进行比较，则应稍微调整for循环：

for(int i = 0; i < max; i++)
    for(int j = i+1; j < max; j++)

这样做可以避免每个元素被比较两次，而只需要比较一次。这是因为第二个循环从第一个循环比较的下一个元素开始。

此外，在迭代列表时删除元素（即使使用for循环而不是迭代器），请记住您会减小列表的大小。常见的解决方案是保留另一个要删除的项目列表，然后在完成决定要删除哪些项目后，再从原始列表中删除它们。

- user232223 · Answer 2

public ArrayList removeDuplicates(ArrayList <String> inArray)
{
    ArrayList <String> outArray = new ArrayList();
    boolean doAdd = true;
    for (int i = 0; i < inArray.size(); i++)
    {
        String testString = inArray.get(i);
        for (int j = 0; j < inArray.size(); j++)
        {
            if (i == j)
            {
                break;
            }
            else if (inArray.get(j).equals(testString))
            {
                doAdd = false;
                break;
            }

        }
        if (doAdd)
        {
            outArray.add(testString);
        }
        else
        {
            doAdd = true;
        }

    }
    return outArray;

}

- Carl · Answer 3

public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
  List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
  Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
  for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
  return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

然后是一些 entryFactory(Integer key, List<Foo> value) 方法。如果你想要改变原始列表（可能不是一个好主意，但无论如何）：

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
  int original = listWithPossibleDuplicates.size();
  Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
  Set<Foo> found = new HashSet<Foo>();
  while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
  return original - found.size();
}

针对您的特定情况，使用字符串时，您可能需要处理一些额外的相等性约束（例如，大写和小写版本是否相同或不同？）。

编辑：啊，这是作业。在Java集合框架中查找Iterator/Iterable以及Set，并查看您是否得出了我提供的相同结论。泛型部分只是锦上添花。

- NomeN · Answer 4

你在代码中遇到的问题是在迭代过程中删除了一个条目，从而使迭代位置无效。

例如：

{"a", "b", "c", "b", "b", "d"} 
       i         j

现在，您正在删除字符串[j]。

{"a", "b", "c", "b", "d"} 
       i         j

内部循环结束，j被增加。

{"a", "b", "c", "b", "d"} 
       i              j

只检测到一个重复的 'b'...哎呀。

在这种情况下的最佳实践是存储需要删除的位置，并在完成对数组列表的迭代后将它们删除。（一个奖励，strings.size() 调用可以由您或编译器在循环外进行优化）

提示：您可以从 i+1 开始使用 j 进行迭代，因为您已经检查了 0 - i！

- user2450176 · Answer 5

您可以将列表添加到 HashSet 中，然后再将该 HashSet 转换为列表以删除重复项。

public static int removeDuplicates(List<String> duplicateList){
    List<String> correctedList = new ArrayList<String>();
    Set<String> a = new HashSet<String>();
    a.addAll(duplicateList);
    correctedList.addAll(a);
    return (duplicateList.size()-correctedList.size());
}

这里将返回重复值的数量。您还可以使用正确列表，其中包含所有唯一值

- Ashish Dwivedi · Answer 6

我有点晚加入这个问题，但是我提供了一个更好的解决方案，使用通用类型。所有上面提供的解决方案都只是一种解决方案。它们增加了整个运行时线程的复杂性。

RemoveDuplicacy.java

我们可以使用一种技术在加载时完成所需的最小化。

例如：假设你正在使用类类型的数组列表：

ArrayList<User> usersList = new ArrayList<User>();
        usersList.clear();

        User user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("AB");
        user.setId("2"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("C");
        user.setId("4");
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("2"); // duplicate
        usersList.add(user);


}

这个类是数组列表的基类，用于上面使用的用户类(User class)。

class User {
    private String name;
    private String id;

    /**
     * @param name
     *            the name to set
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * @return the name
     */
    public String getName() {
        return name;
    }

    /**
     * @param id
     *            the id to set
     */
    public void setId(String id) {
        this.id = id;
    }

    /**
     * @return the id
     */
    public String getId() {
        return id;
    }

}

现在在Java中，Object（父）类中有两个重写的方法，它们可以更好地帮助我们实现目的。它们是：

@Override
    public int hashCode() {

        final int prime = 31;
        int result = 1;
        result = prime * result + ((id == null) ? 0 : id.hashCode());
        return result;

    }

    @Override
    public boolean equals(Object obj) {

        if (this == obj)
            return true;

        if (obj == null)
            return false;

        if (getClass() != obj.getClass())
            return false;

        User other = (User) obj;

        if (id == null) {
            if (other.id != null)
                return false;

        } else if (!id.equals(other.id))
            return false;

        return true;

    }

你必须在User类中重写这些方法。

以下是完整的代码：

https://gist.github.com/4584310

如果您有任何疑问，请告诉我。

- Avinash U · Answer 7

以下是从列表中删除重复元素的代码，而不改变列表的顺序，也不使用临时列表和任何集合变量。这段代码节省了内存并提高了性能。

这是一种通用方法，适用于任何类型的列表。

这是在面试中提出的问题之一。在许多论坛中搜索解决方案，但没有找到一个，所以认为这是发布代码的正确论坛。

    public List<?> removeDuplicate(List<?> listWithDuplicates) {
    int[] intArray = new int[listWithDuplicates.size()];
    int dupCount = 1;
    int arrayIndex = 0;
    int prevListIndex = 0; // to save previous listIndex value from intArray
    int listIndex;

    for (int i = 0; i < listWithDuplicates.size(); i++) {
        for (int j = i + 1; j < listWithDuplicates.size(); j++) {
            if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i)))
                dupCount++;

            if (dupCount == 2) {
                intArray[arrayIndex] = j; // Saving duplicate indexes to an array
                arrayIndex++;
                dupCount = 1;
            }
        }
    }

    Arrays.sort(intArray);

    for (int k = intArray.length - 1; k >= 0; k--) {
        listIndex = intArray[k];
        if (listIndex != 0 && prevListIndex != listIndex){
            listWithDuplicates.remove(listIndex);
            prevListIndex = listIndex;
        }
    }
    return listWithDuplicates;
}

- Smalltown2k · Answer 8

你可以用空字符串替换重复的部分，这样可以保持索引不变。然后在完成后，你可以去掉这些空字符串。

*但前提是在你的实现中空字符串不是有效的。

- markusk · Answer 9

内部的 for 循环无效。如果您删除一个元素，则不能递增j，因为此时j指向删除后的元素之后的元素，并且您需要检查它。

换句话说，应该使用 while 循环代替 for 循环，仅当i和j处的元素不匹配时才递增 j。如果它们匹配，则删除j处的元素。size()会减少1，并且j现在将指向下一个元素，因此无需增加j。

此外，在内部循环中没有必要检查所有元素，只需检查i之后的元素即可，因为先前的迭代已经删除了i之前的重复项。