Java - 在 ArrayList 中去除重复项

Question

Java - 在 ArrayList 中去除重复项

20

我正在开发一个程序，使用ArrayList存储Strings。该程序提示用户选择操作，可进行添加字符串到列表、打印条目等操作。我想创建一个名为removeDuplicates()的方法。该方法将搜索ArrayList并删除任何重复值。我希望保留列表中一个重复值的实例。我还希望此方法返回删除的总重复项数。

我一直在尝试使用嵌套循环来完成这个任务，但是遇到了麻烦，因为当条目被删除时，ArrayList的索引会改变，导致事情无法按预期工作。我知道概念上需要做什么，但在代码实现方面遇到了困难。

以下是一些伪代码：

从第一个条目开始; 检查列表中每个后续条目，看它是否与第一个条目匹配; 删除列表中每个匹配第一个条目的后续条目;

检查所有条目后，移至第二个条目; 检查列表中每个条目，看其是否与第二个条目匹配; 删除列表中每个与第二个条目匹配的条目;

对于列表中的每个条目重复此过程

以下是我目前拥有的代码：

public int removeDuplicates()
{
  int duplicates = 0;

  for ( int i = 0; i < strings.size(); i++ )
  {
     for ( int j = 0; j < strings.size(); j++ )
     {
        if ( i == j )
        {
          // i & j refer to same entry so do nothing
        }

        else if ( strings.get( j ).equals( strings.get( i ) ) )
        {
           strings.remove( j );
           duplicates++;
        }
     }
 }

   return duplicates;
}

更新: 看起来Will正在寻找一种作业解决方案，其中涉及开发算法以删除重复项，而不是使用Sets的实用解决方案。请参见他的评论：

谢谢您的建议。这是作业的一部分，我相信老师打算的解决方案不包括集合。换句话说，我需要想出一种在不使用HashSet的情况下搜索并删除重复项的解决方案。老师建议使用嵌套循环，这就是我正在努力做的事情，但在删除某些条目后，我一直遇到一些关于ArrayList索引的问题。

- Will

1

如果无法通过Set（正如其他人已经建议的）来运行它们，那么了解是否存在其他限制将是有帮助的，例如O（？）。您当前的解决方案为O（n ^ 2），在计算机科学课程中通常被认为对于此类问题过于重。 - ponzao

如果你的老师让你用Java做作业，那么就给他一个实用的解决方案，使用Set =) - BorisOkunskiy

19个回答

17

您可以毫无问题地使用嵌套循环:

public static int removeDuplicates(ArrayList<String> strings) {

    int size = strings.size();
    int duplicates = 0;

    // not using a method in the check also speeds up the execution
    // also i must be less that size-1 so that j doesn't
    // throw IndexOutOfBoundsException
    for (int i = 0; i < size - 1; i++) {
        // start from the next item after strings[i]
        // since the ones before are checked
        for (int j = i + 1; j < size; j++) {
            // no need for if ( i == j ) here
            if (!strings.get(j).equals(strings.get(i)))
                continue;
            duplicates++;
            strings.remove(j);
            // decrease j because the array got re-indexed
            j--;
            // decrease the size of the array
            size--;
        } // for j
    } // for i

    return duplicates;

}

- Azder

没有测试，这看起来是理想的。请注意，内部索引从外部索引之后开始（您不需要每次都从列表开头检查重复项，因为您已经检查了外部索引值的重复项）。最重要的是，它似乎实际上回答了所提出的问题！ - AndyT

@Azder - 它确实会抛出IndexOutOfBoundsException吗？您的条件j < size将会处理它。不是吗？因此，没有必要将其限制为i < size -1。 - Cid

是的，可能是这种情况，但仍然可以避免 i 多循环一次。 - Azder

14

你可以尝试这个一行代码，以保留顺序来复制字符串。

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

这种方法的时间复杂度也是摊销O(n)，而不是O(n^2)

- Peter Lawrey

3

用集合实现，运行时间应该为O(n)。 - Elbek

8

仅想澄清我的评论，如果您真的想要计算删除的重复项数量，请使用以下代码：

List<String> list = new ArrayList<String>();

// list gets populated from user input...

Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();

- Peter

好的，我考虑过使用哈希集，但这是作业的一部分，老师没有提到哈希集作为可能的解决方案。我认为我们应该想出一种不使用哈希集的实现方式。 - Will

好的，那么你的理解是这是一个任务，旨在测试你是否能够开发出适当的算法来去除重复项，而不仅仅是“完成任务”？我将澄清你最初的问题/帖子。 - Peter

4

List<String> lst = new ArrayList<String>();

lst.add("one");
lst.add("one");
lst.add("two");
lst.add("three");
lst.add("three");
lst.add("three");
Set se =new HashSet(lst);
lst.clear();
lst = new ArrayList<String>(se);
for (Object ls : lst){
    System.out.println("Resulting output---------" + ls);   
}

- Psl

4

我一直在尝试使用嵌套循环来完成这个任务，但遇到了麻烦，因为当条目被删除时，ArrayList的索引会发生变化，导致事情无法按照预期进行。为什么不在每次删除一个条目时减少计数器呢？当您删除一个条目时，元素也会移动：例如：

String [] a = {"a","a","b","c" }

职位：

a[0] = "a";
a[1] = "a";    
a[2] = "b";
a[3] = "c";

当您删除第一个“a”后，索引如下：

a[0] = "a";
a[1] = "b";
a[2] = "c";

因此，你应该考虑减少 j 的值 (j--)，以避免“跳过”一个值。

请查看此截图：

its working

- OscarRyz

好好尝试一下，如果你需要看那个缺失的代码片段，请告诉我。你已经快成功了！ - OscarRyz

@BalusC：我不知道。我会尝试在SuperUser上问一下（虽然我很确定它会被关闭，因为与计算机无关）。 - OscarRyz

@BalusC：它是Monaco http://superuser.com/questions/121123/whats-the-name-of-this-font - OscarRyz

我知道了，这只能在Mac上使用。不过有一个Windows的克隆版，可以参考这个网址http://www.webdevkungfu.com/textmate-envy-aka-monaco-font-for-windows/。谢谢 :) - BalusC

3

public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
    Collection result = new ArrayList();

    for(Object o : c) {
        if (!result.contains(o)) {
            result.add(o);
        }
    }

    return result;
}

或者

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
    Object last = null;
    Collections.sort(l);

    Iterator i = l.iterator();
    while(i.hasNext()) {
        Object o = i.next();
        if (o.equals(last)) {
            i.remove();
        } else {
            last = o;
        }
    }
}

两者都未经测试。

- Will Hartung

我喜欢第一种方法；它易于理解，并利用了所有编码在“contains()”中的可能优化。 - Rob Audenaerde

我认为方法声明应该是： public <E> Collection<E> removeDuplicates(Collection<E> c) 以便返回与输入相同类型的集合。在您的示例中，传递一个Collection<String>，将返回一个Collection<Object>。但基本思想很好！ - Mario Reutter

Collections.sort() 需要比较项。 - Alex

1

你可以像这样做，大部分人上面回答的都是一种选择，但这里有另一种选择。

for (int i = 0; i < strings.size(); i++) {
    for (int j = j + 1; j > strings.size(); j++) {
      if(strings.get(i) == strings.get(j)) {
            strings.remove(j);
            j--;
       }`
    }
  }

return strings;

- Tipper Truong

1

假设您不能像您所说的那样使用Set，解决问题最简单的方法是使用临时列表，而不是尝试就地删除重复项：

public class Duplicates {

    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();
        list.add("one");
        list.add("one");
        list.add("two");
        list.add("three");
        list.add("three");
        list.add("three");

        System.out.println("Prior to removal: " +list);
        System.out.println("There were " + removeDuplicates(list) + " duplicates.");
        System.out.println("After removal: " + list);
    }

    public static int removeDuplicates(List<String> list) {
        int removed = 0;
        List<String> temp = new ArrayList<String>();

        for(String s : list) {
            if(!temp.contains(s)) {
                temp.add(s);
            } else {
                //if the string is already in the list, then ignore it and increment the removed counter
                removed++;
            }
        }

        //put the contents of temp back in the main list
        list.clear();
        list.addAll(temp);

        return removed;
    }

}

- Jared Russell

一个临时列表会使列表的内存占用翻倍。 - Alex

0

使用集合是去除重复项的最佳选择：

如果您有一个数组列表，可以删除重复项并仍保留数组列表功能：

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

如果你不能使用集合，可以对数组进行排序（Collections.sort()），然后迭代列表，检查当前元素是否等于前一个元素，如果是，则将其移除。

- Theo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- matt b · Accepted Answer

38

为什么不使用像Set这样的集合（以及像HashSet这样的实现），它们自然地防止重复？

- matt b

2

+1，使用Set是最佳选择。如果您想计算已删除的重复项数量，请像以前一样将其存储在List中，然后通过将List传递到构造函数中来构造一个Set，然后比较两者之间的大小差异以获取重复项的数量。 - Peter

1

+1 对于解决方案，-1 不适合“作业”，= 0 分。 :( @Will 没有标记为“作业”。 - OscarRyz

1

如果保留顺序很重要怎么办？ - Carl

3

请使用 LinkedHashSet。 - Ravi Wallau

2

要使用Set，您必须实现Equals方法，以便Set可以在用户创建的对象上正确地工作。 - monksy

必须可靠地实现equals -> 如果两个对象具有不同的内存分配，但都是从使用UUID作为主键的数据库中检索到的，则这两个对象可能是相同的对象。 - Sandy Simonton