迭代计算任意数量集合的笛卡尔积

21

我希望能够在Java中计算任意数量的非空集合的笛卡尔积。

我已经编写了迭代代码...

public static <T> List<Set<T>> cartesianProduct(List<Set<T>> list) {
    List<Iterator<T>> iterators = new ArrayList<Iterator<T>>(list.size());
    List<T> elements = new ArrayList<T>(list.size());
    List<Set<T>> toRet = new ArrayList<Set<T>>();
    for (int i = 0; i < list.size(); i++) {
        iterators.add(list.get(i).iterator());
        elements.add(iterators.get(i).next());
    }
    for (int j = 1; j >= 0;) {
        toRet.add(Sets.newHashSet(elements));
        for (j = iterators.size()-1; j >= 0 && !iterators.get(j).hasNext(); j--) {
            iterators.set(j, list.get(j).iterator());
            elements.set(j, iterators.get(j).next());
        }
        elements.set(Math.abs(j), iterators.get(Math.abs(j)).next());
    }
    return toRet;
}

...但我觉得它相当不优雅。 有人有更好的,仍是迭代式解决方案吗?一种使用了一些像函数式一样的方法的解决方案? 否则...对如何改进它有建议吗?错误?

10个回答

25

我写了一个解决方案,不需要在内存中填充大量集合。不幸的是,所需的代码有数百行之多。你可能要等到它出现在Guava项目中(https://github.com/google/guava),我希望会在年底之前完成。抱歉 :(

请注意,如果您要进行笛卡尔积的集合数量在编译时就已知且是固定值,那么您可能不需要这样的工具-您可以使用相应数量的嵌套for循环。

编辑:该代码现已发布。

Sets.cartesianProduct()

我认为你会非常满意它的。它只在您请求它们时创建各个列表;不会用所有MxNxPxQ填满内存。

如果您想检查源代码,它在这里

享受吧!


3
为什么只为集合实现这个功能,而不是普遍适用于可迭代对象(例如,给定可迭代对象的列表,返回列表的可迭代对象)?当然,对于集合,您可以进行更多操作,例如轻松检查包含情况,但当我没有集合可用时(必须自己实现)我就需要这个功能。 - Paŭlo Ebermann
@PaŭloEbermann 抱歉我之前没看到,但是一直有一个List版本的方法就在那个旁边(在Lists类中)。 - Kevin Bourrillion

5
使用Google Guava 19和Java 8非常简单:
假设您有一个要关联的所有数组的列表...
public static void main(String[] args) {
  List<String[]> elements = Arrays.asList(
    new String[]{"John", "Mary"}, 
    new String[]{"Eats", "Works", "Plays"},
    new String[]{"Food", "Computer", "Guitar"}
  );

  // Create a list of immutableLists of strings
  List<ImmutableList<String>> immutableElements = makeListofImmutable(elements);

  // Use Guava's Lists.cartesianProduct, since Guava 19
  List<List<String>> cartesianProduct = Lists.cartesianProduct(immutableElements);

  System.out.println(cartesianProduct);
}

制作不可变列表的方法如下:
/**
 * @param values the list of all profiles provided by the client in matrix.json
 * @return the list of ImmutableList to compute the Cartesian product of values
 */
private static List<ImmutableList<String>> makeListofImmutable(List<String[]> values) {
  List<ImmutableList<String>> converted = new LinkedList<>();
  values.forEach(array -> {
    converted.add(ImmutableList.copyOf(array));
  });
  return converted;
}

输出结果如下:
[
  [John, Eats, Food], [John, Eats, Computer], [John, Eats, Guitar],
  [John, Works, Food], [John, Works, Computer], [John, Works, Guitar], 
  [John, Plays, Food], [John, Plays, Computer], [John, Plays, Guitar],
  [Mary, Eats, Food], [Mary, Eats, Computer], [Mary, Eats, Guitar],
  [Mary, Works, Food], [Mary, Works, Computer], [Mary, Works, Guitar],
  [Mary, Plays, Food], [Mary, Plays, Computer], [Mary, Plays, Guitar]
]

1
这是我写的一个迭代、惰性实现。接口与 Google 的 Sets.cartesianProduct 非常相似,但它更加灵活:它处理的是可迭代对象而不是集合。该代码及其单元测试位于 https://gist.github.com/1911614
/* Copyright 2012 LinkedIn Corp.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 */

import com.google.common.base.Function;
import com.google.common.collect.Iterables;
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;

/**
 * Implements the Cartesian product of ordered collections.
 * 
 * @author <a href="mailto:jmkristian@gmail.com">John Kristian</a>
 */
public class Cartesian {
  /**
   * Generate the <a href="http://en.wikipedia.org/wiki/Cartesian_product">Cartesian
   * product</a> of the given axes. For axes [[a1, a2 ...], [b1, b2 ...], [c1, c2 ...]
   * ...] the product is [[a1, b1, c1 ...] ... [a1, b1, c2 ...] ... [a1, b2, c1 ...] ...
   * [aN, bN, cN ...]]. In other words, the results are generated in same order as these
   * nested loops:
   * 
   * <pre>
   * for (T a : [a1, a2 ...])
   *   for (T b : [b1, b2 ...])
   *     for (T c : [c1, c2 ...])
   *       ...
   *         result = new T[]{ a, b, c ... };
   * </pre>
   * 
   * Each result is a new array of T, whose elements refer to the elements of the axes. If
   * you prefer a List, you can call asLists(product(axes)).
   * <p>
   * Don't change the axes while iterating over their product, as a rule. Changes to an
   * axis can affect the product or cause iteration to fail (which is usually bad). To
   * prevent this, you can pass clones of your axes to this method.
   * <p>
   * The implementation is lazy. This method iterates over the axes, and returns an
   * Iterable that contains a reference to each axis. Iterating over the product causes
   * iteration over each axis. Methods of each axis are called as late as practical.
   */
  public static <T> Iterable<T[]> product(Class<T> resultType,
                                          Iterable<? extends Iterable<? extends T>> axes) {
    return new Product<T>(resultType, newArray(Iterable.class, axes));
  }

  /** Works like product(resultType, Arrays.asList(axes)), but slightly more efficient. */
  public static <T> Iterable<T[]> product(Class<T> resultType, Iterable<? extends T>... axes) {
    return new Product<T>(resultType, axes.clone());
  }

  /**
   * Wrap the given arrays in fixed-size lists. Changes to the lists write through to the
   * arrays.
   */
  public static <T> Iterable<List<T>> asLists(Iterable<? extends T[]> arrays) {
    return Iterables.transform(arrays, new AsList<T>());
  }

  /**
   * Arrays.asList, represented as a Function (as used in Google collections).
   */
  public static class AsList<T> implements Function<T[], List<T>> {
    @Override
    public List<T> apply(T[] array) {
      return Arrays.asList(array);
    }
  }

  /** Create a generic array containing references to the given objects. */
  private static <T> T[] newArray(Class<? super T> elementType, Iterable<? extends T> from) {
    List<T> list = new ArrayList<T>();
    for (T f : from)
      list.add(f);
    return list.toArray(newArray(elementType, list.size()));
  }

  /** Create a generic array. */
  @SuppressWarnings("unchecked")
  private static <T> T[] newArray(Class<? super T> elementType, int length) {
    return (T[]) Array.newInstance(elementType, length);
  }

  private static class Product<T> implements Iterable<T[]> {
    private final Class<T> _resultType;
    private final Iterable<? extends T>[] _axes;

    /** Caution: the given array of axes is contained by reference, not cloned. */
    Product(Class<T> resultType, Iterable<? extends T>[] axes) {
      _resultType = resultType;
      _axes = axes;
    }

    @Override
    public Iterator<T[]> iterator() {
      if (_axes.length <= 0) // an edge case
        return Collections.singleton(newArray(_resultType, 0)).iterator();
      return new ProductIterator<T>(_resultType, _axes);
    }

    @Override
    public String toString() {
      return "Cartesian.product(" + Arrays.toString(_axes) + ")";
    }

    private static class ProductIterator<T> implements Iterator<T[]> {
      private final Iterable<? extends T>[] _axes;
      private final Iterator<? extends T>[] _iterators; // one per axis
      private final T[] _result; // a copy of the last result
      /**
       * The minimum index such that this.next() will return an array that contains
       * _iterators[index].next(). There are some special sentinel values: NEW means this
       * is a freshly constructed iterator, DONE means all combinations have been
       * exhausted (so this.hasNext() == false) and _iterators.length means the value is
       * unknown (to be determined by this.hasNext).
       */
      private int _nextIndex = NEW;
      private static final int NEW = -2;
      private static final int DONE = -1;

      /** Caution: the given array of axes is contained by reference, not cloned. */
      ProductIterator(Class<T> resultType, Iterable<? extends T>[] axes) {
        _axes = axes;
        _iterators = Cartesian.<Iterator<? extends T>> newArray(Iterator.class, _axes.length);
        for (int a = 0; a < _axes.length; ++a) {
          _iterators[a] = axes[a].iterator();
        }
        _result = newArray(resultType, _iterators.length);
      }

      private void close() {
        _nextIndex = DONE;
        // Release references, to encourage garbage collection:
        Arrays.fill(_iterators, null);
        Arrays.fill(_result, null);
      }

      @Override
      public boolean hasNext() {
        if (_nextIndex == NEW) { // This is the first call to hasNext().
          _nextIndex = 0; // start here
          for (Iterator<? extends T> iter : _iterators) {
            if (!iter.hasNext()) {
              close(); // no combinations
              break;
            }
          }
        } else if (_nextIndex >= _iterators.length) {
          // This is the first call to hasNext() after next() returned a result.
          // Determine the _nextIndex to be used by next():
          for (_nextIndex = _iterators.length - 1; _nextIndex >= 0; --_nextIndex) {
            Iterator<? extends T> iter = _iterators[_nextIndex];
            if (iter.hasNext()) {
              break; // start here
            }
            if (_nextIndex == 0) { // All combinations have been generated.
              close();
              break;
            }
            // Repeat this axis, with the next value from the previous axis.
            iter = _axes[_nextIndex].iterator();
            _iterators[_nextIndex] = iter;
            if (!iter.hasNext()) { // Oops; this axis can't be repeated.
              close(); // no more combinations
              break;
            }
          }
        }
        return _nextIndex >= 0;
      }

      @Override
      public T[] next() {
        if (!hasNext())
          throw new NoSuchElementException("!hasNext");
        for (; _nextIndex < _iterators.length; ++_nextIndex) {
          _result[_nextIndex] = _iterators[_nextIndex].next();
        }
        return _result.clone();
      }

      @Override
      public void remove() {
        for (Iterator<? extends T> iter : _iterators) {
          iter.remove();
        }
      }

      @Override
      public String toString() {
        return "Cartesian.product(" + Arrays.toString(_axes) + ").iterator()";
      }
    }
  }
}

1
以下答案使用迭代而非递归。它使用了我之前回答中的相同的Tuple类。
这是一个单独的答案,因为在我看来,两种方法都是有效的,只是不同的方法。
以下是新的主类:
public class Example {
    public static <T> List<Tuple<T>> cartesianProduct(List<Set<T>> sets) {
        List<Tuple<T>> tuples = new ArrayList<Tuple<T>>();
        for (Set<T> set : sets) {
            if (tuples.isEmpty()) {
                for (T t : set) {
                    Tuple<T> tuple = new Tuple<T>();
                    tuple.add(t);
                    tuples.add(tuple);
                }
            } else {
                List<Tuple<T>> newTuples = new ArrayList<Tuple<T>>();
                for (Tuple<T> subTuple : tuples) {
                    for (T t : set) {
                        Tuple<T> tuple = new Tuple<T>();
                        tuple.addAll(subTuple);
                        tuple.add(t);
                        newTuples.add(tuple);
                    }
                }
                tuples = newTuples;
            }
        }
        return tuples;
    }
}

有趣且简洁的方法,但我对所有中间元组的内存消耗有些疑虑,它们会像“在时间中消失的泪水”一样丢失:P - akappa
同意,性能可能很差。我猜你真正想要的是算法而不是编码风格? - Michael Easter

1

基于索引的解决方案

与索引一起工作是一种简单的替代方案,它快速、内存高效,并且可以处理任意数量的集合。实现Iterable接口可以轻松地在for-each循环中使用。请参见#main方法以获取用法示例。

public class CartesianProduct implements Iterable<int[]>, Iterator<int[]> {
    private final int[] _lengths;
    private final int[] _indices;
    private boolean _hasNext = true;

    public CartesianProduct(int[] lengths) {
        _lengths = lengths;
        _indices = new int[lengths.length];
    }

    public boolean hasNext() {
        return _hasNext;
    }

    public int[] next() {
        int[] result = Arrays.copyOf(_indices, _indices.length);
        for (int i = _indices.length - 1; i >= 0; i--) {
            if (_indices[i] == _lengths[i] - 1) {
                _indices[i] = 0;
                if (i == 0) {
                    _hasNext = false;
                }
            } else {
                _indices[i]++;
                break;
            }
        }
        return result;
    }

    public Iterator<int[]> iterator() {
        return this;
    }

    public void remove() {
        throw new UnsupportedOperationException();
    }

    /**
     * Usage example. Prints out
     *
     * <pre>
     * [0, 0, 0] a, NANOSECONDS, 1
     * [0, 0, 1] a, NANOSECONDS, 2
     * [0, 0, 2] a, NANOSECONDS, 3
     * [0, 0, 3] a, NANOSECONDS, 4
     * [0, 1, 0] a, MICROSECONDS, 1
     * [0, 1, 1] a, MICROSECONDS, 2
     * [0, 1, 2] a, MICROSECONDS, 3
     * [0, 1, 3] a, MICROSECONDS, 4
     * [0, 2, 0] a, MILLISECONDS, 1
     * [0, 2, 1] a, MILLISECONDS, 2
     * [0, 2, 2] a, MILLISECONDS, 3
     * [0, 2, 3] a, MILLISECONDS, 4
     * [0, 3, 0] a, SECONDS, 1
     * [0, 3, 1] a, SECONDS, 2
     * [0, 3, 2] a, SECONDS, 3
     * [0, 3, 3] a, SECONDS, 4
     * [0, 4, 0] a, MINUTES, 1
     * [0, 4, 1] a, MINUTES, 2
     * ...
     * </pre>
     */
    public static void main(String[] args) {
        String[] list1 = {"a", "b", "c",};
        TimeUnit[] list2 = TimeUnit.values();
        int[] list3 = new int[]{1, 2, 3, 4};

        int[] lengths = new int[]{list1.length, list2.length, list3.length};
        for (int[] indices : new CartesianProduct(lengths)) {
            System.out.println(Arrays.toString(indices) //
                    + " " + list1[indices[0]] //
                    + ", " + list2[indices[1]] //
                    + ", " + list3[indices[2]]);
        }
    }
}

1
哎呀,如果你尝试两次迭代这个对象,它就会出错。 - Paŭlo Ebermann

0
这是一种懒惰迭代器方法,它使用函数来生成适当的输出类型。
public static <T> Iterable<T> cartesianProduct(
        final Function<Object[], T> fn, Object[]... options) {
    final Object[][] opts = new Object[options.length][];
    for (int i = opts.length; --i >= 0; ) {
        // NPE on null input collections, and handle the empty output case here
        // since the iterator code below assumes that it is not exhausted the
        // first time through fetch.
        if (options[i].length == 0) {
            return Collections.emptySet();
        }
        opts[i] = options[i].clone();
    }
    return new Iterable<T>() {
        public Iterator<T> iterator() {
            return new Iterator<T>() {
                final int[] pos = new int[opts.length];
                boolean hasPending;
                T pending;
                boolean exhausted;

                public boolean hasNext() {
                    fetch();
                    return hasPending;
                }

                public T next() {
                    fetch();
                    if (!hasPending) {
                        throw new NoSuchElementException();
                    }
                    T out = pending;
                    pending = null;  // release for GC
                    hasPending = false;
                    return out;
                }

                public void remove() {
                    throw new UnsupportedOperationException();
                }

                private void fetch() {
                    if (hasPending || exhausted) {
                        return;
                    }
                    // Produce a result.
                    int n = pos.length;
                    Object[] args = new Object[n];
                    for (int j = n; --j >= 0; ) {
                        args[j] = opts[j][pos[j]];
                    }
                    pending = fn.apply(args);
                    hasPending = true;
                    // Increment to next.
                    for (int i = n; --i >= 0; ) {
                        if (++pos[i] < opts[i].length) {
                            for (int j = n; --j > i; ) {
                                pos[j] = 0;
                            }
                            return;
                        }
                    }
                    exhausted = true;
                }
            };
        }
    };
}

0

我为字符串表编写了一个递归笛卡尔积算法。您可以将其修改为使用集合。以下是该算法。它也在我的文章中有解释。

public class Main {
    public static void main(String[] args) {
        String[] A = new String[]{"a1", "a2", "a3"};
        String[] B = new String[]{"b1", "b2", "b3"};
        String[] C = new String[]{"c1"};

        String[] cp = CartesianProduct(0, A, B, C);

        for (String s : cp) {
            System.out.println(s);
        }
    }

    public static String[] CartesianProduct(int prodLevel, String[] res, String[]... s) {
        if (prodLevel < s.length) {
            int cProdLen = res.length * s[prodLevel].length;
            String[] tmpRes = new String[cProdLen];

            for (int i = 0; i < res.length; i++) {
                for (int j = 0; j < s[prodLevel].length; j++) {
                    tmpRes[i * res.length + j] = res[i] + s[prodLevel][j];
                }
            }
            res = Main.CartesianProduct(prodLevel + 1, tmpRes, s);
        }
        return res;
    }
}

0

你可以使用 Stream.reduce 方法。

Java 9 不需要额外的库。

public static <U> List<Set<U>> cartesianProduct(List<Set<? extends U>> sets) {
    // incorrect incoming data
    if (sets == null) return Collections.emptyList();
    return sets.stream()
            // non-null and non-empty sets
            .filter(set -> set != null && set.size() > 0)
            // represent each set element as Set<U>
            .map(set -> set.stream().map(Set::<U>of)
                    // Stream<List<Set<U>>>
                    .collect(Collectors.toList()))
            // summation of pairs of inner sets
            .reduce((set1, set2) -> set1.stream()
                    // combinations of inner sets
                    .flatMap(inner1 -> set2.stream()
                            // merge two inner sets into one
                            .map(inner2 -> Stream.of(inner1, inner2)
                                    .flatMap(Set::stream)
                                    .collect(Collectors.toSet())))
                    // list of combinations
                    .collect(Collectors.toList()))
            // List<Set<U>>
            .orElse(Collections.emptyList());
}

public static void main(String[] args) {
    Set<Integer> set1 = Set.of(1, 2);
    Set<Double> set2 = Set.of(3.0, 4.0);
    Set<Long> set3 = Set.of(5L, 6L);

    List<Set<Number>> sets = cartesianProduct(List.of(set1, set2, set3));
    // output
    sets.forEach(System.out::println);
}

输出(元素的顺序可能不同):

[1, 3.0, 5]
[1, 3.0, 6]
[1, 4.0, 5]
[1, 4.0, 6]
[2, 3.0, 5]
[2, 3.0, 6]
[2, 4.0, 5]
[2, 4.0, 6]

另请参阅:任意数量集合的笛卡尔积


-1

你可能对关于笛卡尔积的另一个问题感兴趣(编辑:已删除以保存超链接,请搜索笛卡尔积标签)。那个答案有一个很好的递归解决方案,我很难改进。你是否特别想要迭代解决方案而不是递归解决方案?


在查看了Perl中Stack Overflow上的另一种迭代解决方案和一个清晰的解释之后,这里提供另一种解决方案:

public static <T> List<Set<T>> uglyCartesianProduct(List<Set<T>> list) {
    List<Iterator<T>> iterators = new ArrayList<Iterator<T>>(list.size());
    List<T> elements = new ArrayList<T>(list.size());
    List<Set<T>> toRet = new ArrayList<Set<T>>();

    for (int i = 0; i < list.size(); i++) {
        iterators.add(list.get(i).iterator());
        elements.add(iterators.get(i).next());
    }

    for (int i = 0; i < numberOfTuples(list); i++) {
        toRet.add(new HashSet<T>());
    }

    int setIndex = 0;
    for (Set<T> set : list) {
        int index = 0;
        for (int i = 0; i < numberOfTuples(list); i++) {
            toRet.get(index).add((T) set.toArray()[index % set.size()]);
            index++;
        }
        setIndex++;
    }
    return toRet;
}

private static <T> int numberOfTuples(List<Set<T>> list) {
    int product = 1;
    for (Set<T> set : list) {
        product *= set.size();
    }
    return product;
}

-1

我相信这是正确的。它不是在追求效率,而是通过递归和抽象实现清晰的风格。

关键的抽象是引入一个简单的 Tuple 类。这有助于后面的泛型:

class Tuple<T> {
    private List<T> list = new ArrayList<T>();

    public void add(T t) { list.add(t); }

    public void addAll(Tuple<T> subT) {
        for (T t : subT.list) {
            list.add(t);
        }
    }

    public String toString() {
        String result = "(";

        for (T t : list) { result += t + ", "; }

        result = result.substring(0, result.length() - 2);
        result += " )";

        return result;
    }
}

使用这个类,我们可以这样编写一个类:

public class Example {
    public static <T> List<Tuple<T>> cartesianProduct(List<Set<T>> sets) {
        List<Tuple<T>> tuples = new ArrayList<Tuple<T>>();

        if (sets.size() == 1) {
            Set<T> set = sets.get(0);
            for (T t : set) {
                Tuple<T> tuple = new Tuple<T>();
                tuple.add(t);
                tuples.add(tuple);
            }
        } else {
            Set<T> set = sets.remove(0);
            List<Tuple<T>> subTuples = cartesianProduct(sets);
            System.out.println("TRACER size = " + tuples.size());
            for (Tuple<T> subTuple : subTuples) {
                for (T t : set) {
                    Tuple<T> tuple = new Tuple<T>();
                    tuple.addAll(subTuple);
                    tuple.add(t);
                    tuples.add(tuple);
                }
            }
        }
        return tuples;
    }
}

我有一个很好的示例可以工作,但为了简洁起见,它被省略了。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接