如何获得子集的所有可能组合?

3
考虑这个 List<string>
List<string> data = new List<string>();
data.Add("Text1");
data.Add("Text2");
data.Add("Text3");
data.Add("Text4");

我遇到的问题是:如何获得列表子集的每个组合?就像这样:
#Subset Dimension 4
Text1;Text2;Text3;Text4

#Subset Dimension 3
Text1;Text2;Text3;
Text1;Text2;Text4;
Text1;Text3;Text4;
Text2;Text3;Text4;

#Subset Dimension 2
Text1;Text2;
Text1;Text3;
Text1;Text4;
Text2;Text3;
Text2;Text4;

#Subset Dimension 1
Text1;
Text2;
Text3;
Text4;

我想到了一个不错的解决方案,值得在这里分享。


严谨一点,但是如果不包括子集维度1,你就没有列表的每个子集。 - Esoteric Screen Name
4个回答

4
与Abaco的回答逻辑类似,但实现方式不同....
foreach (var ss in data.SubSets_LB())
{
    Console.WriteLine(String.Join("; ",ss));
}

public static class SO_EXTENSIONS
{
    public static IEnumerable<IEnumerable<T>> SubSets_LB<T>(
      this IEnumerable<T> enumerable)
    {
        List<T> list = enumerable.ToList();
        ulong upper = (ulong)1 << list.Count;

        for (ulong i = 0; i < upper; i++)
        {
            List<T> l = new List<T>(list.Count);
            for (int j = 0; j < sizeof(ulong) * 8; j++)
            {
                if (((ulong)1 << j) >= upper) break;

                if (((i >> j) & 1) == 1)
                {
                    l.Add(list[j]);
                }
            }

            yield return l;
        }
    }
}

+1 是因为我在很大程度上抄袭了你的答案,但稍微修改了一下。 - Jodrell
如果您有兴趣,我进行了另一个团队建设练习。请查看我的扩展回答。 - Jodrell

3

我认为,这个问题的答案需要一些性能测试。我将尝试进行。它是一个社区维基,欢迎随时更新。

void PerfTest()
{
    var list = Enumerable.Range(0, 21).ToList();

    var t1 = GetDurationInMs(list.SubSets_LB);
    var t2 = GetDurationInMs(list.SubSets_Jodrell2);
    var t3 = GetDurationInMs(() => list.CalcCombinations(20));

    Console.WriteLine("{0}\n{1}\n{2}", t1, t2, t3);
}

long GetDurationInMs(Func<IEnumerable<IEnumerable<int>>> fxn)
{
    fxn(); //JIT???
    var count = 0;

    var sw = Stopwatch.StartNew();
    foreach (var ss in fxn())
    {
        count = ss.Sum();
    }
    return sw.ElapsedMilliseconds;
}

输出:

1281
1604 (_Jodrell not _Jodrell2)
6817

Jodrell的更新

我已经使用发布模式进行构建,即开启了优化。 当我通过Visual Studio运行时,我没有得到一个在1或2之间的一致偏差,但是经过多次运行,LB的答案胜出,我得到接近以下内容的答案:

1190
1260
more

但是如果我通过命令行运行测试套件而不是通过Visual Studio,我得到的结果更像这样

987
879
still more

已经点赞了之前的所有帖子。感谢您宝贵的贡献。 当然,我还有一些工作要做 :) - Abaco
1
@Abaco,如我在我的详细回答中所述,我合并了一些内容以获得(在我的测试中)迄今为止最佳的性能。http://stackoverflow.comhttps://dev59.com/questions/ImvXa4cB1Zd3GeqPM89d#13768100 - Jodrell
考虑到所有答案的实用性,我决定接受维基作为一种总结。 - Abaco

2

编辑

我接受了性能挑战,并将所有答案中最好的部分结合起来。在我的测试中,它似乎具有迄今为止最佳的性能。

public static IEnumerable<IEnumerable<T>> SubSets_Jodrell2<T>(
    this IEnumerable<T> source)
{
    var list = source.ToList();
    var limit = (ulong)(1 << list.Count);

    for (var i = limit; i > 0; i--)
    {
        yield return list.SubSet(i);
    }
}

private static IEnumerable<T> SubSet<T>(
    this IList<T> source, ulong bits)
{
    for (var i = 0; i < source.Count; i++)
    {
        if (((bits >> i) & 1) == 1)
        {
            yield return source[i];
        }
    }
}

同样的想法,几乎与L.B的答案相同,但是这是我自己的解释。

我避免使用内部的ListMath.Pow

public static IEnumerable<IEnumerable<T>> SubSets_Jodrell(
    this IEnumerable<T> source)
{
    var count = source.Count();

    if (count > 64)
    {
        throw new OverflowException("Not Supported ...");
    }

    var limit = (ulong)(1 << count) - 2;

    for (var i = limit; i > 0; i--)
    {
        yield return source.SubSet(i);
    }
}

private static IEnumerable<T> SubSet<T>(
    this IEnumerable<T> source,
    ulong bits)
{
    var check = (ulong)1;
    foreach (var t in source)
    {
        if ((bits & check) > 0)
        {
            yield return t;
        }

        check <<= 1;
    }
}

请注意,这些方法在初始集合中的元素超过64个时不起作用,但当数量增加时,它们变得很慢。

Jodrell,好棒的代码。但是就性能而言,我的测试结果与你的不同(我使用了下面(或上面)的PerfTest代码)。 - L.B
@L.B,我的测试代码肯定是错的,我重新测试并修正了维基。 - Jodrell

1

我为列表开发了一个简单的扩展方法:

    /// <summary>
    /// Obtain all the combinations of the elements contained in a list
    /// </summary>
    /// <param name="subsetDimension">Subset Dimension</param>
    /// <returns>IEnumerable containing all the differents subsets</returns>
    public static IEnumerable<List<T>> CalcCombinations<T>(this List<T> list, int subsetDimension)
    {
        //First of all we will create a binary matrix. The dimension of a single row
        //must be the dimension of list 
        //on which we are working (we need a 0 or a 1 for every single element) so row
        //dimension is to obtain a row-length = list.count we have to
        //populate the matrix with the first 2^list.Count binary numbers
        int rowDimension = Convert.ToInt32(Math.Pow(2, list.Count));

        //Now we start counting! We will fill our matrix with every number from 1 
        //(0 is meaningless) to rowDimension
        //we are creating binary mask, hence the name
        List<int[]> combinationMasks = new List<int[]>();
        for (int i = 1; i < rowDimension; i++)
        {
            //I'll grab the binary rapresentation of the number
            string binaryString = Convert.ToString(i, 2);

            //I'll initialize an array of the apropriate dimension
            int[] mask = new int[list.Count];

            //Now, we have to convert our string in a array of 0 and 1, so first we 
            //obtain an array of int then we have to copy it inside our mask 
            //(which have the appropriate dimension), the Reverse()
            //is used because of the behaviour of CopyTo()
            binaryString.Select(x => x == '0' ? 0 : 1).Reverse().ToArray().CopyTo(mask, 0);

            //Why should we keep masks of a dimension which isn't the one of the subset?
            // We have to filter it then!
            if (mask.Sum() == subsetDimension) combinationMasks.Add(mask);
        }

        //And now we apply the matrix to our list
        foreach (int[] mask in combinationMasks)
        {
            List<T> temporaryList = new List<T>(list);

            //Executes the cycle in reverse order to avoid index out of bound
            for (int iter = mask.Length - 1; iter >= 0; iter--)
            {
                //Whenever a 0 is found the correspondent item is removed from the list
                if (mask[iter] == 0)
                    temporaryList.RemoveAt(iter);
            }
            yield return temporaryList;
        }
    }
}

因此,考虑问题中的示例:

# Row Dimension of 4 (list.Count)
Binary Numbers to 2^4

# Binary Matrix
0 0 0 1 => skip
0 0 1 0 => skip
[...]
0 1 1 1 => added // Text2;Text3;Text4
[...]
1 0 1 1 => added // Text1;Text3;Text4
1 1 0 0 => skip
1 1 0 1 => added // Text1;Text2;Text4
1 1 1 0 => added // Text1;Text2;Text3
1 1 1 1 => skip

希望这能帮到某些人 :)
如果您需要澄清或想要贡献,请随意添加答案或评论(哪一个更合适)。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接