当比较两个不同长度的整数数组时,如何提高性能?

4

我在这个比较器中遇到了瓶颈(或者至少是我认为可以改进的地方),它基本上是一个序数字符串比较器,但是针对整数(ushort,不过我认为这并不重要)数组。

数组的长度可能不同,但是如果元素[0..n]匹配,其中n是最短数组的长度,则长度仅会产生影响。在这种情况下,较长的数组被认为是“更大的”。

1 2 3 < 1 2 4

1 2 3 5 < 1 2 4

1 2 5 3 > 1 2 4

1 2 3 4 > 1 2 3

    public int Compare(ushort[] x, ushort[] y)
    {
        int pos = 0;
        int len = Math.Min(x.Length, y.Length);
        while (pos < len && x[pos] == y[pos])
            pos++;

        return pos < len ?
            x[pos].CompareTo(y[pos]) :
            x.Length.CompareTo(y.Length);

    }

有没有关于如何优化这个问题的想法?
更新
回应评论者提出的关于我实际在做什么的问题:我意识到很久以前我曾经问过一个与此相关的问题,它清楚地展示了我在上下文中所做的事情。唯一的主要区别是我现在使用一个ushort数组而不是字符串作为键,因为它更加紧凑。
使用整个路径作为键让我可以使用部分键从排序集合中获取视图,这对于子集查询提供了高性能。我正在尝试提高建立索引时的性能。数据结构用于子集索引搜索 顺便说一句,我对这里迄今为止的回答印象非常深刻,我在SO上问了很多问题,但这是我见过的最周到和有趣的答案集合。我还不确定我的具体问题的正确答案是什么(即数百万短数组的比较),但每一个答案都教给了我一些我不知道的东西。

2
你可以跳过 Math.Minint len = x.Length > y.Length ? y.Length : x.Length; - Dustin Kingen
你是在做很多小数组的比较,还是在做一些非常大数组的比较? - fjardon
如果必须遍历数组,那么你就不能更好了... 你能否将数据表示为其他形式(例如类似于基数树),这样相同部分的比较就不需要遍历所有元素了? - Alexei Levenkov
@Romoku 目前还没有尝试所有的建议,但到目前为止,您的评论已经产生了最大的影响 :-/ .. 我还摆脱了 CompareToreturn 中的使用,并将其编码成长格式,这也有所帮助。有时候最明显的事情是最有效的。 - Jamie Treworgy
其实我忘了很久以前我曾经问过一个与此相关的问题,这正好展示了我正在做的事情,只不过现在我使用的是ushort数组而不是字符串作为键,因为它更加紧凑。使用整个路径作为键让我可以使用部分键从排序集合中获取视图,这对于子集查询提供了高性能。我正在尝试提高构建索引时的性能。https://dev59.com/dVnUa4cB1Zd3GeqPXiBN - Jamie Treworgy
显示剩余3条评论
4个回答

3

以下是我想到的方法,我结合了你的代码和一些并行代码,测试了大约1600万(2^24)。

public int CompareParallel(ushort[]x, ushort[] y, int len, int segLen)
{
    int compareArrLen = ( len / segLen ) + 1;
    int [ ] compareArr = new int [ compareArrLen ];
    Parallel.For ( 0 , compareArrLen , 
                   new Action<int , ParallelLoopState> ( ( i , state ) =>
    {
        if ( state.LowestBreakIteration.HasValue 
                 && state.LowestBreakIteration.Value < i )
            return;
        int segEnd = ( i + 1 ) * segLen;
        int k = len < segEnd ? len : segEnd;
        for ( int j = i * segLen ; j < k ; j++ )
            if ( x [ j ] != y [ j ] )
            {
                compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                state.Break ( );
                return;
            }
    } ) );
    int r = compareArrLen - 1;
    while ( r >= 0 )
    {
        if ( compareArr [ r ] != 0 )
            return compareArr [ r ];
        r--;
    }
    return x.Length.CompareTo ( y.Length );
}

public int CompareSequential ( ushort [ ] x , ushort [ ] y, int len )
{
    int pos = 0;
    while ( pos < len && x [ pos ] == y [ pos ] )
        pos++;

    return pos < len ?
        x [ pos ].CompareTo ( y [ pos ] ) :
        x.Length.CompareTo ( y.Length );

}

public int Compare( ushort [ ] x , ushort [ ] y ) 
{
    //determined through testing to be the best on my machine
    const int cutOff = 4096;
    int len = x.Length < y.Length ? x.Length : y.Length;
    //check if len is above a specific threshold 
    //and if first and a number in the middle are equal
    //chose equal because we know that there is a chance that more
    //then 50% of the list is equal, which would make the overhead
    //worth the effort
    if ( len > cutOff && x [ len - 1 ] == y [ len - 1 ] 
           && x [ len/2 ] == y [ len/2 ] )
    {
        //segment length was determined to be best through testing
        //at around 8% of the size of the array seemed to have the
        //on my machine
        return CompareParallel ( x , y , len , (len / 100)*8 );
    }
    return CompareSequential ( x , y, len );
}

这是我写的测试:

class Program
{
    [Flags]
    private enum InfoLevel:byte
    {
        Detail=0x01, Summary=0x02
    }

    private static InfoLevel logLevel = InfoLevel.Summary;

    private static void LogDetail ( string content ) 
    {
        LogInfo ( InfoLevel.Detail,content );
    }

    private static void LogSummary ( string content ) 
    {
        LogInfo ( InfoLevel.Summary , content );
    }

    private static void LogInfo ( InfoLevel level , string content ) 
    {
        if ( ( level & logLevel ) == level )
            Console.WriteLine ( content );
    }

    private static void LogInfo ( InfoLevel level , string format, 
                                  params object[] arg )
    {
        if ( ( level & logLevel ) == level )
            Console.WriteLine ( format:format, arg:arg  );
    }

    private static void LogDetail ( string format , params object [ ] arg )
    {
        LogInfo ( InfoLevel.Detail , format, arg );
    }

    private static void LogSummary ( string format , params object [ ] arg )
    {
        LogInfo ( InfoLevel.Summary , format, arg );
    }

    const string _randTestResultHeader = "\r\nRandom Array Content\r\n";
    const string _equalArrayResultHeader = "Only Length Different\r\n\r\n";
    const string _summaryTestResultsHeader = 
                                "Size\t\tOrig Elps\tPara Elps\tComp Elps\r\n";
    const string _summaryBodyContent = 
                         "{0}\t\t{1:0.0000}\t\t{2:0.0000}\t\t{3:0.00000}\r\n";

    static void Main ( string [ ] args )
    {
        Console.SetOut(new StreamWriter(File.Create("out.txt")));

        int segLen = 0;
        int segPercent = 7;
        Console.WriteLine ( "Algorithm Test, Time results in milliseconds" );
        for ( ; segPercent < 13; segPercent ++ )
        {
            Console.WriteLine ( 
                      "Test Run with parallel Dynamic segment size at {0}%"
                       +" of Array Size (Comp always at 8%)\r\n" , segPercent);

            StringBuilder _aggrRandResults = new StringBuilder ( );
            StringBuilder _aggrEqualResults = new StringBuilder ( );

            _aggrRandResults.Append ( _randTestResultHeader );
            _aggrEqualResults.Append ( _equalArrayResultHeader );

            _aggrEqualResults.Append ( _summaryTestResultsHeader );
            _aggrRandResults.Append ( _summaryTestResultsHeader );


            for ( int i = 10 ; i < 25 ; i++ )
            {
                int baseLen = ( int ) Math.Pow ( 2 , i );
                segLen = ( baseLen / 100 ) * segPercent;

                var testName = "Equal Length ";
                var equalTestAverage = RandomRunTest ( testName , baseLen , 
                                                       baseLen, segLen );
                testName = "Left Side Larger";
                var lslargerTestAverage=RandomRunTest(testName,baseLen+10, 
                                                      baseLen, segLen );
                testName = "Right Side Larger";
                var rslargerTestAverage = RandomRunTest ( testName , baseLen ,
                                                        baseLen + 10, segLen );

                double [ ] completelyRandomTestAvg = new double [ 3 ];
                for ( int l = 0 ; l < completelyRandomTestAvg.Length ; l++ )
                    completelyRandomTestAvg [ l ] = ( equalTestAverage [ l ] +
                                                 lslargerTestAverage [ l ] +
                                              rslargerTestAverage [ l ] ) / 3;

                LogDetail ( "\r\nRandom Test Results:" );
                LogDetail ("Original Composite Test Average: {0}" ,
                           completelyRandomTestAvg [ 0 ] );
                LogDetail ( "Parallel Composite Test Average: {0}" ,
                            completelyRandomTestAvg [ 1 ]  );

                _aggrRandResults.AppendFormat ( _summaryBodyContent , 
                    baseLen , 
                    completelyRandomTestAvg [ 0 ] , 
                    completelyRandomTestAvg [ 1 ] , 
                    completelyRandomTestAvg [ 2 ]);

                testName = "Equal Len And Values";
                var equalEqualTest = EqualTill ( testName , baseLen , 
                                                 baseLen, segLen );

                testName = "LHS Larger";
                var equalLHSLargerTest = EqualTill ( testName , baseLen + 10 , 
                                                     baseLen, segLen );

                testName = "RHS Larger";
                var equalRHSLargerTest = EqualTill ( testName , baseLen , 
                                                     baseLen + 10, segLen );

                double [ ] mostlyEqualTestAvg = new double [ 3 ];
                for ( int l = 0 ; l < mostlyEqualTestAvg.Length ; l++ )
                    mostlyEqualTestAvg [ l ] = ( ( equalEqualTest [ l ] +
                                            equalLHSLargerTest [ l ] +
                                            equalRHSLargerTest [ l ] ) / 3 );

                LogDetail( "\r\nLength Different Test Results" );
                LogDetail( "Original Composite Test Average: {0}" , 
                           mostlyEqualTestAvg [ 0 ] );
                LogDetail( "Parallel Composite Test Average: {0}" , 
                            mostlyEqualTestAvg [ 1 ] );

                _aggrEqualResults.AppendFormat ( _summaryBodyContent , 
                                                 baseLen , 
                                                 mostlyEqualTestAvg [ 0 ] , 
                                                 mostlyEqualTestAvg [ 1 ] ,
                                                 mostlyEqualTestAvg [ 2 ]);
            }

            LogSummary ( _aggrRandResults.ToString() + "\r\n");
            LogSummary ( _aggrEqualResults.ToString()+ "\r\n");

        }
        Console.Out.Flush ( );
    }


    private const string _testBody = 
                  "\r\n\tOriginal:: Result:{0}, Elapsed:{1}"
                 +"\r\n\tParallel:: Result:{2}, Elapsed:{3}"
                 +"\r\n\tComposite:: Result:{4}, Elapsed:{5}";
    private const string _testHeader = 
                  "\r\nTesting {0}, Array Lengths: {1}, {2}";
    public static double[] RandomRunTest(string testName, int shortArr1Len, 
                                         int shortArr2Len, int parallelSegLen)
    {

        var shortArr1 = new ushort [ shortArr1Len ];
        var shortArr2 = new ushort [ shortArr2Len ];
        double [ ] avgTimes = new double [ 3 ];

        LogDetail ( _testHeader , testName , shortArr1Len , shortArr2Len ) ;
        for ( int i = 0 ; i < 10 ; i++ )
        {
            int arrlen1 = shortArr1.Length , arrlen2 = shortArr2.Length;

            double[] currResults = new double [ 3 ];

            FillCompareArray ( shortArr1 , shortArr1.Length );
            FillCompareArray ( shortArr2 , shortArr2.Length );

            var sw = new Stopwatch ( );

            //Force Garbage Collection 
            //to avoid having it effect 
            //the test results this way 
            //test 2 may have to garbage 
            //collect due to running second
            GC.Collect ( );
            sw.Start ( );
            int origResult = Compare ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currResults[0] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int parallelResult = CompareParallelOnly ( shortArr1 , shortArr2, 
                                                       parallelSegLen );
            sw.Stop ( );
            currResults [ 1 ] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int compositeResults = CompareComposite ( shortArr1 , shortArr2 );
            sw.Stop ( );                
            currResults [ 2 ] = sw.Elapsed.TotalMilliseconds;

            LogDetail ( _testBody, origResult , currResults[0] , 
                        parallelResult , currResults[1], 
                        compositeResults, currResults[2]);

            for ( int l = 0 ; l < currResults.Length ; l++ )
                avgTimes [ l ] = ( ( avgTimes[l]*i)+currResults[l]) 
                                    / ( i + 1 );
        }
        LogDetail ( "\r\nAverage Run Time Original: {0}" , avgTimes[0]);
        LogDetail ( "Average Run Time Parallel: {0}" , avgTimes[1]);
        LogDetail ( "Average Run Time Composite: {0}" , avgTimes [ 2 ] );

        return avgTimes;
    }

    public static double [ ] EqualTill ( string testName, int shortArr1Len , 
                                       int shortArr2Len, int parallelSegLen)
    {

        const string _testHeader = 
               "\r\nTesting When Array Difference is "
               +"Only Length({0}), Array Lengths: {1}, {2}";

        int baseLen = shortArr1Len > shortArr2Len 
                          ? shortArr2Len : shortArr1Len;

        var shortArr1 = new ushort [ shortArr1Len ];
        var shortArr2 = new ushort [ shortArr2Len ];
        double [ ] avgTimes = new double [ 3 ];

        LogDetail( _testHeader , testName , shortArr1Len , shortArr2Len );
        for ( int i = 0 ; i < 10 ; i++ )
        {

            FillCompareArray ( shortArr1 , shortArr1Len);
            Array.Copy ( shortArr1 , shortArr2, baseLen );
            double [ ] currElapsedTime = new double [ 3 ];
            var sw = new Stopwatch ( );
            //See previous explaination 
            GC.Collect ( );
            sw.Start ( );
            int origResult = Compare ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currElapsedTime[0] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int parallelResult = CompareParallelOnly ( shortArr1, shortArr2, 
                                     parallelSegLen );
            sw.Stop ( );
            currElapsedTime[1] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            var compositeResult = CompareComposite ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currElapsedTime [ 2 ] = sw.Elapsed.TotalMilliseconds;

            LogDetail ( _testBody , origResult , currElapsedTime[0] , 
                parallelResult , currElapsedTime[1], 
                compositeResult,currElapsedTime[2]);

            for ( int l = 0 ; l < currElapsedTime.Length ; l++ )
                avgTimes [ l ] = ( ( avgTimes [ l ] * i ) 
                                   + currElapsedTime[l])/(i + 1);
        }
        LogDetail ( "\r\nAverage Run Time Original: {0}" , avgTimes [ 0 ] );
        LogDetail ( "Average Run Time Parallel: {0}" , avgTimes [ 1 ] );
        LogDetail ( "Average Run Time Composite: {0}" , avgTimes [ 2 ] );
        return avgTimes;
    }


    static Random rand = new Random ( );
    public static void FillCompareArray ( ushort[] compareArray, int length ) 
    {
        var retVals = new byte[length];
        ( rand ).NextBytes ( retVals );
        Array.Copy ( retVals , compareArray , length);
    }

    public static int CompareParallelOnly ( ushort [ ] x , ushort[] y, 
                                            int segLen ) 
    {
       int len = x.Length<y.Length ? x.Length:y.Length;
       int compareArrLen = (len/segLen)+1;
       int[] compareArr = new int [ compareArrLen ];
       Parallel.For ( 0 , compareArrLen , 
           new Action<int , ParallelLoopState> ( ( i , state ) =>
       {
           if ( state.LowestBreakIteration.HasValue 
                    && state.LowestBreakIteration.Value < i )
               return;
           int segEnd = ( i + 1 ) * segLen;
           int k = len<segEnd?len:segEnd;

           for ( int j = i * segLen ; j < k ; j++ )
               if ( x [ j ] != y [ j ] )
               {
                   compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                   state.Break ( );
                   return;
               }
       } ) );
       int r=compareArrLen-1;
       while ( r >= 0 ) 
       {
           if ( compareArr [ r ] != 0 )
               return compareArr [ r ];
           r--;
       }
       return x.Length.CompareTo ( y.Length );
    }

    public static int Compare ( ushort [ ] x , ushort [ ] y )
    {
        int pos = 0;
        int len = Math.Min ( x.Length , y.Length );
        while ( pos < len && x [ pos ] == y [ pos ] )
            pos++;

        return pos < len ?
            x [ pos ].CompareTo ( y [ pos ] ) :
            x.Length.CompareTo ( y.Length );

    }

    public static int CompareParallel ( ushort[] x, ushort[] y, int len, 
                                        int segLen )
    {
        int compareArrLen = ( len / segLen ) + 1;
        int [ ] compareArr = new int [ compareArrLen ];
        Parallel.For ( 0 , compareArrLen , 
            new Action<int , ParallelLoopState> ( ( i , state ) =>
        {
            if ( state.LowestBreakIteration.HasValue 
                 && state.LowestBreakIteration.Value < i )
                return;
            int segEnd = ( i + 1 ) * segLen;
            int k = len < segEnd ? len : segEnd;
            for ( int j = i * segLen ; j < k ; j++ )
                if ( x [ j ] != y [ j ] )
                {
                    compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                    state.Break ( );
                    return;
                }
        } ) );
        int r = compareArrLen - 1;
        while ( r >= 0 )
        {
            if ( compareArr [ r ] != 0 )
                return compareArr [ r ];
            r--;
        }
        return x.Length.CompareTo ( y.Length );
    }

    public static int CompareSequential(ushort [ ] x , ushort [ ] y, int len)
    {
        int pos = 0;
        while ( pos < len && x [ pos ] == y [ pos ] )
            pos++;

        return pos < len ?
            x [ pos ].CompareTo ( y [ pos ] ) :
            x.Length.CompareTo ( y.Length );

    }

    public static int CompareComposite ( ushort [ ] x , ushort [ ] y ) 
    {
        const int cutOff = 4096;
        int len = x.Length < y.Length ? x.Length : y.Length;

        if ( len > cutOff && x [ len - 1 ] == y [ len - 1 ]
                 && x [ len/2 ] == y [ len/2 ] )
            return CompareParallel ( x , y , len , (len / 100)*8 );

        return CompareSequential ( x , y, len );
    }
}

注意:

确保使用优化过的代码进行构建,如果没有包含此步骤,则结果会有很大不同,它会使并行代码看起来比实际改进要大得多。

我得到的结果是:在非常长的相等数字集上,执行时间减少了约33%。随着输入的增加,它仍然呈线性增长,但增长速度较慢。对于小数据集(少于4092个),它也开始变慢,但通常所需时间足够短(在我的计算机上为0.001毫秒),所以如果您得到大型几乎相等的数组,使用它将是值得的。


我正在比较许多相对较小的数组。这些数组代表HTML文档中节点的路径;元素数量是节点的深度,因此大多数情况下不太可能超过30或40。无论如何,Charles,你做得很好! - pkuderov
接受这个答案,因为它可能是最普遍适用的,尽管除了@Romoku的评论“摆脱Math.min”之外,它们都没有为我的非常特定的用例做太多事情。但是任何阅读此内容的人都应该建议查看所有答案,这里有很多见解。 - Jamie Treworgy

1

可能不会有太大的差别,但你可以将最后一个元素设置为不同的值,以消除 while 循环中的 pos < len 检查。还有相当琐碎的 pos++ 可以改成 ++pos

public int Compare(ushort[] x, ushort[] y)
{
    int pos = 0;
    int len = Math.Min(x.Length, y.Length);

    // the below is probably not worth it for less than 5 (or so) elements,
    //   so just do the old way
    if (len < 5)
    {
      while (pos < len && x[pos] == y[pos])
        ++pos;

      return pos < len ?
        x[pos].CompareTo(y[pos]) :
        x.Length.CompareTo(y.Length);
    }

    ushort lastX = x[len-1];
    bool lastSame = true;
    if (x[len-1] == y[len-1])
        --x[len-1]; // can be anything else
    else
        lastSame = false;

    while (x[pos] == y[pos])
        ++pos;

    return pos < len-1 ?
        x[pos].CompareTo(y[pos]) :
        lastSame ? x.Length.CompareTo(y.Length)
                 : lastX.CompareTo(y[len-1]);
}

编辑:只有当许多元素从一开始就相同时,您才会真正获得性能提升(并且当存在早期差异时,它会更糟,正如pkuderov所提到的那样)。


看起来不错!虽然在前几个项目不同(早期循环中断)的情况下会有一些性能下降,但我认为这是值得的。 - pkuderov
有趣。因为这些数组代表树中的路径(在此之前我应该提到),事实上当这些数组具有相同的长度时,大多数元素会经常匹配。我要尝试一下这个... - Jamie Treworgy

1
抱歉回答有点长,但这个问题让我很感兴趣,花了几个小时进行调查,并想分享结果。我编写了一些测试用例生成器和粗略的性能测试工具。
以下是内容:
1. 生成完全随机的数组 2. 检查3种比较方法的执行时间 3. 生成高相似度的数组 4. 检查执行时间。
我使用了3种方法:
1. OP的 2. 我的 - 思路 - 将两个索引操作更改为指针增量 3. Dukeling的 - 思路 - 删除不必要的比较
我从短数组(长度为5-15)开始。
在两种测试变化中,方法1是最快的(由pkuderov预测)。
如果我们增加数组的长度,情况就会改变。
当数组长度在500到1500之间时,我得到了以下结果。
Generating test cases ...
Done. (5258 milliseconds)
Compare1 took 18 milliseconds
Compare2 took 18 milliseconds
Compare3 took 33 milliseconds
Generating 'similar' test cases ...
Done. (5081 milliseconds)
Compare1 took 359 milliseconds
Compare2 took 313 milliseconds
Compare3 took 295 milliseconds

因此,与方法1相比,方法2略有收益,而与方法2相比,方法3的收益更小;

解决方案:

1. 如果您的数组足够短和/或有很高的概率差异从小索引值开始 - 您不能做太多事情(使用建议的方法)
2. 否则,您可以尝试一些方法2和3的组合。

代码:

using System;
using System.Diagnostics;

namespace ConsoleExamples
    {
        class ArrayComparePerformance
        {
            static readonly int testArraysNum = 100000;
            static readonly int maxArrayLen = 1500;
            static readonly int minArrayLen = 500;
            static readonly int maxValue = 10;

            public static void RunTest()
            {
                //Generate random arrays;
                ushort[][] a = new ushort[testArraysNum][];
                ushort[][] b = new ushort[testArraysNum][];

                Random rand = new Random();

                Console.WriteLine("Generating test cases ... " );

                Stopwatch sw = new Stopwatch();
                sw.Start();

                for (int i = 0; i < testArraysNum; i++)
                {
                    int len = rand.Next(maxArrayLen) + 1;
                    a[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        a[i][j] = (ushort) rand.Next(maxValue);
                    }



                    len = rand.Next(maxArrayLen) + 1;
                    b[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        b[i][j] = (ushort) rand.Next(maxValue);
                    }



                }

                sw.Stop();
                Console.WriteLine("Done. ({0} milliseconds)", sw.ElapsedMilliseconds);


                //compare1
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare1(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare1 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare2
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare2(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare2 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare3
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare3(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare3 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");


                //Generate "similar" arrays;

                Console.WriteLine("Generating 'similar' test cases ... ");

                sw.Restart();

                for (int i = 0; i < testArraysNum; i++)
                {
                    int len = rand.Next(maxArrayLen - minArrayLen) + minArrayLen -1;
                    a[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        if (j < len/2)
                            a[i][j] = (ushort)j;
                        else
                            a[i][j] = (ushort)(rand.Next(2)  + j);
                    }



                    len = rand.Next(maxArrayLen - minArrayLen) + minArrayLen - 1;
                    b[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        if (j < len/2)
                            b[i][j] = (ushort)j;
                        else
                            b[i][j] = (ushort)(rand.Next(2)  + j);
                    }
                }

                sw.Stop();
                Console.WriteLine("Done. ({0} milliseconds)", sw.ElapsedMilliseconds);


                //compare1
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare1(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare1 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare2
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare2(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare2 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare3
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare3(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare3 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");


                Console.ReadKey();
            }

            public static int Compare1(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);
                while (pos < len && x[pos] == y[pos])
                    pos++;

                return pos < len ?
                    x[pos].CompareTo(y[pos]) :
                    x.Length.CompareTo(y.Length);
            }

            public unsafe static int Compare2(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);
                fixed (ushort* fpx = &x[0], fpy = &y[0])
                {
                    ushort* px = fpx;
                    ushort* py = fpy;
                    while (pos < len && *px == *py)
                    {
                        px++;
                        py++;
                        pos++;
                    }
                }

                return pos < len ?
                    x[pos].CompareTo(y[pos]) :
                    x.Length.CompareTo(y.Length);
            }

            public static int Compare3(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);

                // the below is probably not worth it for less than 5 (or so) elements,
                //   so just do the old way
                if (len < 5)
                {
                    while (pos < len && x[pos] == y[pos])
                        ++pos;

                    return pos < len ?
                      x[pos].CompareTo(y[pos]) :
                      x.Length.CompareTo(y.Length);
                }

                ushort lastX = x[len - 1];
                bool lastSame = true;
                if (x[len - 1] == y[len - 1])
                    --x[len - 1]; // can be anything else
                else
                    lastSame = false;

                while (x[pos] == y[pos])
                    ++pos;

                return pos < len - 1 ?
                    x[pos].CompareTo(y[pos]) :
                    lastSame ? x.Length.CompareTo(y.Length)
                             : lastX.CompareTo(y[len - 1]);
            }
        }
    }

这很棒。方法3的主要问题是删除比较是破坏性的;我必须在之后将其设置回来,这会导致线程安全问题。因此,这可能不是一个选项。有趣的是,到目前为止,产生最大差异的是像第一位评论者建议的去掉Math.Min.. 将尝试您的指针建议。我甚至尝试使用memcmp仅比较相等长度的部分,但实际上这会减慢事情,我认为短数组的封送开销不值得。 - Jamie Treworgy

1
只是一些想法(可能有误,需要测试):

第一点。 较大的数据类型(例如 x32 中的 int 或 x64 中的 long - 让我们将此类型命名为TLong)可能提供更好的性能。如果您在TLong类型的项目中(按big-endian顺序)打包多个ushort项目,则可以同时比较多个项目。但是,如果新数组的最后一个项目未被填满,则需要注意该项目。可能会有一些“棘手的情况”。我现在看不到任何情况,但我不确定。

第二点。 更多!在某些情况下,我们可以将更多原始项目打包到TLong类型的项目中。让我们回到ushort类型的初始数组:假设K是所有数组中存在的最大数字(即要排序的所有路径!)(即每个ushort存储的每个数字t都是真实的:t <= K)。然后想象一下,每个t只是基于K进制数系统中的“数字”。这意味着您图表中的每个路径(即每个ushort数组)仅确定此数字系统中的一个数字。因此,您需要执行类似以下操作而不是操作ushort数组:

  1. 确定最大的 K 的幂可以适应类型 TLong - 假设它是 p:

    int p = 0;
    while (Math.Exp(K, p) <= TLong.MaxValue)
        p++;
    
  2. ushort 数组的第 i 个 p 项,并在基于 K 的数字系统中计算相应的数字,并将其保留为类型 TLong 数组的第 i 项:

    List<TLong> num = new List<TLong>();
    int i = 0;
    while (p * i < a.Length)
    {
        TLong y = 0;
    
        //将十进制数转换为按 p 个元素为一组的基于 K 的数
        for (int j = p * i; j < Math.Min(p * (i + 1), a.Length); j++)
            y = y * K + a[j];
    
        num.Add(sum);
        i++;
    }
    TLong[] result = num.ToArray();
    
  3. 这是一个动态预计算的转换,因此对于不同的 HTML 文档,K 可能是不同的,如果 K 远小于 255,则比第一个想法更快。此外,预计算的转换具有线性复杂度,因此不会对性能产生很大影响。

  4. 通常,您将数组转换为基于 K 数字系统中的大数存储在数组中。就是这样!
  5. 不需要使用其他评论中的改进更改初始排序算法(无论如何都要检查 - 我可能是错误的!)
我希望它能对你有所帮助。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接