为什么有线和无线没有明显的区别呢?

6
我想评估Java的线程功能。
我创建了一个没有线程的演示,如下所示:
```java import java.util.*; public class NoThread {
public static void main(String[] args) { NoThread Obj= new NoThread(); Date BeforeDate = new Date();
Obj.run();
Date AfterDate = new Date(); Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0;
System.out.println("Time Consume= " + Time_Consume + " Seconds" ); }
public void run() { String tmp = ""; for (int i = 0; i < 100000; i++) { tmp += i; } }
} ```
缺点:
控制台显示:Time Consume= 4.771 秒
我创建了一个带有线程的演示。
```java import java.util.Date; public class ThreadTest extends Thread { public static void main(String[] args) { ThreadTest[ ] ObjArray = new ThreadTest[5];
for(int i=0;i<5;i++) ObjArray[i]= new ThreadTest();
Date BeforeDate = new Date();
for(int i=0; i<5;i++ ) { ObjArray[i].start(); }
try { for(int i=0; i<5;i++ ) ObjArray[i].join(); } catch(InterruptedException e) { }
Date AfterDate = new Date(); Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0; System.out.println("Time Consume= " + Time_Consume + " Seconds" ); }
public void run() { String tmp = ""; for (int i = 0; i < 100000; i++) { tmp += i; } } } ```
控制台显示:Time Consume= 18.658 秒
在第二个演示中,我创建了五个线程,在这五个线程中并行运行相同的函数。我期望时间消耗与第一个项目相似,也就是接近五秒钟。
然而,实际情况远远超出了我的预期,接近于5(运行次数)* 4.771 = 23.85秒,就像是顺序执行一样。
为什么会这样呢?

@MarcLeBihan:这不就是糟糕的缩进吗? - undefined
2
你有5个中央处理器吗?5个核心?你期望的依据是什么? - undefined
将_ThreadTest_的_run_方法的循环迭代次数减少到_20000_,即_100000_的五分之一。 - undefined
我使用的是戴尔笔记本电脑,配备第12代英特尔® Core™ i5处理器 i5-1245U。从规格上看,我看到了以下信息:总核心数:10 性能核心数:2 高效核心数:8 - undefined
2
好的,这个循环会产生大量的垃圾,而且垃圾收集器并不像你的代码那样完全支持多线程。你选择使用字符串构建作为测试的原因是什么? - undefined
5个回答

2
你的代码有几个问题,其中一些直接影响了你的问题的答案。
首先,在现代的Java中,我们很少需要直接使用Thread类。相反,我们可以使用Java 5+中的Executors框架。将你的任务定义为一个Runnable或Callable对象。将一个或多个实例提交给执行器服务。该执行器服务可以由一个线程池支持,而你无需管理这些线程的设置/拆除,也无需管理在这些线程之间调度任务。
为了清晰起见,让我们将你的main方法移到它自己的类中。这样我们就可以将你的任务工作移到一个实现Runnable接口的类中,这意味着它实现了一个run方法。完全不需要继承Thread类。
你的任务代码没有任何实际作用。没有传递给其他代码的内容,没有写入存储,没有通过网络调用,没有保存到数据库,也没有在控制台上报告任何内容。这样的代码可能会被编译器优化掉。因此,我们添加了一个调用System.out.println的语句,以避免这种优化。
你的任务代码将整数转换为文本,并进行拼接。在实际工作中,我们可能会使用StringBuilder来提高效率(尽管在某些Java的实现中,这可能会在后台自动完成),并使代码更加自说明。
我们希望报告每个任务运行的经过时间。
为此,请不要使用java.util.Date。事实上,不要再使用Date类中的任何一个。这些类在多年前就被现代的java.time类所取代,这些类在Java 8+中由JSR 310定义。特别是java.util.Date类被java.time.Instant所取代。
对于微基准测试而言,最精确的时间跟踪方法是使用System.nanoTime。这个调用会获取当前递增的纳秒计数器的值。
使用下划线来格式化数字字面量,以便更容易阅读。比如100_000
按照惯例,Java中的变量名以小写字母开头。
所以这是我们的任务类:
package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        StringBuilder tmp = new StringBuilder ( );
        for ( int i = 0 ; i < 100_000 ; i++ )
        {
            tmp.append ( i );  // Converting integer number to text, and concatenating.
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

现在有一个应用程序类来执行这个任务。
package work.basil.example.looping;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App() ;
        app.demoInThisThread();
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter () ;
        task.run() ;
    }
}

当运行时:
线程ID:1,于2023年09月17日06:01:24.691381Z,结果字符长度为:488890。耗时:PT0.003877583S。
在我的机器上(MacBook Pro,16英寸,2021年,Apple M1 Pro,16 GB,macOS Ventura 13.5.2),你的代码花费了PT1.456853583S。所以你可以看到,与这段代码相比,使用StringBuilder比String更高效。但是对于这个线程测试,我们并不真正需要效率,所以我会恢复使用String。
请注意,这种使用String拼接的方式会产生大量垃圾供垃圾收集器管理,正如由user207421评论。这么多垃圾可能会以不可预测的方式影响结果。
package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        String tmp = "";
        for ( int i = 0 ; i < 100000 ; i++ ) tmp += i;
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

当运行时:
线程ID:1,时间为2023-09-17T06:09:03.218903Z,结果字符长度为488890。经过时间:PT1.401553292S。
然后你创建了五个线程。在每个线程中,你执行了相同的任务,即循环100,000次。所以你做了五倍的工作,总共500,000次迭代。更多的工作就是更多的工作,所以线程并不能让额外的工作消失。CPU核心仍然必须进行五倍的字符串拼接。
为了更公平地评估线程的好处,你应该将工作分配给每个线程,使每个线程执行100,000次的一部分。我们可以通过给我们的任务类添加一个构造函数来实现这一点,传递所需的迭代次数。
package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    private final int count;

    public Counter ( final int count ) { this.count = count; }

    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        String tmp = "";
        for ( int i = 0 ; i < this.count ; i++ ) tmp += i;
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced result character length of: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

我们的应用程序代码:
package work.basil.example.looping;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
        app.demoInThisThread ( );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

结果是一样的。
现在我们准备运行五个线程,每个线程有20,000次迭代,作为它们在100,000中的一部分。
我们实例化一个由五个线程支持的执行器服务。然后我们循环提交五个任务实例到该执行器服务。我们使用try-with-resources语法在提交的任务完成后自动关闭我们的ExecutorService。
package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
//        app.demoInThisThread ( );
        app.demoInBackgroundThreads ( );
    }

    private void demoInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Counter ( 20_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

结果大不相同。不再需要花费超过一秒的时间,完成所有的5 * 20,000个连接只需要几分之一秒,即1/20秒。每个任务大约只需要1/20秒的时间,所以从数学上我们知道,在这台10核心的机器上,我们的代码是同时执行的,每个核心都在工作。
Thread ID: 24 at 2023-09-17T06:27:57.068636Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.1823735S
Thread ID: 23 at 2023-09-17T06:27:57.067782Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18159025S
Thread ID: 25 at 2023-09-17T06:27:57.070629Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18414225S
Thread ID: 22 at 2023-09-17T06:27:57.070073Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.183828875S
Thread ID: 21 at 2023-09-17T06:27:57.062894Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.176261S
Executor Service elapsed = PT0.190679125S at 2023-09-17T06:27:57.075335Z

所以为什么速度这么快呢?嗯,我们的测试有缺陷。重复连接字符串对象会导致字符串不断增长。但是20000次连接产生的字符串比100000次连接要小得多。小得多以至于涉及的工作量要少得多。所以这不是一个好的测试。(基准测试是困难的工作。)
一个更好的测试可能是生成随机数并计算平均值。在第一次尝试中,这项工作非常快,我将迭代次数扩大了十倍(100万和200,000)。并且我涉及了字符串↔️整数↔️整数转换,以增加工作量。即使如此,我们仍然只得到了几乎瞬间的结果。
任务:
package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;

public class Averaged implements Runnable
{
    private final int count;
    private final List < String > randoms;

    public Averaged ( final int count )
    {
        this.count = count;
        this.randoms = new ArrayList <> ( this.count );
    }

    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        for ( int i = 0 ; i < this.count ; i++ )
        {
            int x = ThreadLocalRandom.current ( ).nextInt ( 1 , Integer.MAX_VALUE );
            this.randoms.add ( String.valueOf ( x ) );
        }
        double average = this.randoms.stream ( ).map ( Integer :: valueOf ).mapToInt ( Integer :: intValue ).summaryStatistics ( ).getAverage ( );  // Intentionally involved auto-boxing as extra work for our test.
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced average of: " + average + ". Elapsed: " + elapsed );
    }
}

还有应用程序类:

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
//        app.demoInThisThread ( );
//        app.demoInBackgroundThreads ( );
//        app.demoSumsInSameThread ( );
        app.demoSumsInBackgroundThreads ( );
    }

    private void demoSumsInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Averaged ( 200_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoSumsInSameThread ( )
    {
        Runnable task = new Averaged ( 1_000_000 );
        task.run ( );
    }

    private void demoInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Counter ( 20_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

当在同一个线程中运行时,这一百万个数字大约需要8/10秒的时间。
Thread ID: 1 at 2023-09-17T07:05:24.256743Z for a count of 1000000 produced average of: 1.073135902798566E9. Elapsed: PT0.084711792S

当在多个线程上运行时,200,000 * 5个数字大约需要8/10或9/10秒。每个线程任务也需要8/10秒。因此,我们可以数学上得出结论,在这台机器上我们得到了同时运行的独立核心。
Thread ID: 21 at 2023-09-17T07:06:14.812452Z for a count of 200000 produced average of: 1.073304324790475E9. Elapsed: PT0.080241167S
Thread ID: 23 at 2023-09-17T07:06:14.812469Z for a count of 200000 produced average of: 1.073028061541545E9. Elapsed: PT0.080389542S
Thread ID: 25 at 2023-09-17T07:06:14.813533Z for a count of 200000 produced average of: 1.07282572828644E9. Elapsed: PT0.081019291S
Thread ID: 24 at 2023-09-17T07:06:14.813595Z for a count of 200000 produced average of: 1.076256760047685E9. Elapsed: PT0.081408875S
Thread ID: 22 at 2023-09-17T07:06:14.813600Z for a count of 200000 produced average of: 1.073103576810605E9. Elapsed: PT0.081662959S
Executor Service elapsed = PT0.098596875S at 2023-09-17T07:06:14.828988Z

为什么在使用线程时没有节省时间呢?我真的不知道为什么这个特定的任务对于20万和100万来说需要大致相同的时间。再次进行基准测试是困难的。
如果我们将线程测试更改为每个任务包含10万个数字的10个任务,在相同的10核心机器上进行,我们确实会看到某些任务的耗时大幅下降,只需要2/10秒,而我们可能预期需要4/10秒。但总体的分组时间大致相同,为9/10秒。
Thread ID: 25 at 2023-09-17T19:43:06.051570Z for a count of 100000 produced average of: 1.07519247943123E9. Elapsed: PT0.057728417S
Thread ID: 24 at 2023-09-17T19:43:06.053705Z for a count of 100000 produced average of: 1.07496720067476E9. Elapsed: PT0.060560125S
Thread ID: 22 at 2023-09-17T19:43:06.053489Z for a count of 100000 produced average of: 1.07711825115815E9. Elapsed: PT0.060462083S
Thread ID: 21 at 2023-09-17T19:43:06.052852Z for a count of 100000 produced average of: 1.07550130293061E9. Elapsed: PT0.059825042S
Thread ID: 23 at 2023-09-17T19:43:06.051057Z for a count of 100000 produced average of: 1.0755424631933E9. Elapsed: PT0.057704334S
Thread ID: 21 at 2023-09-17T19:43:06.080795Z for a count of 100000 produced average of: 1.07392238112309E9. Elapsed: PT0.013798042S
Thread ID: 25 at 2023-09-17T19:43:06.081217Z for a count of 100000 produced average of: 1.07513370104224E9. Elapsed: PT0.014378042S
Thread ID: 24 at 2023-09-17T19:43:06.083975Z for a count of 100000 produced average of: 1.07646007807133E9. Elapsed: PT0.017139583S
Thread ID: 22 at 2023-09-17T19:43:06.084319Z for a count of 100000 produced average of: 1.07482906529202E9. Elapsed: PT0.017476875S
Thread ID: 23 at 2023-09-17T19:43:06.084813Z for a count of 100000 produced average of: 1.07169205436235E9. Elapsed: PT0.017668375S
Executor Service elapsed = PT0.093519875S at 2023-09-17T19:43:06.085102Z

请记住,在这台机器的Apple Silicon M1 Pro芯片上,有2个核心被调整为高效,而其他8个核心被调整为性能 — 这可能会影响结果。
顺便说一下...我们这里的测试流程还很不完善。我们应该提前做一些工作来预热JVM等等。要进行真正的基准测试,请学会使用JMH。正如之前提到的,基准测试是困难的。
请注意,你的测试是CPU密集型的。这样的任务在实际的Java工作中相当罕见。通常Java工作涉及阻塞。阻塞来自于诸如写入存储、与数据库交互、日志记录、套接字或Web服务的网络调用、进程间通信等活动。对于这样的阻塞工作,考虑在Java 21+中使用虚拟线程(纤程)。
注意:当在多个线程中调用System.out.println时,输出的结果可能不会按照时间顺序出现在控制台上。如果您关心验证顺序,请始终使用时间戳,例如Instant.now()
例如,在上面的最后一个示例结果中,请注意这些行是无序的:
Thread ID: 24 at 2023-09-17T19:43:06.053705Z …
Thread ID: 22 at 2023-09-17T19:43:06.053489Z …
Thread ID: 21 at 2023-09-17T19:43:06.052852Z …
Thread ID: 23 at 2023-09-17T19:43:06.051057Z …

1
运行多个线程并不一定意味着它们都会“并行”运行。您需要有足够的CPU核心来以真正的并行方式运行CPU密集型代码。如果您只有一个核心,那么无论您有多少个线程,您的CPU密集型代码都不会运行得更快。由于上下文切换和多线程引入的其他混乱因素,代码可能会变得更慢。
多个线程只是使代码“并发”运行。如果在“run”结束时放置一个“System.out.println(“Done!”)”,您会发现它们几乎同时打印出来。线程不是按顺序运行,一个接一个地完成。
现在假设您有足够的核心来并行运行5个线程,那么如果每个线程都做类似于以下的操作,您将看到您期望的结果:
int x = 0;
for (int j = 0; j < Integer.MAX_VALUE; j++) {
    for (int k = 0; k < 4; k++) {
        x += j + k;
    }
}

对我来说,我有4个核心,所以使用上述代码运行3个线程(GC使用最后一个核心)大致与运行1个线程的时间相同。如果我增加到4个线程,执行时间开始增加。
你的代码变慢的原因是你在连接字符串,这会创建新的对象。而且内存实际上不能并行分配。对我来说,使用3个线程大约需要5.3秒,使用1个线程需要3.7秒。请注意,这不是简单的“一个线程所花费的时间乘以线程数”。确实有一些地方并行有帮助,但内存分配无法并行化。

非常感谢,我明白你的意思,“内存分配不是并行的”,我已经按照你的方式修改了运行函数。现在我可以看到:不使用线程运行,CPU利用率约为30% 时间消耗= 3.998秒 时间消耗= 3.017秒 时间消耗= 2.046秒 时间消耗= 2.988秒 时间消耗= 3.008秒使用4个线程运行,CPU利用率约为80%。 时间消耗= 4.545秒 时间消耗= 4.679秒 时间消耗= 4.609秒 时间消耗= 4.501秒 时间消耗= 3.998秒现在的时间消耗符合我的期望,因为多线程也会有执行开销。 - undefined

0
"...在第二个演示中,我创建了5个线程,在5个线程中并行运行相同的函数,我期望时间消耗与第一个项目相似,即接近5秒钟..."
我认为"并发"是一个更好的术语来描述"多线程"的过程。
具体来说,通常一个程序按照线性方式执行。 这条线性路径被称为"数据流"或"控制流"。
当一个程序使用线程时,多个控制流会并行或同时执行。 你现在可以将这条路径视为分支。
因此,这里的5个线程将会交织或"串联"在一起,同时执行各自的指令。 CPU时间非常重要。
总之,这5个线程实际上会花费大约5倍的时间。
对于ThreadTest,将run方法的loop迭代次数除以5
在这个例子中,我将值从100000减少到100
class ThreadTest extends Thread
{
    int i, limit;

    public static void main(String[] args) throws BrokenBarrierException, InterruptedException {
        // TODO Auto-generated method stub

        ThreadTest[ ] ObjArray = new ThreadTest[5];

        int n = 100 / 5;
        for(int i=0;i<5;i++)
            ObjArray[i]= new ThreadTest(n * i, (n * i) + n);

        Date BeforeDate = new Date();

        for(int i=0; i<5;i++ )
        {
            ObjArray[i].start();
        }

        try
        {

            for(int i=0; i<5;i++ )
                ObjArray[i].join();

        }
        catch(InterruptedException e)
        {

        }


        Date AfterDate = new Date();
        Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0;
        System.out.println("Time Consume= " + Time_Consume + " Seconds"  );



    }

    ThreadTest(int i, int limit) {
        this.i = i;
        this.limit = limit;
    }

    public void run() {
        String tmp = "";
        for (int i = this.i; i <= limit; i++) {
            tmp += i;
        }
    }

}

这里有几个比较。
NoThread, Time Consume= 0.046 Seconds
ThreadTest, Time Consume= 0.054 Seconds

NoThread, Time Consume= 0.027 Seconds
ThreadTest Time Consume= 0.029 Seconds

NoThread, Time Consume= 0.03 Seconds
ThreadTest, Time Consume= 0.022 Seconds

最后一点,还有一个叫做CyclicBarrier的类,它为一起工作以产生一个值的线程提供了一个缓冲区。

1
有道理,但应该是一个线程池,让线程从阻塞队列中获取任务,这样速度会更快。 - undefined

0
你们得意识到一些关于多线程代码性能提升的事情。相比于单线程,多线程在计算量较大时具有显著的时间性能优势。所以,如果你只是在一个线程中计算时间持续几次,而在5个线程中计算时间持续几次,你不会看到太大的差异。我为你写了一些代码,通过将其粘贴到你的IDE中并更改包名来运行。

这个代码展示了在提出的问题中测试多线程性能的当前方法,我们不仅仅计算5次时间持续,而是计算成千上万次,并比较多线程和单线程方法之间的时间差异,这就是这个类的作用。你可以配置线程数和要计算的时间持续数,它们是随机生成的。另一个提高多线程性能的方法是不是逐个创建线程,而是拥有一个线程池,所有线程都在一个队列上阻塞,计算下一个时间持续,当待计算的时间持续队列为空时,它们都返回并计算时间。我还使用了LocalDateTime,并使时间持续变量在两个日期时间之间以秒和分钟为单位。

话虽如此,使用单线程计算5000个时间段(是的,我的代码在构建响应时增加了一些开销,但仍然是相对的)需要0.152359333秒,而使用多线程阻塞队列方法进行相同数量的计算,每个线程同时计算一部分时间段,所需时间为0.005566667秒,大约快了27倍。

这就是你想要使用线程来实现真正性能的地方,另一个很好的例子是将大量数据导入SQL数据库,创建一个记录的阻塞队列,有一个线程从队列中批量插入记录,创建5个这样的线程,插入时间将提高40倍。希望这能给你一些启示。

这里有10000个时间段需要使用5个线程和1个线程进行计算的对比。它展示了基于计算量的多线程性能提升原理。

10000个时间段的计算任务,单线程花费了0.343375458秒,线程池大小为5花费了0.004482167秒来计算10000个时间段,多线程快了76.60925128403294倍。这是代码更改的包名,没有其他第三方导入。

package com.dunkware.trade.service.beach.server.entity;

import java.time.LocalDateTime;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.TimeUnit;
public class Test {

// play these variables, interesting to see 
// that 3 threads are faster than 5 threads
// might be the overhead for creting them
// in real life it should come from a thread pool 

// it tells you with thee values multithreaded is 27 times faster.
public static int COMPUTE_COUNT = 5000;
public static int THREAD_COUNT = 3;

public static void main(String[] args) {
    Test test = new Test();

    test.SingleThreaded5000Requests();
    test.multiThreadedRequests();
    
    
    
}

public static double singleTime; 
public static double multiTime;

private static BlockingQueue<TimeDurationRequest> requestQuueu = new LinkedBlockingDeque<>();
private static BlockingQueue<DuraionProcessThread> doneThreadQueue = new LinkedBlockingDeque<DuraionProcessThread>();

public static final List<DurationResponse> multiThreadResponses = new ArrayList<DurationResponse>();


private void multiThreadedRequests() { 
    List<DuraionProcessThread> threads = new ArrayList<Test.DuraionProcessThread>();
    int i = 0;
    while(i < COMPUTE_COUNT) { 
        requestQuueu.add(randomRequest());
        i++;
    }
    i = 0;
    DStopWatch timer = DStopWatch.create();
    timer.start();
    while(i < THREAD_COUNT) { 
        DuraionProcessThread thread = new DuraionProcessThread();
        thread.start();
        threads.add(thread);
        i++;
    }

    
    timer.stop();
    multiTime = timer.getCompletedSeconds();
    System.out.println("Thread Pool size " + THREAD_COUNT + " tooked " + timer.getCompletedSeconds() + " seconds to compute  " + COMPUTE_COUNT + " durations");
    double  faster = singleTime / multiTime;
    System.out.println("Multi threaded is " + faster + " time faster");
}
private static class DuraionProcessThread extends Thread {

    public void run() {
        while (!interrupted()) {
            try {

                TimeDurationRequest req = Test.requestQuueu.poll(5, TimeUnit.MILLISECONDS);

                if ((req == null)) {
                    // all requests consumed return trhead
                    doneThreadQueue.add(this);
                    return;
                }

                Number duration = getDurationBetween(req.getUnit(), req.stat, req.stop);
                DurationResponse res = new DurationResponse();
                res.value = duration;
                StringBuilder builder = new StringBuilder();
                if (req.unit == ChronoUnit.SECONDS) {
                    builder.append("Time Duration in seconds is ").append(duration).append(" fom ")
                            .append(req.getStat().toString()).append(" to ").append(req.getStop().toString());
                    res.messsae = builder.toString();
                    multiThreadResponses.add(res);
                }
                builder = new StringBuilder();
                if (req.getUnit() == ChronoUnit.MINUTES) {
                    
                    builder.append("Time Duration in minutes is ").append(duration).append(" fom ")
                            .append(req.getStat().toString()).append(" from ").append(req.getStop().toString());
                    res.messsae = builder.toString();
                    multiThreadResponses.add(res);
                }

            } catch (Exception e) {
                // TODO: handle exception
            }

        }

    }
}

/***
 * First break your logic into some helper methods, if you have a reason to sue
 * java.util.date be it, LocalDateTime are better you can easily get duration of
 * dates and date/time ranges.
 */

/**
 * Also more efficient, one function and pass in a date/time range and specify
 * what duration unit you want, seconds, minutes or whatever.
 * 
 * @param from
 * @param to
 * @return
 */
public static long getDurationBetween(ChronoUnit unit, LocalDateTime from, LocalDateTime to) {
    return unit.between(from, to);
}

// another util for stop watch
public static class DStopWatch {

    public enum Status {
        Started, Stopped, Initialized;
    }

    public static DStopWatch create() {
        return new DStopWatch();
    }

    private volatile DStopWatch.Status _status = DStopWatch.Status.Initialized;
    private TimeUnit _timeUnit = TimeUnit.NANOSECONDS;
    private Long _startTime = null;
    private Long _stopTime = null;

    DStopWatch() {

    }

    public Status getStatus() {
        return _status;
    }

    public void start() {
        if (_timeUnit == TimeUnit.NANOSECONDS) {
            _startTime = System.nanoTime();
            _stopTime = null;
            setStatus(Status.Started);
        }
        if (_timeUnit == TimeUnit.MILLISECONDS) {
            _startTime = System.currentTimeMillis();
            _stopTime = null;
        }

    }

    void setStatus(Status status) {
        _status = Status.Stopped;
    }

    public void stop() {
        if (getStatus() != Status.Stopped) {
            // exception
        }
        if (_timeUnit == TimeUnit.NANOSECONDS) {
            _stopTime = System.nanoTime();
            setStatus(Status.Stopped);
        }
        if (_timeUnit == TimeUnit.MILLISECONDS) {
            _stopTime = System.currentTimeMillis();
        }

    }

    public void reset() {
        setStatus(Status.Initialized);
        _startTime = null;
        _stopTime = null;

    }

    public TimeUnit getTimeUnit() {
        return _timeUnit;
    }

    public void reset(TimeUnit timeUnit) {
        _timeUnit = timeUnit;
        reset();
    }

    public double getRunningSeconds() {
        long dureation = System.nanoTime() - _startTime;
        double seconds = ((double) dureation) / 1E9;
        return seconds;
    }

    public double getCompletedSeconds() {
        long dureation = _stopTime - _startTime;
        double seconds = ((double) dureation) / 1E9;
        return seconds;
    }

}

/**
 * Next i would build a model object for what you are trying to compute if its a
 * model easier to make a REST API out of it.
 */

public static class TimeDurationRequest {
    private ChronoUnit unit;
    private LocalDateTime stat;
    private LocalDateTime stop;

    public ChronoUnit getUnit() {
        return unit;
    }

    public void setUnit(ChronoUnit unit) {
        this.unit = unit;
    }

    public LocalDateTime getStat() {
        return stat;
    }

    public void setStat(LocalDateTime stat) {
        this.stat = stat;
    }

    public LocalDateTime getStop() {
        return stop;
    }

    public void setStop(LocalDateTime stop) {
        this.stop = stop;
    }

}

// helper
public static int getRandomNumber(int min, int max) {
    return (int) ((Math.random() * (max - min)) + min);
}

public static int timeUnitIterator = 0;

// make a random generator for testing performance
private static TimeDurationRequest randomRequest() {

    TimeDurationRequest req = new TimeDurationRequest();
    if (timeUnitIterator == 0) {
        req.setUnit(ChronoUnit.MINUTES);
        timeUnitIterator = 1;
    } else {
        req.setUnit(ChronoUnit.SECONDS);
        timeUnitIterator = 0;

    }

    req.setStat(LocalDateTime.now().minusDays(getRandomNumber(0, 42)));
    req.setStop(LocalDateTime.now().plusSeconds(getRandomNumber(0, 2)));
    return req;
}

public static class DurationResponse {

    public String messsae;
    private Number value;

}

// okay sow now lets test it to really make a multi-threaded service you best
// using a thread poool and a blockign quueue to bulk process and test
// perfomrnace.

private void SingleThreaded5000Requests() {
    List<DurationResponse> responses = new ArrayList<DurationResponse>();
    int i = 0;
    DStopWatch timer = DStopWatch.create();
    timer.start();
    while (i < COMPUTE_COUNT) {
        TimeDurationRequest req = randomRequest();
        Number duration = getDurationBetween(req.getUnit(), req.stat, req.stop);
        DurationResponse res = new DurationResponse();
        res.value = duration;
        StringBuilder builder = new StringBuilder();
        if (req.unit == ChronoUnit.SECONDS) {
            builder.append("Time Duration in seconds is ").append(duration).append(" fom ")
                    .append(req.getStat().toString()).append(" to ").append(req.getStop().toString());
            res.messsae = builder.toString();
            responses.add(res);
        }
        builder = new StringBuilder();
        if (req.getUnit() == ChronoUnit.MINUTES) {
            builder.append("Time Duration in minutes is ").append(duration).append(" fom ")
                    .append(req.getStat().toString()).append(" from ").append(req.getStop().toString());
            res.messsae = builder.toString();
            responses.add(res);
        }
        i++;
    }
    timer.stop();
    singleTime = timer.getCompletedSeconds();
    System.out.println(COMPUTE_COUNT + " durations Time Duration Tasks Single Thread took " + timer.getCompletedSeconds());
    //System.out.println("results....");
    try {
        Thread.sleep(1000);
    } catch (

    Exception e) {
        // TODO: handle exception
    }
//  for (DurationResponse durationResponse : responses) {
    //  System.out.println(durationResponse.messsae);
//  }
}

}


0
这种行为的原因是Java线程不一定并行运行,特别是在单核机器上。Java虚拟机在底层硬件上调度线程,如果只有一个核心,线程将被调度为一个接一个地运行。即使在多核机器上,也不能保证JVM会将线程调度为并行运行。实际行为可能取决于许多因素,包括特定的JVM实现和系统的当前负载。
此外,创建和管理线程的开销也会增加总执行时间。在Java中,线程不是轻量级实体,创建大量线程在时间和内存方面都可能非常昂贵。
您的线程执行的具体任务(在循环中连接字符串)不适合并行化。在Java中,字符串的+=操作每次都会创建一个新对象,因为Java中的字符串是不可变的。这在计算上是昂贵的,并且在循环中进行时可能会给垃圾收集器带来压力,特别是在多个线程中。您可以尝试使用StringBuilder替换+=,看看系统是否有任何差异。
还有其他因素需要考虑,比如是否友好地利用缓存。CPU对各个缓存级别的内存有限。如果您的线程在不同的数据集上工作,那么线程执行之间的上下文切换可能需要缓存失效,从而降低程序的运行速度。
您可以尝试使用并行流来优化并行计算。以下是一个示例代码,将比较线程解决方案与并行流和其他方法的效果:
package org.example;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
import java.util.stream.IntStream;

public class ParallelAndThreadTest {

    public static void main(String[] args) {
        //single thread
        System.out.println("single thread:");
        testTime(() -> runSingleThread(ParallelAndThreadTest::performTask));

        //multiple threads
        System.out.println("multiple threads:");
        testTime(() -> runWithThreads(ParallelAndThreadTest::performTask));

        //parallel streams - uses ForkJoinPool, but is a lot faster than
        //the later example using ForkJoinPool directly. (at least on my system).
        System.out.println("parallel streams:");
        testTime(() -> runInParallel(ParallelAndThreadTest::performTask));

        //executorService
        System.out.println("executorservice:");
        testTime(() -> runWithExecutorService(ParallelAndThreadTest::performTask));

        //ForkJoinPool
        //just to compare with the parallel streams solution.
        System.out.println("ForkJoinPool:");
        testTime(() -> runWithForkJoinPool(ParallelAndThreadTest::performTask));
    }

    public static void testTime(Runnable task) {
        long totalTime = 0;
        int runs = 200;//200 runs to get a good average

        for (int i = 0; i < runs; i++) {
            long before = System.nanoTime();
            task.run();
            long after = System.nanoTime();
            totalTime += (after - before);
        }

        double averageTime = (totalTime / (double) runs) / 1_000_000_000.0;  // Converting to seconds
        String formattedTime = String.format("%.4f", averageTime);
        System.out.println("Average Time Consume = " + formattedTime + " Seconds");
    }

    //your initial code to test, its really slow...
    public static void performTask2() {
        String s = "";
        for (int i = 0; i < 100000; i++) {
            s += "" + i;
        }
    }

    public static void performTask() {
        StringBuilder tmp = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            tmp.append(i);
        }
    }

    public static void runSingleThread(Runnable task) {
        for (int k = 0; k < 5; k++) {
            task.run();
        }
    }

    public static void runWithThreads(Runnable task) {
        Thread[] threads = new Thread[5];
        for (int i = 0; i < 5; i++) {
            threads[i] = new Thread(task);
            threads[i].start();
        }
        for (Thread thread : threads) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    public static void runInParallel(Runnable task) {
        IntStream.range(0, 5).parallel().forEach(i -> task.run());
    }

    public static void runWithExecutorService(Runnable task) {
        ExecutorService executorService = Executors.newFixedThreadPool(5);
        for (int i = 0; i < 5; i++)
            executorService.execute(task);
        executorService.shutdown();
        while (!executorService.isTerminated()) {}
    }

    public static void runWithForkJoinPool(Runnable task) {
        ForkJoinPool forkJoinPool = new ForkJoinPool(5);
        forkJoinPool.submit(() -> {
            for (int i = 0; i < 5; i++) {
                forkJoinPool.invoke(new RecursiveActionEx(task));
            }
        }).join();
    }

    static class RecursiveActionEx extends RecursiveAction {
        private final Runnable task;

        RecursiveActionEx(Runnable task) {
            this.task = task;
        }

        @Override
        protected void compute() {
            task.run();
        }
    }
}

我在我的系统上使用了StringBuilder进行字符串拼接,输出结果如下:
single thread:
Average Time Consume = 0.0064 Seconds
multiple threads:
Average Time Consume = 0.0023 Seconds
parallel streams:
Average Time Consume = 0.0020 Seconds
executorservice:
Average Time Consume = 0.0024 Seconds
ForkJoinPool:
Average Time Consume = 0.0060 Seconds

我有8个核心。并行流解决方案总是稍微胜出。
在我的Mac上,我可以使用cpuctl命令禁用某些核心,我已经禁用了所有7个核心,只留下一个,并得到了以下结果:
single thread:
Average Time Consume = 0.0237 Seconds
multiple threads:
Average Time Consume = 0.0145 Seconds
parallel streams:
Average Time Consume = 0.0212 Seconds
executorservice:
Average Time Consume = 0.0173 Seconds
ForkJoinPool:
Average Time Consume = 0.0111 Seconds

现在,ForkJoinPool是最快的。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接