为什么有线和无线没有明显的区别呢？

Question

为什么有线和无线没有明显的区别呢？

javamultithreading

6

我想评估Java的线程功能。

我创建了一个没有线程的演示，如下所示：

```java import java.util.*; public class NoThread {

public static void main(String[] args) { NoThread Obj= new NoThread(); Date BeforeDate = new Date();

Obj.run();

Date AfterDate = new Date(); Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0;

System.out.println("Time Consume= " + Time_Consume + " Seconds" ); }

public void run() { String tmp = ""; for (int i = 0; i < 100000; i++) { tmp += i; } }

} ```

缺点：

控制台显示：Time Consume= 4.771 秒

我创建了一个带有线程的演示。

```java import java.util.Date; public class ThreadTest extends Thread { public static void main(String[] args) { ThreadTest[ ] ObjArray = new ThreadTest[5];

for(int i=0;i<5;i++) ObjArray[i]= new ThreadTest();

Date BeforeDate = new Date();

for(int i=0; i<5;i++ ) { ObjArray[i].start(); }

try { for(int i=0; i<5;i++ ) ObjArray[i].join(); } catch(InterruptedException e) { }

Date AfterDate = new Date(); Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0; System.out.println("Time Consume= " + Time_Consume + " Seconds" ); }

public void run() { String tmp = ""; for (int i = 0; i < 100000; i++) { tmp += i; } } } ```

控制台显示：Time Consume= 18.658 秒

在第二个演示中，我创建了五个线程，在这五个线程中并行运行相同的函数。我期望时间消耗与第一个项目相似，也就是接近五秒钟。

然而，实际情况远远超出了我的预期，接近于5（运行次数）* 4.771 = 23.85秒，就像是顺序执行一样。

为什么会这样呢？

- Mark Zhou

@MarcLeBihan：这不就是糟糕的缩进吗？ - undefined

2

你有5个中央处理器吗？5个核心？你期望的依据是什么？ - undefined

将_ThreadTest_的_run_方法的循环迭代次数减少到_20000_，即_100000_的五分之一。 - undefined

我使用的是戴尔笔记本电脑，配备第12代英特尔® Core™ i5处理器 i5-1245U。从规格上看，我看到了以下信息：总核心数：10 性能核心数：2 高效核心数：8 - undefined

2

好的，这个循环会产生大量的垃圾，而且垃圾收集器并不像你的代码那样完全支持多线程。你选择使用字符串构建作为测试的原因是什么？ - undefined

5个回答

1

运行多个线程并不一定意味着它们都会“并行”运行。您需要有足够的CPU核心来以真正的并行方式运行CPU密集型代码。如果您只有一个核心，那么无论您有多少个线程，您的CPU密集型代码都不会运行得更快。由于上下文切换和多线程引入的其他混乱因素，代码可能会变得更慢。

多个线程只是使代码“并发”运行。如果在“run”结束时放置一个“System.out.println（“Done！”）”，您会发现它们几乎同时打印出来。线程不是按顺序运行，一个接一个地完成。

现在假设您有足够的核心来并行运行5个线程，那么如果每个线程都做类似于以下的操作，您将看到您期望的结果：

int x = 0;
for (int j = 0; j < Integer.MAX_VALUE; j++) {
    for (int k = 0; k < 4; k++) {
        x += j + k;
    }
}

对我来说，我有4个核心，所以使用上述代码运行3个线程（GC使用最后一个核心）大致与运行1个线程的时间相同。如果我增加到4个线程，执行时间开始增加。

你的代码变慢的原因是你在连接字符串，这会创建新的对象。而且内存实际上不能并行分配。对我来说，使用3个线程大约需要5.3秒，使用1个线程需要3.7秒。请注意，这不是简单的“一个线程所花费的时间乘以线程数”。确实有一些地方并行有帮助，但内存分配无法并行化。

- Sweeper

非常感谢，我明白你的意思，“内存分配不是并行的”，我已经按照你的方式修改了运行函数。现在我可以看到：不使用线程运行，CPU利用率约为30% 时间消耗= 3.998秒时间消耗= 3.017秒时间消耗= 2.046秒时间消耗= 2.988秒时间消耗= 3.008秒使用4个线程运行，CPU利用率约为80%。时间消耗= 4.545秒时间消耗= 4.679秒时间消耗= 4.609秒时间消耗= 4.501秒时间消耗= 3.998秒现在的时间消耗符合我的期望，因为多线程也会有执行开销。 - undefined

0

"...在第二个演示中，我创建了5个线程，在5个线程中并行运行相同的函数，我期望时间消耗与第一个项目相似，即接近5秒钟..."

我认为"并发"是一个更好的术语来描述"多线程"的过程。

具体来说，通常一个程序按照线性方式执行。这条线性路径被称为"数据流"或"控制流"。

当一个程序使用线程时，多个控制流会并行或同时执行。你现在可以将这条路径视为分支。

因此，这里的5个线程将会交织或"串联"在一起，同时执行各自的指令。 CPU时间非常重要。

总之，这5个线程实际上会花费大约5倍的时间。

对于ThreadTest，将run方法的loop迭代次数除以5。
在这个例子中，我将值从100000减少到100。

class ThreadTest extends Thread
{
    int i, limit;

    public static void main(String[] args) throws BrokenBarrierException, InterruptedException {
        // TODO Auto-generated method stub

        ThreadTest[ ] ObjArray = new ThreadTest[5];

        int n = 100 / 5;
        for(int i=0;i<5;i++)
            ObjArray[i]= new ThreadTest(n * i, (n * i) + n);

        Date BeforeDate = new Date();

        for(int i=0; i<5;i++ )
        {
            ObjArray[i].start();
        }

        try
        {

            for(int i=0; i<5;i++ )
                ObjArray[i].join();

        }
        catch(InterruptedException e)
        {

        }


        Date AfterDate = new Date();
        Double Time_Consume = (AfterDate.getTime()- BeforeDate.getTime())/1000.0;
        System.out.println("Time Consume= " + Time_Consume + " Seconds"  );



    }

    ThreadTest(int i, int limit) {
        this.i = i;
        this.limit = limit;
    }

    public void run() {
        String tmp = "";
        for (int i = this.i; i <= limit; i++) {
            tmp += i;
        }
    }

}

这里有几个比较。

NoThread, Time Consume= 0.046 Seconds
ThreadTest, Time Consume= 0.054 Seconds

NoThread, Time Consume= 0.027 Seconds
ThreadTest Time Consume= 0.029 Seconds

NoThread, Time Consume= 0.03 Seconds
ThreadTest, Time Consume= 0.022 Seconds

最后一点，还有一个叫做CyclicBarrier的类，它为一起工作以产生一个值的线程提供了一个缓冲区。

- Reilas

1

有道理，但应该是一个线程池，让线程从阻塞队列中获取任务，这样速度会更快。 - undefined

0

你们得意识到一些关于多线程代码性能提升的事情。相比于单线程，多线程在计算量较大时具有显著的时间性能优势。所以，如果你只是在一个线程中计算时间持续几次，而在5个线程中计算时间持续几次，你不会看到太大的差异。我为你写了一些代码，通过将其粘贴到你的IDE中并更改包名来运行。

这个代码展示了在提出的问题中测试多线程性能的当前方法，我们不仅仅计算5次时间持续，而是计算成千上万次，并比较多线程和单线程方法之间的时间差异，这就是这个类的作用。你可以配置线程数和要计算的时间持续数，它们是随机生成的。另一个提高多线程性能的方法是不是逐个创建线程，而是拥有一个线程池，所有线程都在一个队列上阻塞，计算下一个时间持续，当待计算的时间持续队列为空时，它们都返回并计算时间。我还使用了LocalDateTime，并使时间持续变量在两个日期时间之间以秒和分钟为单位。

话虽如此，使用单线程计算5000个时间段（是的，我的代码在构建响应时增加了一些开销，但仍然是相对的）需要0.152359333秒，而使用多线程阻塞队列方法进行相同数量的计算，每个线程同时计算一部分时间段，所需时间为0.005566667秒，大约快了27倍。

这就是你想要使用线程来实现真正性能的地方，另一个很好的例子是将大量数据导入SQL数据库，创建一个记录的阻塞队列，有一个线程从队列中批量插入记录，创建5个这样的线程，插入时间将提高40倍。希望这能给你一些启示。

这里有10000个时间段需要使用5个线程和1个线程进行计算的对比。它展示了基于计算量的多线程性能提升原理。

10000个时间段的计算任务，单线程花费了0.343375458秒，线程池大小为5花费了0.004482167秒来计算10000个时间段，多线程快了76.60925128403294倍。这是代码更改的包名，没有其他第三方导入。

package com.dunkware.trade.service.beach.server.entity;

import java.time.LocalDateTime;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.TimeUnit;
public class Test {

// play these variables, interesting to see 
// that 3 threads are faster than 5 threads
// might be the overhead for creting them
// in real life it should come from a thread pool 

// it tells you with thee values multithreaded is 27 times faster.
public static int COMPUTE_COUNT = 5000;
public static int THREAD_COUNT = 3;

public static void main(String[] args) {
    Test test = new Test();

    test.SingleThreaded5000Requests();
    test.multiThreadedRequests();
    
    
    
}

public static double singleTime; 
public static double multiTime;

private static BlockingQueue<TimeDurationRequest> requestQuueu = new LinkedBlockingDeque<>();
private static BlockingQueue<DuraionProcessThread> doneThreadQueue = new LinkedBlockingDeque<DuraionProcessThread>();

public static final List<DurationResponse> multiThreadResponses = new ArrayList<DurationResponse>();


private void multiThreadedRequests() { 
    List<DuraionProcessThread> threads = new ArrayList<Test.DuraionProcessThread>();
    int i = 0;
    while(i < COMPUTE_COUNT) { 
        requestQuueu.add(randomRequest());
        i++;
    }
    i = 0;
    DStopWatch timer = DStopWatch.create();
    timer.start();
    while(i < THREAD_COUNT) { 
        DuraionProcessThread thread = new DuraionProcessThread();
        thread.start();
        threads.add(thread);
        i++;
    }

    
    timer.stop();
    multiTime = timer.getCompletedSeconds();
    System.out.println("Thread Pool size " + THREAD_COUNT + " tooked " + timer.getCompletedSeconds() + " seconds to compute  " + COMPUTE_COUNT + " durations");
    double  faster = singleTime / multiTime;
    System.out.println("Multi threaded is " + faster + " time faster");
}
private static class DuraionProcessThread extends Thread {

    public void run() {
        while (!interrupted()) {
            try {

                TimeDurationRequest req = Test.requestQuueu.poll(5, TimeUnit.MILLISECONDS);

                if ((req == null)) {
                    // all requests consumed return trhead
                    doneThreadQueue.add(this);
                    return;
                }

                Number duration = getDurationBetween(req.getUnit(), req.stat, req.stop);
                DurationResponse res = new DurationResponse();
                res.value = duration;
                StringBuilder builder = new StringBuilder();
                if (req.unit == ChronoUnit.SECONDS) {
                    builder.append("Time Duration in seconds is ").append(duration).append(" fom ")
                            .append(req.getStat().toString()).append(" to ").append(req.getStop().toString());
                    res.messsae = builder.toString();
                    multiThreadResponses.add(res);
                }
                builder = new StringBuilder();
                if (req.getUnit() == ChronoUnit.MINUTES) {
                    
                    builder.append("Time Duration in minutes is ").append(duration).append(" fom ")
                            .append(req.getStat().toString()).append(" from ").append(req.getStop().toString());
                    res.messsae = builder.toString();
                    multiThreadResponses.add(res);
                }

            } catch (Exception e) {
                // TODO: handle exception
            }

        }

    }
}

/***
 * First break your logic into some helper methods, if you have a reason to sue
 * java.util.date be it, LocalDateTime are better you can easily get duration of
 * dates and date/time ranges.
 */

/**
 * Also more efficient, one function and pass in a date/time range and specify
 * what duration unit you want, seconds, minutes or whatever.
 * 
 * @param from
 * @param to
 * @return
 */
public static long getDurationBetween(ChronoUnit unit, LocalDateTime from, LocalDateTime to) {
    return unit.between(from, to);
}

// another util for stop watch
public static class DStopWatch {

    public enum Status {
        Started, Stopped, Initialized;
    }

    public static DStopWatch create() {
        return new DStopWatch();
    }

    private volatile DStopWatch.Status _status = DStopWatch.Status.Initialized;
    private TimeUnit _timeUnit = TimeUnit.NANOSECONDS;
    private Long _startTime = null;
    private Long _stopTime = null;

    DStopWatch() {

    }

    public Status getStatus() {
        return _status;
    }

    public void start() {
        if (_timeUnit == TimeUnit.NANOSECONDS) {
            _startTime = System.nanoTime();
            _stopTime = null;
            setStatus(Status.Started);
        }
        if (_timeUnit == TimeUnit.MILLISECONDS) {
            _startTime = System.currentTimeMillis();
            _stopTime = null;
        }

    }

    void setStatus(Status status) {
        _status = Status.Stopped;
    }

    public void stop() {
        if (getStatus() != Status.Stopped) {
            // exception
        }
        if (_timeUnit == TimeUnit.NANOSECONDS) {
            _stopTime = System.nanoTime();
            setStatus(Status.Stopped);
        }
        if (_timeUnit == TimeUnit.MILLISECONDS) {
            _stopTime = System.currentTimeMillis();
        }

    }

    public void reset() {
        setStatus(Status.Initialized);
        _startTime = null;
        _stopTime = null;

    }

    public TimeUnit getTimeUnit() {
        return _timeUnit;
    }

    public void reset(TimeUnit timeUnit) {
        _timeUnit = timeUnit;
        reset();
    }

    public double getRunningSeconds() {
        long dureation = System.nanoTime() - _startTime;
        double seconds = ((double) dureation) / 1E9;
        return seconds;
    }

    public double getCompletedSeconds() {
        long dureation = _stopTime - _startTime;
        double seconds = ((double) dureation) / 1E9;
        return seconds;
    }

}

/**
 * Next i would build a model object for what you are trying to compute if its a
 * model easier to make a REST API out of it.
 */

public static class TimeDurationRequest {
    private ChronoUnit unit;
    private LocalDateTime stat;
    private LocalDateTime stop;

    public ChronoUnit getUnit() {
        return unit;
    }

    public void setUnit(ChronoUnit unit) {
        this.unit = unit;
    }

    public LocalDateTime getStat() {
        return stat;
    }

    public void setStat(LocalDateTime stat) {
        this.stat = stat;
    }

    public LocalDateTime getStop() {
        return stop;
    }

    public void setStop(LocalDateTime stop) {
        this.stop = stop;
    }

}

// helper
public static int getRandomNumber(int min, int max) {
    return (int) ((Math.random() * (max - min)) + min);
}

public static int timeUnitIterator = 0;

// make a random generator for testing performance
private static TimeDurationRequest randomRequest() {

    TimeDurationRequest req = new TimeDurationRequest();
    if (timeUnitIterator == 0) {
        req.setUnit(ChronoUnit.MINUTES);
        timeUnitIterator = 1;
    } else {
        req.setUnit(ChronoUnit.SECONDS);
        timeUnitIterator = 0;

    }

    req.setStat(LocalDateTime.now().minusDays(getRandomNumber(0, 42)));
    req.setStop(LocalDateTime.now().plusSeconds(getRandomNumber(0, 2)));
    return req;
}

public static class DurationResponse {

    public String messsae;
    private Number value;

}

// okay sow now lets test it to really make a multi-threaded service you best
// using a thread poool and a blockign quueue to bulk process and test
// perfomrnace.

private void SingleThreaded5000Requests() {
    List<DurationResponse> responses = new ArrayList<DurationResponse>();
    int i = 0;
    DStopWatch timer = DStopWatch.create();
    timer.start();
    while (i < COMPUTE_COUNT) {
        TimeDurationRequest req = randomRequest();
        Number duration = getDurationBetween(req.getUnit(), req.stat, req.stop);
        DurationResponse res = new DurationResponse();
        res.value = duration;
        StringBuilder builder = new StringBuilder();
        if (req.unit == ChronoUnit.SECONDS) {
            builder.append("Time Duration in seconds is ").append(duration).append(" fom ")
                    .append(req.getStat().toString()).append(" to ").append(req.getStop().toString());
            res.messsae = builder.toString();
            responses.add(res);
        }
        builder = new StringBuilder();
        if (req.getUnit() == ChronoUnit.MINUTES) {
            builder.append("Time Duration in minutes is ").append(duration).append(" fom ")
                    .append(req.getStat().toString()).append(" from ").append(req.getStop().toString());
            res.messsae = builder.toString();
            responses.add(res);
        }
        i++;
    }
    timer.stop();
    singleTime = timer.getCompletedSeconds();
    System.out.println(COMPUTE_COUNT + " durations Time Duration Tasks Single Thread took " + timer.getCompletedSeconds());
    //System.out.println("results....");
    try {
        Thread.sleep(1000);
    } catch (

    Exception e) {
        // TODO: handle exception
    }
//  for (DurationResponse durationResponse : responses) {
    //  System.out.println(durationResponse.messsae);
//  }
}

}

- Duncan Krebs

0

这种行为的原因是Java线程不一定并行运行，特别是在单核机器上。Java虚拟机在底层硬件上调度线程，如果只有一个核心，线程将被调度为一个接一个地运行。即使在多核机器上，也不能保证JVM会将线程调度为并行运行。实际行为可能取决于许多因素，包括特定的JVM实现和系统的当前负载。

此外，创建和管理线程的开销也会增加总执行时间。在Java中，线程不是轻量级实体，创建大量线程在时间和内存方面都可能非常昂贵。

您的线程执行的具体任务（在循环中连接字符串）不适合并行化。在Java中，字符串的+=操作每次都会创建一个新对象，因为Java中的字符串是不可变的。这在计算上是昂贵的，并且在循环中进行时可能会给垃圾收集器带来压力，特别是在多个线程中。您可以尝试使用StringBuilder替换+=，看看系统是否有任何差异。

还有其他因素需要考虑，比如是否友好地利用缓存。CPU对各个缓存级别的内存有限。如果您的线程在不同的数据集上工作，那么线程执行之间的上下文切换可能需要缓存失效，从而降低程序的运行速度。

您可以尝试使用并行流来优化并行计算。以下是一个示例代码，将比较线程解决方案与并行流和其他方法的效果：

package org.example;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
import java.util.stream.IntStream;

public class ParallelAndThreadTest {

    public static void main(String[] args) {
        //single thread
        System.out.println("single thread:");
        testTime(() -> runSingleThread(ParallelAndThreadTest::performTask));

        //multiple threads
        System.out.println("multiple threads:");
        testTime(() -> runWithThreads(ParallelAndThreadTest::performTask));

        //parallel streams - uses ForkJoinPool, but is a lot faster than
        //the later example using ForkJoinPool directly. (at least on my system).
        System.out.println("parallel streams:");
        testTime(() -> runInParallel(ParallelAndThreadTest::performTask));

        //executorService
        System.out.println("executorservice:");
        testTime(() -> runWithExecutorService(ParallelAndThreadTest::performTask));

        //ForkJoinPool
        //just to compare with the parallel streams solution.
        System.out.println("ForkJoinPool:");
        testTime(() -> runWithForkJoinPool(ParallelAndThreadTest::performTask));
    }

    public static void testTime(Runnable task) {
        long totalTime = 0;
        int runs = 200;//200 runs to get a good average

        for (int i = 0; i < runs; i++) {
            long before = System.nanoTime();
            task.run();
            long after = System.nanoTime();
            totalTime += (after - before);
        }

        double averageTime = (totalTime / (double) runs) / 1_000_000_000.0;  // Converting to seconds
        String formattedTime = String.format("%.4f", averageTime);
        System.out.println("Average Time Consume = " + formattedTime + " Seconds");
    }

    //your initial code to test, its really slow...
    public static void performTask2() {
        String s = "";
        for (int i = 0; i < 100000; i++) {
            s += "" + i;
        }
    }

    public static void performTask() {
        StringBuilder tmp = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            tmp.append(i);
        }
    }

    public static void runSingleThread(Runnable task) {
        for (int k = 0; k < 5; k++) {
            task.run();
        }
    }

    public static void runWithThreads(Runnable task) {
        Thread[] threads = new Thread[5];
        for (int i = 0; i < 5; i++) {
            threads[i] = new Thread(task);
            threads[i].start();
        }
        for (Thread thread : threads) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    public static void runInParallel(Runnable task) {
        IntStream.range(0, 5).parallel().forEach(i -> task.run());
    }

    public static void runWithExecutorService(Runnable task) {
        ExecutorService executorService = Executors.newFixedThreadPool(5);
        for (int i = 0; i < 5; i++)
            executorService.execute(task);
        executorService.shutdown();
        while (!executorService.isTerminated()) {}
    }

    public static void runWithForkJoinPool(Runnable task) {
        ForkJoinPool forkJoinPool = new ForkJoinPool(5);
        forkJoinPool.submit(() -> {
            for (int i = 0; i < 5; i++) {
                forkJoinPool.invoke(new RecursiveActionEx(task));
            }
        }).join();
    }

    static class RecursiveActionEx extends RecursiveAction {
        private final Runnable task;

        RecursiveActionEx(Runnable task) {
            this.task = task;
        }

        @Override
        protected void compute() {
            task.run();
        }
    }
}

我在我的系统上使用了StringBuilder进行字符串拼接，输出结果如下：

single thread:
Average Time Consume = 0.0064 Seconds
multiple threads:
Average Time Consume = 0.0023 Seconds
parallel streams:
Average Time Consume = 0.0020 Seconds
executorservice:
Average Time Consume = 0.0024 Seconds
ForkJoinPool:
Average Time Consume = 0.0060 Seconds

我有8个核心。并行流解决方案总是稍微胜出。

在我的Mac上，我可以使用cpuctl命令禁用某些核心，我已经禁用了所有7个核心，只留下一个，并得到了以下结果：

single thread:
Average Time Consume = 0.0237 Seconds
multiple threads:
Average Time Consume = 0.0145 Seconds
parallel streams:
Average Time Consume = 0.0212 Seconds
executorservice:
Average Time Consume = 0.0173 Seconds
ForkJoinPool:
Average Time Consume = 0.0111 Seconds

现在，ForkJoinPool是最快的。

- marcinj

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Basil Bourque · Accepted Answer

你的代码有几个问题，其中一些直接影响了你的问题的答案。

首先，在现代的Java中，我们很少需要直接使用Thread类。相反，我们可以使用Java 5+中的Executors框架。将你的任务定义为一个Runnable或Callable对象。将一个或多个实例提交给执行器服务。该执行器服务可以由一个线程池支持，而你无需管理这些线程的设置/拆除，也无需管理在这些线程之间调度任务。

为了清晰起见，让我们将你的main方法移到它自己的类中。这样我们就可以将你的任务工作移到一个实现Runnable接口的类中，这意味着它实现了一个run方法。完全不需要继承Thread类。

你的任务代码没有任何实际作用。没有传递给其他代码的内容，没有写入存储，没有通过网络调用，没有保存到数据库，也没有在控制台上报告任何内容。这样的代码可能会被编译器优化掉。因此，我们添加了一个调用System.out.println的语句，以避免这种优化。

你的任务代码将整数转换为文本，并进行拼接。在实际工作中，我们可能会使用StringBuilder来提高效率（尽管在某些Java的实现中，这可能会在后台自动完成），并使代码更加自说明。

我们希望报告每个任务运行的经过时间。

为此，请不要使用java.util.Date。事实上，不要再使用Date类中的任何一个。这些类在多年前就被现代的java.time类所取代，这些类在Java 8+中由JSR 310定义。特别是java.util.Date类被java.time.Instant所取代。

对于微基准测试而言，最精确的时间跟踪方法是使用System.nanoTime。这个调用会获取当前递增的纳秒计数器的值。

使用下划线来格式化数字字面量，以便更容易阅读。比如100_000。

按照惯例，Java中的变量名以小写字母开头。

所以这是我们的任务类：

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        StringBuilder tmp = new StringBuilder ( );
        for ( int i = 0 ; i < 100_000 ; i++ )
        {
            tmp.append ( i );  // Converting integer number to text, and concatenating.
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

现在有一个应用程序类来执行这个任务。

package work.basil.example.looping;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App() ;
        app.demoInThisThread();
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter () ;
        task.run() ;
    }
}

当运行时：

线程ID：1，于2023年09月17日06:01:24.691381Z，结果字符长度为：488890。耗时：PT0.003877583S。

在我的机器上（MacBook Pro，16英寸，2021年，Apple M1 Pro，16 GB，macOS Ventura 13.5.2），你的代码花费了PT1.456853583S。所以你可以看到，与这段代码相比，使用StringBuilder比String更高效。但是对于这个线程测试，我们并不真正需要效率，所以我会恢复使用String。

请注意，这种使用String拼接的方式会产生大量垃圾供垃圾收集器管理，正如由user207421评论。这么多垃圾可能会以不可预测的方式影响结果。

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        String tmp = "";
        for ( int i = 0 ; i < 100000 ; i++ ) tmp += i;
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

当运行时：

线程ID：1，时间为2023-09-17T06:09:03.218903Z，结果字符长度为488890。经过时间：PT1.401553292S。

然后你创建了五个线程。在每个线程中，你执行了相同的任务，即循环100,000次。所以你做了五倍的工作，总共500,000次迭代。更多的工作就是更多的工作，所以线程并不能让额外的工作消失。CPU核心仍然必须进行五倍的字符串拼接。

为了更公平地评估线程的好处，你应该将工作分配给每个线程，使每个线程执行100,000次的一部分。我们可以通过给我们的任务类添加一个构造函数来实现这一点，传递所需的迭代次数。

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;

public class Counter implements Runnable
{
    private final int count;

    public Counter ( final int count ) { this.count = count; }

    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        String tmp = "";
        for ( int i = 0 ; i < this.count ; i++ ) tmp += i;
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced result character length of: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
    }
}

我们的应用程序代码：

package work.basil.example.looping;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
        app.demoInThisThread ( );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

结果是一样的。

现在我们准备运行五个线程，每个线程有20,000次迭代，作为它们在100,000中的一部分。

我们实例化一个由五个线程支持的执行器服务。然后我们循环提交五个任务实例到该执行器服务。我们使用try-with-resources语法在提交的任务完成后自动关闭我们的ExecutorService。

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
//        app.demoInThisThread ( );
        app.demoInBackgroundThreads ( );
    }

    private void demoInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Counter ( 20_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

结果大不相同。不再需要花费超过一秒的时间，完成所有的5 * 20,000个连接只需要几分之一秒，即1/20秒。每个任务大约只需要1/20秒的时间，所以从数学上我们知道，在这台10核心的机器上，我们的代码是同时执行的，每个核心都在工作。

Thread ID: 24 at 2023-09-17T06:27:57.068636Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.1823735S
Thread ID: 23 at 2023-09-17T06:27:57.067782Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18159025S
Thread ID: 25 at 2023-09-17T06:27:57.070629Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18414225S
Thread ID: 22 at 2023-09-17T06:27:57.070073Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.183828875S
Thread ID: 21 at 2023-09-17T06:27:57.062894Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.176261S
Executor Service elapsed = PT0.190679125S at 2023-09-17T06:27:57.075335Z

所以为什么速度这么快呢？嗯，我们的测试有缺陷。重复连接字符串对象会导致字符串不断增长。但是20000次连接产生的字符串比100000次连接要小得多。小得多以至于涉及的工作量要少得多。所以这不是一个好的测试。（基准测试是困难的工作。）

一个更好的测试可能是生成随机数并计算平均值。在第一次尝试中，这项工作非常快，我将迭代次数扩大了十倍（100万和200,000）。并且我涉及了字符串↔️整数↔️整数转换，以增加工作量。即使如此，我们仍然只得到了几乎瞬间的结果。

任务：

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;

public class Averaged implements Runnable
{
    private final int count;
    private final List < String > randoms;

    public Averaged ( final int count )
    {
        this.count = count;
        this.randoms = new ArrayList <> ( this.count );
    }

    @Override
    public void run ( )
    {
        long startNanos = System.nanoTime ( );
        for ( int i = 0 ; i < this.count ; i++ )
        {
            int x = ThreadLocalRandom.current ( ).nextInt ( 1 , Integer.MAX_VALUE );
            this.randoms.add ( String.valueOf ( x ) );
        }
        double average = this.randoms.stream ( ).map ( Integer :: valueOf ).mapToInt ( Integer :: intValue ).summaryStatistics ( ).getAverage ( );  // Intentionally involved auto-boxing as extra work for our test.
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced average of: " + average + ". Elapsed: " + elapsed );
    }
}

还有应用程序类：

package work.basil.example.looping;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class App
{
    public static void main ( String[] args )
    {
        App app = new App ( );
//        app.demoInThisThread ( );
//        app.demoInBackgroundThreads ( );
//        app.demoSumsInSameThread ( );
        app.demoSumsInBackgroundThreads ( );
    }

    private void demoSumsInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Averaged ( 200_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoSumsInSameThread ( )
    {
        Runnable task = new Averaged ( 1_000_000 );
        task.run ( );
    }

    private void demoInBackgroundThreads ( )
    {
        long startNanos = System.nanoTime ( );
        try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
        {
            final int countTasks = 5;
            for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
            {
                Runnable task = new Counter ( 20_000 );
                executorService.submit ( task );
            }
        }
        Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
        System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
    }

    private void demoInThisThread ( )
    {
        Runnable task = new Counter ( 100_000 );
        task.run ( );
    }
}

当在同一个线程中运行时，这一百万个数字大约需要8/10秒的时间。

Thread ID: 1 at 2023-09-17T07:05:24.256743Z for a count of 1000000 produced average of: 1.073135902798566E9. Elapsed: PT0.084711792S

当在多个线程上运行时，200,000 * 5个数字大约需要8/10或9/10秒。每个线程任务也需要8/10秒。因此，我们可以数学上得出结论，在这台机器上我们得到了同时运行的独立核心。

Thread ID: 21 at 2023-09-17T07:06:14.812452Z for a count of 200000 produced average of: 1.073304324790475E9. Elapsed: PT0.080241167S
Thread ID: 23 at 2023-09-17T07:06:14.812469Z for a count of 200000 produced average of: 1.073028061541545E9. Elapsed: PT0.080389542S
Thread ID: 25 at 2023-09-17T07:06:14.813533Z for a count of 200000 produced average of: 1.07282572828644E9. Elapsed: PT0.081019291S
Thread ID: 24 at 2023-09-17T07:06:14.813595Z for a count of 200000 produced average of: 1.076256760047685E9. Elapsed: PT0.081408875S
Thread ID: 22 at 2023-09-17T07:06:14.813600Z for a count of 200000 produced average of: 1.073103576810605E9. Elapsed: PT0.081662959S
Executor Service elapsed = PT0.098596875S at 2023-09-17T07:06:14.828988Z

为什么在使用线程时没有节省时间呢？我真的不知道为什么这个特定的任务对于20万和100万来说需要大致相同的时间。再次进行基准测试是困难的。

如果我们将线程测试更改为每个任务包含10万个数字的10个任务，在相同的10核心机器上进行，我们确实会看到某些任务的耗时大幅下降，只需要2/10秒，而我们可能预期需要4/10秒。但总体的分组时间大致相同，为9/10秒。

Thread ID: 25 at 2023-09-17T19:43:06.051570Z for a count of 100000 produced average of: 1.07519247943123E9. Elapsed: PT0.057728417S
Thread ID: 24 at 2023-09-17T19:43:06.053705Z for a count of 100000 produced average of: 1.07496720067476E9. Elapsed: PT0.060560125S
Thread ID: 22 at 2023-09-17T19:43:06.053489Z for a count of 100000 produced average of: 1.07711825115815E9. Elapsed: PT0.060462083S
Thread ID: 21 at 2023-09-17T19:43:06.052852Z for a count of 100000 produced average of: 1.07550130293061E9. Elapsed: PT0.059825042S
Thread ID: 23 at 2023-09-17T19:43:06.051057Z for a count of 100000 produced average of: 1.0755424631933E9. Elapsed: PT0.057704334S
Thread ID: 21 at 2023-09-17T19:43:06.080795Z for a count of 100000 produced average of: 1.07392238112309E9. Elapsed: PT0.013798042S
Thread ID: 25 at 2023-09-17T19:43:06.081217Z for a count of 100000 produced average of: 1.07513370104224E9. Elapsed: PT0.014378042S
Thread ID: 24 at 2023-09-17T19:43:06.083975Z for a count of 100000 produced average of: 1.07646007807133E9. Elapsed: PT0.017139583S
Thread ID: 22 at 2023-09-17T19:43:06.084319Z for a count of 100000 produced average of: 1.07482906529202E9. Elapsed: PT0.017476875S
Thread ID: 23 at 2023-09-17T19:43:06.084813Z for a count of 100000 produced average of: 1.07169205436235E9. Elapsed: PT0.017668375S
Executor Service elapsed = PT0.093519875S at 2023-09-17T19:43:06.085102Z

请记住，在这台机器的Apple Silicon M1 Pro芯片上，有2个核心被调整为高效，而其他8个核心被调整为性能 — 这可能会影响结果。

顺便说一下...我们这里的测试流程还很不完善。我们应该提前做一些工作来预热JVM等等。要进行真正的基准测试，请学会使用JMH。正如之前提到的，基准测试是困难的。

请注意，你的测试是CPU密集型的。这样的任务在实际的Java工作中相当罕见。通常Java工作涉及阻塞。阻塞来自于诸如写入存储、与数据库交互、日志记录、套接字或Web服务的网络调用、进程间通信等活动。对于这样的阻塞工作，考虑在Java 21+中使用虚拟线程（纤程）。

注意：当在多个线程中调用System.out.println时，输出的结果可能不会按照时间顺序出现在控制台上。如果您关心验证顺序，请始终使用时间戳，例如Instant.now()。

例如，在上面的最后一个示例结果中，请注意这些行是无序的：

Thread ID: 24 at 2023-09-17T19:43:06.053705Z …
Thread ID: 22 at 2023-09-17T19:43:06.053489Z …
Thread ID: 21 at 2023-09-17T19:43:06.052852Z …
Thread ID: 23 at 2023-09-17T19:43:06.051057Z …