你的代码有几个问题,其中一些直接影响了你的问题的答案。
首先,在现代的Java中,我们很少需要直接使用Thread类。相反,我们可以使用Java 5+中的Executors框架。将你的任务定义为一个Runnable或Callable对象。将一个或多个实例提交给执行器服务。该执行器服务可以由一个线程池支持,而你无需管理这些线程的设置/拆除,也无需管理在这些线程之间调度任务。
为了清晰起见,让我们将你的main方法移到它自己的类中。这样我们就可以将你的任务工作移到一个实现Runnable接口的类中,这意味着它实现了一个run方法。完全不需要继承Thread类。
你的任务代码没有任何实际作用。没有传递给其他代码的内容,没有写入存储,没有通过网络调用,没有保存到数据库,也没有在控制台上报告任何内容。这样的代码可能会被编译器优化掉。因此,我们添加了一个调用
System.out.println
的语句,以避免这种优化。
你的任务代码将整数转换为文本,并进行拼接。在实际工作中,我们可能会使用
StringBuilder
来提高效率(尽管在某些Java的实现中,这可能会在后台自动完成),并使代码更加自说明。
我们希望报告每个任务运行的经过时间。
为此,请
不要使用
java.util.Date
。事实上,不要再使用
Date
类中的任何一个。这些类在多年前就被现代的
java.time类所取代,这些类在Java 8+中由JSR 310定义。特别是
java.util.Date
类被
java.time.Instant
所取代。
对于微基准测试而言,最精确的时间跟踪方法是使用
System.nanoTime
。这个调用会获取当前递增的纳秒计数器的值。
使用下划线来格式化数字字面量,以便更容易阅读。比如
100_000
。
按照惯例,Java中的变量名以小写字母开头。
所以这是我们的任务类:
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
public class Counter implements Runnable
{
@Override
public void run ( )
{
long startNanos = System.nanoTime ( );
StringBuilder tmp = new StringBuilder ( );
for ( int i = 0 ; i < 100_000 ; i++ )
{
tmp.append ( i );
}
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
}
}
现在有一个应用程序类来执行这个任务。
package work.basil.example.looping;
public class App
{
public static void main ( String[] args )
{
App app = new App() ;
app.demoInThisThread();
}
private void demoInThisThread ( )
{
Runnable task = new Counter () ;
task.run() ;
}
}
当运行时:
线程ID:1,于2023年09月17日06:01:24.691381Z,结果字符长度为:488890。耗时:PT0.003877583S。
在我的机器上(MacBook Pro,16英寸,2021年,Apple M1 Pro,16 GB,macOS Ventura 13.5.2),你的代码花费了PT1.456853583S。所以你可以看到,与这段代码相比,使用StringBuilder比String更高效。但是对于这个线程测试,我们并不真正需要效率,所以我会恢复使用String。
请注意,这种使用String拼接的方式会产生大量垃圾供垃圾收集器管理,正如
由user207421评论。这么多垃圾可能会以不可预测的方式影响结果。
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
public class Counter implements Runnable
{
@Override
public void run ( )
{
long startNanos = System.nanoTime ( );
String tmp = "";
for ( int i = 0 ; i < 100000 ; i++ ) tmp += i;
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " result character length is: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
}
}
当运行时:
线程ID:1,时间为2023-09-17T06:09:03.218903Z,结果字符长度为488890。经过时间:PT1.401553292S。
然后你创建了五个线程。在每个线程中,你执行了相同的任务,即循环100,000次。所以你做了五倍的工作,总共500,000次迭代。更多的工作就是更多的工作,所以线程并不能让额外的工作消失。CPU核心仍然必须进行五倍的字符串拼接。
为了更公平地评估线程的好处,你应该将工作分配给每个线程,使每个线程执行100,000次的一部分。我们可以通过给我们的任务类添加一个构造函数来实现这一点,传递所需的迭代次数。
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
public class Counter implements Runnable
{
private final int count;
public Counter ( final int count ) { this.count = count; }
@Override
public void run ( )
{
long startNanos = System.nanoTime ( );
String tmp = "";
for ( int i = 0 ; i < this.count ; i++ ) tmp += i;
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced result character length of: " + tmp.codePoints ( ).count ( ) + ". Elapsed: " + elapsed );
}
}
我们的应用程序代码:
package work.basil.example.looping;
public class App
{
public static void main ( String[] args )
{
App app = new App ( );
app.demoInThisThread ( );
}
private void demoInThisThread ( )
{
Runnable task = new Counter ( 100_000 );
task.run ( );
}
}
结果是一样的。
现在我们准备运行五个线程,每个线程有20,000次迭代,作为它们在100,000中的一部分。
我们实例化一个由五个线程支持的执行器服务。然后我们循环提交五个任务实例到该执行器服务。我们使用try-with-resources语法在提交的任务完成后自动关闭我们的ExecutorService。
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class App
{
public static void main ( String[] args )
{
App app = new App ( );
app.demoInBackgroundThreads ( );
}
private void demoInBackgroundThreads ( )
{
long startNanos = System.nanoTime ( );
try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
{
final int countTasks = 5;
for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
{
Runnable task = new Counter ( 20_000 );
executorService.submit ( task );
}
}
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
}
private void demoInThisThread ( )
{
Runnable task = new Counter ( 100_000 );
task.run ( );
}
}
结果大不相同。不再需要花费超过一秒的时间,完成所有的5 * 20,000个连接只需要几分之一秒,即1/20秒。每个任务大约只需要1/20秒的时间,所以从数学上我们知道,在这台10核心的机器上,我们的代码是同时执行的,每个核心都在工作。
Thread ID: 24 at 2023-09-17T06:27:57.068636Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.1823735S
Thread ID: 23 at 2023-09-17T06:27:57.067782Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18159025S
Thread ID: 25 at 2023-09-17T06:27:57.070629Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.18414225S
Thread ID: 22 at 2023-09-17T06:27:57.070073Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.183828875S
Thread ID: 21 at 2023-09-17T06:27:57.062894Z for a count of 20000 produced result character length of: 88890. Elapsed: PT0.176261S
Executor Service elapsed = PT0.190679125S at 2023-09-17T06:27:57.075335Z
所以为什么速度这么快呢?嗯,我们的测试有缺陷。重复连接字符串对象会导致字符串不断增长。但是20000次连接产生的字符串比100000次连接要小得多。小得多以至于涉及的工作量要少得多。所以这不是一个好的测试。(基准测试是困难的工作。)
一个更好的测试可能是生成随机数并计算平均值。在第一次尝试中,这项工作非常快,我将迭代次数扩大了十倍(100万和200,000)。并且我涉及了字符串↔️整数↔️整数转换,以增加工作量。即使如此,我们仍然只得到了几乎瞬间的结果。
任务:
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
public class Averaged implements Runnable
{
private final int count;
private final List < String > randoms;
public Averaged ( final int count )
{
this.count = count;
this.randoms = new ArrayList <> ( this.count );
}
@Override
public void run ( )
{
long startNanos = System.nanoTime ( );
for ( int i = 0 ; i < this.count ; i++ )
{
int x = ThreadLocalRandom.current ( ).nextInt ( 1 , Integer.MAX_VALUE );
this.randoms.add ( String.valueOf ( x ) );
}
double average = this.randoms.stream ( ).map ( Integer :: valueOf ).mapToInt ( Integer :: intValue ).summaryStatistics ( ).getAverage ( );
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Thread ID: " + Thread.currentThread ( ).threadId ( ) + " at " + Instant.now ( ) + " for a count of " + this.count + " produced average of: " + average + ". Elapsed: " + elapsed );
}
}
还有应用程序类:
package work.basil.example.looping;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class App
{
public static void main ( String[] args )
{
App app = new App ( );
app.demoSumsInBackgroundThreads ( );
}
private void demoSumsInBackgroundThreads ( )
{
long startNanos = System.nanoTime ( );
try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
{
final int countTasks = 5;
for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
{
Runnable task = new Averaged ( 200_000 );
executorService.submit ( task );
}
}
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
}
private void demoSumsInSameThread ( )
{
Runnable task = new Averaged ( 1_000_000 );
task.run ( );
}
private void demoInBackgroundThreads ( )
{
long startNanos = System.nanoTime ( );
try ( ExecutorService executorService = Executors.newFixedThreadPool ( 5 ) ; )
{
final int countTasks = 5;
for ( int ordinal = 0 ; ordinal < countTasks ; ordinal++ )
{
Runnable task = new Counter ( 20_000 );
executorService.submit ( task );
}
}
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - startNanos );
System.out.println ( "Executor Service elapsed = " + elapsed + " at " + Instant.now ( ) );
}
private void demoInThisThread ( )
{
Runnable task = new Counter ( 100_000 );
task.run ( );
}
}
当在同一个线程中运行时,这一百万个数字大约需要8/10秒的时间。
Thread ID: 1 at 2023-09-17T07:05:24.256743Z for a count of 1000000 produced average of: 1.073135902798566E9. Elapsed: PT0.084711792S
当在多个线程上运行时,200,000 * 5个数字大约需要8/10或9/10秒。每个线程任务也需要8/10秒。因此,我们可以数学上得出结论,在这台机器上我们得到了同时运行的独立核心。
Thread ID: 21 at 2023-09-17T07:06:14.812452Z for a count of 200000 produced average of: 1.073304324790475E9. Elapsed: PT0.080241167S
Thread ID: 23 at 2023-09-17T07:06:14.812469Z for a count of 200000 produced average of: 1.073028061541545E9. Elapsed: PT0.080389542S
Thread ID: 25 at 2023-09-17T07:06:14.813533Z for a count of 200000 produced average of: 1.07282572828644E9. Elapsed: PT0.081019291S
Thread ID: 24 at 2023-09-17T07:06:14.813595Z for a count of 200000 produced average of: 1.076256760047685E9. Elapsed: PT0.081408875S
Thread ID: 22 at 2023-09-17T07:06:14.813600Z for a count of 200000 produced average of: 1.073103576810605E9. Elapsed: PT0.081662959S
Executor Service elapsed = PT0.098596875S at 2023-09-17T07:06:14.828988Z
为什么在使用线程时没有节省时间呢?我真的不知道为什么这个特定的任务对于20万和100万来说需要大致相同的时间。再次进行基准测试是困难的。
如果我们将线程测试更改为每个任务包含10万个数字的10个任务,在相同的10核心机器上进行,我们确实会看到某些任务的耗时大幅下降,只需要2/10秒,而我们可能预期需要4/10秒。但总体的分组时间大致相同,为9/10秒。
Thread ID: 25 at 2023-09-17T19:43:06.051570Z for a count of 100000 produced average of: 1.07519247943123E9. Elapsed: PT0.057728417S
Thread ID: 24 at 2023-09-17T19:43:06.053705Z for a count of 100000 produced average of: 1.07496720067476E9. Elapsed: PT0.060560125S
Thread ID: 22 at 2023-09-17T19:43:06.053489Z for a count of 100000 produced average of: 1.07711825115815E9. Elapsed: PT0.060462083S
Thread ID: 21 at 2023-09-17T19:43:06.052852Z for a count of 100000 produced average of: 1.07550130293061E9. Elapsed: PT0.059825042S
Thread ID: 23 at 2023-09-17T19:43:06.051057Z for a count of 100000 produced average of: 1.0755424631933E9. Elapsed: PT0.057704334S
Thread ID: 21 at 2023-09-17T19:43:06.080795Z for a count of 100000 produced average of: 1.07392238112309E9. Elapsed: PT0.013798042S
Thread ID: 25 at 2023-09-17T19:43:06.081217Z for a count of 100000 produced average of: 1.07513370104224E9. Elapsed: PT0.014378042S
Thread ID: 24 at 2023-09-17T19:43:06.083975Z for a count of 100000 produced average of: 1.07646007807133E9. Elapsed: PT0.017139583S
Thread ID: 22 at 2023-09-17T19:43:06.084319Z for a count of 100000 produced average of: 1.07482906529202E9. Elapsed: PT0.017476875S
Thread ID: 23 at 2023-09-17T19:43:06.084813Z for a count of 100000 produced average of: 1.07169205436235E9. Elapsed: PT0.017668375S
Executor Service elapsed = PT0.093519875S at 2023-09-17T19:43:06.085102Z
请记住,在这台机器的Apple Silicon M1 Pro芯片上,有2个核心被调整为高效,而其他8个核心被调整为性能 — 这可能会影响结果。
顺便说一下...我们这里的测试流程还很不完善。我们应该提前做一些工作来预热JVM等等。要进行真正的基准测试,请学会使用JMH。正如之前提到的,基准测试是困难的。
请注意,你的测试是CPU密集型的。这样的任务在实际的Java工作中相当罕见。通常Java工作涉及阻塞。阻塞来自于诸如写入存储、与数据库交互、日志记录、套接字或Web服务的网络调用、进程间通信等活动。对于这样的阻塞工作,考虑在Java 21+中使用虚拟线程(纤程)。
注意:当在多个线程中调用
System.out.println
时,输出的结果可能不会按照时间顺序出现在控制台上。如果您关心验证顺序,请始终使用时间戳,例如
Instant.now()
。
例如,在上面的最后一个示例结果中,请注意这些行是无序的:
Thread ID: 24 at 2023-09-17T19:43:06.053705Z …
Thread ID: 22 at 2023-09-17T19:43:06.053489Z …
Thread ID: 21 at 2023-09-17T19:43:06.052852Z …
Thread ID: 23 at 2023-09-17T19:43:06.051057Z …