用于将两个数据流合并的目的。
Stream.concat(stream1, stream2).collect(Collectors.toSet());
或者
stream1.collect(Collectors.toSet())
.addAll(stream2.collect(Collectors.toSet()));
哪个更有效率,为什么?
用于将两个数据流合并的目的。
Stream.concat(stream1, stream2).collect(Collectors.toSet());
或者
stream1.collect(Collectors.toSet())
.addAll(stream2.collect(Collectors.toSet()));
哪个更有效率,为什么?
为了提高可读性和意图明确,Stream.concat(a, b).collect(toSet())
比第二种选择更加清晰。
针对问题“哪个是最高效的”,这里有一个JMH测试(我要说我并不经常使用JMH,所以可能还有一些优化空间):
使用以下代码进行JMH测试:
package stackoverflow;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
@State(Scope.Benchmark)
@Warmup(iterations = 2)
@Fork(1)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode({ Mode.AverageTime})
public class StreamBenchmark {
private Set<String> s1;
private Set<String> s2;
@Setup
public void setUp() {
final Set<String> valuesForA = new HashSet<>();
final Set<String> valuesForB = new HashSet<>();
for (int i = 0; i < 1000; ++i) {
valuesForA.add(Integer.toString(i));
valuesForB.add(Integer.toString(1000 + i));
}
s1 = valuesForA;
s2 = valuesForB;
}
@Benchmark
public void stream_concat_then_collect_using_toSet(final Blackhole blackhole) {
final Set<String> set = Stream.concat(s1.stream(), s2.stream()).collect(Collectors.toSet());
blackhole.consume(set);
}
@Benchmark
public void s1_collect_using_toSet_then_addAll_using_toSet(final Blackhole blackhole) {
final Set<String> set = s1.stream().collect(Collectors.toSet());
set.addAll(s2.stream().collect(Collectors.toSet()));
blackhole.consume(set);
}
}
你会得到以下结果(为了易读性,我省略了一些部分)。Result "s1_collect_using_toSet_then_addAll_using_toSet":
156969,172 ±(99.9%) 4463,129 ns/op [Average]
(min, avg, max) = (152842,561, 156969,172, 161444,532), stdev = 2952,084
CI (99.9%): [152506,043, 161432,301] (assumes normal distribution)
Result "stream_concat_then_collect_using_toSet":
104254,566 ±(99.9%) 4318,123 ns/op [Average]
(min, avg, max) = (102086,234, 104254,566, 111731,085), stdev = 2856,171
CI (99.9%): [99936,443, 108572,689] (assumes normal distribution)
# Run complete. Total time: 00:00:25
Benchmark Mode Cnt Score Error Units
StreamBenchmark.s1_collect_using_toSet_then_addAll_using_toSet avgt 10 156969,172 ± 4463,129 ns/op
StreamBenchmark.stream_concat_then_collect_using_toSet avgt 10 104254,566 ± 4318,123 ns/op
如果我正确理解了JMH的数据,使用Stream.concat(a, b).collect(toSet())
的版本应该执行得更快。
另一方面,我认为这个结果很正常,因为您不会创建一个中间集合(即使使用HashSet
,它也有一些成本),并且如第一个答案评论所说,Stream
是“惰性连接”的。
使用分析器,您可以看到哪个部分速度较慢。您还可以尝试使用toCollection(() -> new HashSet(1000))
而不是toSet()
来查看问题是否在于增加HashSet
内部哈希数组的大小。
你不知道我正在处理的任务 - alan7678
确实如此。
但我也不需要知道。
一般有两种情况:
你正在开发一个 OLTP 应用程序。在这种情况下,应用程序应该在一秒钟或更短的时间内响应。用户将不会感受到您提供的变量之间的性能差异。
你正在开发某种 批处理,该处理将在无人值守的情况下运行一段时间。在这种情况下,“性能差异”可能很重要,但仅当你被收费时批处理运行时长。
无论哪种方式:
真正的性能问题(即通过多个而不是几分之一来加速应用程序)通常是由你实现的逻辑引起的(例如:过度通信,“隐藏的循环”或过度对象创建)。
这些问题通常不能通过选择某种特定的语法来解决或预防。
你是否会在5年后仍然负责这个应用程序?或者你是一名顾问,被聘用来启动项目,然后转向下一个项目?
我从未有过一个项目,在语法层面上解决性能问题的机会。
但我经常与存在10多年且因为某些人没有遵守可读性而难以维护的旧代码一起工作。
所以你的非答案对我不适用。- alan7678
这是一个自由的世界,你可以自己选择。
toSet()收集器返回一个没有关于类型、可变性、可序列化性或线程安全的“保证”的Set
。如果不保证可变性,则在结果Set
上调用addAll
是不正确的。
此时,它可以使用参考实现的当前版本工作,其中将创建一个HashSet
,但可能会在未来版本或替代实现中停止工作。为了解决这个问题,您必须将第一个Stream的collect
操作中的toSet()
替换为toCollection(HashSet :: new)
。
这导致第二个变体不仅效率低下,如此答案所示,可能还会阻止对toSet()
收集器进行的未来优化,因为坚持结果是HashSet
的确切类型。此外,与toSet()
收集器不同,toCollection(…)
收集器无法检测目标集合是无序的,这可能在未来的实现中具有性能相关性。
使用任意一种。
如果您对应用程序进行了剖析,并且发现该代码部分是瓶颈,请考虑使用不同的实现方式对应用程序进行剖析,并选择最佳实现方式。
我曾面临一个抉择,是使用Stream.of()与flatMap()或Stream.concat()或Collections.addAll()或Collections.add()将多个列表合并成单个列表。我对我的代码进行了快速测试,进行了10次迭代,并获得了一些令人惊讶的结果。
------------------------------------------------------------------
1. Using getByAddAll()
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.414 ± 0.304 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.291 ± 0.332 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.571 ± 0.622 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.520 ± 0.818 ms/op
Average = 4.449ms
------------------------------------------------------------------
------------------------------------------------------------------
2. Using getByAdd()
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.280 ± 0.499 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.494 ± 0.374 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.575 ± 0.539 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.552 ± 0.272 ms/op
Average = 4.475ms
------------------------------------------------------------------
------------------------------------------------------------------
3. using getByStreamOf()
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.502 ± 0.529 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.494 ± 0.754 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.676 ± 0.347 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.511 ± 0.950 ms/op
Average = 4.545ms
------------------------------------------------------------------
------------------------------------------------------------------
4. Using getByStreamConcat()
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.342 ± 0.372 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.218 ± 0.400 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.892 ± 0.562 ms/op
Benchmark Mode Cnt Score Error Units
PerformaceTest.test avgt 10 4.818 ± 0.608 ms/op
Average = 4.567ms
------------------------------------------------------------------
private List<ItemDTO> getByStreamOf(OfferResponseDTO catalogOfferDTO){
return Stream.of(
catalogOfferDTO.getCharges()
.stream()
.map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),
catalogOfferDTO.getUsages()
.stream()
.map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())),
catalogOfferDTO.getNetworkElements()
.stream()
.map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())),
catalogOfferDTO.getEquipment()
.stream()
.map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())))
.flatMap(Function.identity())
.collect(Collectors.toList());
}
private List<ItemDTO> getByStreamConcat(OfferResponseDTO catalogOfferDTO){
return Stream.concat(
Stream.concat(
catalogOfferDTO.getCharges()
.stream()
.map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
,
catalogOfferDTO.getUsages()
.stream()
.map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(),catalogOfferDTO.getDisplayOrder()))
),
Stream.concat(
catalogOfferDTO.getEquipment()
.stream()
.map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())),
catalogOfferDTO.getNetworkElements()
.stream()
.map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
)
)
.collect(Collectors.toList());
}
private List<ItemDTO> getByAddAll(OfferResponseDTO catalogOfferDTO){
List<ItemDTO> items = new ArrayList<>();
items.addAll(catalogOfferDTO.getCharges()
.stream()
.map(chargeWithPricePlanResponseDTO -> new ItemDTO(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
.collect(Collectors.toList()));
items.addAll(catalogOfferDTO.getUsages()
.stream()
.map(usageResponseDTO -> new ItemDTO(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder()))
.collect(Collectors.toList()));
items.addAll(catalogOfferDTO.getNetworkElements()
.stream()
.map(networkElementResponseDTO -> new ItemDTO(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder()))
.collect(Collectors.toList()));
items.addAll(catalogOfferDTO.getEquipment()
.stream()
.map(equipmentResponseDTO -> new ItemDTO(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder()))
.collect(Collectors.toList()));
return items;
}
private List<ItemDTO> getByAdd(OfferResponseDTO catalogOfferDTO){
List<ItemDTO> items = new ArrayList<>();
catalogOfferDTO.getCharges()
.stream()
.map(chargeWithPricePlanResponseDTO -> items.add(this.addItem(chargeWithPricePlanResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));
catalogOfferDTO.getUsages()
.stream()
.map(usageResponseDTO -> items.add(this.addItem(usageResponseDTO.getDescription(), catalogOfferDTO.getDisplayOrder())));
catalogOfferDTO.getEquipment()
.stream()
.map(equipmentResponseDTO -> items.add(this.addItem(equipmentResponseDTO.getInvoiceDescription(), catalogOfferDTO.getDisplayOrder())));
catalogOfferDTO.getNetworkElements()
.stream()
.map(networkElementResponseDTO -> items.add(this.addItem(networkElementResponseDTO.getName(), catalogOfferDTO.getDisplayOrder())));
return items;
}
Stream.concat(stream1, stream2)
必须创建一个大对象,因为你调用了.collect()
。然后.toSet()
必须将每个出现的元素与之前的每个元素进行比较,可能使用快速哈希函数,但仍可能有许多元素。另一方面,stream1.collect(Collectors.toSet()) .addAll(stream2.collect(Collectors.toSet()))
将创建两个较小的集合,然后合并它们。这种第二种选择的内存占用量可能比第一种选择少。编辑:在阅读@NoDataFound基准测试后,我重新审视了这个问题。在更复杂的测试版本中,Stream.concat似乎表现得比Collection.addAll更快。我试图考虑有多少个不同的元素和初始流的大小。我还从测量中排除了创建输入流所需的时间(无论如何都可以忽略)。以下是我通过下面的代码获得的时间示例。Concat-collect 10000 elements, all distinct: 7205462 nanos
Collect-addAll 10000 elements, all distinct: 12130107 nanos
Concat-collect 100000 elements, all distinct: 78184055 nanos
Collect-addAll 100000 elements, all distinct: 115191392 nanos
Concat-collect 1000000 elements, all distinct: 555265307 nanos
Collect-addAll 1000000 elements, all distinct: 1370210449 nanos
Concat-collect 5000000 elements, all distinct: 9905958478 nanos
Collect-addAll 5000000 elements, all distinct: 27658964935 nanos
Concat-collect 10000 elements, 50% distinct: 3242675 nanos
Collect-addAll 10000 elements, 50% distinct: 5088973 nanos
Concat-collect 100000 elements, 50% distinct: 389537724 nanos
Collect-addAll 100000 elements, 50% distinct: 48777589 nanos
Concat-collect 1000000 elements, 50% distinct: 427842288 nanos
Collect-addAll 1000000 elements, 50% distinct: 1009179744 nanos
Concat-collect 5000000 elements, 50% distinct: 3317183292 nanos
Collect-addAll 5000000 elements, 50% distinct: 4306235069 nanos
Concat-collect 10000 elements, 10% distinct: 2310440 nanos
Collect-addAll 10000 elements, 10% distinct: 2915999 nanos
Concat-collect 100000 elements, 10% distinct: 68601002 nanos
Collect-addAll 100000 elements, 10% distinct: 40163898 nanos
Concat-collect 1000000 elements, 10% distinct: 315481571 nanos
Collect-addAll 1000000 elements, 10% distinct: 494875870 nanos
Concat-collect 5000000 elements, 10% distinct: 1766480800 nanos
Collect-addAll 5000000 elements, 10% distinct: 2721430964 nanos
Concat-collect 10000 elements, 1% distinct: 2097922 nanos
Collect-addAll 10000 elements, 1% distinct: 2086072 nanos
Concat-collect 100000 elements, 1% distinct: 32300739 nanos
Collect-addAll 100000 elements, 1% distinct: 32773570 nanos
Concat-collect 1000000 elements, 1% distinct: 382380451 nanos
Collect-addAll 1000000 elements, 1% distinct: 514534562 nanos
Concat-collect 5000000 elements, 1% distinct: 2468393302 nanos
Collect-addAll 5000000 elements, 1% distinct: 6619280189 nanos
代码
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class StreamBenchmark {
private Set<String> s1;
private Set<String> s2;
private long createStreamsTime;
private long concatCollectTime;
private long collectAddAllTime;
public void setUp(final int howMany, final int distinct) {
final Set<String> valuesForA = new HashSet<>(howMany);
final Set<String> valuesForB = new HashSet<>(howMany);
if (-1 == distinct) {
for (int i = 0; i < howMany; ++i) {
valuesForA.add(Integer.toString(i));
valuesForB.add(Integer.toString(howMany + i));
}
} else {
Random r = new Random();
for (int i = 0; i < howMany; ++i) {
int j = r.nextInt(distinct);
valuesForA.add(Integer.toString(i));
valuesForB.add(Integer.toString(distinct + j));
}
}
s1 = valuesForA;
s2 = valuesForB;
}
public void run(final int streamLength, final int distinctElements, final int times, boolean discard) {
long startTime;
setUp(streamLength, distinctElements);
createStreamsTime = 0l;
concatCollectTime = 0l;
collectAddAllTime = 0l;
for (int r = 0; r < times; r++) {
startTime = System.nanoTime();
Stream<String> st1 = s1.stream();
Stream<String> st2 = s2.stream();
createStreamsTime += System.nanoTime() - startTime;
startTime = System.nanoTime();
Set<String> set1 = Stream.concat(st1, st2).collect(Collectors.toSet());
concatCollectTime += System.nanoTime() - startTime;
st1 = s1.stream();
st2 = s2.stream();
startTime = System.nanoTime();
Set<String> set2 = st1.collect(Collectors.toSet());
set2.addAll(st2.collect(Collectors.toSet()));
collectAddAllTime += System.nanoTime() - startTime;
}
if (!discard) {
// System.out.println("Create streams "+streamLength+" elements,
// "+distinctElements+" distinct: "+createStreamsTime+" nanos");
System.out.println("Concat-collect " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + concatCollectTime + " nanos");
System.out.println("Collect-addAll " + streamLength + " elements, " + (distinctElements == -1 ? "all" : String.valueOf(100 * distinctElements / streamLength) + "%") + " distinct: " + collectAddAllTime + " nanos");
System.out.println("");
}
}
public static void main(String args[]) {
StreamBenchmark test = new StreamBenchmark();
final int times = 5;
test.run(100000, -1, 1, true);
test.run(10000, -1, times, false);
test.run(100000, -1, times, false);
test.run(1000000, -1, times, false);
test.run(5000000, -1, times, false);
test.run(10000, 5000, times, false);
test.run(100000, 50000, times, false);
test.run(1000000, 500000, times, false);
test.run(5000000, 2500000, times, false);
test.run(10000, 1000, times, false);
test.run(100000, 10000, times, false);
test.run(1000000, 100000, times, false);
test.run(5000000, 500000, times, false);
test.run(10000, 100, times, false);
test.run(100000, 1000, times, false);
test.run(1000000, 10000, times, false);
test.run(5000000, 50000, times, false);
}
}
Stream.concat(stream1, stream2)
是惰性连接的。 - NoDataFoundStream.concat(stream1, stream2)
必须创建一个大对象”是什么意思?这两个操作产生相同的 Set
,除此之外,在 Stream.concat
中没有涉及到“大对象”。 - Holger
addAll
的中间集合。 - Holger