当前位置:网站首页>Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
2022-07-06 21:16:00 【I hope to wake up naturally every day】
List of articles
Here are four practical ArrayList The way to remove weight , At the same time, I will be practical Oracle Performance testing tools provided JMH(Java Microbenchmark Harness,Java Micro benchmark test suite ) Let's test the performance of these methods . Of course, the test here is carried out when the system performance is sufficient , See who can squeeze the performance of the machine better .
Of course , That uncommon way of weight removal is not involved here .
ArrayList Common methods of weight removal
ArrayList There are four common methods of weight removal , Namely HashSet duplicate removal 、LinkedHashSet duplicate removal 、stream Stream single thread de duplication 、stream Stream multithreading de duplication . The data structures involved here are reviewed first :
ArrayList( Order is not unique ):Object[], Thread unsafe , Insert delete slow ( Array move ), Quick query ( Random access ).
HashSet( Disorder is unique ): Hashtable , be based on HashMap Realization ,hashmap Of key Store its value . Thread unsafe .
LinkedHashSet( Orderly and unique ): Hashtable + Linked list ( The linked list stores the insertion order ), It is HashSet Subclass , There is only constructor inside . Thread unsafe .
HashSet & LinkedHashSet
// data
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// HashSet duplicate removal
ArrayList<Integer> hashSetList = new ArrayList<>(new HashSet<>(arrayList));
// LinkedHashSet duplicate removal
ArrayList<Integer> linkedHashSetList = new ArrayList<>(new LinkedHashSet<>(arrayList));
Hit the breakpoint to check the data
stream & parallelStream
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// stream duplicate removal
List<Integer> streamList = arrayList.stream().distinct().collect(Collectors.toList());
// parallelStream duplicate removal
List<Integer> parallelStreamList = arrayList.parallelStream().distinct().collect(Collectors.toList());
Set a breakpoint to view the data
The above is the simple data De duplication operation method .
JMH Performance analysis
Use Oracle Official performance testing tools JMH(Java Microbenchmark Harness,JAVA Micro benchmark test suite ) Let's test this 4 Kind of The performance of the weight removal method .
First , Let's introduce JMH frame , stay pom.xml
Add the following configuration to the file :
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-core -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.23</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-generator-annprocess -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.23</version>
<scope>provided</scope>
</dependency>
I have written the version number , The latest version is recommended .
Now write the test code , as follows
@BenchmarkMode(Mode.AverageTime) // Test completion time
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // preheating 2 round , Every time 1s
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)// test 5 round , Every time 1s
@Fork(1) // fork 1 Threads
@State(Scope.Thread) // One instance per test thread
public class ArrayListDistinctTest {
static List<Integer> arrayList = new ArrayList<Integer>(){
{
// altogether 2000000 Data
for (int i = 0; i < 1000000; i++) add(i);
for (int i = 0; i < 1000000; i++) add(i);
}};
public static void main(String[] args) throws RunnerException {
// Start benchmarking
Options opt = new OptionsBuilder()
.include(ArrayListDistinctTest.class.getSimpleName()) // Test class to import
.output("C:/Users/86177/Desktop/jmh-map.log") // Output test results file
.build();
new Runner(opt).run(); // Perform the test
}
@Benchmark
public void hashSet(){
new ArrayList<>(new HashSet<>(arrayList));
}
@Benchmark
public void linkedHashSet(){
new ArrayList<>(new LinkedHashSet<>(arrayList));
}
@Benchmark
public void stream(){
arrayList.stream().distinct().collect(Collectors.toList());
}
@Benchmark
public void parallelStream(){
arrayList.parallelStream().distinct().collect(Collectors.toList());
}
}
All have been added @Benchmark
The annotated method will be tested , perform main The method can . And as the code shows , I tested a total of 2000000( Two million ) Data , Half is repeated . The test file has been printed to the desktop , I will post the performance comparison data here :
Benchmark Mode Cnt Score Error Units
ArrayListDistinctTest.hashSet avgt 5 37251270.300 ± 10458771.326 ns/op
ArrayListDistinctTest.linkedHashSet avgt 5 40382889.634 ± 11785155.916 ns/op
ArrayListDistinctTest.parallelStream avgt 5 157258934.286 ± 34670245.903 ns/op
ArrayListDistinctTest.stream avgt 5 38513055.914 ± 3839780.617 ns/op
among Units by ns/op It means execution completion time ( It's in nanoseconds ), and Score List as average execution time , ±
Symbols indicate errors . It can be seen from the above results , Two Set
It's similar in performance , And the execution speed is very fast , Next is stream
.
notes : The above results are based on the test environment :JDK 1.8 / Win10 16GB (2021H) / Idea 2022.3
Conclusion
Except for the large amount of data above , I also tested into 1000 Small amount of data .HashSet and LinkedHashSet The performance of is still very satisfactory , And for stream In terms of performance, when the amount of data is relatively small Set high (parallelStream Performance is not as good as stream, Here is a simple analysis ). It is recommended to use HashSet and LinkedHashSet duplicate removal .
De duplication selection
Based on the above , Yes ArrayList During the weight removal operation , Recommended LinkedHashSet.
In the final analysis, the de duplication operation is based on the data of the set , You see Set Sets are not allowed to repeat , Of course, first choose it . Then let's take a look at the requirements , Such as HashSet Data disorder after de duplication ,LinkedHashSet After de duplication, the data is orderly, but the performance is not HashSet good .
Stream There are single threaded and multi-threaded flow operations for flow selection , The performance of multithreaded operation flow here is better than that of single thread operation , It is necessary to judge whether the current operation may have concurrency risks .
Be careful. : In the computer system , Multithreading is not necessarily better than multithreading , The reason why single threading may be better than multithreading is usually CPU Performance waste caused by switching operation threads , There will be a thread sleep and thread wake operation . For example, common Redis It is designed as a single thread .
边栏推荐
- js中,字符串和数组互转(一)——字符串转为数组的方法
- Swagger UI教程 API 文档神器
- 通过数字电视通过宽带网络取代互联网电视机顶盒应用
- Redis insert data garbled solution
- Mtcnn face detection
- KDD 2022 | realize unified conversational recommendation through knowledge enhanced prompt learning
- Chris LATTNER, the father of llvm: why should we rebuild AI infrastructure software
- 对话阿里巴巴副总裁贾扬清:追求大模型,并不是一件坏事
- 监控界的最强王者,没有之一!
- [200 opencv routines] 220 Mosaic the image
猜你喜欢
[MySQL] trigger
ICML 2022 | flowformer: task generic linear complexity transformer
HMS core machine learning service creates a new "sound" state of simultaneous interpreting translation, and AI makes international exchanges smoother
[MySQL] basic use of cursor
面试官:Redis中有序集合的内部实现方式是什么?
15 millions d'employés sont faciles à gérer et la base de données native du cloud gaussdb rend le Bureau des RH plus efficace
2022菲尔兹奖揭晓!首位韩裔许埈珥上榜,四位80后得奖,乌克兰女数学家成史上唯二获奖女性
967- letter combination of telephone number
请问sql group by 语句问题
No Yum source to install SPuG monitoring
随机推荐
Reflection operation exercise
Nodejs tutorial expressjs article quick start
[MySQL] basic use of cursor
通过数字电视通过宽带网络取代互联网电视机顶盒应用
启动嵌入式间:资源有限的系统启动
js通过数组内容来获取数组下标
967- letter combination of telephone number
JS get array subscript through array content
SAP Fiori应用索引大全工具和 SAP Fiori Tools 的使用介绍
OneNote in-depth evaluation: using resources, plug-ins, templates
c#使用oracle存储过程获取结果集实例
2017 8th Blue Bridge Cup group a provincial tournament
JS traversal array and string
Absolute primes (C language)
What is the difference between procedural SQL and C language in defining variables
愛可可AI前沿推介(7.6)
The most comprehensive new database in the whole network, multidimensional table platform inventory note, flowus, airtable, seatable, Vig table Vika, flying Book Multidimensional table, heipayun, Zhix
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
js之遍历数组、字符串
监控界的最强王者,没有之一!