当前位置:网站首页>Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
2022-07-06 21:16:00 【I hope to wake up naturally every day】
List of articles
Here are four practical ArrayList The way to remove weight , At the same time, I will be practical Oracle Performance testing tools provided JMH(Java Microbenchmark Harness,Java Micro benchmark test suite ) Let's test the performance of these methods . Of course, the test here is carried out when the system performance is sufficient , See who can squeeze the performance of the machine better .
Of course , That uncommon way of weight removal is not involved here .
ArrayList Common methods of weight removal
ArrayList There are four common methods of weight removal , Namely HashSet duplicate removal 、LinkedHashSet duplicate removal 、stream Stream single thread de duplication 、stream Stream multithreading de duplication . The data structures involved here are reviewed first :
ArrayList( Order is not unique ):Object[], Thread unsafe , Insert delete slow ( Array move ), Quick query ( Random access ).
HashSet( Disorder is unique ): Hashtable , be based on HashMap Realization ,hashmap Of key Store its value . Thread unsafe .
LinkedHashSet( Orderly and unique ): Hashtable + Linked list ( The linked list stores the insertion order ), It is HashSet Subclass , There is only constructor inside . Thread unsafe .
HashSet & LinkedHashSet
// data
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// HashSet duplicate removal
ArrayList<Integer> hashSetList = new ArrayList<>(new HashSet<>(arrayList));
// LinkedHashSet duplicate removal
ArrayList<Integer> linkedHashSetList = new ArrayList<>(new LinkedHashSet<>(arrayList));
Hit the breakpoint to check the data
stream & parallelStream
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// stream duplicate removal
List<Integer> streamList = arrayList.stream().distinct().collect(Collectors.toList());
// parallelStream duplicate removal
List<Integer> parallelStreamList = arrayList.parallelStream().distinct().collect(Collectors.toList());
Set a breakpoint to view the data
The above is the simple data De duplication operation method .
JMH Performance analysis
Use Oracle Official performance testing tools JMH(Java Microbenchmark Harness,JAVA Micro benchmark test suite ) Let's test this 4 Kind of The performance of the weight removal method .
First , Let's introduce JMH frame , stay pom.xml
Add the following configuration to the file :
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-core -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.23</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-generator-annprocess -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.23</version>
<scope>provided</scope>
</dependency>
I have written the version number , The latest version is recommended .
Now write the test code , as follows
@BenchmarkMode(Mode.AverageTime) // Test completion time
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // preheating 2 round , Every time 1s
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)// test 5 round , Every time 1s
@Fork(1) // fork 1 Threads
@State(Scope.Thread) // One instance per test thread
public class ArrayListDistinctTest {
static List<Integer> arrayList = new ArrayList<Integer>(){
{
// altogether 2000000 Data
for (int i = 0; i < 1000000; i++) add(i);
for (int i = 0; i < 1000000; i++) add(i);
}};
public static void main(String[] args) throws RunnerException {
// Start benchmarking
Options opt = new OptionsBuilder()
.include(ArrayListDistinctTest.class.getSimpleName()) // Test class to import
.output("C:/Users/86177/Desktop/jmh-map.log") // Output test results file
.build();
new Runner(opt).run(); // Perform the test
}
@Benchmark
public void hashSet(){
new ArrayList<>(new HashSet<>(arrayList));
}
@Benchmark
public void linkedHashSet(){
new ArrayList<>(new LinkedHashSet<>(arrayList));
}
@Benchmark
public void stream(){
arrayList.stream().distinct().collect(Collectors.toList());
}
@Benchmark
public void parallelStream(){
arrayList.parallelStream().distinct().collect(Collectors.toList());
}
}
All have been added @Benchmark
The annotated method will be tested , perform main The method can . And as the code shows , I tested a total of 2000000( Two million ) Data , Half is repeated . The test file has been printed to the desktop , I will post the performance comparison data here :
Benchmark Mode Cnt Score Error Units
ArrayListDistinctTest.hashSet avgt 5 37251270.300 ± 10458771.326 ns/op
ArrayListDistinctTest.linkedHashSet avgt 5 40382889.634 ± 11785155.916 ns/op
ArrayListDistinctTest.parallelStream avgt 5 157258934.286 ± 34670245.903 ns/op
ArrayListDistinctTest.stream avgt 5 38513055.914 ± 3839780.617 ns/op
among Units by ns/op It means execution completion time ( It's in nanoseconds ), and Score List as average execution time , ±
Symbols indicate errors . It can be seen from the above results , Two Set
It's similar in performance , And the execution speed is very fast , Next is stream
.
notes : The above results are based on the test environment :JDK 1.8 / Win10 16GB (2021H) / Idea 2022.3
Conclusion
Except for the large amount of data above , I also tested into 1000 Small amount of data .HashSet and LinkedHashSet The performance of is still very satisfactory , And for stream In terms of performance, when the amount of data is relatively small Set high (parallelStream Performance is not as good as stream, Here is a simple analysis ). It is recommended to use HashSet and LinkedHashSet duplicate removal .
De duplication selection
Based on the above , Yes ArrayList During the weight removal operation , Recommended LinkedHashSet.
In the final analysis, the de duplication operation is based on the data of the set , You see Set Sets are not allowed to repeat , Of course, first choose it . Then let's take a look at the requirements , Such as HashSet Data disorder after de duplication ,LinkedHashSet After de duplication, the data is orderly, but the performance is not HashSet good .
Stream There are single threaded and multi-threaded flow operations for flow selection , The performance of multithreaded operation flow here is better than that of single thread operation , It is necessary to judge whether the current operation may have concurrency risks .
Be careful. : In the computer system , Multithreading is not necessarily better than multithreading , The reason why single threading may be better than multithreading is usually CPU Performance waste caused by switching operation threads , There will be a thread sleep and thread wake operation . For example, common Redis It is designed as a single thread .
边栏推荐
- 每个程序员必须掌握的常用英语词汇(建议收藏)
- JS get array subscript through array content
- 【mysql】触发器
- Yyds dry goods count re comb this of arrow function
- Nodejs tutorial let's create your first expressjs application with typescript
- FZU 1686 龙之谜 重复覆盖
- No Yum source to install SPuG monitoring
- 038. (2.7) less anxiety
- Chris LATTNER, the father of llvm: why should we rebuild AI infrastructure software
- Set up a time server
猜你喜欢
968 edit distance
Why do job hopping take more than promotion?
每个程序员必须掌握的常用英语词汇(建议收藏)
【力扣刷题】一维动态规划记录(53零钱兑换、300最长递增子序列、53最大子数组和)
2022菲尔兹奖揭晓!首位韩裔许埈珥上榜,四位80后得奖,乌克兰女数学家成史上唯二获奖女性
Swagger UI教程 API 文档神器
Reinforcement learning - learning notes 5 | alphago
[MySQL] basic use of cursor
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
[interpretation of the paper] machine learning technology for Cataract Classification / classification
随机推荐
Regular expression collection
ACdreamoj1110(多重背包)
OAI 5g nr+usrp b210 installation and construction
【mysql】触发器
[MySQL] basic use of cursor
快过年了,心也懒了
The most comprehensive new database in the whole network, multidimensional table platform inventory note, flowus, airtable, seatable, Vig table Vika, flying Book Multidimensional table, heipayun, Zhix
【滑动窗口】第九届蓝桥杯省赛B组:日志统计
爱可可AI前沿推介(7.6)
审稿人dis整个研究方向已经不仅仅是在审我的稿子了怎么办?
js中,字符串和数组互转(一)——字符串转为数组的方法
全网最全的新型数据库、多维表格平台盘点 Notion、FlowUs、Airtable、SeaTable、维格表 Vika、飞书多维表格、黑帕云、织信 Informat、语雀
如何实现常见框架
[200 opencv routines] 220 Mosaic the image
[sliding window] group B of the 9th Landbridge cup provincial tournament: log statistics
Data Lake (VIII): Iceberg data storage format
Start the embedded room: system startup with limited resources
Notes - detailed steps of training, testing and verification of yolo-v4-tiny source code
js之遍历数组、字符串
The biggest pain point of traffic management - the resource utilization rate cannot go up