当前位置:网站首页>Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
Four common ways and performance comparison of ArrayList de duplication (jmh performance analysis)
2022-07-06 21:16:00 【I hope to wake up naturally every day】
List of articles
Here are four practical ArrayList The way to remove weight , At the same time, I will be practical Oracle Performance testing tools provided JMH(Java Microbenchmark Harness,Java Micro benchmark test suite ) Let's test the performance of these methods . Of course, the test here is carried out when the system performance is sufficient , See who can squeeze the performance of the machine better .
Of course , That uncommon way of weight removal is not involved here .
ArrayList Common methods of weight removal
ArrayList There are four common methods of weight removal , Namely HashSet duplicate removal 、LinkedHashSet duplicate removal 、stream Stream single thread de duplication 、stream Stream multithreading de duplication . The data structures involved here are reviewed first :
ArrayList( Order is not unique ):Object[], Thread unsafe , Insert delete slow ( Array move ), Quick query ( Random access ).
HashSet( Disorder is unique ): Hashtable , be based on HashMap Realization ,hashmap Of key Store its value . Thread unsafe .
LinkedHashSet( Orderly and unique ): Hashtable + Linked list ( The linked list stores the insertion order ), It is HashSet Subclass , There is only constructor inside . Thread unsafe .
HashSet & LinkedHashSet
// data
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// HashSet duplicate removal
ArrayList<Integer> hashSetList = new ArrayList<>(new HashSet<>(arrayList));
// LinkedHashSet duplicate removal
ArrayList<Integer> linkedHashSetList = new ArrayList<>(new LinkedHashSet<>(arrayList));
Hit the breakpoint to check the data

stream & parallelStream
List<Integer> arrayList = new ArrayList<Integer>(){
{
add(3);add(2);add(4);add(2);add(3);add(5);add(4);add(5);}};
// stream duplicate removal
List<Integer> streamList = arrayList.stream().distinct().collect(Collectors.toList());
// parallelStream duplicate removal
List<Integer> parallelStreamList = arrayList.parallelStream().distinct().collect(Collectors.toList());
Set a breakpoint to view the data

The above is the simple data De duplication operation method .
JMH Performance analysis
Use Oracle Official performance testing tools JMH(Java Microbenchmark Harness,JAVA Micro benchmark test suite ) Let's test this 4 Kind of The performance of the weight removal method .
First , Let's introduce JMH frame , stay pom.xml Add the following configuration to the file :
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-core -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.23</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openjdk.jmh/jmh-generator-annprocess -->
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.23</version>
<scope>provided</scope>
</dependency>
I have written the version number , The latest version is recommended .
Now write the test code , as follows
@BenchmarkMode(Mode.AverageTime) // Test completion time
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // preheating 2 round , Every time 1s
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)// test 5 round , Every time 1s
@Fork(1) // fork 1 Threads
@State(Scope.Thread) // One instance per test thread
public class ArrayListDistinctTest {
static List<Integer> arrayList = new ArrayList<Integer>(){
{
// altogether 2000000 Data
for (int i = 0; i < 1000000; i++) add(i);
for (int i = 0; i < 1000000; i++) add(i);
}};
public static void main(String[] args) throws RunnerException {
// Start benchmarking
Options opt = new OptionsBuilder()
.include(ArrayListDistinctTest.class.getSimpleName()) // Test class to import
.output("C:/Users/86177/Desktop/jmh-map.log") // Output test results file
.build();
new Runner(opt).run(); // Perform the test
}
@Benchmark
public void hashSet(){
new ArrayList<>(new HashSet<>(arrayList));
}
@Benchmark
public void linkedHashSet(){
new ArrayList<>(new LinkedHashSet<>(arrayList));
}
@Benchmark
public void stream(){
arrayList.stream().distinct().collect(Collectors.toList());
}
@Benchmark
public void parallelStream(){
arrayList.parallelStream().distinct().collect(Collectors.toList());
}
}
All have been added @Benchmark The annotated method will be tested , perform main The method can . And as the code shows , I tested a total of 2000000( Two million ) Data , Half is repeated . The test file has been printed to the desktop , I will post the performance comparison data here :
Benchmark Mode Cnt Score Error Units
ArrayListDistinctTest.hashSet avgt 5 37251270.300 ± 10458771.326 ns/op
ArrayListDistinctTest.linkedHashSet avgt 5 40382889.634 ± 11785155.916 ns/op
ArrayListDistinctTest.parallelStream avgt 5 157258934.286 ± 34670245.903 ns/op
ArrayListDistinctTest.stream avgt 5 38513055.914 ± 3839780.617 ns/op
among Units by ns/op It means execution completion time ( It's in nanoseconds ), and Score List as average execution time , ± Symbols indicate errors . It can be seen from the above results , Two Set It's similar in performance , And the execution speed is very fast , Next is stream .
notes : The above results are based on the test environment :JDK 1.8 / Win10 16GB (2021H) / Idea 2022.3
Conclusion
Except for the large amount of data above , I also tested into 1000 Small amount of data .HashSet and LinkedHashSet The performance of is still very satisfactory , And for stream In terms of performance, when the amount of data is relatively small Set high (parallelStream Performance is not as good as stream, Here is a simple analysis ). It is recommended to use HashSet and LinkedHashSet duplicate removal .
De duplication selection
Based on the above , Yes ArrayList During the weight removal operation , Recommended LinkedHashSet.
In the final analysis, the de duplication operation is based on the data of the set , You see Set Sets are not allowed to repeat , Of course, first choose it . Then let's take a look at the requirements , Such as HashSet Data disorder after de duplication ,LinkedHashSet After de duplication, the data is orderly, but the performance is not HashSet good .
Stream There are single threaded and multi-threaded flow operations for flow selection , The performance of multithreaded operation flow here is better than that of single thread operation , It is necessary to judge whether the current operation may have concurrency risks .
Be careful. : In the computer system , Multithreading is not necessarily better than multithreading , The reason why single threading may be better than multithreading is usually CPU Performance waste caused by switching operation threads , There will be a thread sleep and thread wake operation . For example, common Redis It is designed as a single thread .
边栏推荐
- Regular expression collection
- PG basics -- Logical Structure Management (transaction)
- In JS, string and array are converted to each other (II) -- the method of converting array into string
- 监控界的最强王者,没有之一!
- None of the strongest kings in the monitoring industry!
- JS操作dom元素(一)——获取DOM节点的六种方式
- 爱可可AI前沿推介(7.6)
- The use method of string is startwith () - start with XX, endswith () - end with XX, trim () - delete spaces at both ends
- Reflection operation exercise
- How to implement common frameworks
猜你喜欢

None of the strongest kings in the monitoring industry!

15million employees are easy to manage, and the cloud native database gaussdb makes HR office more efficient

Aiko ai Frontier promotion (7.6)

【滑动窗口】第九届蓝桥杯省赛B组:日志统计

PHP saves session data to MySQL database
![[MySQL] trigger](/img/b5/6df17eb254bbdb0aba422d08f13046.png)
[MySQL] trigger

039. (2.8) thoughts in the ward

HMS core machine learning service creates a new "sound" state of simultaneous interpreting translation, and AI makes international exchanges smoother

966 minimum path sum

全网最全的新型数据库、多维表格平台盘点 Notion、FlowUs、Airtable、SeaTable、维格表 Vika、飞书多维表格、黑帕云、织信 Informat、语雀
随机推荐
FZU 1686 龙之谜 重复覆盖
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
审稿人dis整个研究方向已经不仅仅是在审我的稿子了怎么办?
Huawei device command
Thinking about agile development
3D人脸重建:从基础知识到识别/重建方法!
Pat 1078 hashing (25 points) ⼆ times ⽅ exploration method
对话阿里巴巴副总裁贾扬清:追求大模型,并不是一件坏事
什么是RDB和AOF
PG基础篇--逻辑结构管理(事务)
每个程序员必须掌握的常用英语词汇(建议收藏)
全网最全的新型数据库、多维表格平台盘点 Notion、FlowUs、Airtable、SeaTable、维格表 Vika、飞书多维表格、黑帕云、织信 Informat、语雀
MLP (multilayer perceptron neural network) is a multilayer fully connected neural network model.
正则表达式收集
Pinduoduo lost the lawsuit, and the case of bargain price difference of 0.9% was sentenced; Wechat internal test, the same mobile phone number can register two account functions; 2022 fields Awards an
Reflection operation exercise
Proxy and reverse proxy
【深度学习】PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug
OSPF多区域配置
968 edit distance