当前位置:网站首页>Which version of JVM is the fastest?
Which version of JVM is the fastest?
2022-06-29 16:13:00 【51CTO】

This article uses open source Chronicle Queue Two threads of , Exchange with each other 256 Bytes of message data , Which version of JVM The fastest .
Chronicle Queue Is a persistent low latency Java Messaging framework . It is suitable for critical applications with high performance . because Chronicle Queue Run on memory mapped locally , So it eliminates the need for garbage collection , And provides developers with certainty and high performance .
This article will use open source Chronicle Queue Two threads of , Exchange with each other 256 Bytes of message data . meanwhile , To minimize the impact on the disk subsystem , All messages will be stored in shared memory --/dev/shm in .
Usually , In such benchmarks , A single producer (producer) The thread writes the message with a nanosecond timestamp (nanosecond timestamp) Of the queue . Another consumer thread reads messages from the queue , And record the time increment in the histogram . The producer keeps every second 100,000 Continuous output rate of messages . among , The payload in each message is 256 byte . Because the data will be in 100 Measured over a span of seconds , Therefore, most of the jitters can be reflected in the measurement , And it ensures that those with higher percentiles , Fall within a reasonable confidence interval .
Our target host is to have a AMD Ryzen 9 5950X Of 16 Nuclear processor , And take 3.4 GHz Running on the Linux 5.11.0-49-generic #55-Ubuntu SMP On . Because of the CPU Of 2-8 The nucleus is isolated , Therefore, the operating system will not automatically schedule any user processes , And will avoid most of the interruptions on these cores .
PART 01
Java Code
The following shows part of the code for the producer's internal loop :
Java
// Pin the producer thread to CPU 2
Affinity.setAffinity(2);
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(tmp)
.blockSize(blocksize)
.rollCycle(ROLL_CYCLE)
.build()) {
ExcerptAppender appender = cq.acquireAppender();
final long nano_delay = 1_000_000_000L/MSGS_PER_SECOND;
for (int i = -WARMUP; i < COUNT; ++i) {
long startTime = System.nanoTime();
try (DocumentContext dc = appender.writingDocument()) {
Bytes bytes = dc.wire().bytes();
data.writeLong(0, startTime);
bytes.write(data,0, MSGSIZE);
}
long delay = nano_delay - (System.nanoTime() - startTime);
spin_wait(delay);
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
Slide left and right to see the full code
And in another thread , consumer (consumer) The thread will pass the following code ( The following is only a shortened part ), In its internal circulation
Java
// Pin the consumer thread to CPU 4
Affinity.setAffinity(4);
try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(tmp)
.blockSize(blocksize)
.rollCycle(ROLL_CYCLE)
.build()) {
ExcerptTailer tailer = cq.createTailer();
int idx = -APPENDERS * WARMUP;
while(idx < APPENDERS * COUNT) {
try (DocumentContext dc = tailer.readingDocument()) {
if(!dc.isPresent())
continue;
Bytes bytes = dc.wire().bytes();
data.clear();
bytes.read(data, (int)MSGSIZE);
long startTime = data.readLong(0);
if(idx >= 0)
deltas[idx] = System.nanoTime() - startTime;
++idx;
}
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
Slide left and right to see the full code
It can be seen that , The consumer thread reads each nano timestamp , And record the corresponding delay in an array . These timestamps will be used later when the benchmark is completed , Put into the histogram for printing . and , Only in JVM Be correctly “ preheating ”、 as well as C2 The compiler has JIT(Just-In-Time) After the hot execution path of , Measurement will begin .
PART 02
JVM Various variant versions of
at present ,Chronicle Queue The ability to formally support includes Java 8、Java 11 and Java 17 Inside , All recent LTS(Light Task Schedule) edition , So they can all be used for benchmarking . meanwhile , We will also use GraalVM Community and enterprise . The following are some of the specific JVM List of variant versions :

surface 1, Lists the specific JVM Variant version
PART 03
measurement
Because the benchmark will run 100 second , And every second 100,000 Messages are generated , So during each benchmark , We will have 100,000 * 100 = 1000 Ten thousand messages need to be sampled . The histogram places each sample in 50%( Median )、90%、99%、 as well as 99.9% Wait for a specific percentile . The following table shows the tests for these percentiles , The total number of messages received :

surface 2, Displays the number of messages per percentile
For the above table , We lock in the range where the change of the measured value is relatively small , For up to 99.99% Percentile of , The confidence interval may be reasonable . and 99.999% Percentile of , It may need to run for at least half an hour , Instead of just using 100 The second time , To collect data , To generate any data with a reasonable confidence interval .
PART 04
Benchmark results
For each Java Variant version , We all ran the following benchmarks :
Be careful , Our producer and consumer threads will be locked , So that they can be separated from each other CPU Of 2 and 4 Run on the core . The following are typical process characteristics after they have been running for a period of time :
Slide left and right to see the full code
It can be seen that , The producer and consumer threads rotate and wait between each message (spin-wait), So each consumes the whole CPU The kernel of . If CPU The consumption of is a potential problem , Then the delay and certainty can be achieved by , Pause the thread for a short time ( for example LockSupport.parkNanos(1000)), To reduce power consumption .
Usually , We'll be in nanoseconds (ns) Measure test results in units . Of course , Many other types of delay measurements are also measured in microseconds (= 1,000 ns)、 Even milliseconds (= 1,000,000 ns) Unit of measurement . Here 1 ns Roughly corresponding to CPU 1 Level cache access time .
All the following test values are based on ns Benchmark results in units :

surface 3, Shows the various JDK Delay results for
(*) Indicates that it has not been Chronicle Queue Formal support
PART 05
Typical delay ( Median )
It can be known from the above table , For typical ( Median ) value , Various JDK There is no significant difference between , It's just OpenJDK 11 Will be slower than other versions 30%. The fastest one is GraalVM EE 17, It is associated with OpenJDK 8、 as well as OpenJDK 17 The difference is small .
The chart shown below includes the use of various JDK Variant version , Processing 256 Typical delay for byte messages ( Of course, the lower the better ):

chart 1, Shows various JDK Median of variant versions ( A typical ) Delay ( With ns In units of )
It can be seen from the picture that , A typical ( Median ) The delay will vary slightly depending on the operating environment , Their numbers vary by about 5%.
PART 06
Higher percentile
Here is another chart , It shows a variety of JDK Variant version of 99.99% Percentile delay ( Of course, the lower the better ). From the higher percentile , Various supported JDK Between variant versions , There is not much difference .GraalVM EE A little faster again , But the relative difference here becomes smaller . and OpenJDK 11 It seems to be a little worse than other variants (-5%), However, the error increment is still within the acceptable range .

chart 2, Shows various JDK Variant version of 99.99% Percentile delay ( With ns In units of )
PART 07
Summary
According to the execution logic of the above code : Access from main memory 64 A data , About need 100 A cycle ( namely , On current hardware, it is equivalent to about 30 ns). Compare with the above test , We can see that , Chronicle Queue Get data from producers , And persistent data by writing memory mapped files , For inter thread communication and happens-before Guarantee , Apply appropriate memory protection , Then provide the data to the consumer . And in 30 ns Single in 64 Bit memory access compared to , All this usually happens in 600 ns Left and right 256 Bytes on the message . These are Chronicle Queue The resulting delay comparison results are impressive .
so ,OpenJDK 17 and GraalVM EE 17 Both provide the best delay results , It belongs to the preference of the application . Of course , If you need to suppress outliers 、 Or reduce the overall delay as much as possible , that GraalVM EE 17 It will be more suitable for .
边栏推荐
- Sophon kg upgrade 3.1: break down barriers between data and liberate enterprise productivity
- 蓝桥杯2015年CA省赛(填坑中)
- golang操作NSQ分布式消息队列
- BS-GX-018 基于SSM实现在校学生考试系统
- 如何在 WordPress 中创建登录页面
- What are the top level Chinese programmers?
- Key sprite fighting monsters - window binding skill
- Mysql database Basics: introduction to data types
- How to install WordPress on a web site
- 硬件开发笔记(八): 硬件开发基本流程,制作一个USB转RS232的模块(七):创建基础DIP元器件(晶振)封装并关联原理图元器件
猜你喜欢

瓜分1000+万奖金池,昇腾AI创新大赛2022实力赋能开发者

暑期数据可视化分析展示效果

Science: the interrelated causes and consequences of sleep in the brain

星环科技数据安全管理平台 Defensor重磅发布

MATLAB给数据加噪声/扰动

CVPR 2022 | greatly reduce the manual annotation required for zero sample learning. Mapuosuo and Beiyou proposed category semantic embedding rich in visual information
![leetcode:139. Word splitting [DFS + memory]](/img/6f/8936ed3579c6a6dc3d8d312b413aff.png)
leetcode:139. Word splitting [DFS + memory]

BOE: with the arrival of the peak season in the second half of the year, the promotion and the release of new products, the demand is expected to improve

Interviewer: tell me about the MySQL transaction isolation level?

Nanjing University: Discussion on the training scheme of digital talents in the new era
随机推荐
leetcode:535. TinyURL 的加密与解密【url和id的映射,id自增】
Mysql database Basics: introduction to data types
MySQL常用语句和命令汇总
How can I repair a slow WordPress website?
mysql数据库基础:DDL数据定义语言
【Proteus仿真】数码管递加/递减带闪烁消隐显示
C. Where‘s the Bishop?
The rooster Electronic Society graphical programming real questions and answers analysis of the scratch grade test level 1 June 2022
leetcode:139. Word splitting [DFS + memory]
mysql报错:Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column
telnet+ftp 对设备进行 操控 和 升级
如何在网站上安装 WordPress
[everyone's project] launch the official website of rbatis ORM
路由汇总带来的三层环路-解决实验
BS-GX-017基于SSM实现的在线考试管理系统
资讯 | 扎克伯格被评为全球IT业最不谨慎的CEO;中国移动研发系留式无人机应急通信高空基站
Key sprite fighting monsters - multi window and multi thread background skills
作为开发人员,无代码开发平台 iVX 你有必要了解一下
按键精灵打怪学习-多窗口多线程后台判断人物、宠物血量和宠物快乐度
蓝桥杯几道全排列的题目