当前位置:网站首页>Performance evaluation report of YoMo codec - Y3

Performance evaluation report of YoMo codec - Y3

2020-11-09 19:12:00 Cella

yomo-y3-stress-testing

YoMo Introduce

YoMo It is an open source real-time edge computing gateway 、 Development framework and microservice platform , The communication layer is based on QUIC agreement (2020-09-25 Update to Draft-31 edition ), Better release 5G Wait for the value of next generation low delay network . For streaming (Streaming Computing) Designed codec yomo-codec It can greatly improve the throughput of computing services ; Plug in based development model ,5 Minutes to go online your Internet of things real-time edge computing processing system .YoMo At present, it has been deployed in the field of industrial Internet .

Official website : https://yomo.run

YoMo Codec Introduce

yomo-codec-golang It's through golang Language implementation YoMo Codec Of SPEC describe ; Provide right TLV structure And the ability to encode and decode basic data types , And for YoMo Provide encoding and decoding tools to support its message processing . You can extend it to more types of data processing , It can even be extended and applied to other frameworks that need coding and decoding .

Project introduction :README.md

Why YoMo-Codec?

as everyone knows , stay HTTP We often use JSON As a codec for messages , Because it has a simple format , Easy to read and write , Support for multiple languages , So it's very popular in Internet applications , Then why do we need to do our own research YoMo Codec To support YoMo Application ?

  • YoMo Streaming messages , Extract monitored key-value Process the business logic . If you use JSON codec , Will require that you must wait for the complete packet to be received before deserializing the packet as an object , And then extract the corresponding key-value value ; But for the YoMo Codec, By describing object data as a set of TLV structure , When decoding packets , Can be in the decoding process earlier to understand the current T Whether or not to listen to the prison key, To determine whether to jump directly to the next group TLV structure , There is no need to decode non monitored packets unnecessarily , Thus, the efficiency of decoding is improved .
  • JSON A lot of reflection is usually used in decoding , So its performance will be affected , and YoMo Codec Because only for the actual monitored key-value decode , The use of actual reflection will be greatly reduced .
  • In the industrial Internet or in the network applications that require strict computing resources , For the same codec operation, less loss is needed CPU resources , So that the limited computing resources can be more fully used .

This performance test is to verify YoMo Codec Than JSON It has higher data decoding performance and less resource consumption , Thus for YoMo Provide more real-time 、 Efficient 、 Low loss of message processing capability .

Test instructions

1. Test method

  • adopt Benchmark Benchmark , Provide Serial and parallel Two ways , The latter is making full use of CPU Performance in the case of resources .

  • The tested data package is generated by program , And guarantee that Codec And JSON The data used in the test contains key-value The values of pairs are exactly the same .

  • The data to be tested contains key-value The right data is divided into 3 Yes 16 Yes 32 Yes 63 Yes These groups , Observe separately in different key-value The effect on decoding performance in case of quantity , And what's being monitored key The values are the middle values of their numbers , Such as : K08 It means to listen to No 8 individual key Value . So you get the following dimensions , It is then shown in the chart of the test results .

    Symbolic representation Key-value The number of Monitored key Location
    C63-K32 common 63 Yes key-value Listen to extract section 32 position Of key Of value value
    C32-K16 common 32 Yes key-value Listen to extract section 16 position Of key Of value value
    C16-K08 common 16 Yes key-value Listen to extract section 08 position Of key Of value value
    C03-K02 common 03 Yes key-value Listen to extract section 02 position Of key Of value value
  • The results of the test include :

    • Decode and extract the monitored data from packets key Corresponding value Value operation performance comparison .
    • Compare the occupancy in the same decoded extraction scenario CPU Time for .

2. data structure

  • Y3 Test data

    0x80
        0x01 value
        ....
        0x3f value 
    
  • JSON The structure of the test data

    {
        "k1": value,
        ...
        "k63" value
    }
    

3. Data processing logic

TakeValueFromXXX

4. Test project

  • The code of this test report can be downloaded from yomo-y3-stress-testing Project acquisition .

  • Main code structure description ( List only the documentation directly related to this test ):

catalog

5. Test environment

  • Hardware environment :
    • CPU:2.6 GHz 6P intel Core i7,GOMAXPROCS=12
    • Memory :32GB
    • Hard disk :SSD
  • Software environment :

Benchmark test

1. Serial test process

  • The code under test :./internal/decoder/report_serial/report_benchmark_test.go, Such as :

    //  in the light of YoMo Codec Y3 Benchmark 
    func Benchmark_Codec_C63_K32(b *testing.B) {
    	var key byte = 0x20
    	data := generator.NewCodecTestData().GenDataBy(63)
    	b.ResetTimer()
    	for i := 0; i < b.N; i++ {
    		if decoder.TakeValueFromCodec(key, data) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    //  in the light of JSON Benchmark 
    func Benchmark_Json_C63_K32(b *testing.B) {
    	key := "k32"
    	data := generator.NewJsonTestData().GenDataBy(63)
    	data = append(data, decoder.TokenEnd)
    	b.ResetTimer()
    	for i := 0; i < b.N; i++ {
    		if decoder.TakeValueFromJson(key, data) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    • Benchmark_Codec_C63_K32: For key-value by 63 The data set of the group is extracted from the 32 individual key Data value , Serial benchmarking of this .
    • Default :GOMAXPROCS=12
  • Start the test script : ./internal/decoder/report_serial/report_benchmark_test.sh

    temp_file="../../../docs/temp.out"
    report_file="../../../docs/report.out"
    go test -bench=. -benchtime=3s -benchmem -run=none | grep Benchmark > ${temp_file} \
      && echo 'finished bench' \
      && cat ${temp_file} \
      && cat ${temp_file} | awk '{print $1,$3}' | awk -F "_" '{print $2,$3"-"substr($4,1,3),substr($4,7)}' | awk -v OFS=, '{print $1,$2,$3}' > ${report_file} \
      && echo 'finished analyse' \
      && cat ${report_file}
    

    Through to report_benchmark_test.go Test file run benchmark The benchmark , Generate test result set and save to ./docs/report.out In file .

  • Generate a result chart :./docs/report_graphics.ipynb

    python --version # Python version > 3.2.x
    pip install runipy
    bar_ylim=70000 barh_xlim=20 runipy ./report_graphics.ipynb
    

2. Parallel testing process

To maximize CPU Utilization ratio , Observe the performance of decoder in multi-core environment , Added Parallel Test items of

  • The code under test :./internal/decoder/report_parallel/report_benchmark_test.go, Such as :

    func Benchmark_Codec_C63_K32(b *testing.B) {
    	var key byte = 0x20
    	data := generator.NewCodecTestData().GenDataBy(63)
    	b.ResetTimer()
    	b.RunParallel(func(pb *testing.PB) {
    		for pb.Next(){
    			if decoder.TakeValueFromCodec(key, data) == nil {
    				panic(errors.New("take is failure"))
    			}
    		}
    	})
    }
    
    • The code is the same as the body of the serial , The difference is in the use of RunParallel To do parallel testing
    • Default :GOMAXPROCS=12
  • Start the test script : ./internal/decoder/report_parallel/report_benchmark_test.sh Generate test result set and save to ./docs/report.out In file .

  • Generate a result chart :

    bar_ylim=18000 barh_xlim=25 runipy ./report_graphics.ipynb
    

3. test result

  • Serial Benchmark test result :

    • Time consuming comparison of single decoding extraction : chart 3.1

    report1_serial

    • Y3 And JSON The rate of time-consuming growth : chart 3.2

      report2_serial

    • Chart description :
      • chart 3.1 Coordinates of :C63-K32, Indicates that the packet contains 63 Yes key-value, And listen to the same section 32 Bit key Extract it value.
      • chart 3.1 Of Y coordinate : Represents the number of nanoseconds taken by a single operation .
      • chart 3.2 Of X coordinate : Express (JSON Decoding takes time /Y3 Decoding takes time ) The increase times of . Such as :43010/2077=20.07
  • parallel Benchmark test result :

    • Time consuming comparison of single decoding extraction : chart 3.3

      report1_parallel

    • Y3 And JSON The rate of time-consuming growth : chart 3.4

      report2_parallel

4. Test and analysis

The above test results show that :

  • Y3 Decoding performance ratio of JSON There's a big improvement , As the packet contains key-value Yes, the more , The more obvious the performance improvement , On average, 10 Double growth . (20.7+15.8+6.2+3.3)/4=11.5

  • Parallel decoding with multiple cores , Its ns/op There is also a big improvement in the performance of . Parallel vs. serial has 3 Double the rise :

    C63-K32 C32-K16 C16-K08 C03-K02
    Serial test 2077 1361 1667 610
    Parallel test 706 505 515 175
    growth 290% 260% 320% 350%

CPU Resource analysis

1. Testing process

  • The code under test : ./cpu/cpu_pprof.go

    func main() {
    	dataCodec := generator.NewCodecTestData().GenDataBy(63)
    	dataJson := generator.NewJsonTestData().GenDataBy(63)
    	dataJson = append(dataJson, decoder.TokenEnd)
    
    	// pprof
    	fmt.Printf("start pprof\n")
    	go pprof.Run()
    	time.Sleep(5 * time.Second)
    
    	fmt.Printf("start testing...\n")
    	for {
    		if decoder.TakeValueFromCodec(0x20, dataCodec) == nil {
    			panic(errors.New("take is failure"))
    		}
    		if decoder.TakeValueFromJson("k32", dataJson) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    • pprof.Run(): Used to start pprof
  • The program circulates over and over again Y3 and JSON decode , Through observation cpu profile Its sampling diagram CPU The proportion of the resources of

  • Run the test :

    #  Run the observed code ,pprof The default startup 6060 port 
    go run ./cpu_pprof.go
    #  Take samples , adopt 8081 Port observation analysis chart 
    go tool pprof -http=":8081" http://localhost:6060/debug/cpu/profile
    

2. test result

cpu

3. Test and analysis

As can be seen from the above figure ,YoMo Codec Y3 It has to be decoded. Right CPU The occupation of resources is far less than JSON, There's also a difference 10 More than times (0.73/0.07=10.4), This observation is related to Benchmark It can correspond to , Yes CPU Low resource usage , At the same time, the decoding speed is also improved .

Test conclusion

Y3 a JSON There is an order of magnitude improvement in decoding performance , In the packet key The more quantity, the more obvious the performance improvement , meanwhile Y3 Yes CPU There is also an order of magnitude reduction in resource usage ; Through this performance test, it can be verified that YoMo Codec Y3 Can decode for YoMo Or other scenarios that require high-performance decoding provide real-time 、 Efficient 、 Low loss of message processing capability .

版权声明
本文为[Cella]所创,转载请带上原文链接,感谢