当前位置:网站首页>Techniques for visualizing large time series.
Techniques for visualizing large time series.
2022-07-28 11:44:00 【The way of Python data】
source :kaggle Competition book
dried food
author : Devo
MidiMax Compression algorithm
brief introduction

In many time series problems , For example, financial time series data , We often need to visualize it so that we can understand the data , But we all know that the financial data is very huge , So if visualization is needed, it will cost more RAM, Disk and other computing storage resources , In this article, we introduce a compression algorithm “Midimax”, This algorithm will improve the effect of time series diagram by compressing the data size . The design of this algorithm has the following goals :
Do not introduce non actual data . Only a subset of the original data is returned , So there is no average 、 Median interpolation 、 Regression and statistical aggregation ;
Fast and less computation ;
It should maximize information gain . This means that it should capture as many changes in the original data as possible ;
Taking the minimum and maximum points may give the wrong view of exaggerating variance , Therefore, the median point is taken to retain information about signal stability .
Midimax Compression algorithm

01
Algorithm pseudocode
Input time series data and compression coefficient to the algorithm ( Floating point numbers ).
Split the time series data into non overlapping windows of equal size , Where the size is calculated as :( Compressibility factor ).3 Represents the minimum obtained from each window 、 Median and maximum . therefore , To achieve 2 Compression factor of , The window size must be 6. A larger compression ratio requires a wider window .
Sort the values in each window in ascending order .
Select the first and last values of the minimum and maximum points . This will ensure that we maximize differences and retain information .
Choose an intermediate value for the intermediate value , The middle position is defined as (). therefore , Even if the window size is uniform , No interpolation .
According to the original index ( Timestamp ) Reorder the selected points .
02
The case shows

Blue is the original picture ;
The green dot is Midimax The graph given by the algorithm .
Code

'''
Code from :https://medium.com/towards-data-science/midimax-data-compression-for-large-time-series-data-daf744c89310
'''
import pandas as pd
def compress_series(inputser: pd.Series, compfactor=2):
"""
Split into segments and pick 3 points from each segment, the minimum,
median, and maximum. Segment length = int(compfactor x 3). So, to achieve a
compression factor of 2, a segment length of 6 is needed.
Parameters
----------
inputser : pd.Series
Input data to be compressed.
compfactor : float
Compression factor. The default is 2.
Returns
-------
pd.Series
Compressed output series.
"""
# If comp factor is too low, return original data
if (compfactor < 1.4):
return inputser
win_size = int(3 * compfactor) # window size
# Create a column ofsegment numbers
ser = inputser.rename('value')
ser = ser.round(3)
wdf = ser.to_frame()
del ser
start_idxs = wdf.index[range(0, len(wdf), win_size)]
wdf['win_start'] = 0
wdf.loc[start_idxs, 'win_start'] = 1
wdf['win_num'] = wdf['win_start'].cumsum()
wdf.drop(columns='win_start', inplace=True)
del win_size, start_idxs
# For each window, get the indices of min, median, and max
def get_midimax_idxs(gdf):
if len(gdf) == 1:
return [gdf.index[0]]
elif gdf['value'].iloc[0] == gdf['value'].iloc[-1]:
return [gdf.index[0]]
elif len(gdf) == 2:
return [gdf.index[0], gdf.index[1]]
else:
return [gdf.index[0], gdf.index[len(gdf) // 2], gdf.index[-1]]
wdf = wdf.dropna()
wdf_sorted = wdf.sort_values(['win_num', 'value'])
midimax_idxs = wdf_sorted.groupby('win_num').apply(get_midimax_idxs)
# Convert into a list
midimax_idxs = [idx for sublist in midimax_idxs for idx in sublist]
midimax_idxs.sort()
return inputser.loc[midimax_idxs]Summary

Midimax It is a simple and lightweight Algorithm , It can reduce the size of data , And carry out fast graphic drawing , We found that :
Midimax When drawing a large sequence diagram, the trend of the original sequence can be preserved ; Fewer points can be used to capture changes in the original data , And process a large amount of data in a few seconds .
Midimax Some details will be lost ; If the compression is too large, more information may be lost .
reference

1. https://github.com/edwinsutrisno/midimax_compression
2. Midimax Compression for Large Time-Series Data
-------- End --------

Selected content
The illustration Pandas- Image & Text 01- Data structure introduction
The illustration Pandas- Image & Text 02- Create data objects
The illustration Pandas- Image & Text 03- Read and store Excel file
The illustration Pandas- Image & Text 04- Common data access
The illustration Pandas- Image & Text 05- Common data operations
The illustration Pandas- Image & Text 06- Common mathematical calculations
The illustration Pandas- Image & Text 08- Common data filtering


边栏推荐
- AlexNet—论文分析及复现
- 【cesium】entity属性和时许绑定:SampledProperty方法简单使用
- Four advantages of verification code to ensure mailbox security
- Server online speed measurement system source code
- Install SSL Certificate in Litespeed web server
- Good use explosion! The idea version of postman has been released, and its functions are really powerful
- Zotero document manager and its use posture (updated from time to time)
- What's the secret of creating a popular short video?
- Digital twin rail transit: "intelligent" monitoring to clear the pain points of urban operation
- Database advanced learning notes -- object type
猜你喜欢

What functions does MySQL have? Don't look everywhere. Just look at this.

移动端人脸风格化技术的应用

Design a system that supports millions of users

A lock faster than read-write lock. Don't get to know it quickly

DHCP experiment demonstration (Huawei switch device configuration)

Six papers that must be seen in the field of target detection

数字孪生轨道交通:“智慧化”监控疏通城市运行痛点
![[general database integrated development environment] Shanghai daoning provides you with Aqua Data Studio downloads, tutorials, and trials](/img/46/830b7703ae7cbfa6051137061560c2.png)
[general database integrated development environment] Shanghai daoning provides you with Aqua Data Studio downloads, tutorials, and trials

Byte side: how to realize reliable transmission with UDP?

Are interviews all about memorizing answers?
随机推荐
WinForm generates random verification code
CVPR2020 best paper:对称可变形三维物体的无监督学习
Autumn recruit offer harvesters, and take the offers of major manufacturers at will
postgres概述
Solutions to slow start of MATLAB
Four advantages of verification code to ensure mailbox security
MySQL (version 8.0.16) command and description
迭代法求行列式(线性代数公式)
什么是WordPress
mysql的左连接和右连接(内连接和自然连接的区别)
Software testing and quality learning notes 1 --- black box testing
[MySQL] MySQL error "error 2006 (HY000): MySQL server has gone away"
Design a system that supports millions of users
Five Ali technical experts have been offered. How many interview questions can you answer
Sirius network verification source code / official genuine / included building tutorial
Byte side: how to realize reliable transmission with UDP?
What is WordPress
vim命令下显示行号[通俗易懂]
Why does MySQL sometimes choose the wrong index?
【cesium】entity属性和时许绑定:SampledProperty方法简单使用