当前位置:网站首页>可视化大型时间序列的技巧。
可视化大型时间序列的技巧。
2022-07-28 11:03:00 【Python数据之道】
来源:kaggle竞赛宝典
干货
作者:杰少
MidiMax压缩算法
简介

在很多时间序列问题中,例如金融时序数据,我们经常需要对其进行可视化以方便我们了解数据,但我们都知道金融数据是非常巨大的,所以如果需要可视化的话需要花费较多的RAM,磁盘等计算存储资源,本篇文章我们介绍一种压缩算法“Midimax”,该算法会通过数据大小压缩来提升时间序列图的效果。该算法的设计有如下几点目标:
不引入非实际数据。只返回原始数据的子集,所以没有平均、中值插值、回归和统计聚合等;
快速且计算量小;
它应该最大化信息增益。这意味着它应该尽可能多地捕捉原始数据中的变化;
由于取最小和最大点可能会给出夸大方差的错误观点,因此取中值点以保留有关信号稳定性的信息。
Midimax压缩算法

01
算法伪代码
向算法输入时间序列数据和压缩系数(浮点数)。
将时间序列数据拆分为大小相等的非重叠窗口,其中大小计算为:(压缩因子)。3表示从每个窗口获取的最小、中值和最大点。因此,要实现2的压缩因子,窗口大小必须为6。更大的压缩比需要更宽的窗口。
按升序对每个窗口中的值进行排序。
选取最小点和最大点的第一个和最后一个值。这将确保我们最大限度地利用差异并保留信息。
为中间值选取一个中间值,其中中间位置定义为()。因此,即使窗口大小是均匀的,也不会进行插值。
根据原始索引(即时间戳)对选取的点重新排序。
02
案例展示

蓝色是原始的图;
绿色的点是Midimax算法给出的图。
代码

'''
代码摘自:https://medium.com/towards-data-science/midimax-data-compression-for-large-time-series-data-daf744c89310
'''
import pandas as pd
def compress_series(inputser: pd.Series, compfactor=2):
"""
Split into segments and pick 3 points from each segment, the minimum,
median, and maximum. Segment length = int(compfactor x 3). So, to achieve a
compression factor of 2, a segment length of 6 is needed.
Parameters
----------
inputser : pd.Series
Input data to be compressed.
compfactor : float
Compression factor. The default is 2.
Returns
-------
pd.Series
Compressed output series.
"""
# If comp factor is too low, return original data
if (compfactor < 1.4):
return inputser
win_size = int(3 * compfactor) # window size
# Create a column ofsegment numbers
ser = inputser.rename('value')
ser = ser.round(3)
wdf = ser.to_frame()
del ser
start_idxs = wdf.index[range(0, len(wdf), win_size)]
wdf['win_start'] = 0
wdf.loc[start_idxs, 'win_start'] = 1
wdf['win_num'] = wdf['win_start'].cumsum()
wdf.drop(columns='win_start', inplace=True)
del win_size, start_idxs
# For each window, get the indices of min, median, and max
def get_midimax_idxs(gdf):
if len(gdf) == 1:
return [gdf.index[0]]
elif gdf['value'].iloc[0] == gdf['value'].iloc[-1]:
return [gdf.index[0]]
elif len(gdf) == 2:
return [gdf.index[0], gdf.index[1]]
else:
return [gdf.index[0], gdf.index[len(gdf) // 2], gdf.index[-1]]
wdf = wdf.dropna()
wdf_sorted = wdf.sort_values(['win_num', 'value'])
midimax_idxs = wdf_sorted.groupby('win_num').apply(get_midimax_idxs)
# Convert into a list
midimax_idxs = [idx for sublist in midimax_idxs for idx in sublist]
midimax_idxs.sort()
return inputser.loc[midimax_idxs]小结

Midimax是一种简单轻量级的算法,可以减少数据的大小,并进行快速的图形绘制,我们发现:
Midimax在绘制大型时序图时可以保留原始时序的趋势;可以使用较少的点捕获原始数据中的变化,并在几秒钟内处理大量数据。
Midimax会丢失部分细节;压缩过大的话可能会有较多信息丢失。
参考文献

1. https://github.com/edwinsutrisno/midimax_compression
2. Midimax Compression for Large Time-Series Data
-------- End --------

精选内容


边栏推荐
- Two point, three point, 01 point plan [bullet III]
- Blackboard cleaning effect shows H5 source code + very romantic / BGM attached
- Object stream of i/o operation (serialization and deserialization)
- What is WordPress
- ripro9.0修正升级版+WP两款美化包+稀有插件
- Are interviews all about memorizing answers?
- What's the secret of creating a popular short video?
- 一种比读写锁更快的锁,还不赶紧认识一下
- b2子主题/博客b2child子主题/开源源码
- Tiktok programmer confession special code tutorial (how to play Tiktok)
猜你喜欢

Zotero document manager and its use posture (updated from time to time)

Three methods of using unity mouse to drive objects
![Full version of H5 social chat platform source code [complete database + complete document tutorial]](/img/3f/03239c1b4d6906766348d545a4f234.png)
Full version of H5 social chat platform source code [complete database + complete document tutorial]
![Two point, three point, 01 point plan [bullet III]](/img/4c/a047440b4752c74c249d5e98bd4b3d.png)
Two point, three point, 01 point plan [bullet III]
![[极客大挑战 2019]BabySQL-1|SQL注入](/img/21/b5b4727178a585e610d743e92248f7.png)
[极客大挑战 2019]BabySQL-1|SQL注入

什么样的知识付费系统功能,更有利于平台与讲师发展?

CVPR2020 best paper:对称可变形三维物体的无监督学习

万字详解 Google Play 上架应用标准包格式 AAB

ripro9.0修正升级版+WP两款美化包+稀有插件

对话庄表伟:开源第一课
随机推荐
Ripro9.0 revised and upgraded version +wp two beautification packages + rare plug-ins
Introduction to web security RADIUS protocol application
Byte side: how to realize reliable transmission with UDP?
"Node learning notes" koa framework learning
PHP detects whether the URL URL link is normally accessible
Jupiter、spyder、Anaconda Prompt 、navigator 快捷键消失的解决办法
A lock faster than read-write lock. Don't get to know it quickly
Flutter tutorial flutter navigator 2.0 with gorouter, use go_ Router package learn about the declarative routing mechanism in fluent (tutorial includes source code)
How to effectively implement a rapid and reasonable safety evacuation system in hospitals
Design and implementation of SSM personal blog system
Software testing and quality learning notes 1 --- black box testing
MySQL离线同步到odps的时候 可以配置动态分区吗
Localization, low latency, green and low carbon: Alibaba cloud officially launched Fuzhou data center
Are interviews all about memorizing answers?
Why does MySQL sometimes choose the wrong index?
[FPGA tutorial case 41] image case 1 - reading pictures through Verilog
Design a system that supports millions of users
What is WordPress
What is the process of switching c read / write files from user mode to kernel mode?
万字详解 Google Play 上架应用标准包格式 AAB