当前位置:网站首页>Easy processing of ten-year futures and stock market data -- Application of tdengine in Tongxinyuan fund
Easy processing of ten-year futures and stock market data -- Application of tdengine in Tongxinyuan fund
2022-07-05 03:16:00 【songroom】
Easy processing of ten-year futures and stock market data ——TDengine Application in concentric source Fund
Concentric source ( sanya ) fund Liu Jian Dec 08, 2021 / classification Chinese、 User stories 、 Top recommendation
Small T Reading guide : Concentric source ( sanya ) Fund Management Co., Ltd. is a company committed to adopting scientific methods , Private equity companies investing in the secondary market . The team members of the company are from excellent universities at home and abroad , The founder has a PhD in computer , Many years of algorithm research 、 Experience in software system development .
Starting from our business model , Business personnel mainly discover the trading rules of the market through data mining and automatic pattern recognition . therefore , Our work scenario is based on a large amount of financial data , It mainly includes the following categories :
Real time high-frequency data of domestic futures market , Item by item data, etc
Historical high-frequency data of domestic futures market , Data by data
High frequency data of domestic stock market , Item by item data, etc
Historical high-frequency data of domestic stock market , Data by data
A larger amount of level derivative data generated from the above data
After years of development , The stock market has a huge amount of data , With the cleaning and writing of new data every day , The total amount has become even higher . For more than a dozen TB The amount of data , Storage alone is not easy , If you need to query and download the data , Even more difficult . These problems lie before us , It also makes us lose confidence in the mainstream databases on the market .
later , After the introduction of professionals , We tried TDengine, I didn't expect it to easily adapt to our current business .
Specific practice and landing effect
After selecting the database, we immediately started to build , And chose the latest 2.1.3.2 The version of is deployed , The databases corresponding to different data types are as follows :
1) Stock high frequency database , Including the historical data of the stock market + Daily new data :
Such data are passed daily through Python The way of the connector , After closing, batch import and then analyze . Each table represents a stock , common 85 Column , With Float Data based , common 32311 Zhang .
According to the above table, the structure is calculated , The current situation is about... Per line 408 The length of bytes , Then we use the script to query the row number of all tables , Probably 320 Billion rows .
Based on the above data, the total amount of data received is estimated , The rough calculation is 408*320 Billion rows , Probably 12TB about , After statistics, the actual disk space occupied is only 2T about , This shocked us —— Compression up to 16.7%.
as everyone knows ,Float Type data compression has always been a difficult problem in the field of database , Especially for the database with line storage , Thank you very much for your pleasure TDengine Column storage of , Helped us solve this thorny problem perfectly .
Then we learned from officials , In later versions ,TDengine Further algorithm optimization is also done for floating-point data , The compression ratio can also be greatly improved . But now you need to compile it manually , You can contact the official for specific operation .
2) Futures warehouse :
The futures library is deployed on another server , There are three : Futures high frequency database 、 futures X Frequency database 、 futures Y Frequency database . They represent the high-frequency data of all domestic futures and the aggregated data of different time and frequency :
Futures high frequency database : Real time recording of information sent by the exchange tick data
futures X Frequency database : According to the time period X Set up , Record the aggregated data
futures Y Frequency database : According to the time period Y Set up , Record the aggregated data
The above three libraries contain 3351、5315、5208 Zhang Zibiao , Like a stock pool , They also include long-term historical data and real-time data .
The specific table structure is as follows :
In terms of inquiry , At present, our query is only for a single table , So the logic is simple , The code is as follows :
Besides , As there is no market for futures for many consecutive years , So for long-term data display , We choose to use multiple segments of each X Months of data were spliced , Query efficiency is very fast . for example : stay TDengine The client server uses Python Pull the futures market data for two consecutive months from the server , Time consuming 0.16 second .
The following figure shows the factor 1 Yield curve on futures rapeseed meal , We can also see from this picture , Some other commonly used functions, such as max、last, be based on TDengine Cache and other technologies also realize the millisecond return data .
from “ Two points ” To in-depth cooperation
Careful readers may also notice two small problems in the article :
Why do we estimate the amount of raw data , Is to count the number of rows in all sub tables through scripts , Multiply it by a single byte , Not directly through TDengine Of “ Supertable ”?
Why in the data classification description at the beginning of the article ,1-4 You can see the actual corresponding database in the following text , But there is no second 5 strip —— A large number of derivative data generated based on the above data ?
In fact, it is , At the beginning of the project, there is no need for multi table aggregation query , In addition, in order to reduce the complexity of data migration , Therefore, we didn't choose the super table in the early stage of environment construction .
But with the continuous improvement of business , We will need more data to do more complex analysis , This leads to the second 5 Data type of bar —— A larger amount of level derivative data generated from the above data . So , This part of the data will come from our business to be launched later .
When the , We will use... In more depth TDengine Other core features of , Such as super watch 、 Many calculation functions and so on . But just for now ,TDengine We have been pleasantly surprised by the powerful storage capacity and fast query , Let's also look forward to further cooperation in the future .
About author :
Liu Jian , Master degree in pattern recognition from Beijing University of Aeronautics and Astronautics , Once worked for China Aerospace Science and technology group, engaged in software research and development .2014 In, I started a business with my friends and engaged in foreign exchange 、 futures 、 Stocks ETF Automatic trading has been . Focus on Data Mining 、 Automatic quantitative trading is carried out in the domestic secondary market by means of automatic pattern recognition .
边栏推荐
- Six stone programming: advantages of automated testing
- Cette ADB MySQL prend - elle en charge SQL Server?
- LeetCode 234. Palindrome linked list
- The database and recharge are gone
- How to make OS X read bash_ Profile instead of Profile file - how to make OS X to read bash_ profile not . profile file
- Linux安装Redis
- Vb+access hotel service management system
- Breaking the information cocoon - my method of actively obtaining information - 3
- ASP. Net core 6 framework unveiling example demonstration [01]: initial programming experience
- Spark SQL learning bullet 2
猜你喜欢
IPv6 experiment
腾讯云,实现图片上传
Utilisation simple de devtools
Elk log analysis system
Kbp206-asemi rectifier bridge kbp206
Tencent cloud, realize image upload
[105] Baidu brain map - Online mind mapping tool
Design and implementation of campus epidemic prevention and control system based on SSM
Machine learning experiment report 1 - linear model, decision tree, neural network part
1. Five layer network model
随机推荐
Delphi read / write JSON format
The latest blind box mall, which has been repaired very popular these days, has complete open source operation source code
Acwing game 58 [End]
Devtools的简单使用
Monitoring web performance with performance
Sqoop安装
2021 Li Hongyi machine learning (2): pytorch
[Yu Yue education] National Open University autumn 2018 8109-22t (1) monetary and banking reference questions
About MySQL database connection exceptions
Sqoop installation
001 chip test
Azkaban actual combat
Ask, does this ADB MySQL support sqlserver?
Linux Installation redis
[micro service SCG] 33 usages of filters
問下,這個ADB mysql支持sqlserver嗎?
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
Vb+access hotel service management system
Avoid material "minefields"! Play with super high conversion rate
Performance of calling delegates vs methods