当前位置:网站首页>I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
2020-11-06 01:18:00 【itread01】
More exciting content , Welcome to the public account : Quantity technology house . Want to get the complete policy code shared in this issue , Please add wechat technology house :sljsz01
The price difference is calculated “ Misunderstandings ”
When we test the strategic signals generated by the mutual operation of two or more financial assets , Inevitably need to involve different price time series , Align with the timeline , Arbitrage is one of them . However , Most of them introduce arbitrage strategies 、 Statistical arbitrage articles , For the generating calculation of price difference series , It's very simple to handle , It's basically the subtraction of two time series . For lower frequency signals , This is not a big deal , But in the field of medium and high frequency signals , Direct subtraction , There will be some problems .
This is because , For different asset price series , There is an exchange push time 、 And the difference in arrival time . Even if we see two back testing Tick The timestamp of is exactly the same , When the real offer server receives the push quotation , Also according to the first 、 In the latter order . We found in the actual transaction that , For example, Shanghai futures exchange for a variety of different delivery month contracts , The push of stock exchange in slicing data is not simultaneous , It's pushed in the order of delivery months , For example, according to RB2010、RB2101、RB2015, Push in this order , The same is true for other varieties , And for the same 500ms Within the slicing time of , received RB2010、RB2101、RB2015 Of Tick The timestamp of the data , It's the same .
Another example is the cross exchange arbitrage of digital currency , Even if the two exchanges transmit at the same time Tick Information , Due to the physical location of the exchange server, the transmission time is different , The probability of arriving at our strategy signal computing server will be different .
A typical example of different frequency of price arrival
If the market information arrival time has priority , There will be a certain amount of price difference calculated by direct subtraction “ Lag ” or “ The future function ” The question is , The frequency of price arrival is different , Then we can't directly subtract the price difference . All in all , We need a more realistic spread calculation method .
Let's look at an example of different frequency of price arrival , In other words, the push frequency of the two varieties is different . If we need stock index futures 、 Stocks ETF Carry out the design of arbitrage strategy , With IC And Zhongzheng 500ETF For example , Calculate the spread of cash arbitrage .
IC Stock index futures Tick Information , Our sources are Wind,IC The corresponding CICC , Its market push frequency is every 1 second 2 Pen data ,Level1 Free market push is 1 The mouth of the gear plate , That is to say, only buy 1、 Sell 1 Information about , Data time is the trading time of stock index futures :9:29-15:00. Let's take a look at IC Of Tick Sample data .
Let's take a look at the Chinese securities certificate 500ETF Information about , It also comes from Wind,500ETF Market information push frequency comparison IC Much lower , Every time 3 There will be 1 Pen data ,Level1 Free market has 5 Plate mouth of gear , Buy it now 1 To buy 5、 Sell 1 To sell 5, Data push time :9:15-15:00, Include the call auction period of the stock . Let's take a look at 500ETF Of Tick Sample data .
Skillfully use Pandas Of Merge Function
For this push frequency is different 、 There are also different data on the timeline , Calculate the spread , We need to synthesize according to the time axis .Python Pandas Ku's Merge Function , Exactly what we need . Let's briefly introduce Merge Function .
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,left_index=False, right_index=False, sort=True,suffixes=('x', 'y'), copy=True, indicator=False,validate=None)
When we do data synthesis , The most commonly used one is before 4 Group arguments :
left: The left side of the splice DataFrame thing
right: Right side of the splice DataFrame thing
on: The name of the column or index level to be added . It has to be on the left and right DataFrame Found in the object , For financial time series , Generally speaking, it's a timeline
how: One of ‘left’, ‘right’, ‘outer’, ‘inner’, Presupposition inner.inner It's the intersection ,outer Take Union . such as left:[‘A’,‘B’,‘C’];right[’'A,‘C’,‘D’];inner If you take the intersection ,left That's what happened A Will be with right Buy one A Match and splice , If not, it's B, stay right There is no match in , You lose .'outer’ Take Union , What appears is A It will match one by one , If not, the missing value will be added to the missing part .
And this 4 Group arguments , The preprocessing of arbitrage spread calculation ,how Fields are the most important . We use actual data , Look at the difference how The value of the field , Will calculate the final price difference , What kind of impact .
First ,how = “inner”, Take the intersection of the timeline , There are only two tables DATETIME All of us have time , Will appear in the final summary table . Let's show you the calculated summary table , And calculate the price difference sequence and draw .
secondly ,how = “outer”, Take the union of time axes , Just two watches DATETIME Make any list. Some time , It's going to show up in the final matrix , If the other table has no data , Press nan Value padding .
Because of outer The way data is processed , There is a lot of nana value , We can't calculate the spread directly , The usual processing method is to forward fill null data , About to nan Values are filled with the nearest non null value , Recalculate the future ( Middle price ) The price difference , And draw .
Again ,how = 'left', Merge according to the left time axis . Click on the left table (IC) The timeline matches the right table one by one , The time axis of the left table is reserved , The right table has the time of , It is incorporated into the general table , There is no such time in the right table , With nan Instead of .
Also need to forward fill null data , Then we can calculate the future ( Middle price ) The price difference .
Finally ,how = 'right', Merge according to the time axis in the right table . Click on the right table (500ETF) The timeline of is matched with the left table one by one , The timeline in the right table is reserved , The left watch has the time of , It is incorporated into the general table , The left watch doesn't change the time , With nan Instead of .
Due to the frequency of futures information compared to stocks ETF Higher ,nan It mainly appears in the earlier stage of stock call auction than futures , This part nan Data may be deleted as appropriate .
We combine the charts drawn by different price spread calculation methods , You can see , Top left how="inner" The picture of , Points are the most sparse , Because both prices need to be available at the same time , To calculate the price difference ; And the top right how="outer" The picture of , Price differentials are the most intensive , Just one set of price changes , It will calculate 1 Second price difference , And the two pictures below how="left"、how="right", The intensity is in between .
The price difference is calculated in different ways , Bring about the difference in the way of strategy driven
The price difference is calculated in different ways , On the face of it Merge Function selection how The arguments to are different , The resulting price difference series results are different . But it's different how The choice of arguments , In fact, there are different strategies behind it 、 Strategic logic .
No matter in the backtesting of strategy , Treat market information , All need to adopt one kind of “ Event driven ” The way to test , This is the most close to the back test of real trading . Let's assume that historical data is also like a firm offer , Every time a new data is generated , Give it to us once , And every time we get a new piece of information , It's a new event , This event drives the subsequent strategy signal calculation , And the judgment of the opening and closing conditions corresponding to the signal .
Let's go back to different ways of calculating the spread , Its corresponding , In fact, it's the different driving strategies .
how=‘outer’: The corresponding is futures 、 The concurrent driving of stock market , As long as there are stocks 、 Any update of futures information , Our program updates the spread , To determine whether a trade signal is triggered , Now the signal is calculated and triggered , Most frequently .
how = 'left': The corresponding is the futures market single drive , That is, we don't care whether the stock market reaches , As long as the futures information is updated , The stock price spread is calculated by combining the latest information stored , And determine whether to trigger a trading signal .
how = 'right': A single drive corresponding to the stock market , That is, we don't care whether the futures market reaches or not , As long as the stock information is updated , Futures combine the latest information stored to calculate the spread , And determine whether to trigger a trading signal ,left and right How to trigger , The signal is not as good as outer Frequently .
how=‘inner’: The corresponding is futures 、 The stock market is driven both ways , We are generally back testing 、 We don't use this way in any firm offer , In the first section of this article , I introduced to you , It is basically impossible for the market to arrive at the same time , This drive is too idealistic , It will also reduce a lot of trading opportunities .
The driving mode that the firm offer should choose
To sum up , We're back testing 、 The alternative way to trade , It can be divided into two categories : Concurrent driver of two-way market 、 One way market driving . So , These two kinds of different driving methods , How to choose ?
The author according to the statistical arbitrage strategy of real trading experience , Here are some suggestions :
-
Two types of assets that calculate the spread , There is a clear distinction of activity 、 Subordination : For example, the near and far months of Futures ( The trading activity of contracts in recent months is usually greater than that in distant months )、 Futures arbitrage of stocks and stock index futures ( Stock index futures have a price discovery effect on stock spot ) etc. , This should be a time of active trading 、 Varieties with leading role , As the main driver , Driven by a single market .
-
Two types of assets that calculate the spread , There is no clear distinction 、 Subordination : For example, cross exchange arbitrage of digital currency (OKEX、 Arbitrage between fire currency exchanges , It's quite active , The relationship is equal ), Can adopt the concurrent drive of two-way market , To capture more trading opportunities .
-
Once the drive mode is determined , In data merging 、 Back testing 、 And the development of the real offer trading system , All need to be driven in the same way , In order to ensure the consistency between the back test results and the real offer transaction to the maximum extent .
If you share this Python Code is interested in , Welcome to add wechat :sljsz01, Communicate with me
Previous dry goods sharing recommended reading
【 Quantity technology house | Quantitative investment strategy series share 】 Futures position following strategy of mature traders
How to get free digital currency history information
【 Quantity technology house | Quantitative investment strategy series share 】 Multi period resonant trading strategy
【 Quantity technology house | Financial data analysis series sharing 】 Why is it that the evidence is in evidence 500(IC) Is the most suitable index for long-term long
It's not easy to get the spot information ? The seasonality of goods is hard to track ? One click solution to the trouble free Python Crawler sharing
【 Quantity technology house | Financial data analysis series sharing 】 How to copy bottom commodity futures correctly 、 Commodities
【 Quantity technology house | Quantitative investment strategy series share 】 Stock index futures IF Minute volatility statistics strategy
【 Quantity technology house | Python Crawler series sharing 】 Real time monitoring of major stock market announcements Python
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
边栏推荐
- htmlcss
- 怎么理解Python迭代器与生成器?
- Skywalking series blog 2-skywalking using
- Just now, I popularized two unique skills of login to Xuemei
- Process analysis of Python authentication mechanism based on JWT
- Programmer introspection checklist
- C++和C++程序员快要被市场淘汰了
- PLC模拟量输入和数字量输入是什么
- Query意图识别分析
- Microservices: how to solve the problem of link tracing
猜你喜欢
容联完成1.25亿美元F轮融资
Using Es5 to realize the class of ES6
连肝三个通宵,JVM77道高频面试题详细分析,就这?
It's so embarrassing, fans broke ten thousand, used for a year!
PN8162 20W PD快充芯片,PD快充充电器方案
How long does it take you to work out an object-oriented programming interview question from Ali school?
DevOps是什么
Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...
In order to save money, I learned PHP in one day!
随机推荐
你的财务报告该换个高级的套路了——财务分析驾驶舱
Examples of unconventional aggregation
数字城市响应相关国家政策大力发展数字孪生平台的建设
(1) ASP.NET Introduction to core3.1 Ocelot
OPTIMIZER_ Trace details
向北京集结!OpenI/O 2020启智开发者大会进入倒计时
Chainlink将美国选举结果带入区块链 - Everipedia
Polkadot series (2) -- detailed explanation of mixed consensus
GUI 引擎评价指标
2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...
Save the file directly to Google drive and download it back ten times faster
Deep understanding of common methods of JS array
“颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
Asp.Net Core學習筆記:入門篇
Programmer introspection checklist
PHP应用对接Justswap专用开发包【JustSwap.PHP】
After reading this article, I understand a lot of webpack scaffolding
制造和新的自动化技术是什么?
Introduction to Google software testing
Flink on paasta: yelp's new stream processing platform running on kubernetes