当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- 2022 high altitude installation, maintenance and removal examination questions and answers
- 分享一个因子挖掘的利器:遗传规划
- 高通平台 Camera 之 MCLK 配置
- 短视频本地生活版块成为热门,如何把握新的风口机遇?
- 2022年安全员-B证考试题库及答案
- metaRTC5.0编程之p2p网络穿透(stun)指南
- Disable right-click, keyboard open console events
- Excel将一行的内容进行复制时,列与列之间是用制表符“\t”进行分隔的
- lotus v1.16.0 calibnet
- 吴恩达深度学习测验题:deeplearning.ai-week1-quiz
猜你喜欢

Sqlmap tool user manual

PCR/qPCR研究:Lumiprobe丨dsGreen 用于实时 PCR

分享一个因子挖掘的利器:遗传规划

Keil C51的Data Overlaying机制导致的函数重入问题

metaRTC5.0编程之p2p网络穿透(stun)指南

wordpress zibll子比主题6.4.1开心版 免授权

二级造价工程师考试还没完?还有资格审核规定!

Have you finished the examination of level II cost engineer? There are also qualification regulations!

What is the difference between AC and DC?

Pcr/qpcr research: lumiprobe dsgreen is used for real-time PCR
随机推荐
Leetcode 88: merge two ordered arrays
metaRTC5.0 API编程指南(一)
Interview: what are the similarities and differences between abstract classes and interfaces?
Gorm transaction experience
学习太极创客 — MQTT 第二章(六)MQTT 遗嘱
现代交换原理MOOC部分题目整理
2022年材料员-通用基础(材料员)操作证考试题库及答案
Don't roll! How to reproduce a paper with high quality?
如何学习可编程逻辑控制器(PLC)?
gsap的简单用法
Feign通过自定义注解实现路径的转义
Based on the order flow tool, what can we see?
109. simple chat room 12: realize client-side one-to-one chat
OpenSSL client programming: SSL session failure caused by an obscure function
BioVendor sRAGE蛋白解决方案
DPDK 源码测试时性能下降问题
2022年安全员-B证考试题库及答案
2022电力电缆判断题模拟考试平台操作
并发之wait/notify说明
2022年低压电工考题及答案