当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- 109. simple chat room 12: realize client-side one-to-one chat
- 2022年最新辽宁建筑八大员(标准员)考试试题及答案
- 电源插座是如何传输电的?困扰小伙伴这么多年的简单问题
- How to do a good job of gateway high availability protection in the big promotion scenario
- 如何学习可编程逻辑控制器(PLC)?
- Don't roll! How to reproduce a paper with high quality?
- 吴恩达深度学习测验题:deeplearning.ai-week1-quiz
- How high is the gold content of grade II cost engineer certificate? Just look at this
- 如何从零设计一款牛逼的高并发架构(建议收藏)
- 公司为什么选择云数据库?它的魅力到底是什么!
猜你喜欢

短视频本地生活版块成为热门,如何把握新的风口机遇?

Learning Tai Chi Maker - mqtt Chapter 2 (V) heartbeat mechanism

PCR/qPCR研究:Lumiprobe丨dsGreen 用于实时 PCR

【LeetCode】12、整数转罗马数字

开关电源电压型与电流型控制

!‘ Cat 'is not an internal or external command, nor is it a runnable program or batch file.

羧酸研究:Lumiprobe 磺基花青7二羧酸

A guide to P2P network penetration (stun) for metartc5.0 programming

学习太极创客 — MQTT 第二章(四)ESP8266 保留消息应用

Simulation questions and answers of the latest national fire-fighting facility operators (primary fire-fighting facility operators) in 2022
随机推荐
活性染料研究:Lumiprobe AF594 NHS 酯,5-异构体
Carboxylic acid study: lumiprobe sulfoacyanine 7 dicarboxylic acid
OpenSSL client programming: SSL session failure caused by an obscure function
程序员坐牢了,会被安排去写代码吗?
Is it enough for the project manager to finish the PMP? no, it isn't!
Operation of 2022 power cable judgment question simulation examination platform
Rxswift -- (1) create a project
Programmer - Shepherd
[leetcode] 12. Integer to Roman numeral
What are functions in C language? What is the difference between functions in programming and functions in mathematics? Understanding functions in programming languages
Gorm transaction experience
分享|智慧环保-生态文明信息化解决方案(附PDF)
Sorting out some topics of modern exchange principle MOOC
分享一个因子挖掘的利器:遗传规划
CPG 固体支持物研究:Lumiprobe通用 CPG II 型
Camera Basics
PCR/qPCR研究:Lumiprobe丨dsGreen 用于实时 PCR
MCLK configuration of Qualcomm platform camera
Wedding studio portal applet based on wechat applet
【无标题】drv8825步进电机驱动板子原理图