当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- Prove that there are infinite primes / primes
- Learning Tai Chi Maker - mqtt Chapter II (VI) mqtt wills
- MCLK configuration of Qualcomm platform camera
- What is the difference between AC and DC?
- DH parameters of robotics and derivation using MATLAB symbolic operation
- What are functions in C language? What is the difference between functions in programming and functions in mathematics? Understanding functions in programming languages
- Liuhaiping's mobile phone passes [[uiapplication sharedapplication] delegate] window. safeAreaInsets. The height of the bottom security zone is 0
- How long will the PMP test results come out? You must know this!
- 【无标题】drv8825步进电机驱动板子原理图
- 二级造价工程师考试还没完?还有资格审核规定!
猜你喜欢

羧酸研究:Lumiprobe 磺基花青7二羧酸

Lumiprobe细胞成像分析:PKH26 细胞膜标记试剂盒

基于订单流工具,我们能看到什么?

Simulation questions and answers of the latest national fire-fighting facility operators (primary fire-fighting facility operators) in 2022

短视频本地生活版块成为热门,如何把握新的风口机遇?
![[JVM series] JVM tuning](/img/e1/086f76ec6c9b56d97430b1e073f5a6.png)
[JVM series] JVM tuning

The short video local life section has become popular. How to grasp the new opportunities?

How to do a good job of gateway high availability protection in the big promotion scenario

MySQL export database dictionary to excel file

2022年低压电工考题及答案
随机推荐
How to do a good job of gateway high availability protection in the big promotion scenario
Docker安装Mysql5.7并开启binlog
How long will the PMP test results come out? You must know this!
gsap的简单用法
Prove that there are infinite primes / primes
JS text box loses focus to modify width text and symbols
Don't roll! How to reproduce a paper with high quality?
Unity delegate
【JVM系列】JVM调优
Operation of 2022 power cable judgment question simulation examination platform
2022 low voltage electrician examination questions and answers
禁用右击、键盘打开控制台事件
IP datagram sending and forwarding process
Simple usage of GSAP
openssl客户端编程:一个不起眼的函数导致的SSL会话失败问题
lotus v1.16.0 calibnet
Opencv实现颜色检测
Sqlmap tool user manual
!‘ Cat 'is not an internal or external command, nor is it a runnable program or batch file.
【微服务|OpenFeign】OpenFeign快速入门|基于Feign的服务调用