当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- 乔布斯在斯坦福大学的演讲稿——Follow your heart
- Lumiprobe cell imaging analysis: PKH26 cell membrane labeling kit
- Feign implements path escape through custom annotations
- [skywalking] learn distributed link tracking skywalking at one go
- 使用class toplevel的messagebox时,窗口弹出问题。
- Steve Jobs' speech at Stanford University -- follow your heart
- 【微服务|OpenFeign】OpenFeign快速入门|基于Feign的服务调用
- BioVendor sRAGE蛋白解决方案
- 【JVM系列】JVM调优
- [JVM series] JVM tuning
猜你喜欢

学习太极创客 — MQTT 第二章(五)心跳机制

分享|智慧环保-生态文明信息化解决方案(附PDF)

店铺进销存管理系统源码

学习太极创客 — MQTT 第二章(四)ESP8266 保留消息应用
![[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer](/img/1f/becda82f3136678c58dd8ed7bec8fe.png)
[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer

JS text box loses focus to modify width text and symbols

A guide to P2P network penetration (stun) for metartc5.0 programming

学习太极创客 — MQTT 第二章(六)MQTT 遗嘱

cgo+gSoap+onvif学习总结:8、arm平台交叉编译运行及常见问题总结

openssl客户端编程:一个不起眼的函数导致的SSL会话失败问题
随机推荐
C语言中函数是什么?编程中的函数与数学中的函数区别?理解编程语言中的函数
Wedding studio portal applet based on wechat applet
2022年G3锅炉水处理复训题库模拟考试平台操作
信息学奥赛一本通 1360:奇怪的电梯(lift)
It is the latest weapon to cross the blockade. It is one of the fastest ladders.
Store inventory management system source code
2022年安全员-A证考试题库及模拟考试
Binary sort tree: BST
Lumiprobe细胞成像分析:PKH26 细胞膜标记试剂盒
What is the difference between AC and DC?
2022 low voltage electrician examination questions and answers
cgo+gSoap+onvif学习总结:8、arm平台交叉编译运行及常见问题总结
Qcom LCD commissioning
公司为什么选择云数据库?它的魅力到底是什么!
Learning Tai Chi Maker - mqtt Chapter 2 (V) heartbeat mechanism
Deeplearning ai-week1-quiz
Learn Taiji Maker - mqtt Chapter 2 (IV) esp8266 reserved message application
Organize the online cake mall project
[microservices openfeign] openfeign quick start service invocation based on feign
Interview: what are the similarities and differences between abstract classes and interfaces?