当前位置:网站首页>Paper reading: deep forest / deep forest /gcforest
Paper reading: deep forest / deep forest /gcforest
2022-07-28 22:45:00 【Claire_ Shang】
Recently, this article was reported at the group meeting , Simply sort out your thoughts , By the way, when searching deep forests, you may see Deep Forest: Towards an Alternative to Deep Neural Networks, The contents of these two articles are basically the same , There are only a few small ones in narration Different .
Here's what I did ppt Contents of Li : Reference article :http://t.csdn.cn/iSKfj
primary coverage
The deep learning model is mainly based on Neural Network , That is, a multi-layer parameterized differentiable nonlinear module that can be trained by back-propagation . Explore the possibility of non differentiable modules to build depth models , Propose a deep learning model --gcforest(multi-Grained Cascade Forest)
characteristic :
(1) Few super parameters
(2) The complexity of the model can be automatically determined by means of data correlation
(3) The depth model can be implemented without using back propagation
Ask questions :
(1) Depth model =DNN? The depth model must be built with differentiable modules ?
(2) Is it possible to train a depth model that does not need back propagation ?
(3) Is it possible to make the depth model win the task , Like random forest ?
inspire 1: Integrated learning
In order to build a good integration model , Individual learning should be accurate and diverse .
Actions to improve model diversity :
(1) Data samples : Generate different data samples from different individuals
(2) Input characteristics : Different feature selection spanning tree models are different
(3) Learning parameters are different
(4) The output represents : Use different output representations for different individuals .
inspire 2:DNN

Advantages of depth model : Layer by layer ( chart 1) Feature transformation in the model Huge model complexity
deficiencies : There are many super parameters A lot of training data is needed The network architecture must be determined before training
The author believes that layer by layer processing is DNN The key to success , Pictured 1 Shown , With the deepening of the network level , Higher level abstract features will gradually appear .
Cascade forest structure
Use different kinds of trees to improve the diversity of the model

Each layer of the cascade forest in the figure includes two random forests ( black ) And two extreme random forests ( Blue ), Each forest contains 500 A tree .
The main differences between the two forests :
The sample space is different —— Random characteristic subspace / All sample data
The methods of splitting nodes are different —— The smallest Gini index / Pick one at random
Random forest and extreme random forest :http://t.csdn.cn/c5BZw
The class distribution of cascade forest estimation forms a class vector , Then connect with the original eigenvector , Enter to the next level .

Suppose there are three classes ; Each of the four forests will produce one 3D Class vectors ; therefore , The next level will get 12 individual (= 3 × 4) Enhanced features .
here , What I understand is that this class of distribution vector formation diagram shows the process of inputting eigenvectors into one of the cascaded forests , So a three-dimensional vector is generated on the right side of the above figure .


In order to reduce the risk of over fitting , The class vector generated by each forest passes k Fold cross validation produces .
Each instance will be used as training data k−1 Time , Then average the generated class vectors to get the final class vector , As the enhancement feature of the next stage of cascade .
After expanding a new cascade level , The performance of the whole cascade can be estimated on the verification set , If there is no significant performance gain , The training process will end ; therefore , The number of cascading levels can be automatically determined . namely gcForest Terminate the training at an appropriate time to adaptively determine the complexity of the model . This makes it suitable for training data of different scales , Not limited to large-scale training data .
Multi granularity scanning
Sliding window is used to scan original features

Connecting the above two steps is gcforest Flow chart of
The following figure , Suppose there is 3 Classes , And use... Separately 100 dimension 200 dimension 300 The window of dimension is in the original 400 Slide on the feature of dimension

Cascade cascade : Each cascade consists of multiple levels , Each level corresponds to a scanning granularity .

Super parameters and default settings
Boldface indicates the super parameter with great influence ,"?" It means that it needs to be adjusted according to different tasks .

Table 1 shows gcforest Fewer super parameters are required , Simpler structure .
performance

chart 6 Shows when the cascade level increases ,gcforest Performance trends .

chart 8 Larger models tend to provide better performance .
experimental result
Simply put two experiments , There are many in the article .

About the code
I haven't run the code well yet , But it has been downloaded , See a very detailed article on code composition .
The link is here :http://t.csdn.cn/lo3jX
Come back when I'm ready .
边栏推荐
- 770. 单词替换
- Leetcode question brushing series - sum of majority type
- 【三维目标检测】3DSSD(二)
- Padim [anomaly detection: embedded based]
- [connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly
- JVM——自定义类加载器
- Stm32subeide (10) -- ADC scans multiple channels in DMA mode
- [connect your mobile phone wirelessly] - debug your mobile device wirelessly via LAN
- 6K6w5LiA5qyh5pS75Ye75YiG5p6Q
- Summary of common error types in JS
猜你喜欢

CMD common commands

Lvs+keepalived high availability deployment practical application

STM32_ Hal library driven framework

How to install WiFi correctly

Qt+ffmpeg environment construction

Quadruped robot | gem (elevation map) + fast_ Deployment records of Leo (odometry) environment

imx6q gpio复用

842. 排列数字
![[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly](/img/ab/624e9a3240416f8445c908378310ad.png)
[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly

Bluetooth smart Bracelet system based on STM32 MCU
随机推荐
JS convert numbers to letters
770. 单词替换
WinForm jump to the second form case
20-09-27项目迁移到阿里折腾记录(网卡顺序导致服务无法通过haproxy连接到db)
Lvs+keepalived high availability deployment practical application
776. 字符串移位包含问题
How to use sprintf function
Mspba [anomaly detection: representation_based]
微信小程序里button点击的时候会边框有黑线
STM32 single chip microcomputer drive L298N
PaddleNLP基于ERNIR3.0文本分类:WOS数据集为例(层次分类)
771. 字符串中最长的连续出现的字符
MKD [anomaly detection: knowledge disruption]
C语言学习内容总结
Differernet [anomaly detection: normalizing flow]
Paddlenlp is based on ernir3.0 text classification. Take the traditional Chinese medicine search and retrieval semantic map classification (kuake-qic) as an example [multi classification (single label
OSV_ q AttributeError: ‘numpy. ndarray‘ object has no attribute ‘clone‘
Redis related
Ngrok intranet penetration
Concise history of graphic technology