当前位置:网站首页>Ml4 self study notes
Ml4 self study notes
2022-07-29 06:16:00 【19-year-old flower girl】
Decision tree algorithm
- Put the conditions with good differentiation effect on the front node . The tree model can do both classification and regression .

- Tree composition

- Decision tree training and testing

- How to segment feature points , What features are selected for each node , How to cut . Pass a measure .

- The measure : entropy . Look at the result of entropy after a division . I hope the entropy of each branch will decrease much better after classification .

- Information gain : Used to select features . The feature makes the entropy of classification based on this feature much smaller , That is, the degree of information gain is large , The better the features you choose .

Decision tree construction example
- data features The goal is

- First select the root node , Use information gain calculation .

- Calculate the entropy of raw data

- Yes 4 Features are used to calculate the entropy after division .

Calculate entropy by weighting according to the probability of category
The remaining entropy is also calculated in this way , Select the root node with high information gain ; Then select the next node in this way .
ID3,C4.5,CART,GINI coefficient

stay ID3 in , If there is a column id(1,2,3…n), And if so id Classify each group of data into one category , Is to determine the , Entropy is zero , Is the best classification result , But with id Classification is meaningless , Cannot pass id Number to determine the branch .
So the proposed information gain rate C4.5. Consider the entropy of itself .id There are too many categories , Cause its own entropy to be very large , Divide the entropy of itself by the classification number , It will make the information gain rate very small , It's solved ID3 The problem of , It is commonly used at present .
CART In order to GINI Coefficient to measure , Just look at the formula .
Continuous values in attribute values
If you score two , It is possible between every data , Calculate entropy against several possibilities , Choose the division method with the lowest entropy , This process is data discretization .
Decision tree pruning strategy
reason : Prevent over fitting ( There is only one sample on each leaf , Achieve 100%)
pruning strategy : pre-pruning 、 After pruning
X[2]<=2.45 With this feature <=2.45 Based on ,GINI The coefficient is calculated according to the formula ,sample: Total number of samples ,value: The number of each category 
- pre-pruning : Prune while building ( Limit the maximum depth of the tree 、 Limit the number of leaf nodes 、 Limit the number of leaf node samples and information gain )( These data are determined by experiments , How deep is it , How many leaf nodes are there )
- After pruning : Prune after building the decision tree
According to the formula ,C(T) Is the current point GINI coefficient *sample,|Tleaf| There are several branches from the current node . Take green circle calculation as an example , First, calculate if this node does not branch again Ca(T), Then calculate the... Of the following two leaf nodes respectively Ca(T) Add and compare with unbranched , The bigger, the worse .α The significance of the definition lies in whether the more emphasis is placed on entropy or the number of leaf nodes ,α The larger the size, the more attention should be paid to the number of leaf nodes , That is, we have paid attention to fitting ,α Smaller means more attention is paid to the accuracy of results , Don't pay too much attention to fitting
How does the decision tree solve the problem
- Solve the problem of classification , It's simple , Just follow the branch , Finally, we get a class .
- The return question . The measure : We hope that the variance of each branch is the smallest , The final result is the average value of the sample labels in the current node ( Such as age ).
边栏推荐
猜你喜欢

Hal library learning notes-10 overview of Hal library peripheral driver framework

HAL库学习笔记- 8 串口通信之使用

FPGA based: moving target detection (supplementary simulation results, available)

【软件工程之美 - 专栏笔记】22 | 如何为项目做好技术选型?

6、 Pointer meter recognition based on deep learning key points

Migration learning robot visual domain adaptation with low rank reconstruction

华为云14天鸿蒙设备开发-Day5驱动子系统开发

基于msp430f2491的proteus仿真

京微齐力:基于HMEP060的OLED字符显示(及FUXI工程建立演示)

HAL库学习笔记-12 SPI
随机推荐
Transformer review + understanding
Model building in pytorch
Huawei cloud 14 day Hongmeng device development -day1 source code acquisition
NRF52832-QFAA 蓝牙无线芯片
Zero basics FPGA (5): counter of sequential logic circuit design (with introduction to breathing lamp experiment and simple combinational logic design)
【软件工程之美 - 专栏笔记】29 | 自动化测试:如何把Bug杀死在摇篮里?
【软件工程之美 - 专栏笔记】23 | 架构师:不想当架构师的程序员不是好程序员
Huawei cloud 14 day Hongmeng device development -day3 kernel development
STM8S003国产替代 DP32G003 32 位微控制器芯片
华为云14天鸿蒙设备开发-Day2编译框架
【软件工程之美 - 专栏笔记】“一问一答”第3期 | 18个软件开发常见问题解决策略
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)
Huawei cloud 14 days Hongmeng device development -day1 environment construction
【软件工程之美 - 专栏笔记】21 | 架构设计:普通程序员也能实现复杂系统?
基于STM32开源:磁流体蓝牙音箱(包含源码+PCB)
零基础学FPGA(五):时序逻辑电路设计之计数器(附有呼吸灯实验、简单组合逻辑设计介绍)
低功耗蓝牙5.0芯片nrf52832-QFAA
Torch. NN. Parameter() function understanding
智慧能源管理系统解决方案
Hal library learning notes-13 application of I2C and SPI