当前位置:网站首页>Ml4 self study notes
Ml4 self study notes
2022-07-29 06:16:00 【19-year-old flower girl】
Decision tree algorithm
- Put the conditions with good differentiation effect on the front node . The tree model can do both classification and regression .

- Tree composition

- Decision tree training and testing

- How to segment feature points , What features are selected for each node , How to cut . Pass a measure .

- The measure : entropy . Look at the result of entropy after a division . I hope the entropy of each branch will decrease much better after classification .

- Information gain : Used to select features . The feature makes the entropy of classification based on this feature much smaller , That is, the degree of information gain is large , The better the features you choose .

Decision tree construction example
- data features The goal is

- First select the root node , Use information gain calculation .

- Calculate the entropy of raw data

- Yes 4 Features are used to calculate the entropy after division .

Calculate entropy by weighting according to the probability of category
The remaining entropy is also calculated in this way , Select the root node with high information gain ; Then select the next node in this way .
ID3,C4.5,CART,GINI coefficient

stay ID3 in , If there is a column id(1,2,3…n), And if so id Classify each group of data into one category , Is to determine the , Entropy is zero , Is the best classification result , But with id Classification is meaningless , Cannot pass id Number to determine the branch .
So the proposed information gain rate C4.5. Consider the entropy of itself .id There are too many categories , Cause its own entropy to be very large , Divide the entropy of itself by the classification number , It will make the information gain rate very small , It's solved ID3 The problem of , It is commonly used at present .
CART In order to GINI Coefficient to measure , Just look at the formula .
Continuous values in attribute values
If you score two , It is possible between every data , Calculate entropy against several possibilities , Choose the division method with the lowest entropy , This process is data discretization .
Decision tree pruning strategy
reason : Prevent over fitting ( There is only one sample on each leaf , Achieve 100%)
pruning strategy : pre-pruning 、 After pruning
X[2]<=2.45 With this feature <=2.45 Based on ,GINI The coefficient is calculated according to the formula ,sample: Total number of samples ,value: The number of each category 
- pre-pruning : Prune while building ( Limit the maximum depth of the tree 、 Limit the number of leaf nodes 、 Limit the number of leaf node samples and information gain )( These data are determined by experiments , How deep is it , How many leaf nodes are there )
- After pruning : Prune after building the decision tree
According to the formula ,C(T) Is the current point GINI coefficient *sample,|Tleaf| There are several branches from the current node . Take green circle calculation as an example , First, calculate if this node does not branch again Ca(T), Then calculate the... Of the following two leaf nodes respectively Ca(T) Add and compare with unbranched , The bigger, the worse .α The significance of the definition lies in whether the more emphasis is placed on entropy or the number of leaf nodes ,α The larger the size, the more attention should be paid to the number of leaf nodes , That is, we have paid attention to fitting ,α Smaller means more attention is paid to the accuracy of results , Don't pay too much attention to fitting
How does the decision tree solve the problem
- Solve the problem of classification , It's simple , Just follow the branch , Finally, we get a class .
- The return question . The measure : We hope that the variance of each branch is the smallest , The final result is the average value of the sample labels in the current node ( Such as age ).
边栏推荐
- 基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
- Design and implementation of QT learning notes data management system
- 兼容cc1101/cmt2300-DP4301 SUB-1G 无线收发芯片
- 6、 Pointer meter recognition based on deep learning key points
- Pit avoidance: about the interconnection of two hc-05 master-slave integrated Bluetooth modules, there is no connection problem
- HAL库学习笔记-10 HAL库外设驱动框架概述
- 【软件工程之美 - 专栏笔记】21 | 架构设计:普通程序员也能实现复杂系统?
- 低功耗蓝牙5.0芯片nrf52832-QFAA
- ML4自学笔记
- 新能源充电桩后台管理系统平台
猜你喜欢
随机推荐
Migration learning - geodesic flow kernel for unsupervised domain adaptation
京微齐力:基于HMEP060的OLED字符显示(及FUXI工程建立演示)
智慧充电桩系统由什么组成?
【软件工程之美 - 专栏笔记】20 | 如何应对让人头疼的需求变更问题?
数论:px+py 不能表示的最大数为pq-p-q的证明
Migration learning robot visual domain adaptation with low rank reconstruction
【软件工程之美 - 专栏笔记】25 | 有哪些方法可以提高开发效率?
【软件工程之美 - 专栏笔记】16 | 怎样才能写好项目文档?
Hal library learning notes-10 overview of Hal library peripheral driver framework
基于51单片机的直流电机调速系统(L298的使用)
【软件工程之美 - 专栏笔记】29 | 自动化测试:如何把Bug杀死在摇篮里?
Fasttext learning - text classification
Transformer review + understanding
华为云14天鸿蒙设备开发-Day5驱动子系统开发
智能温度控制系统
基于STM32开源:磁流体蓝牙音箱(包含源码+PCB)
Jingwei Qili: OLED character display based on hmep060 (and Fuxi project establishment demonstration)
2022 spring recruit - Shanghai an road FPGA post Manager (and Lexin SOC interview)
Low rank transfer subspace learning
FT232替代GP232RL USB-RS232转换器芯片国产化应用









