当前位置:网站首页>Ml4 self study notes
Ml4 self study notes
2022-07-29 06:16:00 【19-year-old flower girl】
Decision tree algorithm
- Put the conditions with good differentiation effect on the front node . The tree model can do both classification and regression .
- Tree composition
- Decision tree training and testing
- How to segment feature points , What features are selected for each node , How to cut . Pass a measure .
- The measure : entropy . Look at the result of entropy after a division . I hope the entropy of each branch will decrease much better after classification .
- Information gain : Used to select features . The feature makes the entropy of classification based on this feature much smaller , That is, the degree of information gain is large , The better the features you choose .
Decision tree construction example
- data features The goal is
- First select the root node , Use information gain calculation .
- Calculate the entropy of raw data
- Yes 4 Features are used to calculate the entropy after division .
Calculate entropy by weighting according to the probability of category
The remaining entropy is also calculated in this way , Select the root node with high information gain ; Then select the next node in this way .
ID3,C4.5,CART,GINI coefficient
stay ID3 in , If there is a column id(1,2,3…n), And if so id Classify each group of data into one category , Is to determine the , Entropy is zero , Is the best classification result , But with id Classification is meaningless , Cannot pass id Number to determine the branch .
So the proposed information gain rate C4.5. Consider the entropy of itself .id There are too many categories , Cause its own entropy to be very large , Divide the entropy of itself by the classification number , It will make the information gain rate very small , It's solved ID3 The problem of , It is commonly used at present .
CART In order to GINI Coefficient to measure , Just look at the formula .
Continuous values in attribute values
If you score two , It is possible between every data , Calculate entropy against several possibilities , Choose the division method with the lowest entropy , This process is data discretization .
Decision tree pruning strategy
reason : Prevent over fitting ( There is only one sample on each leaf , Achieve 100%)
pruning strategy : pre-pruning 、 After pruning
X[2]<=2.45 With this feature <=2.45 Based on ,GINI The coefficient is calculated according to the formula ,sample: Total number of samples ,value: The number of each category
- pre-pruning : Prune while building ( Limit the maximum depth of the tree 、 Limit the number of leaf nodes 、 Limit the number of leaf node samples and information gain )( These data are determined by experiments , How deep is it , How many leaf nodes are there )
- After pruning : Prune after building the decision tree
According to the formula ,C(T) Is the current point GINI coefficient *sample,|Tleaf| There are several branches from the current node . Take green circle calculation as an example , First, calculate if this node does not branch again Ca(T), Then calculate the... Of the following two leaf nodes respectively Ca(T) Add and compare with unbranched , The bigger, the worse .α The significance of the definition lies in whether the more emphasis is placed on entropy or the number of leaf nodes ,α The larger the size, the more attention should be paid to the number of leaf nodes , That is, we have paid attention to fitting ,α Smaller means more attention is paid to the accuracy of results , Don't pay too much attention to fitting
How does the decision tree solve the problem
- Solve the problem of classification , It's simple , Just follow the branch , Finally, we get a class .
- The return question . The measure : We hope that the variance of each branch is the smallest , The final result is the average value of the sample labels in the current node ( Such as age ).
边栏推荐
- 基于msp430f2491的proteus仿真(实现流水灯)
- 基于DAC0832的直流电机控制系统
- Review of neural network related knowledge (pytorch)
- Huawei cloud 14 day Hongmeng device development -day1 source code acquisition
- 2022 spring recruit - Hesai technology FPGA technology post (one or two sides, collected from: Digital IC workers and FPGA Explorers)
- 5、 Image pixel statistics
- NFC双向通讯13.56MHZ非接触式阅读器芯片--Si512替代PN512
- DP1332E 多协议高度集成非接触式读写芯片
- 2022春招——芯动科技FPGA岗技术面(一面心得)
- 华为云14天鸿蒙设备开发-Day7WIFI功能开发
猜你喜欢
Jingwei Qili: OLED character display based on hmep060 (and Fuxi project establishment demonstration)
智能货架安全监测系统
Based on STM32: couple interactive doll (design scheme + source code +3d drawing +ad circuit)
STM32 MDK(Keil5) Contents mismatch错误总结
基于51单片机的四路抢答器仿真
HAL库学习笔记- 8 串口通信之概念
基于msp430f2491的proteus仿真(实现流水灯)
STM32FF030 替代国产单片机——DP32G030
Review of neural network related knowledge (pytorch)
基于F407ZGT6的WS2812B彩灯驱动
随机推荐
Based on stc51: schematic diagram and source code of four axis flight control open source project (entry-level DIY)
SimpleFOC调参2-速度、位置控制
DP4301—SUB-1G高集成度无线收发芯片
DP1332E多协议高度集成非接触式读写芯片
HR must ask questions - how to fight with HR (collected from FPGA Explorer)
Huawei cloud 14 days Hongmeng device development -day1 environment construction
传统模型预测控制轨迹跟踪——波浪形轨迹(功能包已经更新)
Error importing Spacy module - oserror: [e941] can't find model 'en'
HAL库学习笔记-10 HAL库外设驱动框架概述
Torch. NN. Parameter() function understanding
CS4344国产替代DP4344 192K 双通道 24 位 DA 转换器
HAL学习笔记 - 7 定时器之基本定时器
【软件工程之美 - 专栏笔记】27 | 软件工程师的核心竞争力是什么?(上)
HAL库学习笔记- 8 串口通信之使用
TB6600+stm32F407测试
2022春招——芯动科技FPGA开发岗笔试题(原题以及心得)
Transfer learning
HAL学习笔记 - 7 定时器之高级定时器
ML4自学笔记
华为云14天鸿蒙设备开发-Day3内核开发