当前位置:网站首页>Ml4 self study notes
Ml4 self study notes
2022-07-29 06:16:00 【19-year-old flower girl】
Decision tree algorithm
- Put the conditions with good differentiation effect on the front node . The tree model can do both classification and regression .

- Tree composition

- Decision tree training and testing

- How to segment feature points , What features are selected for each node , How to cut . Pass a measure .

- The measure : entropy . Look at the result of entropy after a division . I hope the entropy of each branch will decrease much better after classification .

- Information gain : Used to select features . The feature makes the entropy of classification based on this feature much smaller , That is, the degree of information gain is large , The better the features you choose .

Decision tree construction example
- data features The goal is

- First select the root node , Use information gain calculation .

- Calculate the entropy of raw data

- Yes 4 Features are used to calculate the entropy after division .

Calculate entropy by weighting according to the probability of category
The remaining entropy is also calculated in this way , Select the root node with high information gain ; Then select the next node in this way .
ID3,C4.5,CART,GINI coefficient

stay ID3 in , If there is a column id(1,2,3…n), And if so id Classify each group of data into one category , Is to determine the , Entropy is zero , Is the best classification result , But with id Classification is meaningless , Cannot pass id Number to determine the branch .
So the proposed information gain rate C4.5. Consider the entropy of itself .id There are too many categories , Cause its own entropy to be very large , Divide the entropy of itself by the classification number , It will make the information gain rate very small , It's solved ID3 The problem of , It is commonly used at present .
CART In order to GINI Coefficient to measure , Just look at the formula .
Continuous values in attribute values
If you score two , It is possible between every data , Calculate entropy against several possibilities , Choose the division method with the lowest entropy , This process is data discretization .
Decision tree pruning strategy
reason : Prevent over fitting ( There is only one sample on each leaf , Achieve 100%)
pruning strategy : pre-pruning 、 After pruning
X[2]<=2.45 With this feature <=2.45 Based on ,GINI The coefficient is calculated according to the formula ,sample: Total number of samples ,value: The number of each category 
- pre-pruning : Prune while building ( Limit the maximum depth of the tree 、 Limit the number of leaf nodes 、 Limit the number of leaf node samples and information gain )( These data are determined by experiments , How deep is it , How many leaf nodes are there )
- After pruning : Prune after building the decision tree
According to the formula ,C(T) Is the current point GINI coefficient *sample,|Tleaf| There are several branches from the current node . Take green circle calculation as an example , First, calculate if this node does not branch again Ca(T), Then calculate the... Of the following two leaf nodes respectively Ca(T) Add and compare with unbranched , The bigger, the worse .α The significance of the definition lies in whether the more emphasis is placed on entropy or the number of leaf nodes ,α The larger the size, the more attention should be paid to the number of leaf nodes , That is, we have paid attention to fitting ,α Smaller means more attention is paid to the accuracy of results , Don't pay too much attention to fitting
How does the decision tree solve the problem
- Solve the problem of classification , It's simple , Just follow the branch , Finally, we get a class .
- The return question . The measure : We hope that the variance of each branch is the smallest , The final result is the average value of the sample labels in the current node ( Such as age ).
边栏推荐
- CS4344国产替代DP4344 192K 双通道 24 位 DA 转换器
- Hal library learning notes-11 I2C
- Design and implementation of QT learning notes data management system
- 倾角传感器用于通信铁塔、高压电塔长期监测
- Power electronics: single inverter design (matlab program +ad schematic diagram)
- STM32 检测信号频率
- 基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
- markdown与Typora
- ArduinoIDE + STM32Link烧录调试
- Reading papers on false news detection (I): fake news detection using semi supervised graph revolutionary network
猜你喜欢

华为云14天鸿蒙设备开发-Day5驱动子系统开发

ML4自学笔记

基于msp430f2491的proteus仿真(实现流水灯)

【软件工程之美 - 专栏笔记】29 | 自动化测试:如何把Bug杀死在摇篮里?

基于51单片机的DAC0832波形发生器

HAL库学习笔记-11 I2C

HR面必问问题——如何与HR斗志斗勇(收集于FPGA探索者)

2022春招——禾赛科技FPGA技术岗(一、二面,收集于:数字IC打工人及FPGA探索者)

Huawei cloud 14 day Hongmeng device development -day5 drive subsystem development

华为云14天鸿蒙设备开发-Day3内核开发
随机推荐
Hal library learning notes - 8 use of serial communication
智慧充电桩系统由什么组成?
Power electronics: single inverter design (matlab program +ad schematic diagram)
基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
智能温度控制系统
HAL库学习笔记-14 ADC和DAC
Model building in pytorch
【软件工程之美 - 专栏笔记】30 | 用好源代码管理工具,让你的协作更高效
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)
markdown与Typora
数论:px+py 不能表示的最大数为pq-p-q的证明
FPGA based: moving target detection (schematic + source code + hardware selection, available)
新能源充电桩后台管理系统平台
SimpleFOC调参1-力矩控制
5、 Image pixel statistics
SimpleFOC+PlatformIO踩坑之路
AttributeError: module ‘tensorflow‘ has no attribute ‘placeholder‘
华为云14天鸿蒙设备开发-Day7WIFI功能开发
HAL库学习笔记- 9 DMA
Hal library learning notes-11 I2C