当前位置:网站首页>Common characteristic engineering operations
Common characteristic engineering operations
2022-07-29 05:38:00 【Harmful Poems】
Common engineering features include :
exception handling :
1. Through the box diagram ( or 3-Sigma) Analyze and delete outliers ;
2.BOX-COX transformation ( Dealing with biased distribution );
3. Long tail truncation ;
Feature normalization / Standardization :
1. Standardization ( Convert to standard normal distribution );
2. normalization ( Catch and change to [0,1] Section );
3. For power-law distribution , transformation
The data is divided into buckets :
Equal frequency bucket ;
Equidistant barrel ;
Best-KS Points barrels ( Similar to the use of Gini index for two categories );
Chi square is divided into barrels ;
Missing value processing :
Don't deal with ( For similar XGBoost Tree models, etc );
Delete ( Too much missing data );
Interpolation completion , Including the mean / Median / The number of / Modeling predictions / Multiple imputation / Compressed sensing completion / Matrix complement ;
Separate boxes , Missing value a box ;
Characteristic structure :
Construct statistical features , Report count 、 Sum up 、 The proportion 、 Standard deviation ;
Time characteristics , Including relative time and absolute time , The holiday season , Weekends, etc ;
Geographic Information , Including sub boxes , Distributed coding and other methods ;
Nonlinear transformation , Include log/ square / Root sign, etc ;
Feature combination , Feature crossover ;
opinion , Wise men see wisdom. .
Feature screening
Filter type (filter): First, select the features of the data , Then train the learner , Common methods are Relief/ Variance selection / Correlation coefficient method / Chi square test / Mutual information method ;
Parcel type (wrapper): Directly take the performance of the learner to be used as the evaluation criterion of feature subset , Common methods are LVM(Las Vegas Wrapper)
; The embedded (embedding): A combination of filter and wrap , The feature selection is carried out automatically in the process of learner training , Common are lasso Return to ;
Dimension reduction
PCA/ LDA/ ICA;
Feature selection is also a dimension reduction
边栏推荐
- 【C语言系列】— 字符串+部分转义字符详解+注释小技巧
- table中同一列中合并相同项
- Pointer
- 微信小程序更改属性值-setData-双向绑定-model
- 移动端-flex项目属性
- Pyqt5: Chapter 1, Section 1: creating a user interface using QT components - Introduction
- C language first level pointer
- Preemptive appointment | Alibaba cloud shadowless cloud application online conference appointment opens
- Clickhouse learning (x) monitoring operation indicators
- 相对定位和绝对定位
猜你喜欢

ClickHouse学习(三)表引擎

Day 3
![[C language series] - three methods to simulate the implementation of strlen library functions](/img/b2/00cd2b79adc23813088656ec3bc17e.png)
[C language series] - three methods to simulate the implementation of strlen library functions

全局components组件注册

Best practices for elastic computing in the game industry
![[C language series] - constants and variables that confuse students](/img/24/1158034a5de413ea4ce160c5bfcbb4.png)
[C language series] - constants and variables that confuse students

虚拟增强与现实第二篇 (我是一只火鸟)

Day 5

【电子电路】ADC芯片如何选型

Installation steps and environment configuration of vs Code
随机推荐
rem与px与em异同点
Question swiping Madness - leetcode's sword finger offer58 - ii Detailed explanation of left rotation string
[sword finger offer] - explain the library function ATOI and simulate the realization of ATOI function
ClickHouse学习(六)语法优化
【C语言系列】— 把同学弄糊涂的 “常量” 与 “变量”
[C language series] - constants and variables that confuse students
Using POI TL to insert multiple pictures and the same data of multiple rows of cells into the table cells of word template at one time, it is a functional component for automatic merging
ClickHouse学习(十一)clickhouseAPI操作
link与@import的关系
Introduction to C language array to proficiency (array elaboration)
用sql-client.sh生成的job在cancle过后 如何实现断点续传?
[C language series] - print prime numbers between 100 and 200
Camunda 1. Camunda workflow - Introduction
Day 3
[C language series] - detailed explanation of file operation (Part 1)
Alibaba cloud and Dingjie software released the cloud digital factory solution to realize the localized deployment of cloud MES system
About local variables
2022 mathematical modeling competition summer training lecture - optimization method: goal planning
·来一篇编程之路的自我介绍吧·
PyQt5:第一章第1节:使用Qt组件创建一个用户界面-介绍