当前位置:网站首页>Common characteristic engineering operations
Common characteristic engineering operations
2022-07-29 05:38:00 【Harmful Poems】
Common engineering features include :
exception handling :
1. Through the box diagram ( or 3-Sigma) Analyze and delete outliers ;
2.BOX-COX transformation ( Dealing with biased distribution );
3. Long tail truncation ;
Feature normalization / Standardization :
1. Standardization ( Convert to standard normal distribution );
2. normalization ( Catch and change to [0,1] Section );
3. For power-law distribution , transformation
The data is divided into buckets :
Equal frequency bucket ;
Equidistant barrel ;
Best-KS Points barrels ( Similar to the use of Gini index for two categories );
Chi square is divided into barrels ;
Missing value processing :
Don't deal with ( For similar XGBoost Tree models, etc );
Delete ( Too much missing data );
Interpolation completion , Including the mean / Median / The number of / Modeling predictions / Multiple imputation / Compressed sensing completion / Matrix complement ;
Separate boxes , Missing value a box ;
Characteristic structure :
Construct statistical features , Report count 、 Sum up 、 The proportion 、 Standard deviation ;
Time characteristics , Including relative time and absolute time , The holiday season , Weekends, etc ;
Geographic Information , Including sub boxes , Distributed coding and other methods ;
Nonlinear transformation , Include log/ square / Root sign, etc ;
Feature combination , Feature crossover ;
opinion , Wise men see wisdom. .
Feature screening
Filter type (filter): First, select the features of the data , Then train the learner , Common methods are Relief/ Variance selection / Correlation coefficient method / Chi square test / Mutual information method ;
Parcel type (wrapper): Directly take the performance of the learner to be used as the evaluation criterion of feature subset , Common methods are LVM(Las Vegas Wrapper)
; The embedded (embedding): A combination of filter and wrap , The feature selection is carried out automatically in the process of learner training , Common are lasso Return to ;
Dimension reduction
PCA/ LDA/ ICA;
Feature selection is also a dimension reduction
边栏推荐
猜你喜欢

Flask 报错 RuntimeError: The session is unavailable because no secret key was set.

力扣994:腐烂的橘子(BFS)

Day 5
![Niuke network programming problem - [wy22 Fibonacci series] and [replace spaces] detailed explanation](/img/39/1d4fb1774b0f9f7c9bb13221f0d6c2.png)
Niuke network programming problem - [wy22 Fibonacci series] and [replace spaces] detailed explanation
![[sword finger offer] - explain the library function ATOI and simulate the realization of ATOI function](/img/13/a506861da2db8f5a5181e6d82894b3.png)
[sword finger offer] - explain the library function ATOI and simulate the realization of ATOI function

省市区三级联动(简单又完美)

Clickhouse learning (IV) SQL operation

uniapp组件之tab选项卡滑动切换

Installation steps and environment configuration of vs Code
![[C language series] - three methods to simulate the implementation of strlen library functions](/img/b2/00cd2b79adc23813088656ec3bc17e.png)
[C language series] - three methods to simulate the implementation of strlen library functions
随机推荐
Talking about Servlet
微信小程序更改属性值-setData-双向绑定-model
Solution: find the position of the first and last element in a sorted array (personal notes)
Detailed explanation of exit interrupt
·Let's introduce ourselves to the way of programming·
Side effects and sequence points
ClickHouse学习(三)表引擎
Clickhouse learning (IX) Clickhouse integrating MySQL
On Paradigm
Global components component registration
公众号不支持markdown格式文件编写怎么办?
Using POI TL to insert multiple pictures and the same data of multiple rows of cells into the table cells of word template at one time, it is a functional component for automatic merging
【TypeScript】TypeScript中类型缩小(含类型保护)与类型谓词
Terminal shell common commands
ClickHouse学习(七)表查询优化
第三课threejs全景预览房间案例
C language n queen problem
link与@import的关系
link与@import导入外部样式的区别
虚拟增强与现实第二篇 (我是一只火鸟)