当前位置:网站首页>Common characteristic engineering operations
Common characteristic engineering operations
2022-07-29 05:38:00 【Harmful Poems】
Common engineering features include :
exception handling :
1. Through the box diagram ( or 3-Sigma) Analyze and delete outliers ;
2.BOX-COX transformation ( Dealing with biased distribution );
3. Long tail truncation ;
Feature normalization / Standardization :
1. Standardization ( Convert to standard normal distribution );
2. normalization ( Catch and change to [0,1] Section );
3. For power-law distribution , transformation
The data is divided into buckets :
Equal frequency bucket ;
Equidistant barrel ;
Best-KS Points barrels ( Similar to the use of Gini index for two categories );
Chi square is divided into barrels ;
Missing value processing :
Don't deal with ( For similar XGBoost Tree models, etc );
Delete ( Too much missing data );
Interpolation completion , Including the mean / Median / The number of / Modeling predictions / Multiple imputation / Compressed sensing completion / Matrix complement ;
Separate boxes , Missing value a box ;
Characteristic structure :
Construct statistical features , Report count 、 Sum up 、 The proportion 、 Standard deviation ;
Time characteristics , Including relative time and absolute time , The holiday season , Weekends, etc ;
Geographic Information , Including sub boxes , Distributed coding and other methods ;
Nonlinear transformation , Include log/ square / Root sign, etc ;
Feature combination , Feature crossover ;
opinion , Wise men see wisdom. .
Feature screening
Filter type (filter): First, select the features of the data , Then train the learner , Common methods are Relief/ Variance selection / Correlation coefficient method / Chi square test / Mutual information method ;
Parcel type (wrapper): Directly take the performance of the learner to be used as the evaluation criterion of feature subset , Common methods are LVM(Las Vegas Wrapper)
; The embedded (embedding): A combination of filter and wrap , The feature selection is carried out automatically in the process of learner training , Common are lasso Return to ;
Dimension reduction
PCA/ LDA/ ICA;
Feature selection is also a dimension reduction
边栏推荐
- [C language series] - constants and variables that confuse students
- 【C语言系列】— 把同学弄糊涂的 “常量” 与 “变量”
- 力扣994:腐烂的橘子(BFS)
- 365 day challenge leetcode1000 question - day 036 binary tree pruning + subarray and sorted interval sum + delete the shortest subarray to order the remaining arrays
- [event preview] cloud digital factory and digital transformation and innovation forum for small and medium-sized enterprises
- C language first level pointer
- Do students in the science class really understand the future career planning?
- ClickHouse学习(十)监控运行指标
- Day 3
- H5语义化标签
猜你喜欢

微信小程序-组件传参,状态管理

Clickhouse learning (VI) grammar optimization

【TypeScript】深入学习TypeScript函数

HCIA-R&S自用笔记(24)ACL

Day 2

ClickHouse学习(十一)clickhouseAPI操作

Basic concepts of MySQL + database system structure + extended application + basic command learning

Day 3

Installation steps and environment configuration of vs Code

省市区三级联动(简单又完美)
随机推荐
[C language series] - realize the exchange of two numbers without creating the third variable
With cloud simulation platform, Shichuang technology supports the upgrading of "China smart manufacturing"
实现简单的数据库查询(不完整)
167. Sum of two numbers II - enter an ordered array
uniapp之常用提示弹框
Application of Huffman tree and Huffman coding in file compression
uniapp页面标题显示效果
Bubble sort c language
2022 mathematical modeling competition summer training lecture - optimization method: goal planning
paddle. Fluid constant calculation error 'nonetype' object has no attribute 'get_ fetch_ list‘
Best practices for elastic computing in the game industry
Introduction to C language array to proficiency (array elaboration)
Talking about Servlet
[C language series] - storage of deep anatomical data in memory (II) - floating point type
关于局部变量
微信小程序更改属性值-setData-双向绑定-model
力扣994:腐烂的橘子(BFS)
Selenium实战案例之爬取js加密数据
Clickhouse learning (x) monitoring operation indicators
Clickhouse learning (VIII) materialized view