当前位置:网站首页>Machine learning process and method
Machine learning process and method
2022-07-03 02:12:00 【Jieyou tree hole network】
Machine learning process and method
| Problem modeling |
|---|
| Feature Engineering |
| Model selection |
| Model fusion |
| Model online application |
| Problem modeling |
|---|
| 1、 Must be clear , What are my evaluation indicators ? |
| 2. Select the sample set |
| 3. Cross validation |
- What are the evaluation indicators ?
- P ,R ,PRC,ROC&AUC ,LOSS,mAP
- How to select a sample subset ?
- Stratified sampling
- How to cross verify ?
- k-fold
| Feature Engineering |
|---|
| 1.EDA |
| 2. Data cleaning ( Missing value , outliers ) And normalization , Continuous data discretization ( Points barrels ) |
| 3. feature selection |
- Feature Engineering First On data EDA( Exploratory data analysis )
- Learn about datasets General information , Ratio of missing values , The type of feature
- Box figure , Histogram , Stem and leaf , Correlation matrix heat map ,PCA Dimension reduction and so on
- EDA after , Understand the data , Also right data Conduct Cleaning and screening .
- data defect value
- data abnormal value
- Distinguishing features species :
- The number features
- Continuous feature
- normalization
- discretization
- Points barrels
- Discrete features
- Continuous feature
- Category features
- code
- Hot coding alone
- Count rank code
- ( Effective for both linear and nonlinear )
- ( The outliers are not sensitive )
- ( Feature values do not conflict , Ranking does not conflict )
- Natural coding
- Layered coding
- Postal Code , ID number
- code
- Time features
- Specific date
- Minutes and seconds
- Sunday system
- Is it a weekend 、 Month end 、 Whether quarter end 、 Whether it is business hours 、 Holidays, etc
- The last time distance … The time interval
- Space features
- GPS coordinate
- Country ID, City ID, Administrative region ID、 The street ID etc.
- Space distance
- Text features
- Regularization
- Alphanumerics are unified into alphanumerics of one language
- Corpus construction
- file : Description of the item
- Text cleaning
- Remove space , Punctuation, etc
- participle
- Part of speech tagging
- Verb
- Noun
- Adjective
- Semantic restoration
- Can express semantics
- 3-Gram Model
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- ABCDE => (ABC, BCD CDE)
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- Skip-Gram Model
- The word bag model
- First form a vector
- Each component of a vector represents , The frequency of words appearing in the document TF
- TF-IDF
- First put all the documents , Form a vocabulary ( The dictionary )
- For a document , In the dictionary key Form a vector , The value of the vector is tf*idf
- tf It's a local parameter , Express Words in text d Word frequency in
- idf It's a global parameter , by log( Total number of documents / The number of documents with this word )
- The word bag model
- Part of speech tagging
- Regularization
- The number features
- feature selection : Select the matching feature combination
- How to measure the quality of features ?
- Target label scalar and The characteristics of the distance / Similarity degree
- Distance and similarity indicators
- L-p norm distance ( length )
- Cosine similarity (cos)( angle )
- Pearson correlation coefficient ( comprehensive )
- Jaccard Similarity degree ( Number of set intersections / Number of union sets )
- Jaccard distance = 1- Jaccard Similarity degree
- Fisher score
- Mutual information KL(p(x,y) ||p(x)p(y))
- Hypothesis testing
- CFS Correlation feature selection
- Three kinds of methods for feature selection :( according to Whether the feature selection interacts with the machine learning algorithm )
- Filter Method
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- Feature selection and machine learning algorithms do not interact , It's independent . So it's simple and effective .
- Univariate filtering
- Just think about relevance , according to Relevance ranking , To filter out The least relevant feature
- Multivariable filtering
- Consider not only relevance , Also consider consistency ?
- CFS Relevant feature selection , Including the correlation of cross features
- MBF
- FCBF
- Consider not only relevance , Also consider consistency ?
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- encapsulation Method
- Complete set of features => | feature selection <=> Machine learning algorithm | => Model effect
- For possible feature subsets , Consider the suitability of machine learning algorithms , Use algorithms , Verification set To select the best feature subset .
- The embedded Method
- Complete set of features =>| feature selection <=> Machine learning algorithm + Model effect |
- direct The fusion Feature selection and machine learning algorithms , At the same time, evaluate the effect
- Cross validation is required
- Decision tree , Random forests , Gradient lifting tree ,SVM, Lasso
- Filter Method
- How to measure the quality of features ?
边栏推荐
- Depth (penetration) selector:: v-deep/deep/ and > > >
- COM和CN
- File class (check)
- 【Camera专题】手把手撸一份驱动 到 点亮Camera
- How to deal with cache hot key in redis
- 2022 spring "golden three silver four" job hopping prerequisites: Software Test interview questions (with answers)
- Learn BeanShell before you dare to say you know JMeter
- Anna: Beibei, can you draw?
- Kotlin middle process understanding and Practice (I)
- Rockchip3399 start auto load driver
猜你喜欢

A 30-year-old software tester, who has been unemployed for 4 months, is confused and doesn't know what to do?

Technology sharing | Frida's powerful ability to realize hook functions

【Camera专题】Camera dtsi 完全解析

udp接收队列以及多次初始化的测试

Comment communiquer avec Huawei Cloud IOT via le Protocole mqtt

Learn BeanShell before you dare to say you know JMeter

Redis: simple use of redis

微信小程序開發工具 POST net::ERR_PROXY_CONNECTION_FAILED 代理問題

Processing of tree structure data

Introduce in detail how to communicate with Huawei cloud IOT through mqtt protocol
随机推荐
DQL basic operation
Y54. Chapter III kubernetes from introduction to mastery -- ingress (27)
【CodeForces】CF1338A - Powered Addition【二进制】
stm32F407-------DMA
返回一个树形结构数据
CFdiv2-Fixed Point Guessing-(區間答案二分)
力扣(LeetCode)183. 从不订购的客户(2022.07.02)
各国Web3现状与未来
詳細些介紹如何通過MQTT協議和華為雲物聯網進行通信
stm32F407-------ADC
我的创作纪念日
LabVIEW安装第三方VISA软件后NI VISA失效
[camera topic] complete analysis of camera dtsi
What are MySQL locks and classifications
Hard core observation 547 large neural network may be beginning to become aware?
require.context
Leetcode 183 Customers who never order (2022.07.02)
How to deal with cache hot key in redis
[Yu Yue education] reference materials of love psychology of China University of mining and technology
iptables 4层转发