当前位置:网站首页>Machine learning process and method
Machine learning process and method
2022-07-03 02:12:00 【Jieyou tree hole network】
Machine learning process and method
| Problem modeling |
|---|
| Feature Engineering |
| Model selection |
| Model fusion |
| Model online application |
| Problem modeling |
|---|
| 1、 Must be clear , What are my evaluation indicators ? |
| 2. Select the sample set |
| 3. Cross validation |
- What are the evaluation indicators ?
- P ,R ,PRC,ROC&AUC ,LOSS,mAP
- How to select a sample subset ?
- Stratified sampling
- How to cross verify ?
- k-fold
| Feature Engineering |
|---|
| 1.EDA |
| 2. Data cleaning ( Missing value , outliers ) And normalization , Continuous data discretization ( Points barrels ) |
| 3. feature selection |
- Feature Engineering First On data EDA( Exploratory data analysis )
- Learn about datasets General information , Ratio of missing values , The type of feature
- Box figure , Histogram , Stem and leaf , Correlation matrix heat map ,PCA Dimension reduction and so on
- EDA after , Understand the data , Also right data Conduct Cleaning and screening .
- data defect value
- data abnormal value
- Distinguishing features species :
- The number features
- Continuous feature
- normalization
- discretization
- Points barrels
- Discrete features
- Continuous feature
- Category features
- code
- Hot coding alone
- Count rank code
- ( Effective for both linear and nonlinear )
- ( The outliers are not sensitive )
- ( Feature values do not conflict , Ranking does not conflict )
- Natural coding
- Layered coding
- Postal Code , ID number
- code
- Time features
- Specific date
- Minutes and seconds
- Sunday system
- Is it a weekend 、 Month end 、 Whether quarter end 、 Whether it is business hours 、 Holidays, etc
- The last time distance … The time interval
- Space features
- GPS coordinate
- Country ID, City ID, Administrative region ID、 The street ID etc.
- Space distance
- Text features
- Regularization
- Alphanumerics are unified into alphanumerics of one language
- Corpus construction
- file : Description of the item
- Text cleaning
- Remove space , Punctuation, etc
- participle
- Part of speech tagging
- Verb
- Noun
- Adjective
- Semantic restoration
- Can express semantics
- 3-Gram Model
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- ABCDE => (ABC, BCD CDE)
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- Skip-Gram Model
- The word bag model
- First form a vector
- Each component of a vector represents , The frequency of words appearing in the document TF
- TF-IDF
- First put all the documents , Form a vocabulary ( The dictionary )
- For a document , In the dictionary key Form a vector , The value of the vector is tf*idf
- tf It's a local parameter , Express Words in text d Word frequency in
- idf It's a global parameter , by log( Total number of documents / The number of documents with this word )
- The word bag model
- Part of speech tagging
- Regularization
- The number features
- feature selection : Select the matching feature combination
- How to measure the quality of features ?
- Target label scalar and The characteristics of the distance / Similarity degree
- Distance and similarity indicators
- L-p norm distance ( length )
- Cosine similarity (cos)( angle )
- Pearson correlation coefficient ( comprehensive )
- Jaccard Similarity degree ( Number of set intersections / Number of union sets )
- Jaccard distance = 1- Jaccard Similarity degree
- Fisher score
- Mutual information KL(p(x,y) ||p(x)p(y))
- Hypothesis testing
- CFS Correlation feature selection
- Three kinds of methods for feature selection :( according to Whether the feature selection interacts with the machine learning algorithm )
- Filter Method
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- Feature selection and machine learning algorithms do not interact , It's independent . So it's simple and effective .
- Univariate filtering
- Just think about relevance , according to Relevance ranking , To filter out The least relevant feature
- Multivariable filtering
- Consider not only relevance , Also consider consistency ?
- CFS Relevant feature selection , Including the correlation of cross features
- MBF
- FCBF
- Consider not only relevance , Also consider consistency ?
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- encapsulation Method
- Complete set of features => | feature selection <=> Machine learning algorithm | => Model effect
- For possible feature subsets , Consider the suitability of machine learning algorithms , Use algorithms , Verification set To select the best feature subset .
- The embedded Method
- Complete set of features =>| feature selection <=> Machine learning algorithm + Model effect |
- direct The fusion Feature selection and machine learning algorithms , At the same time, evaluate the effect
- Cross validation is required
- Decision tree , Random forests , Gradient lifting tree ,SVM, Lasso
- Filter Method
- How to measure the quality of features ?
边栏推荐
- easyPOI
- 单词单词单词
- Codeforces Round #418 (Div. 2) D. An overnight dance in discotheque
- Y54. Chapter III kubernetes from introduction to mastery -- ingress (27)
- Stm32f407 ------- IIC communication protocol
- require.context
- [shutter] bottom navigation bar implementation (bottomnavigationbar bottom navigation bar | bottomnavigationbaritem navigation bar entry | pageview)
- Leetcode 183 Customers who never order (2022.07.02)
- Groovy, "try with resources" construction alternative
- [camera topic] turn a drive to light up the camera
猜你喜欢

Bottleneck period must see: how can testers who have worked for 3-5 years avoid detours and break through smoothly

Coroutinecontext in kotlin

Ni visa fails after LabVIEW installs the third-party visa software
![[shutter] hero animation (hero realizes radial animation | hero component createrecttween setting)](/img/e7/915404743d6639ac359bb4e7f7fbb7.jpg)
[shutter] hero animation (hero realizes radial animation | hero component createrecttween setting)

详细些介绍如何通过MQTT协议和华为云物联网进行通信
![[leetcode] 797 and 1189 (basis of graph theory)](/img/2a/9c0a904151a17c2d23dea9ad04dbfe.jpg)
[leetcode] 797 and 1189 (basis of graph theory)
![[fluent] fluent debugging (debug debugging window | viewing mobile phone log information | setting normal breakpoints | setting expression breakpoints)](/img/ac/bf83f319ea787c5abd7ac3fabc9ede.jpg)
[fluent] fluent debugging (debug debugging window | viewing mobile phone log information | setting normal breakpoints | setting expression breakpoints)

机器学习笔记(持续更新中。。。)

SPI机制

What are the key points often asked in the redis interview
随机推荐
COM and cn
Kotlin middle process understanding and Practice (II)
How do browsers render pages?
Huakaiyun | virtual host: IP, subnet mask, gateway, default gateway
SPI mechanism
人脸识别6- face_recognition_py-基于OpenCV使用Haar级联与dlib库进行人脸检测及实时跟踪
Flink CDC mongoDB 使用及Flink sql解析monggo中复杂嵌套JSON数据实现
Anna: Beibei, can you draw?
Comment le chef de file gère - t - il l'équipe en cas d'épidémie? Contributions communautaires
In the face of difficult SQL requirements, HQL is not afraid
Socket编程
Answers to ten questions about automated testing software testers must see
Y54. Chapter III kubernetes from introduction to mastery -- ingress (27)
Hard core observation 547 large neural network may be beginning to become aware?
[fluent] hero animation (hero animation use process | create hero animation core components | create source page | create destination page | page Jump)
Internal connection query and external connection
How do it students find short-term internships? Which is better, short-term internship or long-term internship?
The technology boss is ready, and the topic of position C is up to you
CFdiv2-Fixed Point Guessing-(區間答案二分)
[shutter] bottom navigation bar implementation (bottomnavigationbar bottom navigation bar | bottomnavigationbaritem navigation bar entry | pageview)