当前位置:网站首页>Machine learning process and method
Machine learning process and method
2022-07-03 02:12:00 【Jieyou tree hole network】
Machine learning process and method
| Problem modeling |
|---|
| Feature Engineering |
| Model selection |
| Model fusion |
| Model online application |
| Problem modeling |
|---|
| 1、 Must be clear , What are my evaluation indicators ? |
| 2. Select the sample set |
| 3. Cross validation |
- What are the evaluation indicators ?
- P ,R ,PRC,ROC&AUC ,LOSS,mAP
- How to select a sample subset ?
- Stratified sampling
- How to cross verify ?
- k-fold
| Feature Engineering |
|---|
| 1.EDA |
| 2. Data cleaning ( Missing value , outliers ) And normalization , Continuous data discretization ( Points barrels ) |
| 3. feature selection |
- Feature Engineering First On data EDA( Exploratory data analysis )
- Learn about datasets General information , Ratio of missing values , The type of feature
- Box figure , Histogram , Stem and leaf , Correlation matrix heat map ,PCA Dimension reduction and so on
- EDA after , Understand the data , Also right data Conduct Cleaning and screening .
- data defect value
- data abnormal value
- Distinguishing features species :
- The number features
- Continuous feature
- normalization
- discretization
- Points barrels
- Discrete features
- Continuous feature
- Category features
- code
- Hot coding alone
- Count rank code
- ( Effective for both linear and nonlinear )
- ( The outliers are not sensitive )
- ( Feature values do not conflict , Ranking does not conflict )
- Natural coding
- Layered coding
- Postal Code , ID number
- code
- Time features
- Specific date
- Minutes and seconds
- Sunday system
- Is it a weekend 、 Month end 、 Whether quarter end 、 Whether it is business hours 、 Holidays, etc
- The last time distance … The time interval
- Space features
- GPS coordinate
- Country ID, City ID, Administrative region ID、 The street ID etc.
- Space distance
- Text features
- Regularization
- Alphanumerics are unified into alphanumerics of one language
- Corpus construction
- file : Description of the item
- Text cleaning
- Remove space , Punctuation, etc
- participle
- Part of speech tagging
- Verb
- Noun
- Adjective
- Semantic restoration
- Can express semantics
- 3-Gram Model
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- ABCDE => (ABC, BCD CDE)
- Convert text to a continuous sequence , Three consecutive words make up a sentence :
- Skip-Gram Model
- The word bag model
- First form a vector
- Each component of a vector represents , The frequency of words appearing in the document TF
- TF-IDF
- First put all the documents , Form a vocabulary ( The dictionary )
- For a document , In the dictionary key Form a vector , The value of the vector is tf*idf
- tf It's a local parameter , Express Words in text d Word frequency in
- idf It's a global parameter , by log( Total number of documents / The number of documents with this word )
- The word bag model
- Part of speech tagging
- Regularization
- The number features
- feature selection : Select the matching feature combination
- How to measure the quality of features ?
- Target label scalar and The characteristics of the distance / Similarity degree
- Distance and similarity indicators
- L-p norm distance ( length )
- Cosine similarity (cos)( angle )
- Pearson correlation coefficient ( comprehensive )
- Jaccard Similarity degree ( Number of set intersections / Number of union sets )
- Jaccard distance = 1- Jaccard Similarity degree
- Fisher score
- Mutual information KL(p(x,y) ||p(x)p(y))
- Hypothesis testing
- CFS Correlation feature selection
- Three kinds of methods for feature selection :( according to Whether the feature selection interacts with the machine learning algorithm )
- Filter Method
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- Feature selection and machine learning algorithms do not interact , It's independent . So it's simple and effective .
- Univariate filtering
- Just think about relevance , according to Relevance ranking , To filter out The least relevant feature
- Multivariable filtering
- Consider not only relevance , Also consider consistency ?
- CFS Relevant feature selection , Including the correlation of cross features
- MBF
- FCBF
- Consider not only relevance , Also consider consistency ?
- Complete set of features => feature selection => Machine learning algorithm => Model effect
- encapsulation Method
- Complete set of features => | feature selection <=> Machine learning algorithm | => Model effect
- For possible feature subsets , Consider the suitability of machine learning algorithms , Use algorithms , Verification set To select the best feature subset .
- The embedded Method
- Complete set of features =>| feature selection <=> Machine learning algorithm + Model effect |
- direct The fusion Feature selection and machine learning algorithms , At the same time, evaluate the effect
- Cross validation is required
- Decision tree , Random forests , Gradient lifting tree ,SVM, Lasso
- Filter Method
- How to measure the quality of features ?
边栏推荐
- Anna: Beibei, can you draw?
- 技术大佬准备就绪,话题C位由你决定
- Technology sharing | Frida's powerful ability to realize hook functions
- [camera topic] how to save OTP data in user-defined nodes
- In the face of difficult SQL requirements, HQL is not afraid
- Rockchip3399 start auto load driver
- y54.第三章 Kubernetes从入门到精通 -- ingress(二七)
- 机器学习笔记(持续更新中。。。)
- A 30-year-old software tester, who has been unemployed for 4 months, is confused and doesn't know what to do?
- 深度学习笔记(持续更新中。。。)
猜你喜欢
![[shutter] shutter debugging (debugging control related functions | breakpoint management | code operation control)](/img/fe/c053f8d116eb307733177283a26318.png)
[shutter] shutter debugging (debugging control related functions | breakpoint management | code operation control)

Technology sharing | Frida's powerful ability to realize hook functions

返回一个树形结构数据

Distributed transaction solution

MySQL learning 03

Detailed introduction to the usage of Nacos configuration center

《上市风云》荐书——唯勇气最可贵

His experience in choosing a startup company or a big Internet company may give you some inspiration
![[shutter] bottom navigation bar implementation (bottomnavigationbar bottom navigation bar | bottomnavigationbaritem navigation bar entry | pageview)](/img/41/2413af283e8f1db5d20ea845527175.gif)
[shutter] bottom navigation bar implementation (bottomnavigationbar bottom navigation bar | bottomnavigationbaritem navigation bar entry | pageview)

Processing of tree structure data
随机推荐
Hard core observation 547 large neural network may be beginning to become aware?
Redis: simple use of redis
How to find summer technical internship in junior year? Are you looking for a large company or a small company for technical internship?
Return the only different value (de duplication)
How do browsers render pages?
缺少库while loading shared libraries: libisl.so.15: cannot open shared object file: No such file
When the epidemic comes, how to manage the team as a leader| Community essay solicitation
Technology sharing | Frida's powerful ability to realize hook functions
Processing of tree structure data
RestCloud ETL 跨库数据聚合运算
COM and cn
Learn BeanShell before you dare to say you know JMeter
Basic operation of view
[Yu Yue education] reference materials of love psychology of China University of mining and technology
Ni visa fails after LabVIEW installs the third-party visa software
[Yu Yue education] China Ocean University job search OMG reference
Asian Games countdown! AI target detection helps host the Asian Games!
5.文件操作
2022 spring "golden three silver four" job hopping prerequisites: Software Test interview questions (with answers)
【Camera专题】Camera dtsi 完全解析