当前位置:网站首页>What is machine learning? (Fundamentals)
What is machine learning? (Fundamentals)
2022-06-25 20:44:00 【Chengshaoting】
Fundamentals of machine learning
The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples
[0,1,2,3] vector
Feature Engineering : Determine the effect of model prediction , The process of processing data
- feature extraction
- Feature conversion
- Dimension reduction
The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)
- Training set ( Training to get the model )
- Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above
Machine learning classification
Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
- Forecast house prices
- Forecast stock trends
- Forecast the company's sales
- Classification problem ( The target y Is the category value )
- Dichotomy and multiclassification
- Whether it's spam
- Whether the users are lost
- Dichotomy and multiclassification
- The return question ( The target y It's a continuous value )
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
- User segmentation ( User portrait )
- Dimension reduction
- clustering ( Birds of a feather flock together )
Machine learning workflow 【 a key 】
- get data
- Basic data processing ( Time consuming )
- Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
- Using machine learning algorithms ( Training models )
- Model to evaluate
Over fitting and under fitting
- Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
- Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters
KNN Algorithm
knn The algorithm is suitable for classification problems , Two classification
It can also be used to do regression problems
thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7
- KNN api usage
from sklearn.neighbors import KNeighborsClassifier
# Create classifier
knn_clf = KNeighborsClassifier(n_neighbors=6)
# model training fit()
knn_clf.fit(x,y)
# Model to predict predict()
knn_clf.predict(x1)
- Divide the data set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)
- Evaluation of classification models
# Computational accuracy
from sklearn.metrics import accuracy_score
# The way 1:
accuracy_score(y_test,y_predict)
- Normalization and standardization
effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .
It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】
normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]
Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1
from sklearn.preprocessing import StandardScaler,MinMaxScaler
# Create examples
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)
(8-1)/(11-1)=0.7
(61-1)/(101-1) = 0.6
- Grid search and cross validation
Purpose : Make the model more accurate and reliable ( Model tuning )
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- explain : Detailed search for the specified parameter value of the estimator
- Parameters :
- estimator: Estimator objects
- param_grid: Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv: Specify a few fold cross validation
- Method :
- fit: Input training data
- score: Accuracy rate
- Result analysis :
- best_score_: The best results in cross validation
- best_estimator_: The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation
from sklearn.model_selection import GridSearchCV
# Instantiate the predictor class
estimator = KNeighborsClassifier()
# Model selection and tuning —— Grid search and cross validation
# Prepare the hyper parameters to be adjusted
param_dict = {
"n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training
estimator.fit(x_train, y_train)
边栏推荐
- TypeError: __ init__ () takes 1 positional argument but 5 were given
- Swin UNET reading notes
- Paddledtx v1.0 has been released, and its security and flexibility have been comprehensively improved!
- Transunet reading notes
- What are the differences between domestic advanced anti DDoS servers and overseas advanced anti DDoS servers?
- Is it safe to open an account with a mobile phone? Where can I open an account to buy shares?
- Dice、Sensitivity、ppv、miou
- Leetcode daily [2022 - 02 - 18]
- Leetcode daily question - 27 Remove element (simple)
- laf. JS - open source cloud development framework (readme.md)
猜你喜欢

TypeError: __ init__ () takes 1 positional argument but 5 were given

Swin UNET reading notes
MySQL lock

After 20 days' interview, I finally joined Ali (share the interview process)
Detailed explanation of unified monitoring function of multi cloud virtual machine
2022 oceanbase technical essay contest officially opened | come and release your force
New generation engineers teach you how to play with alluxio + ml (Part 2)

PIP command -fatal error in launcher: unable to create process using How to resolve the error after migrating the virtual environment?

Redis core principle and design idea

Interface automation -md5 password encryption
随机推荐
Is it safe to open an account with a mobile phone? Where can I open an account to buy shares?
Measurement index SSMI
MySQL installation tutorial
What are cookies in Web site development?
Local variables and global variables in C language
2022年启牛学堂证券开户安全嘛?
2022 oceanbase technical essay contest officially opened | come and release your force
Clickhouse disables automatic clearing of tables / columns, that is, disables TTL
Interview records
Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part 2)
Leetcode daily question - 28 Implement strstr() (simple)
Leetcode daily question - 27 Remove element (simple)
Illustrated with pictures and texts, 700 pages of machine learning notes are popular! Worth learning
Detailed explanation of unified monitoring function of multi cloud virtual machine
6. exception handling
Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
Flexible scale out: from file system to distributed file system
SQL statement select summary
Transunet reading notes
Why doesn't anyone read your hard-working blog? Do you really understand the skills of framing, typesetting and drawing?