当前位置:网站首页>What is machine learning? (Fundamentals)
What is machine learning? (Fundamentals)
2022-06-25 20:44:00 【Chengshaoting】
Fundamentals of machine learning
The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples
[0,1,2,3] vector
Feature Engineering : Determine the effect of model prediction , The process of processing data
- feature extraction
- Feature conversion
- Dimension reduction
The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)
- Training set ( Training to get the model )
- Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above
Machine learning classification
Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
- Forecast house prices
- Forecast stock trends
- Forecast the company's sales
- Classification problem ( The target y Is the category value )
- Dichotomy and multiclassification
- Whether it's spam
- Whether the users are lost
- Dichotomy and multiclassification
- The return question ( The target y It's a continuous value )
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
- User segmentation ( User portrait )
- Dimension reduction
- clustering ( Birds of a feather flock together )
Machine learning workflow 【 a key 】
- get data
- Basic data processing ( Time consuming )
- Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
- Using machine learning algorithms ( Training models )
- Model to evaluate
Over fitting and under fitting
- Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
- Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters
KNN Algorithm
knn The algorithm is suitable for classification problems , Two classification
It can also be used to do regression problems
thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7
- KNN api usage
from sklearn.neighbors import KNeighborsClassifier
# Create classifier
knn_clf = KNeighborsClassifier(n_neighbors=6)
# model training fit()
knn_clf.fit(x,y)
# Model to predict predict()
knn_clf.predict(x1)
- Divide the data set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)
- Evaluation of classification models
# Computational accuracy
from sklearn.metrics import accuracy_score
# The way 1:
accuracy_score(y_test,y_predict)
- Normalization and standardization
effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .
It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】
normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]
Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1
from sklearn.preprocessing import StandardScaler,MinMaxScaler
# Create examples
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)
(8-1)/(11-1)=0.7
(61-1)/(101-1) = 0.6
- Grid search and cross validation
Purpose : Make the model more accurate and reliable ( Model tuning )
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- explain : Detailed search for the specified parameter value of the estimator
- Parameters :
- estimator: Estimator objects
- param_grid: Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv: Specify a few fold cross validation
- Method :
- fit: Input training data
- score: Accuracy rate
- Result analysis :
- best_score_: The best results in cross validation
- best_estimator_: The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation
from sklearn.model_selection import GridSearchCV
# Instantiate the predictor class
estimator = KNeighborsClassifier()
# Model selection and tuning —— Grid search and cross validation
# Prepare the hyper parameters to be adjusted
param_dict = {
"n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training
estimator.fit(x_train, y_train)
边栏推荐
- DICOM to NII
- What is the core journal of Peking University? An article will help you understand it thoroughly
- Sonar series: continuous scanning through Jenkins integrated sonarqube (IV)
- Solution to big noise of OBS screen recording software
- HMS core actively explores the function based on hardware ear return, helping to reduce the overall singing delay rate of the singing bar by 60%
- What are the differences between domestic advanced anti DDoS servers and overseas advanced anti DDoS servers?
- Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)
- Nnformer reading notes
- node. JS express connect mysql write webapi Foundation
- Interviewer: why does TCP shake hands three times and break up four times? Most people can't answer!
猜你喜欢
[distributed system design profile (1)] raft

JS canvas drawing an arrow with two hearts

Redis core principle and design idea

Leetcode daily question - 28 Implement strstr() (simple)

Connect the local browser to the laboratory server through mobaxterm

Cvpr2020 | the latest cvpr2020 papers are the first to see, with all download links attached!

SQL statement select summary
This is a simple and cool way to make large screen chart linkage. Smartbi will teach you
Cross project measurement is a good helper for CTOs and PMOS

Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
随机推荐
From URL to access page rendering
What are cookies in Web site development?
About eruake learning
Section 13: simplify your code with Lombok
Log4j2 vulnerability detection tool list
How does zhiting home cloud and home assistant access homekit respectively? What is the difference between them?
Transunet reading notes
Illustrated with pictures and texts, 700 pages of machine learning notes are popular! Worth learning
How to buy the millions of medical insurance for children? How much is it a year? Which product is the best?
2022 oceanbase technical essay contest officially opened | come and release your force
[untitled]
2021-08-25
R language momentum and Markowitz portfolio model implementation
Detailed explanation of unified monitoring function of multi cloud virtual machine
Connect the local browser to the laboratory server through mobaxterm
Flexible scale out: from file system to distributed file system
Modifying routes without refreshing the interface
Corporate finance formula_ P1_ Accounting statement and cash flow
Causes and solutions of unreliable JS timer execution
E-commerce project environment construction