当前位置:网站首页>What is machine learning? (Fundamentals)
What is machine learning? (Fundamentals)
2022-06-25 20:44:00 【Chengshaoting】
Fundamentals of machine learning
The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples
[0,1,2,3] vector
Feature Engineering : Determine the effect of model prediction , The process of processing data
- feature extraction
- Feature conversion
- Dimension reduction
The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)
- Training set ( Training to get the model )
- Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above
Machine learning classification
Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
- Forecast house prices
- Forecast stock trends
- Forecast the company's sales
- Classification problem ( The target y Is the category value )
- Dichotomy and multiclassification
- Whether it's spam
- Whether the users are lost
- Dichotomy and multiclassification
- The return question ( The target y It's a continuous value )
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
- User segmentation ( User portrait )
- Dimension reduction
- clustering ( Birds of a feather flock together )
Machine learning workflow 【 a key 】
- get data
- Basic data processing ( Time consuming )
- Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
- Using machine learning algorithms ( Training models )
- Model to evaluate
Over fitting and under fitting
- Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
- Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters
KNN Algorithm
knn The algorithm is suitable for classification problems , Two classification
It can also be used to do regression problems
thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7
- KNN api usage
from sklearn.neighbors import KNeighborsClassifier
# Create classifier
knn_clf = KNeighborsClassifier(n_neighbors=6)
# model training fit()
knn_clf.fit(x,y)
# Model to predict predict()
knn_clf.predict(x1)
- Divide the data set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)
- Evaluation of classification models
# Computational accuracy
from sklearn.metrics import accuracy_score
# The way 1:
accuracy_score(y_test,y_predict)
- Normalization and standardization
effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .
It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】
normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]
Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1
from sklearn.preprocessing import StandardScaler,MinMaxScaler
# Create examples
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)
(8-1)/(11-1)=0.7
(61-1)/(101-1) = 0.6
- Grid search and cross validation
Purpose : Make the model more accurate and reliable ( Model tuning )
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- explain : Detailed search for the specified parameter value of the estimator
- Parameters :
- estimator: Estimator objects
- param_grid: Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv: Specify a few fold cross validation
- Method :
- fit: Input training data
- score: Accuracy rate
- Result analysis :
- best_score_: The best results in cross validation
- best_estimator_: The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation
from sklearn.model_selection import GridSearchCV
# Instantiate the predictor class
estimator = KNeighborsClassifier()
# Model selection and tuning —— Grid search and cross validation
# Prepare the hyper parameters to be adjusted
param_dict = {
"n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training
estimator.fit(x_train, y_train)
边栏推荐
- Exploration of advanced document editor design in online Era
- Splunk series: Splunk installation and deployment (I)
- Record the training process
- hashlib. Md5() function to filter out duplicate system files and remove them
- Cross project measurement is a good helper for CTOs and PMOS
- [distributed system design profile (1)] raft
- COMP9024
- Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part 2)
- An unusual interview question: why doesn't the database connection pool adopt IO multiplexing?
- MySQL installation tutorial
猜你喜欢
Attention to government and enterprise users! The worm prometei is spreading horizontally against the local area network
The secret of metaktv technology of sound network: 3D space sound effect + air attenuation + vocal blur
Web components series (11) -- realizing the reusability of mycard
MySQL installation tutorial

Intra domain information collection for intranet penetration
Exploration of advanced document editor design in online Era
Cross project measurement is a good helper for CTOs and PMOS
Tencent music knowledge map search practice
The live registration is hot to start | the first show of Apache dolphin scheduler meetup in 2022!
2022 oceanbase technical essay contest officially opened | come and release your force
随机推荐
Cloud native 04: use envoy + open policy agent as the pre agent
2022 oceanbase technical essay contest officially opened | come and release your force
A simple file searcher
2020-11-14-Alexnet
Instant aesthetics of the Centennial Olympic Games: beauty in the air, condensed in minutes and seconds - Alibaba cloud video cloud AI editorial department "cloud smart scissors"
Yolov4 reading notes (with mind map)! YOLOv4: Optimal Speed and Accuracy of Object Detection
Interface automation -md5 password encryption
Transunet reading notes
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
Understanding C language structure pointer
Png to NII
Redis thread level reentrant distributed lock (different unique IDs can be locked cyclically)
Web components series (11) -- realizing the reusability of mycard
NMS reduction box
laf. JS - open source cloud development framework (readme.md)
The secret of metaktv technology of sound network: 3D space sound effect + air attenuation + vocal blur
JS forest leaf node non recursive depth first postorder traversal
Teach you how to create and publish a packaged NPM component
This is a simple and cool way to make large screen chart linkage. Smartbi will teach you
2022 "gold, silver and four" is a must for job hopping. You must know 100 questions in 2022 intermediate and advanced Android interview to realize your big factory dream