当前位置：网站首页>What is machine learning? (Fundamentals)

What is machine learning? (Fundamentals)

2022-06-25 20:44:00 【Chengshaoting】

Fundamentals of machine learning

The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples

[0,1,2,3] vector

Feature Engineering : Determine the effect of model prediction , The process of processing data

feature extraction
Feature conversion
Dimension reduction

The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)

Training set ( Training to get the model )
Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above

Machine learning classification

Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
  - Forecast house prices
  - Forecast stock trends
  - Forecast the company's sales
- Classification problem ( The target y Is the category value )
  - Dichotomy and multiclassification
    - Whether it's spam
    - Whether the users are lost
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
  - User segmentation ( User portrait )
- Dimension reduction

Machine learning workflow 【 a key 】

get data
Basic data processing ( Time consuming )
Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
Using machine learning algorithms ( Training models )
Model to evaluate

Over fitting and under fitting

Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters

KNN Algorithm

knn The algorithm is suitable for classification problems , Two classification

It can also be used to do regression problems

thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7

KNN api usage

from sklearn.neighbors import KNeighborsClassifier

#  Create classifier 
knn_clf = KNeighborsClassifier(n_neighbors=6)

#  model training fit()
knn_clf.fit(x,y)

#  Model to predict predict()
knn_clf.predict(x1)

Divide the data set

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)

Evaluation of classification models

# Computational accuracy 
from sklearn.metrics import accuracy_score
# The way 1：
accuracy_score(y_test,y_predict)

Normalization and standardization

effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .

It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】

normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]

Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1

from sklearn.preprocessing import StandardScaler,MinMaxScaler

#  Create examples 
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)

(8-1)/(11-1)=0.7

(61-1)/(101-1) = 0.6

Grid search and cross validation

Purpose : Make the model more accurate and reliable ( Model tuning )

sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)

explain ： Detailed search for the specified parameter value of the estimator
Parameters ：
- estimator： Estimator objects
- param_grid： Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv： Specify a few fold cross validation
Method ：
- fit： Input training data
- score： Accuracy rate
Result analysis ：
- best_score_: The best results in cross validation
- best_estimator_： The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation

from sklearn.model_selection import GridSearchCV

# Instantiate the predictor class 
estimator = KNeighborsClassifier()
#  Model selection and tuning —— Grid search and cross validation 
#  Prepare the hyper parameters to be adjusted 
param_dict = {
    "n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training 
estimator.fit(x_train, y_train)

原网站

版权声明
本文为[Chengshaoting]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202181341042921.html

当前位置：网站首页>What is machine learning? (Fundamentals)

What is machine learning? (Fundamentals)

Fundamentals of machine learning

KNN Algorithm

边栏推荐

猜你喜欢

随机推荐