当前位置：网站首页>Scikit learn -- steps and understanding of machine learning application development

Scikit learn -- steps and understanding of machine learning application development

2022-07-29 05:14:00 【m0_ sixty-five million one hundred and eighty-seven thousand fo】

scikit-learn It's open source Python Language machine learning toolkit , It covers the implementation of almost all mainstream machine learning algorithms , And provides a consistent call interface . It's based on Numpy and scipy etc. Python Numerical calculation library , It provides an efficient algorithm implementation .

Catalog

1. Data collection and tagging

7. Model saving and loading

8. example

Data collection and tagging

feature selection

model training

Model test

Model saving and loading

1. Data collection and tagging

Collect data first , Then mark the data . Among them, the collected data should be representative , To ensure the accuracy of the final trained model .

2. feature selection

An intuitive way to select features ： Directly use each pixel of the image as a feature .

Data is saved as Number of samples × Characteristic number format Of array object .scikit-learn Use Numpy Of array Object to represent data , All image data is saved in digits.images in , Each element is a 8×8 Gray scale image of size .

3. Data cleaning

Take the collected 、 It is not suitable for preprocessing the data used for machine learning training , Thus, it can be converted into data suitable for machine learning .

Purpose ： Reduce computation , Ensure model stability .

4. Model selection

For different data sets , Choosing different models has different efficiency . Therefore, many factors should be considered in selecting the model , To improve the fit of the final selection model .

5. model training

Before model training , To divide the data set into Training data set and Test data set , Then use the divided data set for model training , Finally, we get the model parameters we trained .

6. Model test

Intuitive method of model testing ： Use the trained model to predict the test data set , Then compare the predicted results with the real results , The final result of comparison is the accuracy of the model .

scikit-learn The method provided to complete this work ：

clf . score ( Xtest , Ytest)

besides , You can also directly display some pictures in the test data set , And the predicted value is displayed in the lower left corner of the picture , The lower right corner shows the real value .

7. Model saving and loading

When we train a satisfactory model, we can save it , So when you need to predict next time , This model can be directly used to predict , There is no need to train the model again .

8. example

Data collection and tagging

# Import library 
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
"""
sk-learn There are some data sets in the Library 
 What is used here is the data of handwritten numeral recognition pictures 
"""
#  Import sklearn In the library datasets modular 
from sklearn import datasets
#  utilize datasets Functions in modules load_digits() Load data 
digits = datasets.load_digits()
#  Display the image represented by the data 
images_and_labels = list(zip(digits.images, digits.target))
plt.figure(figsize=(8, 6))
for index, (image, label) in enumerate(images_and_labels[:8]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Digit: %i' % label, fontsize=20);

feature selection

#  Save the data as   Number of samples x The number of features   Format array object   Data format for output 
#  The data has been saved in digits.data In file 
print("shape of raw image data: {0}".format(digits.images.shape))
print("shape of data: {0}".format(digits.data.shape))

model training

#  Divide the data into training data set and test data set （ Here, 20% of the data set is taken as the test data set ）
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(digits.data, digits.target, test_size=0.20, random_state=2);
 
#  Use support vector machine to train the model 
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100., probability=True)

#  Use training datasets Xtrain and Ytrain To train the model 
clf.fit(Xtrain, Ytrain);

Model test

"""
sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
normalize： The default value is True, Return the proportion of correct classification ; If False, Returns the number of correctly classified samples 
"""
 
#  Evaluate the accuracy of the model ( Default here is true, Directly return the correct proportion , That is, the accuracy of the model )
from sklearn.metrics import accuracy_score

# predict Is the prediction result returned after training , It's the tag value .
Ypred = clf.predict(Xtest);
accuracy_score(Ytest, Ypred)

Model saving and loading

"""
 Display some pictures in the test data set 
 The lower left corner of the picture shows the predicted value , The lower right corner shows the real value 
"""

#  Look at the forecast 
fig, axes = plt.subplots(4, 4, figsize=(8, 8))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
 
for i, ax in enumerate(axes.flat):
    ax.imshow(Xtest[i].reshape(8, 8), cmap=plt.cm.gray_r, interpolation='nearest')
    ax.text(0.05, 0.05, str(Ypred[i]), fontsize=32,
            transform=ax.transAxes,
            color='green' if Ypred[i] == Ytest[i] else 'red')
    ax.text(0.8, 0.05, str(Ytest[i]), fontsize=32,
            transform=ax.transAxes,
            color='black')
    ax.set_xticks([])
    ax.set_yticks([])

#  Save model parameters 
import joblib
joblib.dump(clf, 'digits_svm.pkl');

The following error occurred during saving model parameters ：

reason ：sklearn.externals.joblib Function is used in 0.21 And previous versions of , In the latest version , This function should be deprecated .

resolvent ： take from sklearn.externals import joblib Change it to import joblib

#  Import model parameters , Direct prediction 
clf = joblib.load('digits_svm.pkl')
Ypred = clf.predict(Xtest);
clf.score(Xtest, Ytest)

原网站

版权声明
本文为[m0_ sixty-five million one hundred and eighty-seven thousand fo]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290506333892.html

当前位置：网站首页>Scikit learn -- steps and understanding of machine learning application development

Scikit learn -- steps and understanding of machine learning application development

1. Data collection and tagging

2. feature selection

3. Data cleaning

4. Model selection

5. model training

6. Model test

7. Model saving and loading

8. example

Data collection and tagging

feature selection

model training

Model test

Model saving and loading

边栏推荐

猜你喜欢

随机推荐