当前位置:网站首页>Scikit learn -- steps and understanding of machine learning application development
Scikit learn -- steps and understanding of machine learning application development
2022-07-29 05:14:00 【m0_ sixty-five million one hundred and eighty-seven thousand fo】
scikit-learn It's open source Python Language machine learning toolkit , It covers the implementation of almost all mainstream machine learning algorithms , And provides a consistent call interface . It's based on Numpy and scipy etc. Python Numerical calculation library , It provides an efficient algorithm implementation .
Catalog
1. Data collection and tagging
1. Data collection and tagging
Collect data first , Then mark the data . Among them, the collected data should be representative , To ensure the accuracy of the final trained model .
2. feature selection
An intuitive way to select features : Directly use each pixel of the image as a feature .
Data is saved as Number of samples × Characteristic number format Of array object .scikit-learn Use Numpy Of array Object to represent data , All image data is saved in digits.images in , Each element is a 8×8 Gray scale image of size .
3. Data cleaning
Take the collected 、 It is not suitable for preprocessing the data used for machine learning training , Thus, it can be converted into data suitable for machine learning .
Purpose : Reduce computation , Ensure model stability .
4. Model selection
For different data sets , Choosing different models has different efficiency . Therefore, many factors should be considered in selecting the model , To improve the fit of the final selection model .
5. model training
Before model training , To divide the data set into Training data set and Test data set , Then use the divided data set for model training , Finally, we get the model parameters we trained .
6. Model test
Intuitive method of model testing : Use the trained model to predict the test data set , Then compare the predicted results with the real results , The final result of comparison is the accuracy of the model .
scikit-learn The method provided to complete this work :
clf . score ( Xtest , Ytest)
besides , You can also directly display some pictures in the test data set , And the predicted value is displayed in the lower left corner of the picture , The lower right corner shows the real value .
7. Model saving and loading
When we train a satisfactory model, we can save it , So when you need to predict next time , This model can be directly used to predict , There is no need to train the model again .
8. example
Data collection and tagging
# Import library
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
"""
sk-learn There are some data sets in the Library
What is used here is the data of handwritten numeral recognition pictures
"""
# Import sklearn In the library datasets modular
from sklearn import datasets
# utilize datasets Functions in modules load_digits() Load data
digits = datasets.load_digits()
# Display the image represented by the data
images_and_labels = list(zip(digits.images, digits.target))
plt.figure(figsize=(8, 6))
for index, (image, label) in enumerate(images_and_labels[:8]):
plt.subplot(2, 4, index + 1)
plt.axis('off')
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Digit: %i' % label, fontsize=20);
feature selection
# Save the data as Number of samples x The number of features Format array object Data format for output
# The data has been saved in digits.data In file
print("shape of raw image data: {0}".format(digits.images.shape))
print("shape of data: {0}".format(digits.data.shape))
model training
# Divide the data into training data set and test data set ( Here, 20% of the data set is taken as the test data set )
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(digits.data, digits.target, test_size=0.20, random_state=2);
# Use support vector machine to train the model
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100., probability=True)
# Use training datasets Xtrain and Ytrain To train the model
clf.fit(Xtrain, Ytrain);
Model test
"""
sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
normalize: The default value is True, Return the proportion of correct classification ; If False, Returns the number of correctly classified samples
"""
# Evaluate the accuracy of the model ( Default here is true, Directly return the correct proportion , That is, the accuracy of the model )
from sklearn.metrics import accuracy_score
# predict Is the prediction result returned after training , It's the tag value .
Ypred = clf.predict(Xtest);
accuracy_score(Ytest, Ypred)
Model saving and loading
"""
Display some pictures in the test data set
The lower left corner of the picture shows the predicted value , The lower right corner shows the real value
"""
# Look at the forecast
fig, axes = plt.subplots(4, 4, figsize=(8, 8))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for i, ax in enumerate(axes.flat):
ax.imshow(Xtest[i].reshape(8, 8), cmap=plt.cm.gray_r, interpolation='nearest')
ax.text(0.05, 0.05, str(Ypred[i]), fontsize=32,
transform=ax.transAxes,
color='green' if Ypred[i] == Ytest[i] else 'red')
ax.text(0.8, 0.05, str(Ytest[i]), fontsize=32,
transform=ax.transAxes,
color='black')
ax.set_xticks([])
ax.set_yticks([])
# Save model parameters
import joblib
joblib.dump(clf, 'digits_svm.pkl');
The following error occurred during saving model parameters :
reason :sklearn.externals.joblib Function is used in 0.21 And previous versions of , In the latest version , This function should be deprecated .
resolvent : take from sklearn.externals import joblib Change it to import joblib
# Import model parameters , Direct prediction
clf = joblib.load('digits_svm.pkl')
Ypred = clf.predict(Xtest);
clf.score(Xtest, Ytest)
边栏推荐
- Let you understand several common traffic exposure schemes in kubernetes cluster
- [file download] easyexcel quick start
- 开区网站打开自动播放音乐的添加跟修改教程
- Word如何查看文档修改痕迹?Word查看文档修改痕迹的方法
- Apache POI implements excel import, read data, write data and export
- How to add traffic statistics codes to the legendary Development Zone website
- AttributeError: ‘module‘ object has no attribute ‘create_connection‘
- Open the tutorial of adding and modifying automatically playing music on the open zone website
- Big silent event Google browser has no title
- ODOO开发教程之透视表
猜你喜欢
Mapper agent development
关于servlet中实现网站的页面跳转
一文带你搞懂环绕通知@Around与最终通知@After的实现
Apache POI implements excel import, read data, write data and export
Solution | get the relevant information about the current employees' highest salary in each department |
scikit-learn——机器学习应用开发的步骤和理解
Northeast University Data Science Foundation (matlab) - Notes
玩家访问网站自动弹窗加QQ群方法以及详细代码
学习数据库的第一个程序
Google GTEST event mechanism
随机推荐
【2022新生学习】第三周要点
Google gtest事件机制
SM整合原来这么简单,步骤清晰(详细)
三层项目的架构分析及构造方法的参数名称注入
MySQL regularly calls preset functions to complete data update
P2181 diagonal
传奇如何一台服务器配置多个版本微端更新
How does WPS take quick screenshots? WPS quick screenshot method
Raspberry pie 4B + Intel neural computing stick (stick2) +yolov5 feasibility study report
"Invisible Bridge" built in the free trade economy: domestic products and Chinese AI power
How to add traffic statistics codes to the legendary Development Zone website
JS (foreach) return cannot end the function solution
What if the office prompts that the system configuration cannot run?
Pivot table of odoo development tutorial
TCP三次握手四次挥手
网安学习-内网安全1
Exception - ...MaxUploadSizeExceededException: Maximum upload size exceeded; nested exception is ...
学习数据库的第一个程序
怎样监测微型的网站服务
The method and detailed code of automatically pop-up and QQ group when players visit the website