当前位置:网站首页>Scikit learn -- steps and understanding of machine learning application development
Scikit learn -- steps and understanding of machine learning application development
2022-07-29 05:14:00 【m0_ sixty-five million one hundred and eighty-seven thousand fo】
scikit-learn It's open source Python Language machine learning toolkit , It covers the implementation of almost all mainstream machine learning algorithms , And provides a consistent call interface . It's based on Numpy and scipy etc. Python Numerical calculation library , It provides an efficient algorithm implementation .
Catalog
1. Data collection and tagging
1. Data collection and tagging
Collect data first , Then mark the data . Among them, the collected data should be representative , To ensure the accuracy of the final trained model .
2. feature selection
An intuitive way to select features : Directly use each pixel of the image as a feature .
Data is saved as Number of samples × Characteristic number format Of array object .scikit-learn Use Numpy Of array Object to represent data , All image data is saved in digits.images in , Each element is a 8×8 Gray scale image of size .
3. Data cleaning
Take the collected 、 It is not suitable for preprocessing the data used for machine learning training , Thus, it can be converted into data suitable for machine learning .
Purpose : Reduce computation , Ensure model stability .
4. Model selection
For different data sets , Choosing different models has different efficiency . Therefore, many factors should be considered in selecting the model , To improve the fit of the final selection model .
5. model training
Before model training , To divide the data set into Training data set and Test data set , Then use the divided data set for model training , Finally, we get the model parameters we trained .
6. Model test
Intuitive method of model testing : Use the trained model to predict the test data set , Then compare the predicted results with the real results , The final result of comparison is the accuracy of the model .
scikit-learn The method provided to complete this work :
clf . score ( Xtest , Ytest)
besides , You can also directly display some pictures in the test data set , And the predicted value is displayed in the lower left corner of the picture , The lower right corner shows the real value .
7. Model saving and loading
When we train a satisfactory model, we can save it , So when you need to predict next time , This model can be directly used to predict , There is no need to train the model again .
8. example
Data collection and tagging
# Import library
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
"""
sk-learn There are some data sets in the Library
What is used here is the data of handwritten numeral recognition pictures
"""
# Import sklearn In the library datasets modular
from sklearn import datasets
# utilize datasets Functions in modules load_digits() Load data
digits = datasets.load_digits()
# Display the image represented by the data
images_and_labels = list(zip(digits.images, digits.target))
plt.figure(figsize=(8, 6))
for index, (image, label) in enumerate(images_and_labels[:8]):
plt.subplot(2, 4, index + 1)
plt.axis('off')
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Digit: %i' % label, fontsize=20);
feature selection
# Save the data as Number of samples x The number of features Format array object Data format for output
# The data has been saved in digits.data In file
print("shape of raw image data: {0}".format(digits.images.shape))
print("shape of data: {0}".format(digits.data.shape))
model training
# Divide the data into training data set and test data set ( Here, 20% of the data set is taken as the test data set )
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(digits.data, digits.target, test_size=0.20, random_state=2);
# Use support vector machine to train the model
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100., probability=True)
# Use training datasets Xtrain and Ytrain To train the model
clf.fit(Xtrain, Ytrain);
Model test
"""
sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
normalize: The default value is True, Return the proportion of correct classification ; If False, Returns the number of correctly classified samples
"""
# Evaluate the accuracy of the model ( Default here is true, Directly return the correct proportion , That is, the accuracy of the model )
from sklearn.metrics import accuracy_score
# predict Is the prediction result returned after training , It's the tag value .
Ypred = clf.predict(Xtest);
accuracy_score(Ytest, Ypred)
Model saving and loading
"""
Display some pictures in the test data set
The lower left corner of the picture shows the predicted value , The lower right corner shows the real value
"""
# Look at the forecast
fig, axes = plt.subplots(4, 4, figsize=(8, 8))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for i, ax in enumerate(axes.flat):
ax.imshow(Xtest[i].reshape(8, 8), cmap=plt.cm.gray_r, interpolation='nearest')
ax.text(0.05, 0.05, str(Ypred[i]), fontsize=32,
transform=ax.transAxes,
color='green' if Ypred[i] == Ytest[i] else 'red')
ax.text(0.8, 0.05, str(Ytest[i]), fontsize=32,
transform=ax.transAxes,
color='black')
ax.set_xticks([])
ax.set_yticks([])
# Save model parameters
import joblib
joblib.dump(clf, 'digits_svm.pkl');
The following error occurred during saving model parameters :
reason :sklearn.externals.joblib Function is used in 0.21 And previous versions of , In the latest version , This function should be deprecated .
resolvent : take from sklearn.externals import joblib Change it to import joblib
# Import model parameters , Direct prediction
clf = joblib.load('digits_svm.pkl')
Ypred = clf.predict(Xtest);
clf.score(Xtest, Ytest)
边栏推荐
- Activity workflow table structure learning
- Word如何查看文档修改痕迹?Word查看文档修改痕迹的方法
- AUTOSAR from introduction to proficiency 100 lectures (78) -autosar-dem module
- Operator operation list of spark
- JS (in ES6) sync & await understanding
- [wechat applet] swiper slides the page, and the left and right sides of the slider show part of the front and back, showing part of the front and back
- Open the tutorial of adding and modifying automatically playing music on the open zone website
- "Invisible Bridge" built in the free trade economy: domestic products and Chinese AI power
- Introduction of JDBC preparestatement+ database connection pool
- Deep learning brush a bunch of tricks of SOTA
猜你喜欢
三层项目的架构分析及构造方法的参数名称注入
虚拟偶像的歌声原来是这样生成的!
Excel怎么筛选出自己想要的内容?excel表格筛选内容教程
如何让照片中的人物笑起来?HMS Core视频编辑服务一键微笑功能,让人物笑容更自然
Create a mindscore environment in modelars, install mindvision, and conduct in-depth learning and training (Huawei)
Let you understand several common traffic exposure schemes in kubernetes cluster
时间序列分析的表示学习时代来了?
Young freshmen yearn for more open source | here comes the escape guide from open source to employment!
输入的查询SQL语句,是如何执行的?
Stack and queue and priority queue (large heap and small heap) simulation implementation and explanation of imitation function
随机推荐
Northeast University Data Science Foundation (matlab) - Notes
roLabelImg转DATO格式数据
Excel怎么筛选出自己想要的内容?excel表格筛选内容教程
Getting started with solidity
传奇如何一台服务器配置多个版本微端更新
带你搞懂 Kubernetes 集群中几种常见的流量暴露方案
JS (in ES6) sync & await understanding
Double type nullpointexception in Flink flow calculation
excel怎么设置行高和列宽?excel设置行高和列宽的方法
tmux随笔
WPS如何进行快速截屏?WPS快速截屏的方法
Activity workflow table structure learning
Deep learning brush a bunch of tricks of SOTA
Legend how to configure multiple versions of wechat updates on one server
Apache POI implements excel import, read data, write data and export
Young freshmen yearn for more open source | here comes the escape guide from open source to employment!
Connection database time zone setting
What servers are needed to build mobile app
IDEA中使用注解Test
Original code, inverse code, complement code