当前位置:网站首页>Machine learning plant leaf recognition
Machine learning plant leaf recognition
2022-07-06 06:39:00 【Nothing (sybh)】
Identification of plant leaves : Give the data set of blades ” Leaf shape .csv”, Describe the edges of plant leaves 、 shape 、 The numerical variables of these three features of texture have 64 individual ( common 64*3=192 A variable ). Besides , also 1 Taxonomic variables recording the plant species to which each leaf belongs , common 193 A variable . Please use the feature selection method for feature selection , And compare the similarities and differences of the feature selection results (20 branch ). Through data modeling , Complete the recognition of blade shape (30 branch ).
Catalog
Catalog
3 Conduct PCA Dimension reduction
4 KNN Grid search optimization ,PCA Before and after
Ideas
1. Data analysis visualization
2. establish Feature Engineering ( According to the correlation matrix , Select features for Feature Engineering . Including data preprocessing , Supplement missing values , Normalized data, etc )
3. Machine learning algorithm Model to verify the analysis
1 Import package
import pandas as pd
from sklearn import svm
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
2 Draw correlation matrix ( According to the correlation matrix , Select features for Feature Engineering )
Train= pd.read_csv(" Leaf shape .csv")
X = Train.drop(['species'], axis=1)
Y = Train['species']
Train['species'].replace(map_dic.keys(), map_dic.values(), inplace=True)
Train.drop(['id'], inplace = True, axis = 1)
Train_ture = Train['species']
# Draw the correlation matrix
corr = Train.corr()
f, ax = plt.subplots(figsize=(25, 25))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5)
plt.show()
Supplement missing values
np.all(np.any(pd.isnull(Train)))
#false
Training set test set division (80% Training set 、20% Test set )
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.2,random_state=123)
Normalize the data
standerScaler = StandardScaler()
x_train = standerScaler.fit_transform(x_train)
x_test = standerScaler.fit_transform(x_test)
3 Conduct PCA Dimension reduction
pca = PCA(n_components=0.9)
x_train_1 = pca.fit_transform(x_train)
x_test_1 = pca.transform(x_test)
## 44 Features
4 KNN grid Search optimization ,PCA Before and after
from sklearn.neighbors import KNeighborsClassifier
knn_clf0 = KNeighborsClassifier()
knn_clf0.fit(x_train, y_train)
print('KNeighborsClassifier')
y_predict = knn_clf0.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
print("PCA after ")
knn_clf1 = KNeighborsClassifier()
knn_clf1.fit(x_train_1, y_train)
print('KNeighborsClassifier')
y_predict = knn_clf1.predict(x_test_1)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
5 SVC
svc_clf = SVC(probability=True)
svc_clf.fit(x_train, y_train)
print("*"*30)
print('SVC')
y_predict = svc_clf.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
svc_clf1 = SVC(probability=True)
svc_clf1.fit(x_train_1, y_train)
print("*"*30)
print('SVC')
y_predict1 = svc_clf1.predict(x_test_1)
score = accuracy_score(y_test, y_predict1)
print("Accuracy: {:.4%}".format(score))
6. Logical regression
from sklearn.linear_model import LogisticRegressionCV
lr = LogisticRegressionCV(multi_class="ovr",
fit_intercept=True,
Cs=np.logspace(-2,2,20),
cv=2,
penalty="l2",
solver="lbfgs",
tol=0.01)
lr.fit(x_train,y_train)
print(' Logical regression ')
y_predict = lr.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
The accuracy of logistic regression is the highest 98.65
After feature selection and principal component analysis, the accuracy will not necessarily be improved
边栏推荐
- MySQL high frequency interview 20 questions, necessary (important)
- LeetCode每日一题(971. Flip Binary Tree To Match Preorder Traversal)
- Today's summer solstice
- Use shortcut LNK online CS
- How much is the price for the seal of the certificate
- Avtiviti创建表时报错:Error getting a new connection. Cause: org.apache.commons.dbcp.SQLNestedException
- Py06 字典 映射 字典嵌套 键不存在测试 键排序
- Private cloud disk deployment
- E-book CHM online CS
- MySQL is sorted alphabetically
猜你喜欢
生物医学英文合同翻译,关于词汇翻译的特点
LeetCode - 152 乘积最大子数组
LeetCode 732. My schedule III
SQL Server Manager studio (SSMS) installation tutorial
私人云盘部署
Cobalt Strike特征修改
中英对照:You can do this. Best of luck祝你好运
Summary of leetcode's dynamic programming 4
MySQL5.72.msi安装失败
How to translate professional papers and write English abstracts better
随机推荐
Simulation volume leetcode [general] 1314 Matrix area and
记一个基于JEECG-BOOT的比较复杂的增删改功能的实现
My seven years with NLP
如何做好互联网金融的英语翻译
MFC dynamically creates dialog boxes and changes the size and position of controls
自动化测试环境配置
Apple has open source, but what about it?
Wish Dragon Boat Festival is happy
关于新冠疫情,常用的英文单词、语句有哪些?
org. activiti. bpmn. exceptions. XMLException: cvc-complex-type. 2.4. a: Invalid content beginning with element 'outgoing' was found
Changes in the number of words in English papers translated into Chinese
Black cat takes you to learn EMMC Protocol Part 10: EMMC read and write operation details (read & write)
Leetcode daily question (971. flip binary tree to match preorder traversal)
SourceInsight Chinese garbled
删除外部表源数据
LeetCode每日一题(1997. First Day Where You Have Been in All the Rooms)
Redis core technology and basic architecture of actual combat: what does a key value database contain?
Office doc add in - Online CS
CS-证书指纹修改
Making interactive page of "left tree and right table" based on jeecg-boot