当前位置:网站首页>机器学习植物叶片识别
机器学习植物叶片识别
2022-07-06 06:29:00 【啥也不会(sybh)】
植物叶片的识别:给出叶片的数据集”叶子形状.csv”,描述植物叶片的边缘、形状、纹理这三个特征的数值型变量各有64个(共64*3=192个变量)。此外,还有1个记录每片叶片所属植物物种的分类型变量,共193个变量。请采用特征选择方法进行特征选择,并比较各特征选择结果的异同(20分)。通过数据建模,完成叶片形状的识别(30分)。
目录
目录
2画出相关性矩阵(需要根据相关性矩阵,选择特征进行特征工程)
思路
1.数据分析 可视化
2.建立特征工程(需要根据相关性矩阵,选择特征进行特征工程。包括对数据进行预处理,补充缺失值,归一化数据等)
3.机器学习算法模型去验证分析
1导入包
import pandas as pd
from sklearn import svm
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
2画出相关性矩阵(需要根据相关性矩阵,选择特征进行特征工程)
Train= pd.read_csv("叶子形状.csv")
X = Train.drop(['species'], axis=1)
Y = Train['species']
Train['species'].replace(map_dic.keys(), map_dic.values(), inplace=True)
Train.drop(['id'], inplace = True, axis = 1)
Train_ture = Train['species']
#画出相关性矩阵
corr = Train.corr()
f, ax = plt.subplots(figsize=(25, 25))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5)
plt.show()
补充缺失值
np.all(np.any(pd.isnull(Train)))
#false
训练集测试集划分(80%训练集、20%测试集)
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.2,random_state=123)
对数据归一化处理
standerScaler = StandardScaler()
x_train = standerScaler.fit_transform(x_train)
x_test = standerScaler.fit_transform(x_test)
3进行PCA降维
pca = PCA(n_components=0.9)
x_train_1 = pca.fit_transform(x_train)
x_test_1 = pca.transform(x_test)
## 44个特征
4 KNN网格搜索优化 ,PCA前后
from sklearn.neighbors import KNeighborsClassifier
knn_clf0 = KNeighborsClassifier()
knn_clf0.fit(x_train, y_train)
print('KNeighborsClassifier')
y_predict = knn_clf0.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
print("PCA后")
knn_clf1 = KNeighborsClassifier()
knn_clf1.fit(x_train_1, y_train)
print('KNeighborsClassifier')
y_predict = knn_clf1.predict(x_test_1)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
5 SVC
svc_clf = SVC(probability=True)
svc_clf.fit(x_train, y_train)
print("*"*30)
print('SVC')
y_predict = svc_clf.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
svc_clf1 = SVC(probability=True)
svc_clf1.fit(x_train_1, y_train)
print("*"*30)
print('SVC')
y_predict1 = svc_clf1.predict(x_test_1)
score = accuracy_score(y_test, y_predict1)
print("Accuracy: {:.4%}".format(score))
6.逻辑回归
from sklearn.linear_model import LogisticRegressionCV
lr = LogisticRegressionCV(multi_class="ovr",
fit_intercept=True,
Cs=np.logspace(-2,2,20),
cv=2,
penalty="l2",
solver="lbfgs",
tol=0.01)
lr.fit(x_train,y_train)
print('逻辑回归')
y_predict = lr.predict(x_test)
score = accuracy_score(y_test, y_predict)
print("Accuracy: {:.4%}".format(score))
逻辑回归准确率最高98.65
经过特征选择和主成分分析不一定会提高准确率
边栏推荐
- The internationalization of domestic games is inseparable from professional translation companies
- Cobalt strike feature modification
- Biomedical localization translation services
- Today's summer solstice
- Grouping convolution and DW convolution, residuals and inverted residuals, bottleneck and linearbottleneck
- Black cat takes you to learn EMMC Protocol Part 10: EMMC read and write operation details (read & write)
- LeetCode每日一题(1997. First Day Where You Have Been in All the Rooms)
- Day 246/300 ssh连接提示“REMOTE HOST IDENTIFICATION HAS CHANGED! ”
- SSO流程分析
- Summary of leetcode's dynamic programming 4
猜你喜欢
Play video with Tencent video plug-in in uni app
Traffic encryption of red blue confrontation (OpenSSL encrypted transmission, MSF traffic encryption, CS modifying profile for traffic encryption)
What are the characteristics of trademark translation and how to translate it?
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
私人云盘部署
云服务器 AccessKey 密钥泄露利用
[ 英語 ] 語法重塑 之 動詞分類 —— 英語兔學習筆記(2)
中英对照:You can do this. Best of luck祝你好运
CS-证书指纹修改
端午节快乐Wish Dragon Boat Festival is happy
随机推荐
论文摘要翻译,多语言纯人工翻译
[Tera term] black cat takes you to learn TTL script -- serial port automation skill in embedded development
org. activiti. bpmn. exceptions. XMLException: cvc-complex-type. 2.4. a: Invalid content beginning with element 'outgoing' was found
A 27-year-old without a diploma, wants to work hard on self-study programming, and has the opportunity to become a programmer?
oscp raven2靶机渗透过程
Avtiviti创建表时报错:Error getting a new connection. Cause: org.apache.commons.dbcp.SQLNestedException
专业论文翻译,英文摘要如何写比较好
How to convert flv file to MP4 file? A simple solution
端午节快乐Wish Dragon Boat Festival is happy
Summary of the post of "Web Test Engineer"
Biomedical English contract translation, characteristics of Vocabulary Translation
Fledgling Xiao Li's 103rd blog CC2530 resource introduction
JDBC requset corresponding content and function introduction
PHP uses redis to implement distributed locks
钓鱼&文件名反转&office远程模板
QT: the program input point xxxxx cannot be located in the dynamic link library.
Distributed system basic (V) protocol (I)
LeetCode - 152 乘积最大子数组
CS-证书指纹修改
Simulation volume leetcode [general] 1219 Golden Miner