2022-07-05 18:39:00 【Bayesian小孙】
3. K近邻算法(KNN)
(1)KNN概念:k个最近的邻居,即每个样本都可以用它最接近的k个邻居来代表。(K Near Neighbor)
(2)算法思想:一个样本与数据集中的k个样本最相似, 如果这k个样本中的大多数属于某一个类别, 则该样本也属于这个类别。
3.1 读取数据信息
import pandas as pd
# 读取数据
data = pd.read_csv("./KNN_al/train.csv")
row_id | x | y | accuracy | time | place_id | |
0 | 0 | 0.7941 | 9.0809 | 54 | 470702 | 8523065625 |
1 | 1 | 5.9567 | 4.7968 | 13 | 186555 | 1757726713 |
2 | 2 | 8.3078 | 7.0407 | 74 | 322648 | 1137537235 |
3 | 3 | 7.3665 | 2.5165 | 65 | 704587 | 6567393236 |
4 | 4 | 4.0961 | 1.1307 | 31 | 472130 | 7440663949 |
5 | 5 | 3.8099 | 1.9586 | 75 | 178065 | 6289802927 |
6 | 6 | 6.3336 | 4.3720 | 13 | 666829 | 9931249544 |
7 | 7 | 5.7409 | 6.7697 | 85 | 369002 | 5662813655 |
8 | 8 | 4.3114 | 6.9410 | 3 | 166384 | 8471780938 |
9 | 9 | 6.3414 | 0.0758 | 65 | 400060 | 1253803156 |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29118021 entries, 0 to 29118020
Data columns (total 6 columns):
# Column Dtype
--- ------ -----
0 row_id int64
1 x float64
2 y float64
3 accuracy int64
4 time int64
5 place_id int64
dtypes: float64(2), int64(4)
memory usage: 1.3 GB
3.2 处理数据
3.2.1 缩小数据,查询数据筛选
data = data.query("x > 1.0 & x < 1.25 & y > 2.5 & y < 2.75")
row_id | x | y | accuracy | time | place_id | |
600 | 600 | 1.2214 | 2.7023 | 17 | 65380 | 6683426742 |
957 | 957 | 1.1832 | 2.6891 | 58 | 785470 | 6683426742 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 400082 | 6889790653 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 514983 | 6822359752 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 732410 | 1527921905 |
6090 | 6090 | 1.1140 | 2.6262 | 11 | 145507 | 4000153867 |
6234 | 6234 | 1.1449 | 2.5003 | 34 | 316377 | 3741484405 |
6350 | 6350 | 1.0844 | 2.7436 | 65 | 36816 | 5963693798 |
7468 | 7468 | 1.0058 | 2.5096 | 66 | 746766 | 9076695703 |
8478 | 8478 | 1.2015 | 2.5187 | 72 | 690722 | 3992589015 |
3.2.2 处理时间的数据
time_value = pd.to_datetime(data['time'], unit='s')
600 1970-01-01 18:09:40
957 1970-01-10 02:11:10
4345 1970-01-05 15:08:02
4735 1970-01-06 23:03:03
5580 1970-01-09 11:26:50
Name: time, dtype: datetime64[ns]
# 把日期格式转换成 字典格式
time_value = pd.DatetimeIndex(time_value)
DatetimeIndex(['1970-01-01 18:09:40', '1970-01-10 02:11:10',
'1970-01-05 15:08:02', '1970-01-06 23:03:03',
'1970-01-09 11:26:50', '1970-01-02 16:25:07',
'1970-01-04 15:52:57', '1970-01-01 10:13:36',
'1970-01-09 15:26:06', '1970-01-08 23:52:02',
'1970-01-07 10:03:36', '1970-01-09 11:44:34',
'1970-01-04 08:07:44', '1970-01-04 15:47:47',
'1970-01-08 01:24:11', '1970-01-01 10:33:56',
'1970-01-07 23:22:04', '1970-01-08 15:03:14',
'1970-01-04 00:53:41', '1970-01-08 23:01:07'],
dtype='datetime64[ns]', name='time', length=17710, freq=None)
# 构造一些特征
data['day'] = time_value.day
data['hour'] = time_value.hour
data['weekday'] = time_value.weekday
row_id | x | y | accuracy | time | place_id | day | hour | weekday | |
600 | 600 | 1.2214 | 2.7023 | 17 | 65380 | 6683426742 | 1 | 18 | 3 |
957 | 957 | 1.1832 | 2.6891 | 58 | 785470 | 6683426742 | 10 | 2 | 5 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 400082 | 6889790653 | 5 | 15 | 0 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 514983 | 6822359752 | 6 | 23 | 1 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 732410 | 1527921905 | 9 | 11 | 4 |
# 把时间戳特征删除
data = data.drop(['time'], axis=1)
row_id | x | y | accuracy | place_id | day | hour | weekday | |
600 | 600 | 1.2214 | 2.7023 | 17 | 6683426742 | 1 | 18 | 3 |
957 | 957 | 1.1832 | 2.6891 | 58 | 6683426742 | 10 | 2 | 5 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 6889790653 | 5 | 15 | 0 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 6822359752 | 6 | 23 | 1 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 1527921905 | 9 | 11 | 4 |
# 把签到数量少于n个目标位置删除
place_count = data.groupby('place_id').count()
# 以某个特征进行分组,该特征就成了索引index
row_id | x | y | accuracy | day | hour | weekday | |
place_id | |||||||
1012023972 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1057182134 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1059958036 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
1085266789 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1097200869 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 |
... | ... | ... | ... | ... | ... | ... | ... |
9904182060 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
9915093501 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
9946198589 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
9950190890 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
9980711012 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
805 rows × 7 columns
# tf里面保留了row_id>3的数据
tf = place_count[place_count.row_id > 3]
row_id | x | y | accuracy | day | hour | weekday | |
place_id | |||||||
1097200869 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 |
1228935308 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
1267801529 | 58 | 58 | 58 | 58 | 58 | 58 | 58 |
1278040507 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
1285051622 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
... | ... | ... | ... | ... | ... | ... | ... |
9741307878 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
9753855529 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
9806043737 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
9809476069 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
9980711012 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
239 rows × 7 columns
# 然后把索引重新设置一下,让place_id回到数据特征里面
tf = tf.reset_index()
place_id | row_id | x | y | accuracy | day | hour | weekday | |
0 | 1097200869 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 | 1044 |
1 | 1228935308 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
2 | 1267801529 | 58 | 58 | 58 | 58 | 58 | 58 | 58 |
3 | 1278040507 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
4 | 1285051622 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
234 | 9741307878 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
235 | 9753855529 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
236 | 9806043737 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
237 | 9809476069 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
238 | 9980711012 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
239 rows × 8 columns
# 把data里面的id是不是在tf.place_id里面,有就保存下来。
data = data[data['place_id'].isin(tf.place_id)]
row_id | x | y | accuracy | place_id | day | hour | weekday | |
600 | 600 | 1.2214 | 2.7023 | 17 | 6683426742 | 1 | 18 | 3 |
957 | 957 | 1.1832 | 2.6891 | 58 | 6683426742 | 10 | 2 | 5 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 6889790653 | 5 | 15 | 0 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 6822359752 | 6 | 23 | 1 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 1527921905 | 9 | 11 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
29100203 | 29100203 | 1.0129 | 2.6775 | 12 | 3312463746 | 1 | 10 | 3 |
29108443 | 29108443 | 1.1474 | 2.6840 | 36 | 3533177779 | 7 | 23 | 2 |
29109993 | 29109993 | 1.0240 | 2.7238 | 62 | 6424972551 | 8 | 15 | 3 |
29111539 | 29111539 | 1.2032 | 2.6796 | 87 | 3533177779 | 4 | 0 | 6 |
29112154 | 29112154 | 1.1070 | 2.5419 | 178 | 4932578245 | 8 | 23 | 3 |
16918 rows × 8 columns
3.2.3 取出目标值和特征值
y = data["place_id"]
x = data.drop(["place_id"],axis = 1) # 沿着列的方向删除目标值即可
3.3 划分训练集和测试集
from sklearn.datasets import load_iris, fetch_20newsgroups, load_boston
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.25)
row_id | x | y | accuracy | place_id | day | hour | weekday | |
600 | 600 | 1.2214 | 2.7023 | 17 | 6683426742 | 1 | 18 | 3 |
957 | 957 | 1.1832 | 2.6891 | 58 | 6683426742 | 10 | 2 | 5 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 6889790653 | 5 | 15 | 0 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 6822359752 | 6 | 23 | 1 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 1527921905 | 9 | 11 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
29100203 | 29100203 | 1.0129 | 2.6775 | 12 | 3312463746 | 1 | 10 | 3 |
29108443 | 29108443 | 1.1474 | 2.6840 | 36 | 3533177779 | 7 | 23 | 2 |
29109993 | 29109993 | 1.0240 | 2.7238 | 62 | 6424972551 | 8 | 15 | 3 |
29111539 | 29111539 | 1.2032 | 2.6796 | 87 | 3533177779 | 4 | 0 | 6 |
29112154 | 29112154 | 1.1070 | 2.5419 | 178 | 4932578245 | 8 | 23 | 3 |
16918 rows × 8 columns
# 这个时候我们先不做数据的标准化处理,直接调用KNN算法来试一试预测效果如何。
def knn_al():
knn = KNeighborsClassifier(n_neighbors = 5)
# fit,predict ,score
# 得出预测结果
y_predict = knn.predict(x_test)
# 得出准确率
if __name__ == "__main__":
预测目标签到位置为: [1479000473 2584530303 2946102544 ... 5606572086 1602053545 1097200869]
预测的准确率: 0.029787234042553193
# 我们尝试着提高下算法的准确率试试,先删除data中的row_id的特征。
data_del_row_id = data.drop(['row_id'],axis =1)
x | y | accuracy | place_id | day | hour | weekday | |
600 | 1.2214 | 2.7023 | 17 | 6683426742 | 1 | 18 | 3 |
957 | 1.1832 | 2.6891 | 58 | 6683426742 | 10 | 2 | 5 |
4345 | 1.1935 | 2.6550 | 11 | 6889790653 | 5 | 15 | 0 |
4735 | 1.1452 | 2.6074 | 49 | 6822359752 | 6 | 23 | 1 |
5580 | 1.0089 | 2.7287 | 19 | 1527921905 | 9 | 11 | 4 |
... | ... | ... | ... | ... | ... | ... | ... |
29100203 | 1.0129 | 2.6775 | 12 | 3312463746 | 1 | 10 | 3 |
29108443 | 1.1474 | 2.6840 | 36 | 3533177779 | 7 | 23 | 2 |
29109993 | 1.0240 | 2.7238 | 62 | 6424972551 | 8 | 15 | 3 |
29111539 | 1.2032 | 2.6796 | 87 | 3533177779 | 4 | 0 | 6 |
29112154 | 1.1070 | 2.5419 | 178 | 4932578245 | 8 | 23 | 3 |
16918 rows × 7 columns
y = data_del_row_id["place_id"]
x = data_del_row_id.drop(["place_id"],axis = 1) # 沿着列的方向删除目标值即可
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.25)
if __name__ == "__main__":
预测目标签到位置为: [1097200869 3312463746 9632980559 ... 3533177779 4932578245 1913341282]
预测的准确率: 0.0806146572104019
# 接下来删除day试试
y = data_del_row_id["day"]
x = data_del_row_id.drop(["place_id"],axis = 1) # 沿着列的方向删除目标值即可
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.25)
if __name__ == "__main__":
预测目标签到位置为: [2 9 4 ... 6 5 9]
预测的准确率: 0.810401891252955
3.4 特征工程(标准化)
3.5 计算predict和Score
# 取出数据当中的特征值和目标值
y = data['place_id']
x = data.drop(['place_id'], axis=1)
# 进行数据的分割训练集合测试集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
# 特征工程(标准化)
std = StandardScaler()
# 对测试集和训练集的特征值进行标准化
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
if __name__ == "__main__":
预测目标签到位置为: [6683426742 1435128522 2327054745 ... 2460093296 1435128522 1097200869]
预测的准确率: 0.41631205673758864
# 取出数据当中的特征值和目标值
x = data.drop("place_id",axis = 1)
row_id | x | y | accuracy | day | hour | weekday | |
600 | 600 | 1.2214 | 2.7023 | 17 | 1 | 18 | 3 |
957 | 957 | 1.1832 | 2.6891 | 58 | 10 | 2 | 5 |
4345 | 4345 | 1.1935 | 2.6550 | 11 | 5 | 15 | 0 |
4735 | 4735 | 1.1452 | 2.6074 | 49 | 6 | 23 | 1 |
5580 | 5580 | 1.0089 | 2.7287 | 19 | 9 | 11 | 4 |
... | ... | ... | ... | ... | ... | ... | ... |
29100203 | 29100203 | 1.0129 | 2.6775 | 12 | 1 | 10 | 3 |
29108443 | 29108443 | 1.1474 | 2.6840 | 36 | 7 | 23 | 2 |
29109993 | 29109993 | 1.0240 | 2.7238 | 62 | 8 | 15 | 3 |
29111539 | 29111539 | 1.2032 | 2.6796 | 87 | 4 | 0 | 6 |
29112154 | 29112154 | 1.1070 | 2.5419 | 178 | 8 | 23 | 3 |
16918 rows × 7 columns
# 进行数据的分割训练集合测试集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
# 特征工程(标准化)
std = StandardScaler()
# 对测试集和训练集的特征值进行标准化
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
if __name__ == "__main__":
预测目标签到位置为: [5270522918 1097200869 3312463746 ... 1097200869 5606572086 1097200869]
预测的准确率: 0.40803782505910163
# 我们再drop特征:“day”
x_no_row_id = x.drop(["row_id"],axis =1)
x_no_row_id_and_no_day = x_no_row_id.drop(["day"],axis =1)
x | y | accuracy | hour | weekday | |
600 | 1.2214 | 2.7023 | 17 | 18 | 3 |
957 | 1.1832 | 2.6891 | 58 | 2 | 5 |
4345 | 1.1935 | 2.6550 | 11 | 15 | 0 |
4735 | 1.1452 | 2.6074 | 49 | 23 | 1 |
5580 | 1.0089 | 2.7287 | 19 | 11 | 4 |
... | ... | ... | ... | ... | ... |
29100203 | 1.0129 | 2.6775 | 12 | 10 | 3 |
29108443 | 1.1474 | 2.6840 | 36 | 23 | 2 |
29109993 | 1.0240 | 2.7238 | 62 | 15 | 3 |
29111539 | 1.2032 | 2.6796 | 87 | 0 | 6 |
29112154 | 1.1070 | 2.5419 | 178 | 23 | 3 |
16918 rows × 5 columns
600 6683426742
957 6683426742
4345 6889790653
4735 6822359752
5580 1527921905
29100203 3312463746
29108443 3533177779
29109993 6424972551
29111539 3533177779
29112154 4932578245
Name: place_id, Length: 16918, dtype: int64
## 3.5
# 进行数据的分割训练集合测试集
x_train, x_test, y_train, y_test = train_test_split(x_no_row_id_and_no_day, y, test_size=0.25)
# 特征工程(标准化)
std = StandardScaler()
# 对测试集和训练集的特征值进行标准化
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
knn = KNeighborsClassifier(n_neighbors = 5)
# fit,predict ,score
# 得出预测结果
y_predict = knn.predict(x_test)
# 得出准确率
预测目标签到位置为: [6399991653 3533177779 1097200869 ... 2327054745 3992589015 6683426742]
预测的准确率: 0.48699763593380613
3.6 KNN算法总结
4. 分类模型评估(精确率与召回率)
A c c u r a c y = T P + T N T P + F P + F N + T N Accuracy = \frac{TP+TN}{TP+FP+FN+TN} Accuracy=TP+FP+FN+TNTP+TN
混淆矩阵:分类任务中,预测结果(Predicted Condition)与正确标记(True Condition)之间存在四种不同的组合,构成混淆矩阵(适用于多分类)
P r e c i s i o n = T P T P + F P Precision = \frac{TP}{TP+FP} Precision=TP+FPTP
R e c a l l = T P T P + F N Recall = \frac{TP}{TP+FN} Recall=TP+FNTP
sklearn.metrics.classification_report(y_true, y_pred, target_names=None)
5. 交叉验证与网格搜索
超参数搜索-网格搜索API: sklearn.model_selection.GridSearchCV
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
from sklearn.model_selection import train_test_split, GridSearchCV
# 构造一些参数的值进行搜索
param = {
"n_neighbors": [1,3,5,7,10]}
# 进行网格搜索
gc = GridSearchCV(knn, param_grid=param, cv=2)
gc.fit(x_train, y_train)
# 预测准确率
print("在测试集上准确率:", gc.score(x_test, y_test))
print("在交叉验证当中最好的结果:", gc.best_score_)
print("选择最好的模型是:", gc.best_estimator_)
print("每个超参数每次交叉验证的结果:", gc.cv_results_)
在测试集上准确率: 0.4955082742316785
在交叉验证当中最好的结果: 0.45917402269861285
选择最好的模型是: KNeighborsClassifier(n_neighbors=10)
每个超参数每次交叉验证的结果: {'mean_fit_time': array([0.00385594, 0.00366092, 0.00310779, 0.00316703, 0.003443 ]), 'std_fit_time': array([4.26769257e-04, 5.06877899e-04, 7.70092010e-05, 4.99486923e-05,
2.91109085e-04]), 'mean_score_time': array([0.19389665, 0.20236516, 0.21587265, 0.22173393, 0.23718596]), 'std_score_time': array([0.00897849, 0.00262308, 0.00137246, 0.00043309, 0.00201011]), 'param_n_neighbors': masked_array(data=[1, 3, 5, 7, 10],
mask=[False, False, False, False, False],
dtype=object), 'params': [{'n_neighbors': 1}, {'n_neighbors': 3}, {'n_neighbors': 5}, {'n_neighbors': 7}, {'n_neighbors': 10}], 'split0_test_score': array([0.41456494, 0.42307692, 0.44435687, 0.44656368, 0.45176545]), 'split1_test_score': array([0.4186633 , 0.43332282, 0.45412989, 0.4612232 , 0.4665826 ]), 'mean_test_score': array([0.41661412, 0.42819987, 0.44924338, 0.45389344, 0.45917402]), 'std_test_score': array([0.00204918, 0.00512295, 0.00488651, 0.00732976, 0.00740858]), 'rank_test_score': array([5, 4, 3, 2, 1], dtype=int32)}
6. 朴素贝叶斯算法
P ( C ∣ W ) = P ( W ∣ C ) P ( C ) P ( W ) P(C|W)=\frac{P(W|C)P(C)}{P(W)} P(C∣W)=P(W)P(W∣C)P(C)
计算方法:𝑃(𝐹1│𝐶)=𝑁𝑖/𝑁 (训练文档中去计算)
6.1 拉普拉斯平滑
P ( F 1 ∣ C ) = N i + α N + α m P(F1|C)=\frac{N_i+\alpha}{N+\alpha m} P(F1∣C)=N+αmNi+α
6.2 sklearn朴素贝叶斯实现API
sklearn.naive_bayes.MultinomialNB(alpha = 1.0)
α \alpha α:拉普拉斯平滑系数
6.3 朴素贝叶斯算法案例
def naviebayes():
""" 朴素贝叶斯进行文本分类 :return: None """
news = fetch_20newsgroups(subset='all')
# 进行数据分割
x_train, x_test, y_train, y_test = train_test_split(news.data, news.target, test_size=0.25)
# 对数据集进行特征抽取
tf = TfidfVectorizer()
# 以训练集当中的词的列表进行每篇文章重要性统计['a','b','c','d']
x_train = tf.fit_transform(x_train)
x_test = tf.transform(x_test)
# 进行朴素贝叶斯算法的预测
mlt = MultinomialNB(alpha=1.0)
mlt.fit(x_train, y_train)
y_predict = mlt.predict(x_test)
print("预测的文章类别为:", y_predict)
# 得出准确率
print("准确率为:", mlt.score(x_test, y_test))
print("每个类别的精确率和召回率:", classification_report(y_test, y_predict, target_names=news.target_names))
return None
if __name__ =="__main__":
['00' '000' '0000' ... 'óáíïìåô' 'ýé' 'ÿhooked']
[[0. 0.02654538 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
预测的文章类别为: [ 5 2 17 ... 1 13 7]
准确率为: 0.8612054329371817
每个类别的精确率和召回率: precision recall f1-score support
alt.atheism 0.88 0.80 0.84 200
comp.graphics 0.88 0.79 0.83 241
comp.os.ms-windows.misc 0.89 0.78 0.83 254
comp.sys.ibm.pc.hardware 0.76 0.87 0.81 245
comp.sys.mac.hardware 0.84 0.90 0.86 229
comp.windows.x 0.90 0.85 0.88 245
misc.forsale 0.93 0.67 0.78 241
rec.autos 0.91 0.92 0.92 263
rec.motorcycles 0.94 0.95 0.94 265
rec.sport.baseball 0.94 0.95 0.95 237
rec.sport.hockey 0.91 0.98 0.94 238
sci.crypt 0.79 0.98 0.88 259
sci.electronics 0.91 0.82 0.86 238
sci.med 0.98 0.90 0.94 239
sci.space 0.87 0.97 0.92 249
soc.religion.christian 0.62 0.98 0.76 260
talk.politics.guns 0.80 0.95 0.87 230
talk.politics.mideast 0.92 0.98 0.95 230
talk.politics.misc 1.00 0.65 0.79 196
talk.religion.misc 0.97 0.23 0.37 153
accuracy 0.86 4712
macro avg 0.88 0.85 0.85 4712
weighted avg 0.88 0.86 0.86 4712
- Insufficient picture data? I made a free image enhancement software
- A cloud opens a new future of smart transportation
- U-Net: Convolutional Networks for Biomedical Images Segmentation
- 2022 Alibaba Android advanced interview questions sharing, 2022 Alibaba hand Taobao Android interview questions
- 华律网牵手观测云,上线系统全链路可观测平台
- 自动化测试的好处
- 小程序 修改样式 ( placeholder、checkbox的样式)
- The era of Web3.0 is coming. See how Tianyi cloud storage resources revitalize the system to enable new infrastructure (Part 2)
- 怎么自动安装pythn三方库
- RPC协议详解
Various pits of vs2017 QT
Ant group open source trusted privacy computing framework "argot": open and universal
How much does the mlperf list weigh when AI is named?
2022 latest intermediate and advanced Android interview questions, [principle + practice + Video + source code]
Powerful tool for collection processing
技术分享 | 常见接口协议解析
面试官:Redis 过期删除策略和内存淘汰策略有什么区别?
[today in history] July 5: the mother of Google was born; Two Turing Award pioneers born on the same day
7-1 linked list is also simple fina
【Autosar 十四 启动流程详解】
Quickly generate IPA package
Common time complexity
Tianyi cloud understands enterprise level data security in this way
Use file and directory properties and properties
Thoroughly understand why network i/o is blocked?
7-2 保持链表有序
Find in MySQL_ in_ Detailed explanation of set() function usage
Is it safe for Apple mobile phone to speculate in stocks? Is it a fraud to get new debts?
Lombok @builder annotation
U-Net: Convolutional Networks for Biomedical Images Segmentation
A2L file parsing based on CAN bus (3)