当前位置:网站首页>KNN method predicts pregnancy, KNN principle simple code
KNN method predicts pregnancy, KNN principle simple code
2022-07-29 03:23:00 【Order anything】
knn Algorithm is : K- Nearest neighbor algorithm (K Nearest Neighborhood), Birds of a feather flock together , This is a kind of according to your neighbors , How to classify you .
knn Methods belong to supervised learning methods , The principle is :
If a sample is in the feature space K The most similar ( That is, the closest in the feature space ) Most of the samples belong to a certain category , Then the sample also belongs to this category .K It is artificially set Hyperparameters
When it comes to distance , I'm learning knn Algorithm before , It is suggested to review Minkowski distance , Mathematics can refer to watermelon book , Or Dr. Li Hang's statistical principle .

In this case , We use Euclidean distance , When Minkowski is in the distance P Items for 2 when , That is, Euclidean distance ( It can be understood as the linear distance between two points ).
In this case , The packages that need to be used include :numpy,pandas,skicit-learn, Counter
# Import related toolkits
import pandas
import numpy as np
from collections import Counter
# Read training data Feature part ( Collected by robot sensors 112 Customers look , Smell , pulse condition , Temperature and other characteristics )
# With .values Finally, what comes out is numpy In the form of
x = pandas.read_csv('train_X.csv').values
# print(f' I am a test_x:{x}')
# print(np.mean(x),np.std(x))
# Read training data Actual results section ( Above 112 The real pregnancy status of customers ,0 It means girl ,1 Means boy ,2 It means that you are not pregnant )
y = pandas.read_csv('train_y.csv')
# adopt np.array() transformation , You can also DataFrame Format to array(numpy)
y = np.array(y)
"""
This part is knn Principle code , First, calculate the distance from a point to be predicted to all known points , Encapsulate in a way
Call the method encapsulated above , Get the distance from all the points to be predicted to the known points
Sort distances
"""
class knn_qingdaifu(object):
def fit(self,x,y,k=5):
self.k = k
self.x = x
self.y = y
# The process from multiple prediction points to all training points
def predict(self,X):
"""
Call a single point function , Generate a new result list
:param X:
:return:
"""
self.X = X
result = [self.oneResult(self.X[i]) for i in range(self.X.shape[0])]
return result
# The process from a prediction point to all training points
def oneResult(self,x_test):
"""
Calculate the distance from one point to all points :
1. The paradigm is used :np.linalg.norm;
2. A calculator is used Counter function (Counter);
:param x_test:
:return:
"""
# Calculate the Euclidean distance from a point to all known points Tabular
"""
A list appears bug Namely self.y[i][0]: Through the printing of the previous page, you can find , convert to numpy In the future, it will be a two-dimensional array
self.y[i] What appears is a ndarray type (nparray[2], String type )
self.y[i][0] Is to extract a two-dimensional array y The specific number in
"""
dist_list = [(np.linalg.norm(x_test-self.x[i]),self.y[i][0]) for i in range(len(self.x))]
# print(f' I am a :{dist_list}')
# Sort all distances
# Sort the list , The default is descending
"""
It is actually a tuple ,key = lambda x:x[0] 0 Represents one-dimensional
"""
dist_list.sort(key = lambda x: x[0])
# print(f' I am a :{dist_list}')
# Take before k The category corresponding to the minimum distance (y value )
"""
-1 Represents the last column , When a value , Get a one-dimensional numpy, One dimensional array
"""
y_list = [dist_list[i][-1] for i in range(self.k)]
# print(y_list)
# Make statistics on the classification of the above points
# Here is a Counter function
y_count = Counter(y_list).most_common()
# print('11',y_count)
return y_count[0][0]
# Create a robot
doctor = knn_qingdaifu()
# Training robot
doctor.fit(x,y)
# Use 38 Test the robot diagnosis effect with the data of customers
# Read 38 The look of a customer , Smell , pulse condition , Temperature and other characteristic data
test_x = pandas.read_csv('test_X.csv').values
# print(f' I am a x:{test_x}')
# The diagnosis ! The results are stored in result Array
result = doctor.predict(test_x)
# print(f' I am a result:{result}')
# Print out the diagnostic results , Compare with the actual results
# Read 38 Actual value of pregnancy status of customers (0 It means girl ,1 It means boy ,2 It means that you are not pregnant )
test_y = pandas.read_csv('test_y.csv')
# print(f' I am a y_test:/n{test_y}')
labels=[' Girl ',' The boy ',' No pregnancy ']
i = 0
# The number of correct diagnoses
predictOKNum = 0
print(" Number , Diagnostic value , actual value ,")
while i < test_y.shape[0]:
# The first i The diagnosis results are consistent with the actual i Compare the results , Equal means correct diagnosis
# if result[i] == (test_y.values[i,0]):
if result[i] == (test_y.values[i,0]):
predictOKNum = predictOKNum + 1
okOrNo = ' accuracy '
else:
okOrNo = ' error '
print("%s,%s,%s,%s" %(i+1, labels[result[i]],labels[test_y.values[i,0]],okOrNo))
i = i+1
print(" Diagnostic accuracy :%s" % (predictOKNum/i))After the code is tested , You can call sklearn The built-in knn Of API Verify the accuracy of your code ,KNeighborsClassifier(n_neighbors=5);
A high-level function for finding distance is used in the code , Normal form function :np.linalg.norm();
The above code is the most basic knn Principle realization , In real applications API in , utilize kd Trees To construct positional relationships , Reduce computation
Be careful :Counter yes python Internal bag , The object of operation is list, Can't operate array(numpy)
边栏推荐
猜你喜欢

Reproduce 20 character short domain name bypass and XSS related knowledge points

C language programming | exchange binary odd and even bits (macro Implementation)

Score addition and subtraction of force deduction and brushing questions (one question per day 7/27)

Watermelon book learning Chapter 6 -- SVM

NXP i.mx8mp-deepviewrt

Matlab learning -- structured programs and user-defined functions

带你来浅聊一下,单商户功能模块汇总

Detailed steps for installing MySQL 8.0 under Linux

Mathematical modeling -- analytic hierarchy process model

今晚7:30 | 连界、将门、百度、碧桂园创投四位大佬眼中的AI世界,是继续高深还是回归商业本质?...
随机推荐
后缀自动机(sam)板子 from jly
ShardingSphere之水平分表实战(三)
再学EXKMP(EXKMP模板)
Flask creation process day05-06 creation project
力扣刷题之分数加减运算(每日一题7/27)
web-uploader不能多文件上传
Rongyun real-time community solution
[freeswitch development practice] unimrcp compilation and installation
Redis configuration cache expiration listening event trigger
How to realize shortcut keys for interface scaling by vscade
3D advanced renderer: artlandis studio 2021.2 Chinese version
Implement Lmax disruptor queue from scratch (VI) analysis of the principle of disruptor solving pseudo sharing and consumers' elegant stopping
STC单片机驱动1.8‘TFT SPI屏幕演示示例(含资料包)
3.2 model saving and loading
正则表达绕过waf
Score addition and subtraction of force deduction and brushing questions (one question per day 7/27)
Rongyun IM & RTC capabilities on new sites
C traps and defects Chapter 3 semantic "traps" 3.4 avoid "couple method"
美联储再加息,75基点 鲍威尔“放鸽”,美股狂欢
Several methods of converting object to string