当前位置:网站首页>[machine learning] first day of introduction
[machine learning] first day of introduction
2022-06-12 05:32:00 【cbdgz】
K-NN(k- Nearest neighbor algorithm )
The overview
Algorithm principle
K- Steps of nearest neighbor algorithm
- Calculate the distance between a point in a given category dataset and the current point
- Sort in the order of increasing distance
- Select the front with the smallest distance k A little bit
- Before finding out k The point with the highest frequency of category occurrence among the points
- Select this category as the prediction category of test data
python Basic code knowledge
1.np.tile(inX,(dataSetSize,1)) Express the inX Repeat in the column vector direction 1 Time , Row vector repeat dataSize Time , In this case, i.e inX Constructed into and dataSet A matrix of the same order , To facilitate the subtraction operation .
2.array.argsort() That is, the array array Sort from small to large and return the sorted subscript value .
3.dict.get(key,default=None) Dictionary get() Method , Returns the value of the specified key , If it is not in the dictionary, the default value is returned None, The default value returned in this program is set to 0.
4.key=operator.itemgetter(1) Sort by dictionary value
5.key=operator.itemgetter(0) Sort according to the key value of the dictionary
Complete code
import numpy as np
import operator
# Create a dataset
def createDataSet():
group = np.array([[1, 101], [5, 89], [108, 5], [115, 8]])
# label
labels = [' Love story ', ' Love story ', ' Action movies ', ' Action movies ']
return group, labels
''' K-NN Algorithm Paraments: inX Data for classification ( Test set ) dataSet Data for training ( Training set ) labes Category labels K K-NN Choose the front... With the smallest distance K A little bit Returns: sortedClassCount[0][0] Classification results '''
def K_NN(inX,dataSet,labes,k):
# dataSet The number of rows
dataSetSize=dataSet.shape[0]
# Repeat on column vector discovery inX common 1 Time ( The transverse ), The direction of the line vector is repeated dataSetSize Time ( The longitudinal ), That means to put inX Convert to order and dataSet The same matrix
diffMat=np.tile(inX,(dataSetSize,1))-dataSet
sqDiffMat=diffMat**2 # Two dimensional vectors are subtracted and squared
#sum() All the elements add up ,sum(0) Column addition ,sum(1) Add lines
sqDistances=sqDiffMat.sum(axis=1)
# Calculate the distance by square root
distances=sqDistances**0.5
# return distances The index value of elements sorted from small to large in , Put back the original subscript of the sorted data
sortedDistIndices=distances.argsort()
# Define a dictionary that records the number of times of a category
classCount={
}
for i in range(k):
# Before removal K A category of elements , That is, take out the sorted tag names in turn
votellabel=labels[sortedDistIndices[i]]
# dict.get(key,default=None) Dictionary get() Method , Returns the value of the specified key , If it is not in the dictionary, the default value is returned None, The default value returned in this program is set to 0
classCount[votellabel]=classCount.get(votellabel,0)+1
sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
# Put back the most frequent tags
return sortedClassCount[0][0]
if __name__=='__main__':
group,labels=createDataSet()
print(group)
print(labels)
test=[101,20]
test_class=K_NN(test,group,labels,3)
print(test_class)
[1] Reference boss Jack-Cui
边栏推荐
- 国企为什么要上市
- 16. 最接近的三数之和
- Summary of problems in rv1109/rv1126 product development
- Development of video preview for main interface of pupanvr-ui
- FPGA语法的细节
- Variables and data types
- Kubernetes certificate online update
- 4.3 simulate browser operation and page waiting (display waiting and implicit waiting, handle)
- WebRTC AEC 流程解析
- arp timer and arptables
猜你喜欢

什么是工程预付款

Main business objects of pupanvr record (5)

yolov5

个人申请OV类型SSL证书

WiFi band resources

Abstract methods and interfaces

Matlab: halftone and dither conversion

Detailed explanation of data envelopment analysis (DEA) (taking the 8th Ningxia provincial competition as an example)

38. 外观数列

Introduction to audio alsa architecture
随机推荐
Computer network connected but unable to access the Internet
Thingsboard view telemetry data through database
The server time zone value ‘Ö Ð¹ ú±ê ×¼ ʱ ¼ ä‘ is unrecognized or represents more than one time zone. You
16. 最接近的三數之和
60. points of N dice
Stm32f4 ll library multi-channel ADC
CentOS compiling and installing mysql8.0
Calculation method notes for personal use
4.3 模拟浏览器操作和页面等待(显示等待和隐式等待、句柄)
The combined application of TOPSIS and fuzzy borde (taking the second Dawan District cup and the national championship as examples, it may cause misunderstanding, and the Dawan District cup will be up
Project requirements specification
Kubernetes certificate online update
Detailed usage of vim editor
16. 最接近的三数之和
个人申请OV类型SSL证书
Detailed analysis of the 2021 central China Cup Title A (color selection of mosaic tiles)
About architecture (in no particular order)
12.24 day exercise -- Programming summation, 99 multiplication table, while loop and for loop exercises
Beginning is an excellent emlog theme v3.1, which supports emlog Pro
31. stack push in and pop-up sequence