当前位置:网站首页>[basic data mining technology] KNN simple clustering
[basic data mining technology] KNN simple clustering
2022-07-24 20:28:00 【Sunny qt01】
KNN Clustering technology

The picture shows age and income , Will you buy magazines
KNN Is to choose one K As the radius of , Circle with sample as origin , If there are more categories in the circle , Then we will divide the sample into this category .K Is a super parameter , Because we are sure .
KNN Theoretical basis : Customers in the same cluster will show the same behavior .
So the cluster is the same as the adjacent customers , It is not a machine learning method
Inferiority : inefficiency , Because I'm not sure K So try many times .
It is difficult to explain why KNN Clustering effect will be better than naïve prediction Good prediction .
KNN And Naïve Prediction Result probability comparison :

We found that the correct probability is indeed much higher .
Practical application KNN Of 3 A step
Data preprocessing (Data Preprocessing) Guaranteed attributes (age vs Income) The measurement ( The proportion )scale No problem . Remember to standardize
Calculation of distance (Distance Caculation): Choose which distance calculation formula
Calculation of prediction probability (Predicted Probability)
step 1: Standardization
We use extreme positive programming here (Min-max Normalization)【0,1】

The transformation effect is as follows :

step 2 Several distance calculations :
Manhattan distance ( First power ), This is the street distance , Not a straight distance

Among them R The formula is as follows :

Euclidean distance ( A quadratic )
It's our classic distance formula , Linear distance

The difference between the two is p The final formula of the control is

When p be equal to 1 Manhattan distance
When p be equal to 2 Time is European distance
Python In the accessories, it is to change p To change the distance formula
step 3:
For example, there is a score 3 Class distance , The test data sample is T,k=5. give the result as follows
The latest goal is A class
The second most recent target attribute is B class
The third recent target attribute is A class
The fourth recent target attribute is C class
The fifth recent target attribute is A class
Then we predict that the target attribute value is A, Accuracy rate is 3/5
Case study 1 Give diagnostic data for the following diseases , Field 1 is the patient code , The following input fields ( sore throat 、 Have a fever 、 Swollen lymph glands 、 congestion , Have a headache ) And the target field ( The diagnosis )

utilize KNN Predict the diagnostic results of the following patients (K=3)
Distance(Yes,No)=1
DIStance(YES,YES)=0
Distance(No,No)=0
Two customers distance The calculation method adopts interception distance
边栏推荐
- How to apply Po mode in selenium automated testing
- Connect the smart WiFi remote control in the home assistant
- [training Day8] series [matrix multiplication]
- [leetcode] 1184. Distance between bus stops
- How to view the execution plan of a stored procedure in Youxuan database
- PD user manual
- [training Day9] rotate [violence] [thinking]
- Modulenotfounderror: no module named 'pysat.solvers' (resolved)
- [shader realizes the flicker effect of three primary colors of television signal _shader effect Chapter 5]
- Browser local storage webstroage
猜你喜欢

Implementation of OA office system based on JSP

Risk control system, implemented by flink+clickhouse!

Markdown to PDF API data interface

Lights of thousands of families in the year of xinchou
![[msp430g2553] graphical development notes (2) system clock and low power consumption mode](/img/4e/c08288c3804d3f1bcd5ff2826f7546.png)
[msp430g2553] graphical development notes (2) system clock and low power consumption mode

How to test WebService interface

Introduction to fastdfs high availability

Opengl rendering pipeline
![[training Day10] point [enumeration] [bidirectional linked list]](/img/62/41dcab40eeb6aea545602e10c1c1a0.png)
[training Day10] point [enumeration] [bidirectional linked list]

Lunch break train & problem thinking: thinking about the problem of converting the string formed by hour: minute: second to second
随机推荐
Expression evaluation (stack)
Sql164 next day retention rate of new users per day in November 2021
Introduction to fastdfs high availability
Functional test of redisgraph multi active design scheme
TCP sliding window, singleton mode (lazy and hungry) double checked locking / double checked locking (DCL)
2022 chemical automation control instrument test question simulation test platform operation
How to set the allure test report
Transport layer protocol parsing -- UDP and TCP
API data interface of A-share transaction data
Todolist case
Lunch break train & problem thinking: on multidimensional array statistics of the number of elements
[training Day10] point [enumeration] [bidirectional linked list]
Monotone stack and monotone queue (linear complexity optimization)
Write a batch and start redis
[training Day9] maze [line segment tree]
[sciter]: window communication
Lights of thousands of families in the year of xinchou
Choose the appropriate container runtime for kubernetes
Alibaba Sentinel 基操
Redisgraph graphic database multi activity design scheme