当前位置:网站首页>Simple recommendation based on Euclidean distance
Simple recommendation based on Euclidean distance
2022-06-11 09:39:00 【Mud smoke】
note
Minkowski distance :
among :
- r = 1 The formula is Manhattan distance
- r = 2 This formula is called Euclidean distance
- r = ∞ Maximal distance
*r The bigger the value is. , The difference size of a single dimension will have a greater impact on the overall distance .
Pearson correlation coefficient is used to measure the correlation between two variables
Its value is -1 to 1 Between ,1 It means a perfect match ,-1 It means completely contrary to
There is another formula , Be able to calculate the approximate value of Pearson correlation coefficient :
Although this formula looks more complicated , And the calculation results are not very stable , There is a certain error , But its biggest advantage is , When implemented in code, you can only traverse the data once
What kind of similarity should be used ?
- If the data exists “ Fraction inflation ” problem , Just use Pearson correlation coefficient .
- If data comparison “ Concentrated ”, There are basically public values between variables , And these distance data are very important , Use Euclidean or Manhattan distances .
- If the data is sparse , Cosine similarity is used .
among ,“·” Sign means quantity product .“||x||” It's a vector x The mold , The formula is :
Manhattan distance and Euclidean distance work very well with complete data , If the data is sparse , Then consider using cosine distance .
Use Euclidean distance to realize simple recommendation function :
# -*- coding: utf-8 -*-
"""
Created on Mon Feb 28 09:52:22 2022
@author: Knight
"""
from math import sqrt
users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0},
"Bill":{"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0},
"Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0},
"Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0},
"Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0},
"Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0},
"Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0},
"Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}
}
# Calculate Euclid distance
def ouji(rating1, rating2):
commonRating = False
sum = 0
for r1 in rating1:
if r1 in rating2:
sum += (rating1[r1] - rating2[r1])**2
commonRating = True
distance = sqrt(sum)
if commonRating:
return distance
else:
return -1
# Get the user with the smallest Euclidean distance
def getNearestUser(username, users):
distances = []
for user in users:
if user != username:
distance = ouji(users[user], users[username])
distances.append((distance, user))
distances.sort()
return distances[0][1]
def recommend(username, users):
nearest = getNearestUser(username, users)
recomendations = []
waitingList = users[nearest]
doneList = users[username]
for value in waitingList:
if not value in doneList:
recomendations.append((value, waitingList[value]))
return recomendations
print( recommend("Chan", users))
# >>> [('The Strokes', 2.5), ('Vampire Weekend', 2.0)]
print( recommend("Jordyn", users))
# >>> [('Blues Traveler', 3.0)]
边栏推荐
- Why is it difficult to implement informatization in manufacturing industry?
- js基础--Array对象
- [TiO websocket] v. TiO websocket server counts the number of online people
- [ERP system] how much do you know about the professional and technical evaluation?
- Slice of go language foundation
- Day44 database
- MySQL:Got a packet bigger than ‘max_ allowed_ packet‘ bytes
- keyboard entry.
- JS foundation -- Date object
- 1493. the longest subarray with all 1 after deleting an element
猜你喜欢

DOS command virtual environment

The difference and relation between machine learning and statistics

关于原型及原型链

Document object

Detailed explanation of the difference between construction method and method

Openstack explanation (21) -- installation and configuration of neutron components

ESP8266_ SmartConfig

Augmented reality experiment IV of Shandong University

Redis transaction details

Control statement if switch for while while break continue
随机推荐
2161. divide the array according to the given number
12.5 concurrent search + violent DFS - [discovery ring]
OpenCV OAK-D-W广角相机测试
[scheme development] scheme of infrared thermometer
Sed explanation of shell script (SED command, sed -e, sed s/ new / old /...)
About prototype and prototype chain
affair
报错device = depthai.Device(““, False) TypeError: _init_(): incompatible constructor arguments.
Shandong University project training (IV) -- wechat applet scans web QR code to realize web login
Before applying data warehouse ODBC, you need to understand these problems first
报错[DetectionNetwork(1)][warning]Network compiled for 6 shaves,maximum available 10,compiling for 5 s
How do we connect to WiFi?
Fabric. JS dynamically set font size
Monotone stack
1854. 人口最多的年份
1400. construct K palindrome strings
Fabric.js 動態設置字號大小
OpenCV OAK相机对比及介绍
[TiO websocket] IV. the TiO websocket server implements the custom cluster mode
远程办公最佳实践及策略




