当前位置：网站首页>Recommended system infrastructure and project introduction

Recommended system infrastructure and project introduction

2022-07-23 06:05:00 【Diesel】

System learning 《 Recommendation system 》-- Infrastructure

One 、 General recommendation system framework

data collection

ETL MapReduce Spark Flink

data storage

Hive HBase MySQL Redis
 Persistent storage collects data 
 Usually according to the cold and hot data , Structured and unstructured distributed storage

Algorithm recall

 hot   Collaborative filtering   Content   portrait   As a substitute 
 Put massive data sets , Preliminary screening according to specific algorithm 
 From hundreds of thousands to hundreds and thousands

Sorting results

LR SVD DNN GBDT
 Sort accurately 
 Optimize for multiple goals

Results application

 Guess you like   Similar recommendation   Look and look 
 Show the final recommendation results to users according to different scenarios

Recommend common features

User characteristics

Nature
Portrait features ： Interest in , Behavior
Relationship characteristics ： Crowd attributes , Focus on relationships , Intimacy

Item characteristics

Static characteristics ： Category labels
Dynamic characteristics
Correlation characteristics
Contextual features
Today's headline

Two 、 Recommend common algorithms

Based on popularity
The hottest newest Most people like
Based on content
Same label Same key words Similar topics
Based on association rules
I saw A People also saw B
Neighbor recommendation
Collaborative filtering ： Based on users Item based Model-based

3、 ... and 、 Result evaluation index

Accuracy rate Accuracy
The number of samples correctly predicted / Total number of samples
Area under curve AUC
Under different thresholds , The ratio of true positive to false positive in the prediction results
AUG = 1 ： Perfect classifier
AUG > 0.5 : Most of them are really classifier intervals
AUG = 0.5 : Baseline classifier （ Toss a coin ）
AUG < 0.5 : It is more accurate for negative samples , It can be transformed into a positive classifier

*ROC indicators *
 For classifiers with continuous output values （ Such as probability prediction ）, True positive at a certain threshold （TP） Probability / False positive （FP） Probability

Evaluation criteria

Satisfaction degree ： Accuracy rate 、 The length of stay 、 Conversion rate
coverage ： Can long tail items be recommended
diversity ： Are the recommended items different from each other 、 Cover as many points of interest as possible
Novelty ： Can you recommend something users haven't seen before
Surprise degree （ It is difficult to ）： The recommended things are not similar to the user's historical behavior records , But users like it very much
The real time ： Update the recommendation results in real time according to the latest preferences of users
Business objectives ： Whether it can achieve business goals such as GMV

Four 、 The project build （Concrec）

data source ：Kaggle Anime Recommenations Dataset（ Animation data source ：myanimelist.net）

1. Data preprocessing

Summarize all data sources View data visually Clean and convert data

2. Recall

Carry out a preliminary recall of the candidate set according to a variety of strategies

3. Sort

Accurately sort the optimization objectives
Realize the reordering of specific rules

4. Interface services

Assemble sorting results , And expose the interface for front-end consumption

5. Front page

Result display & User interaction

5、 ... and 、Concrec Technology selection

programing language ：python
Microservice framework ：Flask
Front page ：Vue
Data analysis ：pandas
Big data processing ：spark + Flink（spark Mainly ）
Machine learning framework ：TensorFlow（ Google developed ）

Spark： Distributed big data processing platform It solves the problem of computing power and storage capacity distribution differ Hadoop,Spark Based on memory computing , Faster Provide a variety of programming interfaces Such as SparkSQL,Mllib etc.
Flink： Streaming data （stream） Processing platform With flow as the core , High throughput , Low latency Good fault tolerance
TensorFlow： Machine learning framework Focus on Neural Networks 、 Deep learning In distributed training 、 Model visualization and other aspects are excellent

原网站

版权声明
本文为[Diesel]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207221757225076.html