当前位置:网站首页>Recommended system infrastructure and project introduction
Recommended system infrastructure and project introduction
2022-07-23 06:05:00 【Diesel】
System learning 《 Recommendation system 》-- Infrastructure
One 、 General recommendation system framework
- data collection
ETL MapReduce Spark Flink
- data storage
Hive HBase MySQL Redis
Persistent storage collects data
Usually according to the cold and hot data , Structured and unstructured distributed storage
- Algorithm recall
hot Collaborative filtering Content portrait As a substitute
Put massive data sets , Preliminary screening according to specific algorithm
From hundreds of thousands to hundreds and thousands
- Sorting results
LR SVD DNN GBDT
Sort accurately
Optimize for multiple goals
- Results application
Guess you like Similar recommendation Look and look
Show the final recommendation results to users according to different scenarios
Recommend common features
User characteristics
- Nature
- Portrait features : Interest in , Behavior
- Relationship characteristics : Crowd attributes , Focus on relationships , Intimacy
Item characteristics
- Static characteristics : Category labels
- Dynamic characteristics
- Correlation characteristics
- Contextual features
Today's headline
Two 、 Recommend common algorithms
Based on popularity
The hottest newest Most people likeBased on content
Same label Same key words Similar topicsBased on association rules
I saw A People also saw BNeighbor recommendation
Collaborative filtering : Based on users Item based Model-based
3、 ... and 、 Result evaluation index
- Accuracy rate Accuracy
The number of samples correctly predicted / Total number of samples - Area under curve AUC
Under different thresholds , The ratio of true positive to false positive in the prediction results
AUG = 1 : Perfect classifier
AUG > 0.5 : Most of them are really classifier intervals
AUG = 0.5 : Baseline classifier ( Toss a coin )
AUG < 0.5 : It is more accurate for negative samples , It can be transformed into a positive classifier
*ROC indicators *
For classifiers with continuous output values ( Such as probability prediction ), True positive at a certain threshold (TP) Probability / False positive (FP) Probability
Evaluation criteria
- Satisfaction degree : Accuracy rate 、 The length of stay 、 Conversion rate
- coverage : Can long tail items be recommended
- diversity : Are the recommended items different from each other 、 Cover as many points of interest as possible
- Novelty : Can you recommend something users haven't seen before
- Surprise degree ( It is difficult to ): The recommended things are not similar to the user's historical behavior records , But users like it very much
- The real time : Update the recommendation results in real time according to the latest preferences of users
- Business objectives : Whether it can achieve business goals such as GMV
Four 、 The project build (Concrec)
data source :Kaggle Anime Recommenations Dataset( Animation data source :myanimelist.net)
1. Data preprocessing
Summarize all data sources View data visually Clean and convert data
2. Recall
Carry out a preliminary recall of the candidate set according to a variety of strategies
3. Sort
Accurately sort the optimization objectives
Realize the reordering of specific rules
4. Interface services
Assemble sorting results , And expose the interface for front-end consumption
5. Front page
Result display & User interaction
5、 ... and 、Concrec Technology selection
programing language :python
Microservice framework :Flask
Front page :Vue
Data analysis :pandas
Big data processing :spark + Flink(spark Mainly )
Machine learning framework :TensorFlow( Google developed )
Spark: Distributed big data processing platform It solves the problem of computing power and storage capacity distribution differ Hadoop,Spark Based on memory computing , Faster Provide a variety of programming interfaces Such as SparkSQL,Mllib etc.
Flink: Streaming data (stream) Processing platform With flow as the core , High throughput , Low latency Good fault tolerance
TensorFlow: Machine learning framework Focus on Neural Networks 、 Deep learning In distributed training 、 Model visualization and other aspects are excellent
边栏推荐
猜你喜欢
随机推荐
mysql数据库基本知识
zstuAcm登记成绩(用STL链表list完成)
堆基础练习题 —— 1
Embedded system transplantation [1] - Guidance
IA笔记1
【基础6】——类、对象、类的封装、继承
LIinux下的基本C编程的三类高频函数操作详解第一类——文件操作函数(f)
Introduction to programming 3 - seeking the maximum value
UNIX实现IO多路复用之使用epoll函数实现网络socket服务端
栈溢出基础练习题——6(字符串漏洞64位下)
第四次作业:关于cat,grep,cut,sort,uniq,vim,tr等命令的用法
Transplantation de systèmes embarqués
记一个无显示屏无线连接树莓派,查找不到ip的办法
Zstuacm summer camp flag bearer
【基础3】——结构与函数
迷茫的五月
【基础7】——异常,捕获、自定义异常
洛谷回文质数 Prime Palindromes
UNIX Programming - network socket
Operation of numerical variables and special variables








