当前位置:网站首页>Understand the recommendation system in one article: Outline 02: The link of the recommendation system, from recalling rough sorting, to fine sorting, to rearranging, and finally showing the recommend
Understand the recommendation system in one article: Outline 02: The link of the recommendation system, from recalling rough sorting, to fine sorting, to rearranging, and finally showing the recommend
2022-08-03 16:35:00 【Ice Dew Coke】
Understand the recommendation system in one article:概要02:Links to recommender systems,Recall from coarse row,to fine row,to rearrange,Finally recommended display to the user
提示:A recent course on systematically studying recommender systems.We in the little red book scene, for example,Recommender systems in industry.
I only talk about technologies that are actually useful in industry.说实话,Technology in industry is far ahead of academia,See the book in open channels、The paper has a lot to do with industry practicegap,
You can't learn the key technologies of recommendation systems by reading books.
You can't learn the key technologies of recommendation systems by reading books.
You can't learn the key technologies of recommendation systems by reading books.
Shu-sen wang explain**《Xiaohongshu's recommendation system》**
GitHub资料连接:http://wangshusen.github.io/
B站视频合集:https://space.bilibili.com/1369507485/channel/seriesdetail?sid=2249610
提示:文章目录
文章目录
Links to recommender systems
We continue to learn the basic concepts of recommender systems.
This section is a link to recommender systems.
The link of the recommender system is divided into recall、粗排、精排、重排
Just a brief introduction of this class,The following courses will explain each part in detail.
第一步是召回,Quickly retrieve some items from the item database,
For example, Xiaohongshu has hundreds of millions of notes,When the user refreshes the little red book,
System will call recalled dozens of channels at the same time,Each channel recall back dozens to hundreds of paper notes,Retrieved a total of thousands of notes.
After I finish the recall,The next step is to select from thousands of notes that the user is most interested in.
下一步是粗排,Use smaller machine learning models,Grading thousands of notes one by one,
Sort and truncate by score,Keep the hundreds of notes with the highest scores,
再下一步是精排.Here we will use a large-scale deep neural network to score hundreds of notes one by one.
Refinement scores reflect user interest in notes,Can do stages after spermatogenesis,You can also do nothing else.
We Xiaohong said that the fine row does not do other stages,All these hundreds of notes with fine lines,Scores enter rearrangement.
重排是最后一步.Here will be according to the fine line score and diversity score do random sampling,Get dozens of notes,
Then break up similar content,And insert ads and operation content,展示给用户.
This is a general idea of the recommendation system,Will be explained in this a few links below.
The goal of the recommender system is to select dozens of items from the database of items to display to the user.
In the scene of our little red book,items are notes.
We have hundreds of millions of notes The first loop on the recommendation system circuit is recall,is to quickly retrieve some notes from the notes database.
在实践中,The recommender system has many recall channels.
Common include system filtering、双塔模型、Authors to follow, etc..
Such as the little red book recommendation system has dozens of recall,Each channel recall back dozens to hundreds of paper notes,These recall channels collectively return several thousand notes,
Recommendation system will then merge these notes,and do deduplication and filtering.
Filtering mean exclude users don't like it,The authors do not like it,Notes on disliked topics,After recovering thousands of notes,The next step is to do the sorting.
Ranking uses machine learning models to estimate user interest in notes,Keep the note with the highest score.
If you directly use a large-scale neural network to score thousands of notes one by one,It will cost a lot.
In order to solve the problem of calculation,The sorting is usually divided into two steps: rough sorting and fine sorting..
Rough row quickly grade thousands of notes with a simpler model,Keep the hundreds of notes with the highest scores.
Refinement uses a large neural network to score hundreds of notes,The fine line model is much larger than the rough model,more features,
Therefore, the score of the refined ranking model is more reliable.,However, the amount of calculation of fine sorting is very large.
That's why we filter with coarse row first,Then use fine,Doing so can better balance the amount of calculation and accuracy.
Get hundreds of notes after rough and fine sorting,Each note has a score,Indicates how interested the user is in the note,
You can directly sort the notes according to the score of the model,然后展示给用户.
However, there are still some shortcomings in the results at this time.,需要做一些调整.
This step is called rearrangement,rearrangement is mainlyConsider diversity,
random sampling based on diversity,Select dozens of notes from hundreds,
Then use the rulesContent of similar notes scattered.
I'll explain later rearrangement,The result of the rearrangement is the item that is finally displayed to the user,
比如把前80Items to show to the user,which includes notes and advertisements.
我说一下,The numbers here are all random.,I'm not too convenient to speak little red book of real Numbers,
Below I will briefly introduce the coarse and fine line of models,Coarse and fine are very similar,
The only difference is that the fine row model is bigger,more features.
The input to the model includes user features、The characteristics of the candidate items,There are statistical characteristics.
If we want to judge whether Xiao Wang is interested in a certain note,We're going to characterize the note、King's characteristics,There are also many statistical features fed into the neural network.
There are various structures of neural networks,这里就不展开讲了,Save it for a later class.
The neural network will output a lot of values,比如点击率、点赞率、收藏率、转发率,These values are all estimates of user behavior by the neural network..
The greater the numerical,Indicates that users are more interested in notes,
Finally, the multiple estimates are fused,得到最终的分数.
For example, the weighted sum of this score determines whether the note will be displayed to the user,and whether the notes are displayed at the front or the back.
请注意,This is just for a grade thick line of the note,To grade thousands of notes,Refinement requires scoring hundreds of notes.
Each note has multiple estimated scores,merge into one score,As you this article notes the basis of a sort.
The last link on the recommender chain is rearrangement,The most important function of rearrangement is diversity sampling.
Need to select dozens of notes from hundreds of notes,常见的方法有MMR和DPPThere are two reasons for sampling,
One basis is the size of the refined score,Another basis is diversity.
After sampling,Similar content will be broken up with rules.
We cannot put too similar notes on the adjacent location.
举个例子,Points based on gold medals,The top five notes are allNBA的内容,这样就不太合适.
even if the user is a basketball fan,He doesn't necessarily want to see homogeneous content.
If the row is the firstNBA的笔记,Then can't put several placesNBA的内容,Similar notes will be moved back.
Another purpose of rearrangement is to insert ads and operations,The content of the promotion should also be adjusted and sorted according to the ecological requirements.,For example, you can't connect a lot of beautiful pictures.
okayTo summarize this section,This lesson briefly introduces links to recommender systems:
The first link on the link is recall,We have a lot of recall channels,Quickly retrieve thousands of notes from hundreds of millions of notes as candidate sets,
Then let the sorting decide which notes to expose to the user,And show what is order,Sort in steps.
First is rough,Score thousands of notes with a small-scale neural network,Select the hundreds of articles with the highest scores and send them to the refinement.
当然,There will also be some rules to ensure that the notes entered into the refined arrangement are diverse.
Next is fine,Using a large-scale neural network to score hundreds of rough-chosen notes,打完分之后,No need to do sorting and staging.
These hundreds of notes will be finely arranged,Scores all go to rearrangement、Rearrangement will do diversity sampling,Select dozens of notes from hundreds.
then break up with rules,And insert ads and operation content.
The rules for rearrangement are very complex,There are thousands of lines of code,
The rough row recalled along the entire link is the biggest funnel.They changed the number of candidate notes from hundreds of millions to thousands,and then into a few hundred.
When there are only a few hundred candidate notes,In order to use large-scale neural network to do fine sorting,才能用DPPDiversity sampling in this way.
If the number of notes is too large,It is impossible to use large-scale neural networks andDPP.
总结
提示:How to systematically learn recommender systems,This series of articles can help you
(1)Applying for a job resume,You need to match the job requirements of the recruiting unit with your research direction and work content,It can meet the company's recruitment needs,Otherwise it hung his resume to you directly
(2)What do you do recommendation system direction is to enter this company?还是纯cv方向?还是NLP方向?or voice direction?Still in the middle of deep learning machine learning technology?还是硬件?还是前端开发?后端开发?测试开发?产品?人力?行政?You can't do everything,you need to find a direction,own accumulation,to deliver,Otherwise, what will the interviewer talk to you about??
(3)Recommendation system learning experience today:The goal of the recommender system is to select dozens of items from the database of items to display to the user,The link of the recommender system is divided into recall、粗排、精排、重排,In order to solve the problem of calculation,The sorting is usually divided into two steps: rough sorting and fine sorting..
边栏推荐
猜你喜欢
随机推荐
建造者模式/生成器模式
罗克韦尔AB PLC RSLogix5000中创建新项目、任务、程序和例程的具体方法和步骤
Introduction to the advantages of the new generation mesh network protocol T-Mesh wireless communication technology
SQL中对 datetime 类型操作
mysql delete 执行报错:You can‘t specify target table ‘doctor_info‘ for update in FROM clause
基于DMS的数仓智能运维服务,知多少?
Leetcode76. 最小覆盖子串
MobileVIT实战:使用MobileVIT实现图像分类
Introduction to spark learning - 1
leetcode:187. 重复的DNA序列
C专家编程 第3章 分析C语言的声明 3.8 理解所有分析过程的代码段
socket快速理解
Leetcode76. Minimal Covering Substring
【Unity入门计划】基本概念(6)-精灵渲染器 Sprite Renderer
2021年数据泄露成本报告解读
MySQL查询语法
window.open不显示favicon.icon
C专家编程 第3章 分析C语言的声明 3.4 通过图标分析C语言的声明
Kubernetes 笔记 / 任务 / 管理集群 / 用 kubeadm 管理集群 / 配置一个 cgroup 驱动
【带你了解SDN和网络虚拟化】