当前位置:网站首页>Introduction to Tianchi news recommendation: 4 Characteristic Engineering
Introduction to Tianchi news recommendation: 4 Characteristic Engineering
2022-07-04 01:20:00 【Programmer base camp】
Preface
Feature engineering is to make features and labels , Turn to supervised learning .
- Features that can be used directly :
- The article's own characteristics , category_id The type of the article , created_at_ts Indicates when the article was created , This is related to the timeliness of the article , words_count It's the number of words in the article , Generally, the number of words is too long. We don't like to click , It doesn't rule out that some people like to read long articles .
- The content of the article embedding features , It was used in this recall , Here you can choose to use , You can choose not to , You can also try other types of embedding features , such as W2V etc.
- User's device feature information
- The idea of constructing supervision data set , According to the result of the recall , We're going to get one {user_id: [ List of possible articles to click on ]} A dictionary of form . So we can, for each user , Each article constructs a possible set of tests , For example, for users user1, Suppose you get his recall list {user1: [item1, item2, item3]}, We can get three rows of data (user1, item1), (user1, item2), (user1, item3) In the form of , These are the first two columns of features when monitoring test sets .
The idea of structural features is this , We know that each user's click on the article is closely related to its historical click article information , Like the same theme , Similar and so on . So the feature structure is an important series of features Is to combine the user's history, click on the article information . We've got a data set of two columns for each user and click on the candidate article , And our goal is to predict the last click on the article , A more natural way of thinking is to have a relationship with the last few clicks on the article , This takes into account the history of the click article information , It has to be closer to the last click , Because one of the most important features of news is its timeliness . Often the last click of a user has a lot to do with the last few clicks . So we can do this for each candidate article , Make features related to the last few clicks as follows :
- The candidate item Similarity to the last few clicks (embedding Inner product ) — This is directly related to the user's historical behavior
- The candidate item Statistical characteristics of similarity features with the last few clicks — Statistical features can reduce some fluctuations and anomalies
- The candidate item The difference between the number of words in the last few clicks — You can see user preferences by the number of words
- The candidate item The time difference characteristics established with the last few clicks on the article — The time difference feature shows the user's preference for the real-time of the article
You need to think about - If used youtube If you recall , We can also create users and candidates item Similar characteristics of
- Word2Vec The main idea is : The context of a word can well express the meaning of a word . A way of generating word vectors through unsupervised learning .word2vec There are two very classic models in :skip-gram and cbow.
skip-gram: The head word is known to predict the surrounding words .
cbow: Knowing the surrounding words predicts the head word .
In the use of gensim Training word2vec When , There are several important parameters
- size: The dimension of the word vector .
- window: It determines how far the target word will relate to the context .
- sg: If it is 0, It is CBOW Model , yes 1 It is Skip-Gram Model .
- workers: Indicates the number of threads during training
- min_count: Set the smallest
- iter: The number of times to traverse the entire dataset during training
Specific tutorials and codes
边栏推荐
- Trading software programming
- The first training of wechat applet
- C import Xls data method summary IV (upload file de duplication and database data De duplication)
- 【.NET+MQTT】. Net6 environment to achieve mqtt communication, as well as bilateral message subscription and publishing code demonstration of server and client
- Release and visualization of related data
- Swagger2 quick start and use
- 不得不会的Oracle数据库知识点(一)
- Solution of cursor thickening
- Make drop-down menu
- Severity code description the project file line prohibits the display of status error c4996 fopen ('fscanf ', StrCmp): this function or variable may be unsafe The most comprehensive solution
猜你喜欢
The FISCO bcos console calls the contract and reports an error does not exist
1-Redis架构设计到使用场景-四种部署运行模式(上)
GUI 应用:socket 网络聊天室
我管你什么okr还是kpi,PPT轻松交给你
AI helps make new breakthroughs in art design plagiarism retrieval! Professor Liu Fang's team paper was employed by ACM mm, a multimedia top-level conference
Cloud dial test helps Weidong cloud education to comprehensively improve the global user experience
What is the GPM scheduler for go?
AI 助力艺术设计抄袭检索新突破!刘芳教授团队论文被多媒体顶级会议ACM MM录用
功能:将主函数中输入的字符串反序存放。例如:输入字符串“abcdefg”,则应输出“gfedcba”。
Function: write function fun to find s=1^k+2^k +3^k ++ The value of n^k, (the cumulative sum of the K power of 1 to the K power of n).
随机推荐
功能:将主函数中输入的字符串反序存放。例如:输入字符串“abcdefg”,则应输出“gfedcba”。
7.1 learning content
8. Go implementation of string conversion integer (ATOI) and leetcode
GUI 应用:socket 网络聊天室
Avoid playing with super high conversion rate in material minefields
使用dnSpy对无源码EXE或DLL进行反编译并且修改
Solution of cursor thickening
MySQL uses the view to report an error, explain/show can not be issued; lacking privileges for underlying table
数据库表外键的设计
Oracle database knowledge points that cannot be learned (III)
Severity code description the project file line prohibits the display of status error c4996 fopen ('fscanf ', StrCmp): this function or variable may be unsafe The most comprehensive solution
Idsia & supsi & usi | continuous control behavior learning and adaptive robot operation based on Reinforcement Learning
PMP 考试常见工具与技术点总结
2-redis architecture design to use scenarios - four deployment and operation modes (Part 2)
Luogu p1309 Swiss wheel
be based on. NETCORE development blog project starblog - (14) realize theme switching function
【.NET+MQTT】.NET6 环境下实现MQTT通信,以及服务端、客户端的双边消息订阅与发布的代码演示
功能:求5行5列矩阵的主、副对角线上元素之和。注意, 两条对角线相交的元素只加一次。例如:主函数中给出的矩阵的两条对角线的和为45。
在寻求人类智能AI的过程中,Meta将赌注押向了自监督学习
The force deduction method summarizes the single elements in the 540 ordered array