当前位置:网站首页>Introduction to Tianchi news recommendation: 4 Characteristic Engineering
Introduction to Tianchi news recommendation: 4 Characteristic Engineering
2022-07-04 01:20:00 【Programmer base camp】
Preface
Feature engineering is to make features and labels , Turn to supervised learning .
- Features that can be used directly :
- The article's own characteristics , category_id The type of the article , created_at_ts Indicates when the article was created , This is related to the timeliness of the article , words_count It's the number of words in the article , Generally, the number of words is too long. We don't like to click , It doesn't rule out that some people like to read long articles .
- The content of the article embedding features , It was used in this recall , Here you can choose to use , You can choose not to , You can also try other types of embedding features , such as W2V etc.
- User's device feature information
- The idea of constructing supervision data set , According to the result of the recall , We're going to get one {user_id: [ List of possible articles to click on ]} A dictionary of form . So we can, for each user , Each article constructs a possible set of tests , For example, for users user1, Suppose you get his recall list {user1: [item1, item2, item3]}, We can get three rows of data (user1, item1), (user1, item2), (user1, item3) In the form of , These are the first two columns of features when monitoring test sets .
The idea of structural features is this , We know that each user's click on the article is closely related to its historical click article information , Like the same theme , Similar and so on . So the feature structure is an important series of features Is to combine the user's history, click on the article information . We've got a data set of two columns for each user and click on the candidate article , And our goal is to predict the last click on the article , A more natural way of thinking is to have a relationship with the last few clicks on the article , This takes into account the history of the click article information , It has to be closer to the last click , Because one of the most important features of news is its timeliness . Often the last click of a user has a lot to do with the last few clicks . So we can do this for each candidate article , Make features related to the last few clicks as follows :
- The candidate item Similarity to the last few clicks (embedding Inner product ) — This is directly related to the user's historical behavior
- The candidate item Statistical characteristics of similarity features with the last few clicks — Statistical features can reduce some fluctuations and anomalies
- The candidate item The difference between the number of words in the last few clicks — You can see user preferences by the number of words
- The candidate item The time difference characteristics established with the last few clicks on the article — The time difference feature shows the user's preference for the real-time of the article
You need to think about - If used youtube If you recall , We can also create users and candidates item Similar characteristics of
- Word2Vec The main idea is : The context of a word can well express the meaning of a word . A way of generating word vectors through unsupervised learning .word2vec There are two very classic models in :skip-gram and cbow.
skip-gram: The head word is known to predict the surrounding words .
cbow: Knowing the surrounding words predicts the head word .
In the use of gensim Training word2vec When , There are several important parameters
- size: The dimension of the word vector .
- window: It determines how far the target word will relate to the context .
- sg: If it is 0, It is CBOW Model , yes 1 It is Skip-Gram Model .
- workers: Indicates the number of threads during training
- min_count: Set the smallest
- iter: The number of times to traverse the entire dataset during training
Specific tutorials and codes
边栏推荐
- Who moved my code!
- Oracle database knowledge points that cannot be learned (III)
- In the process of seeking human intelligent AI, meta bet on self supervised learning
- Cloud dial test helps Weidong cloud education to comprehensively improve the global user experience
- 2-redis architecture design to use scenarios - four deployment and operation modes (Part 2)
- Understanding of Radix
- 不得不会的Oracle数据库知识点(四)
- “疫”起坚守 保障数据中台服务“不打烊”
- C import Xls data method summary III (processing data in datatable)
- Gauss elimination method and template code
猜你喜欢

0 basic learning C language - nixie tube dynamic scanning display

Windos10 reinstallation system tutorial

Luogu p1309 Swiss wheel

Print diamond pattern

Function: write function fun to find s=1^k+2^k +3^k ++ The value of n^k, (the cumulative sum of the K power of 1 to the K power of n).

【.NET+MQTT】.NET6 环境下实现MQTT通信,以及服务端、客户端的双边消息订阅与发布的代码演示

Analysis and solution of lazyinitializationexception

功能:将主函数中输入的字符串反序存放。例如:输入字符串“abcdefg”,则应输出“gfedcba”。

长文综述:大脑中的熵、自由能、对称性和动力学

Huawei rip and BFD linkage
随机推荐
Make drop-down menu
Ka! Why does the seat belt suddenly fail to pull? After reading these pictures, I can't stop wearing them
0 basic learning C language - nixie tube dynamic scanning display
Thinkphp6 integrated JWT method and detailed explanation of generation, removal and destruction
Typescript basic knowledge sorting
mysql使用视图报错,EXPLAIN/SHOW can not be issued; lacking privileges for underlying table
打印菱形图案
QML add gradient animation during state transition
Print diamond pattern
Gauss elimination method and template code
The culprit of unrestrained consumption -- Summary
基于.NetCore开发博客项目 StarBlog - (14) 实现主题切换功能
Summary of common tools and technical points of PMP examination
gslb(global server load balance)技术的一点理解
On covariance of array and wildcard of generic type
All in one 1407: stupid monkey
Flutter local database sqflite
@EnableAsync @Async
Pratique technique | analyse et solution des défaillances en ligne (Partie 1)
Development of user-defined navigation bar in uniapp