当前位置:网站首页>Introduction to Tianchi news recommendation: 4 Characteristic Engineering
Introduction to Tianchi news recommendation: 4 Characteristic Engineering
2022-07-04 01:20:00 【Programmer base camp】
Preface
Feature engineering is to make features and labels , Turn to supervised learning .
- Features that can be used directly :
- The article's own characteristics , category_id The type of the article , created_at_ts Indicates when the article was created , This is related to the timeliness of the article , words_count It's the number of words in the article , Generally, the number of words is too long. We don't like to click , It doesn't rule out that some people like to read long articles .
- The content of the article embedding features , It was used in this recall , Here you can choose to use , You can choose not to , You can also try other types of embedding features , such as W2V etc.
- User's device feature information
- The idea of constructing supervision data set , According to the result of the recall , We're going to get one {user_id: [ List of possible articles to click on ]} A dictionary of form . So we can, for each user , Each article constructs a possible set of tests , For example, for users user1, Suppose you get his recall list {user1: [item1, item2, item3]}, We can get three rows of data (user1, item1), (user1, item2), (user1, item3) In the form of , These are the first two columns of features when monitoring test sets .
The idea of structural features is this , We know that each user's click on the article is closely related to its historical click article information , Like the same theme , Similar and so on . So the feature structure is an important series of features Is to combine the user's history, click on the article information . We've got a data set of two columns for each user and click on the candidate article , And our goal is to predict the last click on the article , A more natural way of thinking is to have a relationship with the last few clicks on the article , This takes into account the history of the click article information , It has to be closer to the last click , Because one of the most important features of news is its timeliness . Often the last click of a user has a lot to do with the last few clicks . So we can do this for each candidate article , Make features related to the last few clicks as follows :
- The candidate item Similarity to the last few clicks (embedding Inner product ) — This is directly related to the user's historical behavior
- The candidate item Statistical characteristics of similarity features with the last few clicks — Statistical features can reduce some fluctuations and anomalies
- The candidate item The difference between the number of words in the last few clicks — You can see user preferences by the number of words
- The candidate item The time difference characteristics established with the last few clicks on the article — The time difference feature shows the user's preference for the real-time of the article
You need to think about - If used youtube If you recall , We can also create users and candidates item Similar characteristics of
- Word2Vec The main idea is : The context of a word can well express the meaning of a word . A way of generating word vectors through unsupervised learning .word2vec There are two very classic models in :skip-gram and cbow.
skip-gram: The head word is known to predict the surrounding words .
cbow: Knowing the surrounding words predicts the head word .
In the use of gensim Training word2vec When , There are several important parameters
- size: The dimension of the word vector .
- window: It determines how far the target word will relate to the context .
- sg: If it is 0, It is CBOW Model , yes 1 It is Skip-Gram Model .
- workers: Indicates the number of threads during training
- min_count: Set the smallest
- iter: The number of times to traverse the entire dataset during training
Specific tutorials and codes
边栏推荐
- 【.NET+MQTT】.NET6 环境下实现MQTT通信,以及服务端、客户端的双边消息订阅与发布的代码演示
- Introduction to unity shader essentials reading notes Chapter III unity shader Foundation
- Typescript basic knowledge sorting
- All in one 1412: binary classification
- Function: find the approximate value of the limit of the ratio of the former term to the latter term of Fibonacci sequence. For example, when the error is 0.0001, the function value is 0.618056.
- Unity Shader入门精要读书笔记 第三章 Unity Shader基础
- Decompile and modify the non source exe or DLL with dnspy
- 长文综述:大脑中的熵、自由能、对称性和动力学
- be based on. NETCORE development blog project starblog - (14) realize theme switching function
- Day05 table
猜你喜欢

Future源码一观-JUC系列

1-redis architecture design to use scenarios - four deployment and operation modes (Part 1)

Windos10 reinstallation system tutorial

It's OK to have hands-on 8 - project construction details 3-jenkins' parametric construction

功能:将主函数中输入的字符串反序存放。例如:输入字符串“abcdefg”,则应输出“gfedcba”。

功能:求5行5列矩阵的主、副对角线上元素之和。注意, 两条对角线相交的元素只加一次。例如:主函数中给出的矩阵的两条对角线的和为45。

Since the "epidemic", we have adhered to the "no closing" of data middle office services

Install the pit that the electron has stepped on

Who moved my code!

GUI 应用:socket 网络聊天室
随机推荐
Hash table, string hash (special KMP)
Cesiumjs 2022^ source code interpretation [8] - resource encapsulation and multithreading
Future源码一观-JUC系列
Meta metauniverse female safety problems occur frequently, how to solve the relevant problems in the metauniverse?
【.NET+MQTT】. Net6 environment to achieve mqtt communication, as well as bilateral message subscription and publishing code demonstration of server and client
In the process of seeking human intelligent AI, meta bet on self supervised learning
Network layer - routing
Beijing invites reporters and media
[prefix and notes] prefix and introduction and use
Function: write function fun to find s=1^k+2^k +3^k ++ The value of n^k, (the cumulative sum of the K power of 1 to the K power of n).
1-redis architecture design to use scenarios - four deployment and operation modes (Part 1)
Conditional test, if, case conditional test statements of shell script
Employees' turnover intention is under the control of the company. After the dispute, the monitoring system developer quietly removed the relevant services
Future source code view -juc series
Notice on Soliciting Opinions on the draft of information security technology mobile Internet application (APP) life cycle security management guide
7.1 learning content
Hbuilder link Xiaoyao simulator
Oracle database knowledge points that cannot be learned (III)
机器学习基础:用 Lasso 做特征选择
Luogu p1309 Swiss wheel