当前位置:网站首页>CTR click through rate prediction practice project of advertising recommendation!
CTR click through rate prediction practice project of advertising recommendation!
2022-07-28 18:52:00 【Datawhale】
Datawhale dried food
author : Fishman , Master of Wuhan University ,Datawhale member
And newspapers 、 The magazine 、 TV 、 Compared with these traditional media advertisements , New Internet advertising has natural advantages : It can track 、 Study user preferences , And on this basis, carry out accurate advertising recommendation and marketing .
CTR(Click-Through-Rate) Click through rate , It is an important index to measure the effect of Internet advertising . This issue has become a hot topic in the research of major platforms in recent years . With the help of Huawei Global Campus AI Algorithm elite game question —— advertisement - Information flows across domains ctr forecast , Study this problem .

Practical background
Background of the contest
Advertising recommendation is mainly based on the historical exposure of users to advertisements 、 Click and other behaviors to model , If you only use advertising domain data , User behavior data is sparse , The type of behavior is relatively single . And the introduction of cross domain data from the same media , You can get the behavior data of the same advertising user in other domains , Deeply tap user interests , Enrich user behavior characteristics . Introduce advertising user behavior data of other media , It can also enrich the characteristics of users and advertisements .
Match task
This competition question is based on the advertisement log data , Basic user information and cross domain data to optimize advertising ctr Prediction accuracy . The target domain is the advertising domain , The source domain is the information flow recommendation domain , By obtaining the exposure of users in the information basin 、 Click on behavioral data such as information flow , Conduct user interest modeling , Help advertising domain ctr Accurate estimation .
Registration and data download
Registration address :
https://developer.huawei.com/consumer/cn/activity/digixActivity/digixdetail/101655281685926449?ha_source=dw&ha_sourceId=89000243
Data download :( Students who have not participated in the competition refer to )
https://xj15uxcopw.feishu.cn/docx/doxcnufyNTvUfpU57sRyydgyK6c
Practical ideas
This game is a classic hit rate prediction (CTR) Data mining competition , The task is to build a model , Predict whether the user clicks on the advertisement according to the user's test data . This is a typical dichotomous problem , The prediction output of the model is 0 or 1 ( Click on :1, Did not click :0)
Machine learning , About the classification task, we usually think of logical regression 、 Decision tree and other algorithms , In this practice code , We try to use logical regression to build our model . When we solve machine learning problems , Generally, the following process will be followed :
Practice code
Need memory :1GB
The elapsed time :5 minute
# Install dependent Libraries If it is windows System ,cmd Input in the command box pip install , Refer to the above environment configuration
#!pip install sklearn
#!pip install pandas
#---------------------------------------------------
# Import library
import pandas as pd
#---------------- Data exploration ----------------
# Use only target domain user behavior data
train_ads = pd.read_csv('./train/train_data_ads.csv',
usecols=['log_id', 'label', 'user_id', 'age', 'gender', 'residence', 'device_name',
'device_size', 'net_type', 'task_id', 'adv_id', 'creat_type_cd'])
test_ads = pd.read_csv('./test/test_data_ads.csv',
usecols=['log_id', 'user_id', 'age', 'gender', 'residence', 'device_name',
'device_size', 'net_type', 'task_id', 'adv_id', 'creat_type_cd'])
#---------------- Data set sampling ----------------
train_ads = pd.concat([
train_ads[train_ads['label'] == 0].sample(70000),
train_ads[train_ads['label'] == 1].sample(10000),
])
#---------------- model training ----------------
# Load training logistic regression model
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(
train_ads.drop(['log_id', 'label', 'user_id'], axis=1),
train_ads['label']
)
#---------------- Results output ----------------
# Model prediction and generate result files
test_ads['pctr'] = clf.predict_proba(
test_ads.drop(['log_id', 'user_id'], axis=1),
)[:, 1]
test_ads[['log_id', 'pctr']].to_csv('submission.csv',index=None)Practice improves
We have completed the cross domain advertising information flow ctr Estimated practical baseline Mission , Next, we can think from the following directions :
Continue to try different prediction models or feature engineering to improve the accuracy of model prediction
Try strategies such as model fusion
Check the advertising information flow across domains ctr Relevant data of prediction , Get other model building methods
Take part in the internal test
This paper is about Datawhale Project practice 2.0 course , If you are also a student , It's still in the introductory stage , You can enter the internal test learning group , We optimize the tutorial together in the learning feedback .


Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
- SwiftUI 组件之如何实现电话号码掩码隐藏部分的文本字段TextField(教程含源码)
- My creation anniversary -- July 25th, 2022
- Go exe generates icon version information
- EasyCVR新版本级联时,下级平台向上传递层级目录显示不全的原因分析
- 配置教程:新版本EasyCVR(v2.5.0)组织结构如何级联到上级平台?
- Kotlin:Sealed class密封类详解
- 数字经济时代的开源数据库创新 | 2022开放原子全球开源峰会数据库分论坛圆满召开
- 2022杭电多校第二场1011 DOS Card(线段树)
- UE5 GAS 学习笔记0.1 案例预览
- ERROR 2003 (HY000) Can‘t connect to MySQL server on ‘localhost3306‘ (10061)解决办法
猜你喜欢

MYSQL入门与进阶(七)

Record your interview experience in Xiamen for two years -- Conclusion

Golang concurrency model

SwiftUI 组件之如何实现电话号码掩码隐藏部分的文本字段TextField(教程含源码)

Zero knowledge proof: zkp with DDH assumption

Introduction and advanced MySQL (4)

不理解模块化、组件化、插件化的区别怎么行?

When golang encounters high concurrency seckill

配置教程:新版本EasyCVR(v2.5.0)组织结构如何级联到上级平台?

“讳疾忌医”的开源走不远
随机推荐
MYSQL入门与进阶(七)
Is it difficult for novices to change careers through self-study software testing?
Go exe generates icon version information
UE5 GAS 学习笔记0.1 案例预览
Meta Q2财报:营收首次下滑,Metaverse将与苹果竞争
Apple develops a complete creation process of Apple certificate and description file
Introduction and advanced level of MySQL (6)
MYSQL入门与进阶(三)
Win11怎么调亮度?Win11调屏幕亮度的四种方法
What are the conditions for zero foundation learning software testing?
GC垃圾回收器详解
配置教程:新版本EasyCVR(v2.5.0)组织结构如何级联到上级平台?
Record your interview experience in Xiamen for two years -- Conclusion
Log base zap of go language series
Leetcode binary tree class
redis优势以及数据结构相关知识
2022年牛客多校第2场 J . Link with Arithmetic Progression (三分+枚举)
1.3 linked list
专题讲座6 树形dp 学习心得(长期更新)
Introduction and advanced level of MySQL (8)