当前位置:网站首页>CTR click through rate prediction practice project of advertising recommendation!
CTR click through rate prediction practice project of advertising recommendation!
2022-07-28 18:52:00 【Datawhale】
Datawhale dried food
author : Fishman , Master of Wuhan University ,Datawhale member
And newspapers 、 The magazine 、 TV 、 Compared with these traditional media advertisements , New Internet advertising has natural advantages : It can track 、 Study user preferences , And on this basis, carry out accurate advertising recommendation and marketing .
CTR(Click-Through-Rate) Click through rate , It is an important index to measure the effect of Internet advertising . This issue has become a hot topic in the research of major platforms in recent years . With the help of Huawei Global Campus AI Algorithm elite game question —— advertisement - Information flows across domains ctr forecast , Study this problem .

Practical background
Background of the contest
Advertising recommendation is mainly based on the historical exposure of users to advertisements 、 Click and other behaviors to model , If you only use advertising domain data , User behavior data is sparse , The type of behavior is relatively single . And the introduction of cross domain data from the same media , You can get the behavior data of the same advertising user in other domains , Deeply tap user interests , Enrich user behavior characteristics . Introduce advertising user behavior data of other media , It can also enrich the characteristics of users and advertisements .
Match task
This competition question is based on the advertisement log data , Basic user information and cross domain data to optimize advertising ctr Prediction accuracy . The target domain is the advertising domain , The source domain is the information flow recommendation domain , By obtaining the exposure of users in the information basin 、 Click on behavioral data such as information flow , Conduct user interest modeling , Help advertising domain ctr Accurate estimation .
Registration and data download
Registration address :
https://developer.huawei.com/consumer/cn/activity/digixActivity/digixdetail/101655281685926449?ha_source=dw&ha_sourceId=89000243
Data download :( Students who have not participated in the competition refer to )
https://xj15uxcopw.feishu.cn/docx/doxcnufyNTvUfpU57sRyydgyK6c
Practical ideas
This game is a classic hit rate prediction (CTR) Data mining competition , The task is to build a model , Predict whether the user clicks on the advertisement according to the user's test data . This is a typical dichotomous problem , The prediction output of the model is 0 or 1 ( Click on :1, Did not click :0)
Machine learning , About the classification task, we usually think of logical regression 、 Decision tree and other algorithms , In this practice code , We try to use logical regression to build our model . When we solve machine learning problems , Generally, the following process will be followed :
Practice code
Need memory :1GB
The elapsed time :5 minute
# Install dependent Libraries If it is windows System ,cmd Input in the command box pip install , Refer to the above environment configuration
#!pip install sklearn
#!pip install pandas
#---------------------------------------------------
# Import library
import pandas as pd
#---------------- Data exploration ----------------
# Use only target domain user behavior data
train_ads = pd.read_csv('./train/train_data_ads.csv',
usecols=['log_id', 'label', 'user_id', 'age', 'gender', 'residence', 'device_name',
'device_size', 'net_type', 'task_id', 'adv_id', 'creat_type_cd'])
test_ads = pd.read_csv('./test/test_data_ads.csv',
usecols=['log_id', 'user_id', 'age', 'gender', 'residence', 'device_name',
'device_size', 'net_type', 'task_id', 'adv_id', 'creat_type_cd'])
#---------------- Data set sampling ----------------
train_ads = pd.concat([
train_ads[train_ads['label'] == 0].sample(70000),
train_ads[train_ads['label'] == 1].sample(10000),
])
#---------------- model training ----------------
# Load training logistic regression model
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(
train_ads.drop(['log_id', 'label', 'user_id'], axis=1),
train_ads['label']
)
#---------------- Results output ----------------
# Model prediction and generate result files
test_ads['pctr'] = clf.predict_proba(
test_ads.drop(['log_id', 'user_id'], axis=1),
)[:, 1]
test_ads[['log_id', 'pctr']].to_csv('submission.csv',index=None)Practice improves
We have completed the cross domain advertising information flow ctr Estimated practical baseline Mission , Next, we can think from the following directions :
Continue to try different prediction models or feature engineering to improve the accuracy of model prediction
Try strategies such as model fusion
Check the advertising information flow across domains ctr Relevant data of prediction , Get other model building methods
Take part in the internal test
This paper is about Datawhale Project practice 2.0 course , If you are also a student , It's still in the introductory stage , You can enter the internal test learning group , We optimize the tutorial together in the learning feedback .


Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
猜你喜欢
随机推荐
湖上建仓全解析:如何打造湖仓一体数据平台 | DEEPNOVA技术荟系列公开课第四期
LeetCode_ 63_ Different paths II
数字经济时代的开源数据库创新 | 2022开放原子全球开源峰会数据库分论坛圆满召开
jvm四种引用类型
2022年中国企业服务产业市场行情
MYSQL入门与进阶(九)
kotlin:out in
使用自开发的代理服务器解决 SAP UI5 FileUploader 上传文件时遇到的跨域访问错误试读版
What skills do you need to master when learning software testing zero foundation?
AI 改变千行万业,开发者如何投身 AI 语音新“声”态
What is the future of software testing?
LeetCode_ 96_ Different binary search trees
配置教程:新版本EasyCVR(v2.5.0)组织结构如何级联到上级平台?
Win11电脑摄像头打开看不见,显示黑屏如何解决?
What is one hot code? Why use it and when?
EasyCVR接入设备后播放视频出现卡顿现象的原因分析及解决
Ue5 gas learning notes 1.8 game special effects (gameplaycue)
MYSQL入门与进阶(二)
UE5 GAS 学习笔记 1.9 技能系统全局类(AbilitySystemGlobals)
UE5 GAS 学习笔记 1.2游戏标签









