当前位置:网站首页>Data warehouse buried point system and attribution practice
Data warehouse buried point system and attribution practice
2022-08-03 07:33:00 【Zhuojiu South Street】
目录
导读:Today is a flow for ran,Strictly selected as electricity,Traffic construction is particularly important.Traffic data construction difficult than business data,Because the data source itself is some semi-structured data,Not the concept of dimensions of analysis,And the traffic data of mixed、脏、乱,For data of test、整合、The difficulty of governance will be more,This article introduced from the whole traffic link on.
1. 埋点体系建设
Buried point system construction is the core link traffic counts warehouse construction.The source of traffic data mainly buried point,Buried point system construction quality directly determines the flow of data quality,直接影响了上游应用的数据质量以及业务对数据的可信度.
1.1 埋点分类
The classification of technically,Buried buried point generally divided into front-end and back-end buried PM:
前端埋点
将采集的 SDK Integrated in the terminal,主要分三种:代码埋点、可视化埋点、无埋点.
优点:较方便、灵活,Mobile phone users can convenient to the behavior of the data on the interface,What resource such as a user clicks on a.
缺点:Rely on the client environment,The general collection of data compression、暂存,To reduce to the mobile data traffic,In addition to need some of the most important events in the real-time reporting does not restrict network environment,Other events only commonly inwifi情况下上报,So the data will be delay,Losing data such as disadvantages.
代码埋点 | 可视化埋点 | |
---|---|---|
概念 | Need to be buried point to develop students into buried point code | And no buried point principle about,Products or operations can also needs in the management platform configuration buried point,然后SDKBuried up periodically to identify points of control,获取埋点数据,Without buried point to develop students intervention. |
优点 | 高度定制、控制精准、Data collected by abundant accurate | 实施成本低 |
缺点 | Implementation cost big | Scene confined to interact,覆盖面小 |
后端埋点
将采集的 SDK Integration on the server,Is what we call the back-end log,比如登录日志.
优点:Because the data is in the network transmission,Data transmission of real-time strong,The risk of data loss small
缺点:Collect data less,Can't get the user interface behavior data,The crawler data more
Strict selection belongs to the electricity business is complicated,Will face more business analysis scenarios,需要精细化运营,The code is buried point+基于xpathNo buried point technology combined with.Buried point defined entity(SPM+SCM+ACTION).SPMIs the location information page,SCMIs the business parameters(比如素材、ab分组数据),ACTIONRefers to a series of user actions.
SPM语义化
Page location information to take unified definition in English+The rear of the buried points related to realize
SCM统一化
The back-end unified passthroughjsonThe business parameters
{
"extra": {
"k1": v1,
"k2": v2,
"k3": v3
}
}
ACTION标准化
1.2 开发流程&保障
Buried point roughly development process:
2. 数仓建设
2.1 业务架构图
2.2 数仓架构图
2.3 The fact table construction
General idea of the construction of the fact table or:选择业务过程->选择粒度->确定维度->确定事实->冗余维度.
Traffic construction fact table,Many people believe that no business process,其实不然,Just not as obvious as pay Yi Yu split,从下单->支付->完结.The business process is a series of actions the user,And that these data are doped in buried point.
Relatively simple fact table construction,Mainly according to the business analysis scenarios,Do some subject field split,比如活动、搜索等等,Each subject domains of the cohesion and the fact table are related to the burial site data redundancy and there will be some dimensions.
2.4 Dimension table construction
The dimension table construction is a very important part of traffic counts warehouse inside,The source of data for dimension tables are mainly two parts,Respectively from the business table data and buried some detail data.The former design logic is relatively simple,Basic it is directly from the business table synchronization come over,Since the business table data volume is not big,From the consideration on the implementation and maintenance costs,At this stage is to make periodic snapshot table form;For the user to access attributes of the device or,Usually out of the fact table,Such as equipment at the end of the first visit and visit time.
2.5 dws表建设
dwsConstruction is mainly done some general granularity and index of subsidence calculation,有同学可能会有疑问,Why single douuid粒度的dws模型,Not have more fine grained model yet?Can not directly from theuuidUpdate the granularity model summary?
This is mainly in two aspects,One is to reduce the computational cost、Second is to keepdwsTable extensibility and reducedwsThe use of table cost.Different granularity cross-product will contain some unique index,比如uuidCross-product commodities some indexes is unique,Such as goods access.uuidParticle size of the model is based on a few more granular model summary to the,So the diameter is consistent on the.
在模型设计中,Considering the principle of the main or: 高内聚低耦合 、The balance of cost and performance 、数据一致性.
3.uuidAnd the attribution of construction
3.1 uuid建设
In order to solve the problem of account with equipment many-to-many,User identity judgment is not the only problem,我们做了i_code方案(Dimensions are collectively referred to asuuid).
3.2 Construction of the attribution of
Attribution of system construction is a very important part of traffic in the construction of assets,Now an roughly three main points:Users touch of attribution system、Inbound channels attribution of guide system and internal attribution system.
Buy my side is responsible for the on-site attribution system,So the construction of key will say this aspect of attribution under the iterative.
用户的路径往往杂乱无序,如何从这杂乱无序的用户行为中抽丝剥茧地去追踪他的行为链路,归属他的交易,那就要说到页面导购体系,页面导购体系通过对用户行为的追踪,变现交易的归属,使我们对用户的判断从“盲人摸象”的状态到交易,路径的有迹可循.
业界归因方式较多,分权,平摊,末次,Time decay, etc,此处不对各种类型过多介绍.
结合严选电商业务特性,最终严选采用两种归因方式:
The last single point attribution
At the end of the time more attribution
原因如下:电商中有的页面天然作为用户必经页面,比如首页或者搜索页,这些页面作为用户在站内导流的入口页,这里引入的入口页概念,其实可以形象的理解成我们商场的大门,商场一般会有多个大门用来导流用户,入口页职能承接了流量在站内分发的作用,Here if the multipoint attribution way,那对于全站来说,大多用户都需要进过这个“大门”,多计导致的问题是销售会被过多的归因到这些所谓的入口页,So here for entry page we adopt the attribution of the last single point way,一是解决入口页的合理归因,二是也划清各个入口的交易,一举两得.
In order to achieve accurate attribution of,Guide in the attribution of construction mainly has the following three stages:
Each phase of the project construction is based on the embedded point infrastructure:
Guide in the early plan takes a lot of human intervention,For every new page,All need define the page level human,维护成本高,Depend on the time series behavior link,Accuracy depend entirely on the client's time;
The middle of the solutions to the legacy of the early days of,Rely on the client buried point on stepsequence,But this solution can only be used in offline data;
At the present stage scheme can be applied to the real-time data in,Single buried points to passthrough before10The behavior of the step link,At the time of access link you don't need to through correlation.
4. 数据应用
In the universal tool level,Traffic data is mainly used in:DSP(广告投放平台)、DMP(用户标签、用户画像)、A/B实验平台、User bus service、BI报表、数据产品(Contain the user behavior analysis system、Marketing data operation platform, and so on);
On the business level,应用场景包含:广告投放、拉新促活、智能营销、流量赛马、搜索推荐等.
5. 未来展望
Strict construction has a tendency to choose the number of traffic storehouse mature and steady,In the future mainly in three directions:Number of storehouse automated build、dws建设完善、Mart layer model upgrade.
Several positions of automated build is in progress:Has now completed isodsLayer will replace the originalmid层,odsOnly need to configure platform for simple operator can complete automatic fall positions.Mart layer of an automated build is in progress;
dwsThe construction of the not rich enough,Current market there will be some duplication processing layer of indicator logic exist,Step is now gradually sinks todws;
Mart layer model upgrade rely more onolapComputing power of the engine,后续会引入doris,With the help of its ability to materialized views ,Reduce the number of model mart layer.
边栏推荐
猜你喜欢
随机推荐
Getting Started with Chrome Plugin Development
亿流量大考(1):日增上亿数据,把MySQL直接搞宕机了...
spark中的cache和checkpoint
解决登录vCenter提示“当前网站安全证书不受信任“
Week5
模型训练前后显卡占用对比、多卡训练GPU占用分析【一文读懂】
关于NOI 2022福建省选及省队组成的公告
C语言实现通讯录功能(400行代码实现)
Charles capture shows
solution empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType),
el-table gets the data attribute of a row in the read data table
6.nodejs--promise、async-await
9月考,如何选择靠谱正规的培训机构?
Getting started with el-tabs (tab bar)
IEEE RAL投初稿
10 分钟彻底理解 Redis 的持久化和主从复制
一文搞懂什么是@Component和@Bean注解以及如何使用
安全狗云原生安全能力全面亮相全球数字经济大会暨ISC互联网安全大会
DIFM网络详解及复现
MySQL必知必会