当前位置:网站首页>Data warehouse data processing and data flow
Data warehouse data processing and data flow
2022-06-13 07:38:00 【_ seven seven】
List of articles
One 、 Data flow
Data operation layer (ODS) Data collection
ODS:Operation Data Store Data preparation area , Also known as paste source layer
The data table of the source system of the data warehouse is usually stored intact , This is called ODS layer , It is the source of subsequent data warehouse processing data .
ODS The meaning of , Is to keep the most complete data on site , It is convenient for troubleshooting in some special scenarios .
Click on 、 show 、 Order 、 viewing
Data warehouse (DW)
Data warehouse( It can be abbreviated as DW perhaps DWH) Data warehouse , It's when there are a lot of databases , In order to further Mining data resources 、 In order to make a decision And the , It is The whole set includes etl、 Dispatch 、 A complete theoretical system including modeling .
Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes .
DW Data tiering , From bottom to top for DWD,DWB,DWS.
DWD:data warehouse details Format preprocessing
DWD:data warehouse details Detail data layer , It is the isolation layer between business layer and data warehouse .
Mainly for ODS The data layer does some data cleaning and standardized operations .
Preprocessing refers to turning data into semi formatted or formatted data , For example, being HDFS Standard format on , We use the format of string to store .
Deal with some dirty data , For example, the field is missing 、 Format error 、 The statement 、 Null value , wait
- Data cleaning : Remove null value 、 Dirty data 、 Beyond the limit range
Unified pretreatment
DWB:data warehouse base
DWB:data warehouse base Data base layer , It's storing objective data , Generally used as an intermediate layer , It can be considered as the data layer of a large number of indicators .
DWS:data warehouse service
DWS:data warehouse service Data service layer , be based on DWB Basic data on , Integrate and analyze the service data layer of a certain subject domain , It's usually a wide watch . Used to provide subsequent business queries ,OLAP analysis , Data distribution, etc .
- User behavior , Mild polymerization
- Mainly for ODS/DWD Do some light summary of layer data .
Data service layer / application layer (ADS)
- ADS:applicationData Service Application data services , This layer mainly provides data products and data used for data analysis , Usually stored in ES、mysql And other systems for online systems .
Ensure data consistency , Ensure the speed of demand response
Two 、 Data warehouse data processing flow
Data collection
Log file
Data collection process , ZTE cdn Take the TV bill collection process as an example
DB Data source data collection
DB Data source acquisition , Common are Mysql、mongodb, The main use of datax As a data extraction tool , With CMS Take content data extraction as an example , Introduce DB Data collection process :
Data warehousing
Data warehousing process , Data collection node Flume After collecting the data , The data warehouse node Flume send out , Data warehouse node flume Write the data to the file system after receiving it , Write after the file is written hdfs, The accounting period data arrives hdfs The dispatching system will be notified , The dispatching system receives signaling to know that certain accounting period data has been warehoused .
Data cleaning ODS
Data cleaning operations , take hdfs File loading to ods The layer data , Complete the original data ID Unification 、 Outliers cleaning 、 Unified data format 、 Associate dimension tables to supplement dimension attributes .
Data is slightly aggregated DWS
DWS Layer data is slightly aggregated , Divide the subject area , Redundant dimension information as much as possible , For improving the speed of downstream calculation 、 Reduce the amount of operation data 、 Simplify business logic 、 Merging computing units and so on has great advantages . During polymerization , Fact tables are often associated with one or more dimension tables , The calculation process generates a lot of intermediate data , Clean up intermediate process data after calculation , Write the result data to the target table .
边栏推荐
猜你喜欢

Mui mixed development - when updating the download app, the system status bar displays the download progress

redis-6. Redis master-slave replication, cap, Paxos, cluster sharding cluster 01

Redis learning journey - transaction

AQS - detailed explanation of reentrantlock source code

19 | 建立数据通路(下):指令+运算=CPU

论文笔记: 多标签学习 BP-MLL

Redis Cluster - the bottom principle of building clusters

Tidb certification guide PCTA Pctp
![[log4j2 log framework] modify dump log file permissions](/img/30/3c42d1f77ce0edc5d538c41c72826d.jpg)
[log4j2 log framework] modify dump log file permissions

Redis learning journey - cache exceptions (CACHE penetration, cache avalanche, cache breakdown)
随机推荐
Redis master-slave replication - mentality detection mechanism
socket编程2:IO复用(select && poll && epoll)
Station B crazy God notes
Wechat applet - positioning, map display, route planning and navigation
领先企业的管理实践证明,企业可持续发展的核心是什么?
redis-0. Introduction to redis and NiO principle (random talk)
Data desensitization tool advance tool Datamask
RT thread simulator lvgl control: button button event
redis-4. Redis' message subscription, pipeline, transaction, modules, bloom filter, and cache LRU
Redis master-slave replication - the underlying principle of partial resynchronization
Functions about Oracle.
Time field comparison time size in MySQL
关于#数据库#的问题:PGADMIN4 编辑sql窗口问题
8. process status and transition
redis-7. Redis master-slave replication, cap, Paxos, cluster sharding cluster 02
Redis learning journey -- getting to know redis for the first time
Sharp weapon tcpdump
Hashtable source code analysis
oracle问题,字段里面的数据被逗号隔开,取逗号两边数据
MySQL row column conversion (updated version)