当前位置:网站首页>Data warehouse data processing and data flow
Data warehouse data processing and data flow
2022-06-13 07:38:00 【_ seven seven】
List of articles
One 、 Data flow
Data operation layer (ODS) Data collection
ODS:Operation Data Store Data preparation area , Also known as paste source layer
The data table of the source system of the data warehouse is usually stored intact , This is called ODS layer , It is the source of subsequent data warehouse processing data .
ODS The meaning of , Is to keep the most complete data on site , It is convenient for troubleshooting in some special scenarios .
Click on 、 show 、 Order 、 viewing
Data warehouse (DW)
Data warehouse( It can be abbreviated as DW perhaps DWH) Data warehouse , It's when there are a lot of databases , In order to further Mining data resources 、 In order to make a decision And the , It is The whole set includes etl、 Dispatch 、 A complete theoretical system including modeling .
Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes .
DW Data tiering , From bottom to top for DWD,DWB,DWS.
DWD:data warehouse details Format preprocessing
DWD:data warehouse details Detail data layer , It is the isolation layer between business layer and data warehouse .
Mainly for ODS The data layer does some data cleaning and standardized operations .
Preprocessing refers to turning data into semi formatted or formatted data , For example, being HDFS Standard format on , We use the format of string to store .
Deal with some dirty data , For example, the field is missing 、 Format error 、 The statement 、 Null value , wait
- Data cleaning : Remove null value 、 Dirty data 、 Beyond the limit range
Unified pretreatment
DWB:data warehouse base
DWB:data warehouse base Data base layer , It's storing objective data , Generally used as an intermediate layer , It can be considered as the data layer of a large number of indicators .
DWS:data warehouse service
DWS:data warehouse service Data service layer , be based on DWB Basic data on , Integrate and analyze the service data layer of a certain subject domain , It's usually a wide watch . Used to provide subsequent business queries ,OLAP analysis , Data distribution, etc .
- User behavior , Mild polymerization
- Mainly for ODS/DWD Do some light summary of layer data .
Data service layer / application layer (ADS)
- ADS:applicationData Service Application data services , This layer mainly provides data products and data used for data analysis , Usually stored in ES、mysql And other systems for online systems .
Ensure data consistency , Ensure the speed of demand response
Two 、 Data warehouse data processing flow
Data collection
Log file
Data collection process , ZTE cdn Take the TV bill collection process as an example
DB Data source data collection
DB Data source acquisition , Common are Mysql、mongodb, The main use of datax As a data extraction tool , With CMS Take content data extraction as an example , Introduce DB Data collection process :
Data warehousing
Data warehousing process , Data collection node Flume After collecting the data , The data warehouse node Flume send out , Data warehouse node flume Write the data to the file system after receiving it , Write after the file is written hdfs, The accounting period data arrives hdfs The dispatching system will be notified , The dispatching system receives signaling to know that certain accounting period data has been warehoused .
Data cleaning ODS
Data cleaning operations , take hdfs File loading to ods The layer data , Complete the original data ID Unification 、 Outliers cleaning 、 Unified data format 、 Associate dimension tables to supplement dimension attributes .
Data is slightly aggregated DWS
DWS Layer data is slightly aggregated , Divide the subject area , Redundant dimension information as much as possible , For improving the speed of downstream calculation 、 Reduce the amount of operation data 、 Simplify business logic 、 Merging computing units and so on has great advantages . During polymerization , Fact tables are often associated with one or more dimension tables , The calculation process generates a lot of intermediate data , Clean up intermediate process data after calculation , Write the result data to the target table .
边栏推荐
- Redis learning journey --redis Conf details
- [log4j2 log framework] sensitive character filtering
- 21 | 面向流水线的指令设计(下):奔腾4是怎么失败的?
- 25 domestic and foreign literature databases
- redis-1. Install redis with pictures and texts
- 理財產品連續幾天收益都是零是怎麼回事?
- I always don't understand the high address and high position
- powerdisgner逆向生成oracle数据模型
- 8. process status and transition
- C#合并多个richtextbox内容时始终存在换行符的解决方法
猜你喜欢

在排序数组中查找元素的第一个和最后一个位置

Redis learning journey - persistence

24 | 冒险和预测(三):CPU里的“线程池”

redis-6. Redis master-slave replication, cap, Paxos, cluster sharding cluster 01

redis-3. Redis list, set, hash, sorted_ set、skiplist

Learning notes of balanced binary tree -- one two pandas

Redis learning journey - transaction
![[MySQL change master error] slave is not configured or failed to initialize properly](/img/1d/0d0442eba0eca3902a1ff5fd71c40c.jpg)
[MySQL change master error] slave is not configured or failed to initialize properly

Redis persistence -- AOF

Login registration
随机推荐
Why should two judgment expressions in if be written in two lines
nodejs文件模块fs
Precautions for passing parameters with byte array
9. process control
10. process communication
MySQL summary
The password does not take effect after redis is set
EF CORE执行SQL语句
JMeter encryption interface test
P1434 [show2002] skiing (memory search
Time field comparison time size in MySQL
oracle问题,字段里面的数据被逗号隔开,取逗号两边数据
MySQL does not recommend setting the column default value to null. Why on earth is this
25 | 冒险和预测(四):今天下雨了,明天还会下雨么?
Number of detection cycles "142857“
socket编程2:IO复用(select && poll && epoll)
关于oracle的函数。
Deploy RDS service
25 domestic and foreign literature databases
[introduction to flirting with girls on Valentine's day -- 63 lines of code to win]