当前位置:网站首页>On data management of data warehouse
On data management of data warehouse
2022-07-27 22:55:00 【Software testing network】
The vast majority of companies did not consider how to do data governance at the beginning of establishing data warehouse , Because the data department is just beginning to set up , There must be some “ Data driven ” The results of , And data governance can not well reflect these achievements . therefore , All in the process of business development , Gradually encountered some data problems , To consider doing it .

1. How to start data governance ?
Start with data management , Before starting data governance , We need to sort out the core assets of the warehouse first . From data acquisition to data processing , Then to the application of data ( Including warehouse report data 、 Indicator data ).
For business data sources , We need to know which business systems are the main data sources in the data warehouse , What are the key processes , Identify the data owner of key source data , Formulate data management specifications in combination with business .
2. The idea of data warehouse management
Data management of data warehouse can start with the following key points : The data directory is subdivided , Improve model reuse ,ETL Task optimization 、 Data quality monitoring .
Data directory division :
At the beginning of the design of many warehouses, there was no complete and clear planning , Slowly, the data directory will become chaotic , Finding a model can become cumbersome . that , At this time, a good directory design , It will help us clarify the structure of data warehouse , Quickly find and locate the model , For example, which floor is it on 、 Which business domain . When these are clearly displayed , The efficiency of data development will be improved rapidly .
Model reuse :
Offline warehouse teams are generally large , Last time I chatted with a friend of Kwai , There are hundreds of people in their offline warehouse . Therefore, model reuse must pay attention to ,, For example, focus on some fields with high reuse , It can be put into the middle tier for unified processing , That is to say, there is a large and wide table to provide reuse ; For example, reuse higher functions or logic , We develop a unified UDF function , Improve data processing performance .
Task optimization :
Whenever you apply for resources , Leaders will ask you for value and purpose , In fact, in addition to applying for additional resources , We can also optimize existing resources . Because in the process of data warehouse development , Everyone's technical level is mixed , The level of business understanding varies greatly , So at this time , Developed by everyone ETL The quality of the task must be different . therefore , We need to monitor the execution time of the task and the resources invoked from time to time , Carry out special optimization , For example, reduce the amount of input data , A lot of distinct Operation and use groupby Replacement, etc . Of course , In management, the efficiency of task execution can be regarded as an assessment item , Air those that fail to meet the standard .
Data quality :
Mainly data duplication 、 Null value 、 Monitoring of abnormal data , Be sure to configure rule verification . Last live broadcast , I told you , It's not that a successful mission is complete , Sometimes the cost of successful mission execution is even heavier . such as , A previous project , Will push business indicator SMS to the boss , Therefore, failure alarm monitoring is added to this task , But the content is not verified , Abnormal data caused by business , Cause the last indicator exception , The boss is very angry , The consequences are serious . therefore , We also need to consider monitoring the data indicators of some key businesses , Exception found , Timely termination of downstream tasks , Alarm . Of course , Data quality still has a lot of work to do , A previous article also said , You can read 《 Talk about ETL Data quality in 》
3. summary
To make a long story short , The value of data warehouse management is difficult to quantify , So many data teams are reluctant to do , But don't do , It will be “ itch ”.
If , You're starting to do warehouse management , Then we must be prepared for a long-term battle , For example, make some monthly asset management related meetings , Review , Regularly optimize inefficient tasks , This requires a set of management mechanism , The best way to implement the management mechanism should be bound with performance appraisal .
边栏推荐
猜你喜欢
C language explanation series -- understanding of functions (5) function recursion and iteration
![The wave of smart home is coming, how to make machines understand the world [there is information at the end]](/img/8a/533e7f1fc96c03e6f8140efdd17983.png)
The wave of smart home is coming, how to make machines understand the world [there is information at the end]

Jumpserver learning

Understanding and use of third-party library

一篇搞定Redis中的BigKey问题

Cy3 fluorescent labeling antibody / protein Kit (10~100mg labeling amount)

Quartus:Instantiation of ‘sdram_model_plus‘ failed. The design unit was not found.

细胞CLE19多肽荧光成像牛血清白蛋白荧光猝灭量子点的制备

The follow-up is coming. Whether it's OK without reference, let's make it clear to everyone at once!

Redis learning
随机推荐
jstack那些事
MeterSphere金融公司落地经验分享
Take byte offer in four rounds and answer the interview questions
C language explanation series -- understanding of functions (5) function recursion and iteration
2022/4/8 exam summary
2022/3/10 考试总结
Vocational school Panyun network security competition ----- exploration of hidden information
Six employees have been confirmed! Samsung closed the turtle tail mobile phone factory for the third time!
Iptables learning
[SQL] SQL optimization
MMU learning summary
Summary of exam on May 17, 2022
MediaTek and Samsung launched the world's first 8K TV that supports Wi Fi 6
C语言详解系列——函数的认识(5)函数递归与迭代
2022年五大网络管理趋势
10 years of technical career, those technical books that make me excited
Can uniswap integrate sudoswap to open a new prelude to NFT liquidity?
Understanding and use of third-party library
浅析云原生应用安全组织架构
Jstack stuff

