当前位置：网站首页>On data management of data warehouse

On data management of data warehouse

2022-07-27 22:55:00 【Software testing network】

The vast majority of companies did not consider how to do data governance at the beginning of establishing data warehouse , Because the data department is just beginning to set up , There must be some “ Data driven ” The results of , And data governance can not well reflect these achievements . therefore , All in the process of business development , Gradually encountered some data problems , To consider doing it .

1. How to start data governance ？

Start with data management , Before starting data governance , We need to sort out the core assets of the warehouse first . From data acquisition to data processing , Then to the application of data （ Including warehouse report data 、 Indicator data ）.

For business data sources , We need to know which business systems are the main data sources in the data warehouse , What are the key processes , Identify the data owner of key source data , Formulate data management specifications in combination with business .

2. The idea of data warehouse management

Data management of data warehouse can start with the following key points ： The data directory is subdivided , Improve model reuse ,ETL Task optimization 、 Data quality monitoring .

Data directory division ：

At the beginning of the design of many warehouses, there was no complete and clear planning , Slowly, the data directory will become chaotic , Finding a model can become cumbersome . that , At this time, a good directory design , It will help us clarify the structure of data warehouse , Quickly find and locate the model , For example, which floor is it on 、 Which business domain . When these are clearly displayed , The efficiency of data development will be improved rapidly .

Model reuse ：

Offline warehouse teams are generally large , Last time I chatted with a friend of Kwai , There are hundreds of people in their offline warehouse . Therefore, model reuse must pay attention to ,, For example, focus on some fields with high reuse , It can be put into the middle tier for unified processing , That is to say, there is a large and wide table to provide reuse ; For example, reuse higher functions or logic , We develop a unified UDF function , Improve data processing performance .

Task optimization ：

Whenever you apply for resources , Leaders will ask you for value and purpose , In fact, in addition to applying for additional resources , We can also optimize existing resources . Because in the process of data warehouse development , Everyone's technical level is mixed , The level of business understanding varies greatly , So at this time , Developed by everyone ETL The quality of the task must be different . therefore , We need to monitor the execution time of the task and the resources invoked from time to time , Carry out special optimization , For example, reduce the amount of input data , A lot of distinct Operation and use groupby Replacement, etc . Of course , In management, the efficiency of task execution can be regarded as an assessment item , Air those that fail to meet the standard .

Data quality ：

Mainly data duplication 、 Null value 、 Monitoring of abnormal data , Be sure to configure rule verification . Last live broadcast , I told you , It's not that a successful mission is complete , Sometimes the cost of successful mission execution is even heavier . such as , A previous project , Will push business indicator SMS to the boss , Therefore, failure alarm monitoring is added to this task , But the content is not verified , Abnormal data caused by business , Cause the last indicator exception , The boss is very angry , The consequences are serious . therefore , We also need to consider monitoring the data indicators of some key businesses , Exception found , Timely termination of downstream tasks , Alarm . Of course , Data quality still has a lot of work to do , A previous article also said , You can read 《 Talk about ETL Data quality in 》

3. summary

To make a long story short , The value of data warehouse management is difficult to quantify , So many data teams are reluctant to do , But don't do , It will be “ itch ”.

If , You're starting to do warehouse management , Then we must be prepared for a long-term battle , For example, make some monthly asset management related meetings , Review , Regularly optimize inefficient tasks , This requires a set of management mechanism , The best way to implement the management mechanism should be bound with performance appraisal .

原网站

版权声明
本文为[Software testing network]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207272000217619.html