当前位置:网站首页>On data management of data warehouse
On data management of data warehouse
2022-07-27 22:55:00 【Software testing network】
The vast majority of companies did not consider how to do data governance at the beginning of establishing data warehouse , Because the data department is just beginning to set up , There must be some “ Data driven ” The results of , And data governance can not well reflect these achievements . therefore , All in the process of business development , Gradually encountered some data problems , To consider doing it .

1. How to start data governance ?
Start with data management , Before starting data governance , We need to sort out the core assets of the warehouse first . From data acquisition to data processing , Then to the application of data ( Including warehouse report data 、 Indicator data ).
For business data sources , We need to know which business systems are the main data sources in the data warehouse , What are the key processes , Identify the data owner of key source data , Formulate data management specifications in combination with business .
2. The idea of data warehouse management
Data management of data warehouse can start with the following key points : The data directory is subdivided , Improve model reuse ,ETL Task optimization 、 Data quality monitoring .
Data directory division :
At the beginning of the design of many warehouses, there was no complete and clear planning , Slowly, the data directory will become chaotic , Finding a model can become cumbersome . that , At this time, a good directory design , It will help us clarify the structure of data warehouse , Quickly find and locate the model , For example, which floor is it on 、 Which business domain . When these are clearly displayed , The efficiency of data development will be improved rapidly .
Model reuse :
Offline warehouse teams are generally large , Last time I chatted with a friend of Kwai , There are hundreds of people in their offline warehouse . Therefore, model reuse must pay attention to ,, For example, focus on some fields with high reuse , It can be put into the middle tier for unified processing , That is to say, there is a large and wide table to provide reuse ; For example, reuse higher functions or logic , We develop a unified UDF function , Improve data processing performance .
Task optimization :
Whenever you apply for resources , Leaders will ask you for value and purpose , In fact, in addition to applying for additional resources , We can also optimize existing resources . Because in the process of data warehouse development , Everyone's technical level is mixed , The level of business understanding varies greatly , So at this time , Developed by everyone ETL The quality of the task must be different . therefore , We need to monitor the execution time of the task and the resources invoked from time to time , Carry out special optimization , For example, reduce the amount of input data , A lot of distinct Operation and use groupby Replacement, etc . Of course , In management, the efficiency of task execution can be regarded as an assessment item , Air those that fail to meet the standard .
Data quality :
Mainly data duplication 、 Null value 、 Monitoring of abnormal data , Be sure to configure rule verification . Last live broadcast , I told you , It's not that a successful mission is complete , Sometimes the cost of successful mission execution is even heavier . such as , A previous project , Will push business indicator SMS to the boss , Therefore, failure alarm monitoring is added to this task , But the content is not verified , Abnormal data caused by business , Cause the last indicator exception , The boss is very angry , The consequences are serious . therefore , We also need to consider monitoring the data indicators of some key businesses , Exception found , Timely termination of downstream tasks , Alarm . Of course , Data quality still has a lot of work to do , A previous article also said , You can read 《 Talk about ETL Data quality in 》
3. summary
To make a long story short , The value of data warehouse management is difficult to quantify , So many data teams are reluctant to do , But don't do , It will be “ itch ”.
If , You're starting to do warehouse management , Then we must be prepared for a long-term battle , For example, make some monthly asset management related meetings , Review , Regularly optimize inefficient tasks , This requires a set of management mechanism , The best way to implement the management mechanism should be bound with performance appraisal .
边栏推荐
- Jeninkins离线部署
- 2022年软件开发的趋势
- Quartus:Instantiation of ‘sdram_ model_ plus‘ failed. The design unit was not found.
- Purple light FPGA solves the mask problem! Boost the overall speed-up of mask production
- [noi2018] bubble sort (combination + Cartland number +dp+ tree array)
- The ordinary way of chasing source code
- Fluorescence imaging of cle19 polypeptide in cells preparation of fluorescence quenching quantum dots of bovine serum albumin
- 三星存储工厂又发生火灾!
- 紫光FPGA解决口罩难题!助力口罩生产全面提速
- 2022/3/22考试总结
猜你喜欢

可能导致索引失效的原因

组件的传参

带你掌握 Makefile 分析

Jumpserver learning
C语言详解系列——函数的认识(5)函数递归与迭代

SparkSQL的UDF及分析案例,220726,,

It's time to say goodbye gracefully to nullpointexception

Possible causes of index failure

Direct insertion sort of seven sorts

If there is no reference ground at all, guess if you can control the impedance?
随机推荐
Exam summary on May 13, 2022
PyQt5快速开发与实战 4.10 窗口绘图类控件
2022/6/9 考试总结
视频人体行为检测
Brief explanation of noi 2018
MySQL的B+Tree索引到底是咋回事?聚簇索引到底是如何长高的?
The ordinary way of chasing source code
2022/3/11 exam summary
Possible causes of index failure
Redis learning
Here comes Gree mask! Kn95 mask only costs 5.5 yuan!
Another fire broke out in Samsung storage factory!
带你掌握 Makefile 分析
2022/5/13 考试总结
2022 review plan of joint provincial election
RN search highlight
Jumpserver learning
Memoirs of three years in junior high school
Two dimensional code generation based on MCU and two dimensional code display on ink screen
SSM整合流程

