当前位置:网站首页>Data warehouse project is never a technical project
Data warehouse project is never a technical project
2022-07-27 15:06:00 【ThoughtWorks China】
What is a data warehouse ?
We still have to start with the definition : The data warehouse is thematic (Subject Oriented)、 Integrated (Integrated)、 Relatively stable (Non-Volatile)、 Reflect historical changes (Time Variant) Data set for , Used to support management decisions .
there “ Support decision making ” It is often analysis oriented , It is necessary to be able to process data of business system in large quantities 、 Multi dimensional data exploration and analysis , So as to help the final business decision .
This article is my little understanding of the data warehouse project , It does not involve specific technical implementation .
But it has never been ( pure ) Technical projects
Many technical components are used in the data warehouse project , I believe many people can list the dishes by quoting their names , It sounds like a smart component 、 Sexy technology project . But in terms of weight , I don't think technology is the most important part . For data warehouse projects , What is more needed is a set of strategies , A set of combos , Not just technical excellence 、 Business understanding , The demand side is also needed 、 The cooperation of the business side in the overall structure and process .
Data warehouse construction should include these main processes :
- Business needs interview 、 Business architecture design ;
- Technology selection 、 Technical architecture design ;
- Top level strategic support for customers and various business parties 、 Cooperation of the employer ;
- Specific business needs analysis 、 Data modeling ;
- ETL Import data ;
- Report development 、 Data services 、 Data marts, etc .
Data warehouse project implementation does not immediately receive data at the beginning , Instead, we need to determine the overall business needs and complete the overall business architecture design through several rounds of business interviews in the early stage , And determine the top-level technology selection and technology architecture design according to the business architecture and specific customer technical status , Business parties involved in data warehouse 、 Demand side 、 Only after the technical party has synchronously determined and obtained the support of all parties can it be ready to really prepare to access data , That is to say 4~6 These steps .
and 4~6 It's a continuous process , Instead of waiting until all business analysis is over ETL Part of . The purpose is to quickly access 、 Quick results 、 Quick results , If you encounter problems, you can also quickly adjust , The more important purpose is to gain customers' trust . If the time is too long , Customers are likely to lose confidence because they can't see the effect . This logic is also similar to that in the story card “ Vertical resolution ”.
“ Where is the customer's yacht ?” Accompanying, :“ Where is the customer's gold ?”
Speaking of data warehouse , The metaphor used more often is “ Icebergs below sea level ” and “ Sleeping in the gold of the mine ”. If we just put the ore of multiple different business systems ( data ) Move in 、 Regular, regular , You can't find gold . If you spend a lot of manpower and material resources , Instead, I just worked as a porter , The whole project is “ Lose money ” Project . Because it does not generate business value ( gold ). At this time, I can't help asking myself :“ Where is the customer's gold ”?
I understand that business value is mainly distributed in these areas :
- Support operation , Aid decision making : All kinds of activities 、 Business growth depends on data to make decisions , At this time, the core index calculation 、 Align the business caliber 、 Multidimensional analysis is very important , Accurate and timely results can help customers make operational decisions .
- Data analysis : For user conversion 、 User behavior analysis and other scenarios , Data exploration 、 Interactive analysis 、 Data visualization and other support are very important .
- Business support : machine learning 、 Risk control 、 Data services 、 Recommendation system puts forward higher requirements for data warehouse .
It is not limited to the above areas , I think the main judgment condition is whether the output of the data warehouse provides value for the business system 、 Even the supporting role that can be realized directly . For example, the risk control and recommendation system can be measured from two dimensions: how much possible property losses have been prevented and how much order conversion rate has been increased “ gold ” The value of .
If we just say that we have more than one month xxx Reports , New access xxx Business systems , the xxx This business interview sounds very busy , But when you think about it, it doesn't produce much “ gold ”. Others say it can be measured ROI( Return on investment (ROI)= Annual profit / Total investment ) To quantify “ Gold content ”, In practice “ Annual profit ” Maybe it's OK to calculate , however “ Total investment ” Often difficult to measure , Because the underlying data 、 colony 、 Operation and maintenance are often shared with other business data , And the data flow of many processes is very long , This greatly increases the measurement “ Total investment ” The difficulty of , So that most data warehouses are difficult to accurately measure the investment . This doesn't mean that you don't have to think at all ROI 了 , I think even without accurate data , You can also use the way of estimation to roughly judge the general ROI How much is the , Instead of heading forward .
At the same time, the early steps are also very important , There is no early data handling 、 Modeling and other steps , these “ gold ” It becomes water without a source , Early data acquisition 、 Data cleaning 、 Data modeling and other steps determine whether we can find high-quality gold . It's more like “ Barrel theory ”, From business analysis 、 Data modeling 、 Data loading 、 Data cleaning 、 Data conversion to indicator calculation 、 Report development 、 Data analysis 、 If there is a short board in machine learning and other steps, it will lead to the output of other processes “ Gold content ” falling , Especially the early steps , If the previous steps are not done well , The follow-up is almost “garbage in, garbage out” 了 .
writing /Thoughtworks Zhang Zhihao
Link to the original text : What is a data warehouse -Thoughtworks Insight
边栏推荐
- [work] about technical architecture
- [Yunxiang book club issue 13] multimedia processing tool ffmpeg tool set
- Stock trading 4
- SkyWalking分布式系统应用程序性能监控工具-中
- telnet远程登录aaa模式详解【华为eNSP】
- 电子制造行业的数字化转型突破点在哪?精益制造是关键
- 事务_基本演示和事务_默认自动提交&手动提交
- LeetCode 783. 二叉搜索树节点最小距离 树/easy
- Lecture 4: Longest ascending substring
- Unityui aspect processing (induction and accumulation)
猜你喜欢

【ManageEngine】什么是SIEM

这年头谁还不会抓包,WireShark 抓包及常用协议分析送给你!

如果我们是那晚负责修复 B 站崩了的开发人员

LeetCode 190. 颠倒二进制位 位运算/easy

Passive income: return to the original and safe two ways to earn

LeetCode 74. 搜索二维矩阵 二分/medium

被动收入:回归原始且安全的两种赚取方法

Stm32f103c8t6 drives sh1106 1.3 "IIC OLED display under Arduino frame

Graphical SQL is too vivid

Research on Chinese idiom metaphorical knowledge recognition and relevance based on transfer learning and text enhancement
随机推荐
SkyWalking分布式系统应用程序性能监控工具-中
What you want most is the most comprehensive summary of C language knowledge. Don't hurry to learn
网络设备硬核技术内幕 路由器篇 13 从鹿由器到路由器(上)
Idea makes jar packages and introduces jar packages
数据仓库项目从来不是技术项目
Regular expressions: mailbox matching
谷歌团队推出新Transformer,优化全景分割方案|CVPR 2022
Nokia's patent business was hit for the first time, and Chinese enterprises are not so easy to knead
被动收入:回归原始且安全的两种赚取方法
什么是Tor?Tor浏览器更新有什么用?
【医疗行业】DICOM converter Tools
数据库使用psql及jdbc进行远程连接,不定时自动断开的解决办法
Txt replace line breaks with spaces or cancel line breaks
视觉系统设计实例(halcon-winform)-10.PLC通讯
网络设备硬核技术内幕 路由器篇 16 DPDK及其前传(一)
Finally, someone finished all the dynamic planning, linked list, binary tree and string required for the interview
Redis
What if win11 wallpaper turns black? The solution of win11 wallpaper blackening
Web页面table表格,实现快速筛选
Lesson 3: SPFA seeking the shortest path