当前位置:网站首页>Will your company choose to develop data center?
Will your company choose to develop data center?
2022-06-10 21:15:00 【Young promising 2025】
Recently, I have been writing articles about the data center , I find everyone is very interested , Today, we will solve your doubts from the construction policy of data center .
Horizontal planning refers to the initial stage of data stage planning , It is necessary to open up all business departments of the enterprise , Breaking data silos , In fact, it is the stage of building data warehouse .
The construction of the data middle platform involves the construction of the big data platform 、 Data warehouse construction 、 Model algorithm 、 Data governance 、 Data services and other projects , It can't be done overnight , We need to sort out the business scenarios , Find a business scenario to see what services they need , Build up the service capacity of the data center , Then iterate in turn , Crush one by one .

One 、 Master plan

Data integration
First, we need to confirm what data the platform accesses , Confirm whether the data access method is real-time access or offline extraction . Offline extraction is full extraction or incremental extraction . Frequency of decimation: Daily decimation or hourly decimation .
Real time access can use kafka Write data to in real time HDFS On the cluster .

Offline data can be used Sqoop Extract relational database to HDFS.

Model building
Model construction is an important part of the data center , It can be said that the success or failure of the data center lies in the quality of the model construction . The model is divided into the analysis model of data warehouse and some general algorithm models .
The analysis model
Data access to data warehouse , We need to process the data , According to our planned business domain , Summarize and aggregate the data of each business , Form our data model .
This involves the construction of data warehouse , In this simple way .

This is a simple data hierarchy . Raw data ODS, After cleaning, it becomes the detailed data in the data warehouse DWS And dimensional data DIM, The detailed data of each business is associated with the business domain and dimension data to form our data model DW, Different DW After aggregation, each business indicator data is generated APP layer .

In the construction of data warehouse, we declare the business granularity , Granularity can accurately indicate the business meaning . Also determine the dimensions , User dimension or commodity dimension , Finally, our master data , That is, the basis of model data .
Algorithm model
We will form some general algorithms in the process of business development , It can be a encapsulated random forest 、 Regression and other general algorithms , It can also be our business algorithm , For example, user product recommendation algorithm, etc . By summarizing these algorithms , Form our algorithm model , It can be directly called by various businesses .

ETL platform
When developing data models , We must have a unified platform , Can be like an assembly line , Process the data step by step into a data model . This involves data extraction 、 Data aggregation 、 Job scheduling, etc .

Different from business research and development , Generally, data R & D seldom writes detailed requirements and documents , It is usually a simple communication with business personnel , But slowly you will find that the tasks you have completed will change again and again . To avoid this phenomenon , We can sort out a requirement template according to our actual business . This includes data source fields , Data caliber , Task scheduling cycle , Field mapping.
Data assets
In layman's terms , The model we developed in the data warehouse is the data asset , Data assets need standardized control and governance .
The most basic work of asset management is to manage metadata , Metadata contains the caliber of data , Definition of data model , Blood relationship between models, etc , See the previous metadata article for details 《 Data Warehouse Metadata 》. The metadata and data model are managed uniformly and orderly to form the data assets of the enterprise .
Data asset governance is not controlled afterwards , In the process of building the model, we need to form a set of our own data warehouse development specifications for management .
Data services
It is said that , The smell of wine is afraid of the deep lane . After we have completed the data assets , To sell our assets , For more departments , This is also the original intention of data center construction . Therefore, it provides a set of data service capabilities , It is a very important task to unify and connect with the outside world .

Data service standards : Data structure standardization 、 Online query is real-time 、 Data development Visualization .
Data structure standardization
For data interaction , We need to provide a unified interface view , You can query the data 、 Authority control .
Online query is real-time
Call for each business , We need to provide real-time data results with unified indicator level data caliber .
Data development Visualization
Provide a visual unified management page for data interface , Developers manage through visualization API, Reduce the difficulty of interface understanding , Easy to maintain .

Two 、 Data Lake engine
Actually speaking of the data center , The data architecture of the whole enterprise must be involved , But because the content is too much , I can only pick and choose , Today, let's talk about a relatively new and important concept , Data Lake engine .
The data Lake engine is between the management data system 、 Between analysis visualization and data processing tools . The data Lake engine does not move data from a data source to a single repository , It's a tool deployed on existing data sources and data consumers ( Such as BI Tools and data science platform ) above .

Tools used by millions of data consumers , Such as BI Tools 、 Data science platforms and dashboard tools , Assume that all data exists in a high-performance relational database , When data is in multiple systems , Or in non relational storage ( Such as ADLS、Amazon S3、Hadoop and NoSQL database ) in , The capabilities of these tools will be affected .
BI Analysis tools , Such as FineBI/Tableau/Python And machine learning models , Is to live for data in a single 、 High performance relational database environment .
However , Most organizations use different data formats and different technologies to manage their data in a variety of solutions . Most organizations now use one or more non relational data stores , Like cloud storage ( Such as S3、ADLS)、Hadoop and NoSQL database ( Such as Elasticsearch、Cassandra).
When data is stored in an independent high-performance relational database ,BI Tools 、 Data science systems and machine learning models can make good use of this data . However , As we said above , The data doesn't exist in one place .
therefore , Its task is to move this data into a relational environment , Create Cube , And generate dedicated views for different analysis tools . The data Lake engine simplifies these challenges , Allow companies to store data anywhere .
3、 ... and 、 summary
For large group enterprises , The Zhongtai methodology is very practical , Broke the data island of each section of the group , A unified data service capability has been formed .
But slowly many people put forward , For SMEs , Is the Zhongtai methodology too cumbersome , It's a burden for them , What small and medium-sized enterprises need may be faster iterative data services .
So about the construction of Zhongtai , What do you think ? Will your company choose Zhongtai ? Finally, I'd like to introduce a Commercial grade saas Open source project of the system , Interested partners can study !
边栏推荐
- pdf.js-----js解析pdf文件實現預覽,並獲取pdf文件中的內容(數組形式)
- node(express)实现增删改查、分页等接口
- Software definition boundary (SDP)
- Identity and access management (IAM)
- 游戏兼容性测试(通用方案)
- Talk about server performance optimization ~ (recommended Collection)
- Is Zhongyan futures a regular platform in China? Is it safe to open an account? Want to open a futures account
- Error code 1129, state HY000, host 'xxx' is blocked because of many connection errors
- 电子招标采购商城系统:优化传统采购业务,提速企业数字化升级
- App test case
猜你喜欢

分布式服务理论基础

Canvas advanced functions (Part 1)

揭秘:春晚微信红包,是如何抗住 100 亿次请求的?

聊聊服务器性能优化~(建议收藏)

Attack and defense drill | network security "whistleblower": security monitoring

1、 Vulkan develops theoretical fundamentals

Uncover secrets: how can wechat red envelopes in the Spring Festival Gala resist 10billion requests?

Kcon 2022 topic public selection is hot! Don't miss the topic of "favorite"

Connexion MySQL errorcode 1129, State hy000, Host 'xxx' is Blocked because of many Connection Errors

Can you still have a wonderful life if you are laid off at the age of 35?
随机推荐
In MySQL basics, MySQL adds an automatically added primary key (or any field) to an existing table
P5723 [deep base 4. example 13] prime number pocket
「运维有小邓」自助帐户解锁工具
Lengsuanling, a 30-year tortuous history of IPO of a domestic brand
蛮力法/邻接表 深度优先 有向带权图 无向带权图
在YUV图像上根据背景色实现OSD反色
Heap sorting and hardening heap code for memory
蛮力法/u到v是否存在简单路径
游戏兼容性测试(通用方案)
JS basic and frequently asked interview questions [] = =! [] result is true, [] = = [] result is false detailed explanation
pdf. Js----- JS parse PDF file to realize preview, and obtain the contents in PDF file (in array form)
Construction of RT thread smart win10 64 bit compilation environment
在手机上买基金安全吗?会不会被吞本金?
H.264中NALU、RBSP、SODB的关系
Power set V4 recursion of brute force method /1~n
Attack and defense drill | network security "whistleblower": security monitoring
torch. nn. Simple understanding of parameter / / to be continued. Let me understand this paragraph
KCon 2022 议题大众评选火热进行中!不要错过“心仪”的议题哦~
Monitoring is easy to create a "quasi ecological" pattern and empower Xinchuang to "replace"
Arduino中Serial.print()与Serial.write()函数的区别,以及串口通信中十六进制与字符串的收发格式问题和转换过程详解