当前位置:网站首页>Will your company choose to develop data center?
Will your company choose to develop data center?
2022-06-11 02:42:00 【Young Youwei 2022】
Recently, I have been writing articles about the data center , I find everyone is very interested , Today, we will solve your doubts from the construction policy of data center .
Horizontal planning refers to the initial stage of data stage planning , It is necessary to open up all business departments of the enterprise , Breaking data silos , In fact, it is the stage of building data warehouse .
The construction of the data middle platform involves the construction of the big data platform 、 Data warehouse construction 、 Model algorithm 、 Data governance 、 Data services and other projects , It can't be done overnight , We need to sort out the business scenarios , Find a business scenario to see what services they need , Build up the service capacity of the data center , Then iterate in turn , Crush one by one .

One 、 Master plan

Data integration
First, we need to confirm what data the platform accesses , Confirm whether the data access method is real-time access or offline extraction . Offline extraction is full extraction or incremental extraction . Frequency of decimation: Daily decimation or hourly decimation .
Real time access can use kafka Write data to in real time HDFS On the cluster .

Offline data can be used Sqoop Extract relational database to HDFS.

Model building
Model construction is an important part of the data center , It can be said that the success or failure of the data center lies in the quality of the model construction . The model is divided into the analysis model of data warehouse and some general algorithm models .
The analysis model
Data access to data warehouse , We need to process the data , According to our planned business domain , Summarize and aggregate the data of each business , Form our data model .
This involves the construction of data warehouse , In this simple way .

This is a simple data hierarchy . Raw data ODS, After cleaning, it becomes the detailed data in the data warehouse DWS And dimensional data DIM, The detailed data of each business is associated with the business domain and dimension data to form our data model DW, Different DW After aggregation, each business indicator data is generated APP layer .

In the construction of data warehouse, we declare the business granularity , Granularity can accurately indicate the business meaning . Also determine the dimensions , User dimension or commodity dimension , Finally, our master data , That is, the basis of model data .
Algorithm model
We will form some general algorithms in the process of business development , It can be a encapsulated random forest 、 Regression and other general algorithms , It can also be our business algorithm , For example, user product recommendation algorithm, etc . By summarizing these algorithms , Form our algorithm model , It can be directly called by various businesses .

ETL platform
When developing data models , We must have a unified platform , Can be like an assembly line , Process the data step by step into a data model . This involves data extraction 、 Data aggregation 、 Job scheduling, etc .

Different from business research and development , Generally, data R & D seldom writes detailed requirements and documents , It is usually a simple communication with business personnel , But slowly you will find that the tasks you have completed will change again and again . To avoid this phenomenon , We can sort out a requirement template according to our actual business . This includes data source fields , Data caliber , Task scheduling cycle , Field mapping.
Data assets
In layman's terms , The model we developed in the data warehouse is the data asset , Data assets need standardized control and governance .
The most basic work of asset management is to manage metadata , Metadata contains the caliber of data , Definition of data model , Blood relationship between models, etc , See the previous metadata article for details 《 Data Warehouse Metadata 》. The metadata and data model are managed uniformly and orderly to form the data assets of the enterprise .
Data asset governance is not controlled afterwards , In the process of building the model, we need to form a set of our own data warehouse development specifications for management .
Data services
It is said that , The smell of wine is afraid of the deep lane . After we have completed the data assets , To sell our assets , For more departments , This is also the original intention of data center construction . Therefore, it provides a set of data service capabilities , It is a very important task to unify and connect with the outside world .

Data service standards : Data structure standardization 、 Online query is real-time 、 Data development Visualization .
Data structure standardization
For data interaction , We need to provide a unified interface view , You can query the data 、 Authority control .
Online query is real-time
Call for each business , We need to provide real-time data results with unified indicator level data caliber .
Data development Visualization
Provide a visual unified management page for data interface , Developers manage through visualization API, Reduce the difficulty of interface understanding , Easy to maintain .

Two 、 Data Lake engine
Actually speaking of the data center , The data architecture of the whole enterprise must be involved , But because the content is too much , I can only pick and choose , Today, let's talk about a relatively new and important concept , Data Lake engine .
The data Lake engine is between the management data system 、 Between analysis visualization and data processing tools . The data Lake engine does not move data from a data source to a single repository , It's a tool deployed on existing data sources and data consumers ( Such as BI Tools and data science platform ) above .

Tools used by millions of data consumers , Such as BI Tools 、 Data science platforms and dashboard tools , Assume that all data exists in a high-performance relational database , When data is in multiple systems , Or in non relational storage ( Such as ADLS、Amazon S3、Hadoop and NoSQL database ) in , The capabilities of these tools will be affected .
BI Analysis tools , Such as FineBI/Tableau/Python And machine learning models , Is to live for data in a single 、 High performance relational database environment .
However , Most organizations use different data formats and different technologies to manage their data in a variety of solutions . Most organizations now use one or more non relational data stores , Like cloud storage ( Such as S3、ADLS)、Hadoop and NoSQL database ( Such as Elasticsearch、Cassandra).
When data is stored in an independent high-performance relational database ,BI Tools 、 Data science systems and machine learning models can make good use of this data . However , As we said above , The data doesn't exist in one place .
therefore , Its task is to move this data into a relational environment , Create Cube , And generate dedicated views for different analysis tools . The data Lake engine simplifies these challenges , Allow companies to store data anywhere .
3、 ... and 、 summary
For large group enterprises , The Zhongtai methodology is very practical , Broke the data island of each section of the group , A unified data service capability has been formed .
But slowly many people put forward , For SMEs , Is the Zhongtai methodology too cumbersome , It's a burden for them , What small and medium-sized enterprises need may be faster iterative data services .
So about the construction of Zhongtai , What do you think ? Will your company choose Zhongtai ? Finally, I would like to recommend a saas Open source project of the system , Interested partners can study !
边栏推荐
- Kotlin apply method
- Limiting visibility of symbols when linking shared libraries
- P4338 [zjoi2018] history (tree section) (violence)
- 【面试题 17.04. 消失的数字】
- MOFs, metal organic framework materials of folic acid ligands, are loaded with small molecule drugs such as 5-fluorouracil, sidabelamine, taxol, doxorubicin, daunorubicin, ibuprofen, camptothecin, cur
- app 测试 常用 adb 命令集合
- Jetpack compose box control
- How to use phpMyAdmin to optimize MySQL database
- APP测试_测试点总结
- Tests logiciels vocabulaire commun anglais
猜你喜欢

【冒泡排序的实现】

One line of code solves the problem that the time to fetch datetime from MySQL database is less than eight hours

Introduction to the functions of today's headline search webmaster platform (portal)

Multilevel mesoporous organometallic framework material zif-8 loaded with lactic acid oxidase (LOD) / ferric oxide (Fe304) / doxorubicin / insulin /cas9 protein / metronidazole / emodin methyl ether

AOSP ~ modify WebView default implementation

Link list of high frequency written interview question brushing summary (distribution explanation & code annotation)

Jetpack Compose Scaffold和BottomAppBar(底部导航)

Use of CIN and cout

App test_ Summary of test points

Project load failed
随机推荐
A数位dp
【189. 轮转数组】
扁平数据转tree与tree数据扁平化
Cyclodextrin metal organic framework( β- Cd-mof) loaded with dimercaptosuccinic acid / emodin / quercetin / sucralose / diflunisal / omeprazole (OME)
Problèmes de classe d'outils JDBC
To view the data in redis, in addition to the command line and client, you have a third option
Unity animator rewind
APP测试_测试点总结
Metal organic framework materials (fe-mil-53, mg-mof-74, ti-kumof-1, fe-mil-100, fe-mil-101) supported on isoflurane / methotrexate / doxorubicin (DOX) / paclitaxel / ibuprofen / camptothecin
Net core Tianma XingKong series - Interface Implementation for dependency injection and mutual conversion of database tables and C entity classes
Epoll principle and Application & ET mode and lt mode
[AI weekly] AI and freeze electron microscopy reveal the structure of "atomic level" NPC; Tsinghua and Shangtang proposed the "SIM" method, which takes into account semantic alignment and spatial reso
20220610 星期五
Why is the trend chart of precious metal silver strong?
List 过滤、排序、校验等处理方法
When the interviewer opens his mouth, he comes to compose. Is this the case now?
Kotlin let方法
Flat data to tree and tree data flattening
SQL | calculate sum
Metal organic framework MOF Al (Diba), MOF Zr (Diba), MOF Fe (Diba) loaded with curcumin / carboxybenzylpenicillin /mtx methotrexate / paclitaxel ptx/ DOX / cisplatin cddp/cpt camptothecin and other d