当前位置:网站首页>Introduction to data platform
Introduction to data platform
2022-06-24 08:40:00 【An unreliable programmer】
The goal is
- To provide stable and reliable data for various business platforms
- Provide a general data processing flow solution
- Generate some topic oriented 、 Integrated 、 Changing over time 、 But the information itself is a relatively stable data set
- Integrate historical data from multiple data sources for fine-grained processing 、 Multidimensional analysis
- To put it bluntly, it means reading data –> The production data –> Process of data delivery
Some of the concepts
ETL
ETL,Extraction-Transformation-Loading Abbreviation , The Chinese name is data extraction 、 Transform and load .ETL Responsible for the distribution 、 Data in heterogeneous data sources such as relational data 、 The plane data files are extracted to the temporary middle layer for cleaning 、 transformation 、 Integrate , Finally loaded into a data warehouse or data mart , Become online analytical processing 、 The foundation of data mining .ETL yes BI The most important part of the project , Usually ETL It's going to cost the whole project 1/3 Time for ,ETL The quality of the design is directly related to it BI The success or failure of the project .ETL It's also a long-term process , Only to find and solve problems constantly , Can we make ETL More efficient operation , Provide accurate data for the later development of the project .
Data warehouse
Data warehouse , English name is Data Warehouse, It can be abbreviated as DW or DWH. Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes . For businesses that need business intelligence , Provide guidance for business process improvement 、 Monitoring time 、 cost 、 Quality and control .
Problems to be solved at present
- A task scheduling monitoring platform is required to manage data reading 、 production 、 A series of scripts delivered , Task scheduling and monitoring .
- Need one API Interface platform to meet the ad hoc query of some data .
- A data synchronization platform is needed to synchronize the production data to each business end .
- A data inspection platform is needed to control the quality of the delivered data .
- Need one BI Data display platform to clearly display the data of various dimensions concerned by different roles .
Solution
- Use airflow To build ETL System , That is to compile and adjust the collection script of a series of data , Cleaning script , Data summary , polymerization , Pre calculate multi-dimensional indicators . Provide task monitoring and webUI Visual tasks depend on .
- Use dataX To complete data synchronization .
- Use lumen To do it API Interface platform .
- Data detection platform and BI The first phase of the exhibition will not be considered for the time being .
Technology stack
airflow(python)、lumen、postgreSQL、dataX、elasticsearch
In the later stage, based on the amount of data, we will do spark Distributed cluster offline computing ,hdfs Storage , Flow calculation 、hive etc.
Ideal state
Later log analysis can be accessed ETL System to analyze user behavior , User portrait , Improve the security of the system .
On performance daily report , weekly , The annual report and other data display and summary provide shorter time delay , Reduce the load of the business system .
Yes ERP The data are collected and analyzed to provide reference for the decision-making of the leadership .
Yes APP The logs are summarized and analyzed to provide some data facts for product design and operation .
At the same time, in the face of the rapid growth of data, big data analysis can also be handy .
“ Rome was not built in a day ”
边栏推荐
- ZUCC_ Principles of compiling language and compilation_ Big job
- 图片工具
- [untitled]
- Question bank and simulation examination for operation certificate of refrigeration and air conditioning equipment in 2022
- ZUCC_ Principles of compiling language and compilation_ Experiment 03 getting started with compiler
- os. path. Pits encountered during the use of join()
- 5分钟,客服聊天处理技巧,炉火纯青
- QT writing security video monitoring system 36 onvif continuous movement
- rsync做文件备份
- 2021-03-09 comp9021 class 7 Notes
猜你喜欢

Longhorn installation and use

5分钟,客服聊天处理技巧,炉火纯青

A preliminary study of IO model

ZUCC_编译语言原理与编译_实验02 FSharp OCaml语言

独立站运营中如何提升客户留存率?客户细分很重要!

How to improve the customer retention rate in the operation of independent stations? Customer segmentation is very important!

【无标题】

OpenCV to realize the basic transformation of image

JUC personal simple notes

一文详解|增长那些事儿
随机推荐
Building a static website with eleventy
App Startup
获取屏幕宽高工具类
Qmenu response in pyqt
Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析
Common misconceptions in Tencent conference API - signature error_ code 200003
12-- merge two ordered linked lists
Permission model DAC ACL RBAC ABAC
js中通过key查找和更新对象中指定值的方法
ZUCC_ Principles of compiling language and compilation_ Big job
Win10 cloud, add Vietnamese
api平台通用签名机制
Common date formatter and QT method for obtaining current time
Three ways to uninstall Symantec Endpoint Protection Symantec
Take my brother to do the project. It's cold
leetcode 1642. Furthest Building You Can Reach(能到达的最远的建筑)
Video Fusion communication has become an inevitable trend of emergency command communication. How to realize it based on easyrtc?
Three categories of financial assets under the new standards: AMC, fvoci and FVTPL
[micro services ~nacos] Nacos service providers and service consumers