当前位置:网站首页>Introduction to data platform
Introduction to data platform
2022-06-24 08:40:00 【An unreliable programmer】
The goal is
- To provide stable and reliable data for various business platforms
- Provide a general data processing flow solution
- Generate some topic oriented 、 Integrated 、 Changing over time 、 But the information itself is a relatively stable data set
- Integrate historical data from multiple data sources for fine-grained processing 、 Multidimensional analysis
- To put it bluntly, it means reading data –> The production data –> Process of data delivery
Some of the concepts
ETL
ETL,Extraction-Transformation-Loading Abbreviation , The Chinese name is data extraction 、 Transform and load .ETL Responsible for the distribution 、 Data in heterogeneous data sources such as relational data 、 The plane data files are extracted to the temporary middle layer for cleaning 、 transformation 、 Integrate , Finally loaded into a data warehouse or data mart , Become online analytical processing 、 The foundation of data mining .ETL yes BI The most important part of the project , Usually ETL It's going to cost the whole project 1/3 Time for ,ETL The quality of the design is directly related to it BI The success or failure of the project .ETL It's also a long-term process , Only to find and solve problems constantly , Can we make ETL More efficient operation , Provide accurate data for the later development of the project .
Data warehouse
Data warehouse , English name is Data Warehouse, It can be abbreviated as DW or DWH. Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes . For businesses that need business intelligence , Provide guidance for business process improvement 、 Monitoring time 、 cost 、 Quality and control .
Problems to be solved at present
- A task scheduling monitoring platform is required to manage data reading 、 production 、 A series of scripts delivered , Task scheduling and monitoring .
- Need one API Interface platform to meet the ad hoc query of some data .
- A data synchronization platform is needed to synchronize the production data to each business end .
- A data inspection platform is needed to control the quality of the delivered data .
- Need one BI Data display platform to clearly display the data of various dimensions concerned by different roles .
Solution
- Use airflow To build ETL System , That is to compile and adjust the collection script of a series of data , Cleaning script , Data summary , polymerization , Pre calculate multi-dimensional indicators . Provide task monitoring and webUI Visual tasks depend on .
- Use dataX To complete data synchronization .
- Use lumen To do it API Interface platform .
- Data detection platform and BI The first phase of the exhibition will not be considered for the time being .
Technology stack
airflow(python)、lumen、postgreSQL、dataX、elasticsearch
In the later stage, based on the amount of data, we will do spark Distributed cluster offline computing ,hdfs Storage , Flow calculation 、hive etc.
Ideal state
Later log analysis can be accessed ETL System to analyze user behavior , User portrait , Improve the security of the system .
On performance daily report , weekly , The annual report and other data display and summary provide shorter time delay , Reduce the load of the business system .
Yes ERP The data are collected and analyzed to provide reference for the decision-making of the leadership .
Yes APP The logs are summarized and analyzed to provide some data facts for product design and operation .
At the same time, in the face of the rapid growth of data, big data analysis can also be handy .
“ Rome was not built in a day ”
边栏推荐
- Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
- 解析互联网广告术语 CPM、CPC、CPA、CPS、CPL、CPR 是什么意思
- (PKCS1) RSA 公私钥 pem 文件解析
- Glusterfs replacement failure brick
- How to mount a USB hard disk with NTFS file format under RHEL5 system
- 一文详解|增长那些事儿
- [xinliu-s6 new model +sa 3-star Xinghai] the new two-way server of the third generation chip was launched and the product was updated~
- [untitled]
- RuntimeError: Missing dependencies:XXX
- Live broadcast appointment: growth of Mengxin Product Manager
猜你喜欢

ZUCC_ Principles of compiling language and compilation_ Experiment 03 getting started with compiler

ZUCC_ Principles of compiling language and compilation_ Experiment 04 language and grammar

什么是SRE?一文详解SRE运维体系

Question bank and simulation examination for operation certificate of refrigeration and air conditioning equipment in 2022

ZUCC_ Principles of compiling language and compilation_ Experiment 01 language analysis and introduction

2021-03-16 comp9021 class 9 notes

【关于运维和网工的差别,一文说透】
![[explain the difference between operation and maintenance and network engineering]](/img/2b/945f468588e729336e2e973e777623.jpg)
[explain the difference between operation and maintenance and network engineering]

一文详解|增长那些事儿

成为IEEE学生会员
随机推荐
App Startup
JS merge multiple objects and remove duplicates
PHP代码加密的几种方案
Two methods of QT exporting PDF files
Qt 中发送自定义事件
2022 tea artist (intermediate) work license question bank and online simulation examination
Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
中国芯片独角兽公司
Take my brother to do the project. It's cold
A preliminary study of IO model
LabVIEW finds prime numbers in an array of n elements
orb slam build bug: undefined reference to symbol ‘_ZN5boost6system15system_categoryEv‘
"Adobe international certification" Photoshop software, about drawing tutorial?
Tencent conference API - get rest API & webhook application docking information
2021-06-25: a batch of strings consisting only of lowercase letters (a~z) are put
Promise usage scenarios
利用sonar做代码检查
Detailed explanation of etcd backup and recovery principle and actual record of stepping on the pit
Blue screen error UNMOUNTABLE boot volume of the solution
Cloudbase database migration scheme