当前位置:网站首页>It is enough to read this article about ETL. Three minutes will let you understand what ETL is
It is enough to read this article about ETL. Three minutes will let you understand what ETL is
2022-06-24 08:44:00 【Bi visualization of Parker data】
Today, let's talk about a technical problem , of ETL Development of . For those who have business intelligence BI Friends of development ,ETL No stranger , As long as the data extraction of the data source is involved 、 Development of data calculation and processing process , All are ETL.
ETL What is it?
ETL There are three stages , Namely Extraction extract ,Transformation transformation ,Loading load . Extract data from different data sources EXTRACTION , According to certain data processing rules, data processing and format conversion TRASFORMATION, The output of the final processing to the target data table may also be a file, etc , This is LOADING.

ETL - Parker data business intelligence BI Visual analysis platform
More generally speaking ,ETL The process of cooking is the same as that of daily cooking , You need to buy good food at the stalls of the market , Pick the vegetables when you buy them , Wash , Cut everything and finally put it into the pot to fry and bring it to the table . Each stall in the vegetable market is the data source , A good dish is the final output , All the processes in the middle are like picking vegetables 、 Wash the dishes 、 Chopping vegetables 、 Cooking is conversion .
ETL How to achieve
In development , Most of the time it will pass ETL Tools to achieve , For example, common ones like KETTLE、PENTAHO、IBM DATASTAGE、INFORNAICA、 Microsoft SQL SERVER Inside SSIS wait , In combination with the basic SQL To achieve the whole ETL The process .
Some of them develop their own programs , Then control some data processing scripts to run in batches , It's basically program plus SQL Realization .

ETL - Parker data business intelligence BI Visual analysis platform
Which way is better , It also depends on the usage scenarios and the developers' more skillful use of that method . I think most software developers come from , When encountering data projects, I prefer to use program to control running batch , This is the natural continuation of procedural thinking . pure business intelligence BI Most developers naturally choose mature ETL Tools to develop , Of course, there are also people who write program scripts as soon as they come up , This kind of business intelligence BI Developers' masters are basically transferred by programmers .
The advantage of using the program is that it is adaptable , High scalability , It can be integrated or disassembled into any program processing process , Sometimes it is more efficient to use program development . The difficulty lies in having certain technical requirements for maintenance personnel , Experience transfer and replicability are not enough .

ETL - Parker data business intelligence BI Visual analysis platform
use ETL The benefits of tools , The first is the whole ETL The development process is visualized , In particular, it can be clearly managed in the hierarchical design of data processing process . The second is when linking to different data sources , Various data sources 、 The database link protocol has been built in , It can be configured directly , There is no need to write a program to realize . Thirdly, various conversion controls can be used by dragging and dropping , Play a simplified part instead of SQL Development of , There is no need to write code to implement . The fourth is to be able to design all kinds of ETL Scheduling rules , Highly configurable , There is no need to write code to realize this .
So in most common projects , Use... On projects ETL There will be more standard component development .
ETL What is the design concept
ETL Logically, it can be generally divided into two layers , Control flow and data flow , That's a lot ETL The idea of tool design , Different ETL Tools may be called differently .
Control flow is to control the sequence of each data flow and data flow processing , A control flow can contain multiple data flows . For example, in the process of data warehouse development , The first layer of processing is ODS Layer or staging Layer development , The second level is DIMENSION Development of dimension layer , The next few floors are DW In fact 、DM Development of data mart layer . adopt ETL The scheduling management can make these layers connected to form a complete data processing process .

ETL - Parker data business intelligence BI Visual analysis platform
Data flow is a specific data conversion process from source data to target data table , So there is also a ETL Tools call data flows transformations . There are three main links in the development and design of data flow , Link to target data table , These two go directly through ETL Control configuration is OK . Intermediate conversion link , There may be many choices at this time , transfer SQL sentence 、 stored procedure , Or use ETL Control to implement .
Some projects are used to ETL Control to implement the transformation in the data flow , Some projects require that stored procedures be used to call... Without using standard transformation components . There are also data warehouses that do not support stored procedures, so they can only pass the standard SQL To achieve .
ETL What is architecture
We usually talk about business intelligence BI What data architects really mean is ETL Architecture design , This is the whole business intelligence BI A very core layer of technology implementation in the project , Data processing 、 Data cleaning and modeling are both in ETL Achieve in .

business intelligence BI - Parker data business intelligence BI Visual analysis platform
A good ETL The architecture design can support hundreds of packages at the same time, which is the control flow , There may be hundreds of data streams under each control stream . I wrote a technical article before , You can search for keywords BIWORK ETL This article can be found on the Internet .
This frame design is more than just ETL The design of the framework , There are also very deep ETL Project management and normative controller idea , Including the later operation and maintenance , Based on business intelligence BI Business intelligence BI analysis ,ETL The performance tuning of will be reflected in these frameworks . Because big business intelligence BI The project may require dozens of people to develop at the same time ETL, The top-level design of the framework is very important .
边栏推荐
- xargs使用技巧 —— 筑梦之路
- [untitled]
- Two methods of QT exporting PDF files
- MATLAB Camera Calibrator相机标定
- IIS build wordpress5.7 manually
- Jenkins自动化部署,连接不到所依赖的服务【已解决】
- Shell pass parameters
- A preliminary study of IO model
- Three categories of financial assets under the new standards: AMC, fvoci and FVTPL
- 什么是SRE?一文详解SRE运维体系
猜你喜欢

Base64编码详解及其变种(解决加号在URL变空格问题)

Centos7 installation of jdk8, mysql5.7 and Navicat connection to virtual machine MySQL and solutions (solutions to MySQL download errors are attached)

中国芯片独角兽公司

教程篇(5.0) 08. Fortinet安全架构集成与FortiXDR * FortiEDR * Fortinet 网络安全专家 NSE 5

ZUCC_编译语言原理与编译_实验06 07 语法分析 LL 分析
![[untitled]](/img/94/792e8363dbfe67770e93b0dcdc8e72.png)
[untitled]

ZUCC_编译语言原理与编译_实验04 语言与文法

Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)

Detailed explanation of Base64 coding and its variants (to solve the problem that the plus sign changes into a space in the URL)

ZUCC_编译语言原理与编译_实验08 语法分析 LR 分析
随机推荐
ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析
什么是SRE?一文详解SRE运维体系
How to replace the web player easyplayerproactivex Key in OCX?
The pie chart with dimension lines can set various parameter options
mysql组合索引的有序性
[untitled]
更改SSH端口号
5 minutes, excellent customer service chat handling skills
常用日期格式符与Qt获取当前时间的办法
(pkcs1) RSA public private key PEM file parsing
How to improve the customer retention rate in the operation of independent stations? Customer segmentation is very important!
Three ways to uninstall Symantec Endpoint Protection Symantec
Shell array
[acnoi2022] not a structure, more like a structure
数据库,查询本月借出书的数量,如果高于10本时,显示“本月借出书大于10本”,否则显示“本月借出书小于10本”
深度学习与神经网络:最值得关注的6大趋势
Xtrabackup for data backup
api平台通用签名机制
dataX使用指南
Easydss anonymous live channel data volume instability optimization scheme sharing