当前位置:网站首页>[azure data platform] ETL tool (1) -- Introduction to azure data factory
[azure data platform] ETL tool (1) -- Introduction to azure data factory
2022-06-13 03:23:00 【Hair dung coating wall】
This paper belongs to 【Azure Data Platform】 series .
Due to the need of work , Record daily learning practice into a series of articles , Hope to be helpful to readers in need .
About the use of data , In addition to storage ( database ), And collection (ETL/ELT etc. ) And analysis . This series focuses on Azure Of ETL Tools Azure Data Factory(ADF) in .
Preface
As a data platform solution , Data collection is an essential part of it . When selecting a data collection tool , There are several points to consider ( Suppose your environment is cloud based , there Azure):
- Whether to use the... Provided by the platform ? Many vendors provide local and cloud based tools . How to choose needs many considerations , Especially the established strategy of the enterprise .
- timeliness : Tradition ETL Tools are usually T+1, The delay is serious , For today's environment , A little inappropriate . However, the details still need to be determined according to the project .
- cost : Of course , Free tools are not recommended for commercial use .
ADF brief introduction
According to the strategy of the enterprise , My project is based on ADF by ETL Tools , Then the next series of articles will mainly introduce and demonstrate it . Then we must first make clear ADF This ETL What tools can do ?
The official introduction will not be posted , Here are the most important highlights : Replicate and transfer data in local and cloud environments , This is one of the pain points in the process of data synchronization in many hybrid cloud environments .
SQL Server Users should have heard SSIS(SQL Server Integration Services), ADF Not a cloud version SSIS, It is not very strongly bound to SQL Server The characteristics of , It supports a wider range of data transmission .ADF Although it can be executed on the cloud SSIS package , But it's just using ADF Scalability and SSIS Advanced ETL It's just a function .
ADF And others ETL The difference between the tools
Based on the following points , In the use of Azure Platform time ,ADF Is a relatively good candidate ETL One of the tools :
- Can run SSIS package , This has been used maturely for the local environment SSIS In terms of projects , Can reduce migration costs .
- Automatic scaling based on load , And completely by PaaS Product hosting .
- The running interval can be reduced to every minute .
- Through a gateway (gateway) Seamlessly connect your local environment to Azure Cloud platform .
- Ability to handle large-scale data .
- It can be connected with other computing services HDI Connect and collaborate on real big data .
Of course , If it is just an ordinary ETL Tools , It is difficult to compete with other majors in the market ETL Manufacturers compete . therefore ADF There must be some other advantages . In addition to support for mainstream data sources , Customizable code logic , Monitoring and other common functions ,ADF Also integrates Azure DevOps and GitHub Of CI/CD function .
In Microsoft's official parlance ,ADF It simplifies mixed data integration : Azure Data Factory
Let's briefly introduce ADF The components of , The content mainly comes from ADF file :
- Pipeline: The Conduit , Its function is mainly to encapsulate functions in the form of collections ( Mission ). And pass the parameter value to the next task .
- Mapping Data Flow: Mapping data streams , Visual transfer logic (ETL Medium T). Run on fully managed Spark On the cluster , And meet the load demand in the form of horizontal expansion .
- Activity: Activities , A single execution step in a pipeline .ADF Currently, three types of activities are supported : Data mobility activities 、 Data conversion activities and control activities .
- Dataset: Data sets , The data source involved in the pipeline output input .
- Linked Service: Link services , Similar to the connection string , Connecting data and computing resources ( such as HDI).
- Trigger: trigger , A processing unit that defines when a pipeline will be executed .
- Control flow: control flow , Control the business process of the pipeline , Set up branches , Cycle, etc .
More introductions will be provided in the follow-up demonstration and use . Let's prepare the environment first .
Demonstration of practical operation
open Azure Portal, Search service “ Data factory ” Or the English version “Data Factory”,
“ basic ” Options page : Fill in the necessary information 
It is suggested to choose V2, because V1 Will be discarded .
“Git To configure ” Options page : This step does not need , So choose to configure later Git:
“ The Internet ” Options page : If you haven't figured out what the options are for , Leave the default options and click next :
“ senior ” Options page , As an introduction , Do not select key encryption :
“ Mark ” It is more suitable for the management of a large number of resources in enterprise applications , The default is also kept here :
Notice here the... Below “ Download automation templates ”, If in the enterprise , Use a defined template , It can save configuration time and the risk of omission caused by manual operation .
Click on “ establish ” After the button , Probably 1 It will be deployed in about minutes :
Now we have one ADF service . The next article will demonstrate the simplest ADF Use .
【Azure Data Platform】ETL Tools (2)——Azure Data Factory “ Copy the data ” Tools ( Copy in the cloud )
边栏推荐
- Differences between XAML and XML
- [figure data] how long does it take for the equity network to penetrate 1000 layers?
- A personal understanding of interpreted and compiled languages
- Explain tool and index optimization (II)
- C simple understanding - overloaded operator
- On the career crisis of programmers at the age of 35
- 技术博客,经验分享宝典
- Rustup installation
- 2021-08-30 distributed cluster
- Filters in PHP
猜你喜欢

Technology blog, a treasure trove of experience sharing

Neo4j auradb free, the world's leading map database

JVM JMM (VI)

Alibaba cloud OSS access notes

QML connecting to MySQL database

技术博客,经验分享宝典

Summary of rust language practice

Pytorch record: pytorch variables parameter and buffer. self. register_ buffer()、self. register_ parameter()

Aggregation analysis of research word association based on graph data

Azure SQL db/dw series (12) -- using query store (1) -- report Introduction (1)
随机推荐
Loading process of [JVM series 3] classes
Coordinate location interface of wechat applet (II) map plug-in
Video playback has repeatedly broken 1000w+, how to use the second dimension to create a popular model in Kwai
Introduction to Kestrel_ Introduction to kestrel web server
Use of interceptors webmvcconfigurer
Redis memory optimization and distributed locking
C # simple understanding - method overloading and rewriting
Use PHP to count command line calls on your computer
MySQL create user authorization remote access
C method parameter: params
C simple understanding - overloaded operator
QML connecting to MySQL database
MySQL 8.0 installation free configuration method
Using linked list to find set union
Azure SQL db/dw series (10) -- re understanding the query store (3) -- configuring the query store
Graph data modeling tool
Data Governance Series 1: data governance framework [interpretation and analysis]
MySQL and PostgreSQL installation subtotal
Explain tool and index optimization (II)
Querywrapper constructor method