当前位置：网站首页>[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)

[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)

2022-06-13 03:22:00 【Hair dung coating wall】

This paper belongs to 【Azure Data Platform】 series .
Continued above ：【Azure Data Platform】ETL Tools (1)——Azure Data Factory brief introduction
This article demonstrates how to use ADF from Azure Blob Storage Copy data to Azure SQL DB.

In the last article , We have created ADF service , Here is a demonstration of the simplest ADF operation , except ADF Out of service , This article will create a Azure Blob Storage and Azure SQL Database As a demonstration of data transmission .

stay ADF in , There is one “ Copy the data ” Tools , With this tool , You can put them in different places （ Local or cloud ） To realize data transmission in different data sources . Basically support all the regular data sources you can think of , The specific list shall be subject to here ：Supported data stores and formats.

Here we introduce a concept ：Integration Runtime (IR), Integration runtime .

ADF The current support 3 class IR：

Azure Integration Runtime： It mainly involves public network access .
Self-Hosted Integration Runtime: It is used to access the local data source of the source or target table .
Azure SSIS Integration Runtime: Used to run the SSIS package .

ADF Use IR Safely run replication activities in different network environments . And select the closest available area as the data source . It can be understood as IR Set up replication activities （copy activity) And link services (Linked services) The bridge .

Environmental preparation

Here is a very common requirement , from Azure Blob Copy data to SQL DB. This is based on cloud environment （Azure） Internal data replication operations . So , Let's quickly create a Azure Blob Storage and SQL DB, If you can use the default, use the default .

establish Blob Storage

This series assumes that you have basically created Azure service , And because the budget is limited , Will try to choose low configuration services . establish Blob Storage As shown in the figure below ：
Insert picture description here

establish SQL DB

Again, the cheapest configuration is selected here ：
Insert picture description here

Be careful “ The Internet ” In the options page , The default is “ Cannot access ”, In order to make Blob Storage Be able to access DB, Choose here “ Public endpoint ” And then in “ Firewall rules ” Chinese vs “ allow Azure Services and resources access this server ” choice “ yes ”
Insert picture description here
After the resource is created , We can start the operation .

Operation demo

First, let's connect DB And create a test table “ADFLab”, As shown in the figure below ：
Insert picture description here

Check if there is data in the table ：
Insert picture description here

Then prepare a test file and upload it to Blob Storage On . This is a txt file , And with a comma “,” As a separator , You'll see in the back , The reason for using commas , This is because the tool for copying data is separated by commas by default , If you use other as the separator , Need extra configuration , The contents of the document are as follows ：
Insert picture description here
Upload to Blob Storage Of container（adflab） in ：

When you're ready , Begin ADF The development work of .

ADF Development

To use the copy data tool , First create Pipeline( The Conduit ), Is used to blob container Copy the file in to Azure SQL DB in .

step 1： Open the studio of the data factory ：
Insert picture description here

step 2： Click on 【 newly build 】 And select 【 The Conduit 】：
Insert picture description here
step 3： Click new pipe （pipeline1), Put... In the picture below 【 Copy the data 】 Drag to the right margin , Note here that due to time , There is no good name for each step , In a formal setting , Each component must be named with a meaningful identifier .

step 4： Configuration source , Click in the following order , choice Blob Storage As 【 Source 】：
Insert picture description here

step 5： Select the format type of the data , Because this time we use txt file , So choose DelimitedText, Then click on the bottom 【 continue 】：
Insert picture description here

step 6： Select or create a new linked service （linked service), Because of the new environment , So here's a new one ：

Insert picture description here

step 7： Create a new link service and test the connection ：
Insert picture description here
After passing the test , You can click the red box in the following figure to view the files inside the container in a graphical form ：

We can see that there is a problem , It also means that the connection is successful , Select this file as the data source ：
Insert picture description here
After configuring the source , Click on 【 Preview data 】, View the contents of the data , It can also be used as a verification process ：

step 8： To configure 【 Receiver 】, That's the goal , As shown in the figure below , Select new and then select Azure SQL database ：
Insert picture description here
step 9： Similar to the configuration source , Fill in the necessary information and click test link ：

Insert picture description here
The following figure shows the configuration ：

step 10： Configuration mapping , Mapping refers to how to parse files in the source （ Or other formats ） structure , In this case , Is to make ADF analysis Blob The process of setting up file structures and configuring database tables ：
Insert picture description here

step 11： To configure 【 Set up 】, Just keep the default settings here for the time being ：
Insert picture description here
step 12： To configure 【 User properties 】, Click on the user attribute 【 Automatic generation 】 You can load some attribute information , You can also add necessary information by yourself .

step 13： Then we verify and debug ：

An error is reported during debugging , Viewing the information, you can find that the data type does not correspond well ：
Insert picture description here
The error message is as follows ：

Then look back at the configuration ： The following figure was not checked , Check it out this time ：

Click on 【 framework 】, And then click 【 Import schema 】, Refresh data source analysis results , You can see 【 Name 】 I have read it ：
Insert picture description here
And then back to 【 mapping 】 in , Again 【 Import schema 】, Refresh structure , You can see that the source has also been updated ：

Next, debug again , It turned out to be a success .

The last step is to release the program ： Insert picture description here

Then go back to the query results in the database , You can see that the database has been successfully imported ：
Insert picture description here

up to now , Have already put one Blob Storage The simple file on is loaded into SQL DB in , The process is very simple , Very idealized , However, in enterprise use, various environmental requirements will bring a lot of additional configurations , Like authentication , Network connectivity, etc .

But as the first practical exercise for getting started , I think this level should be enough . From this practice , As ETL The entrant , Many details still need to be studied , practice , But you should not be discouraged just because you encounter various problems for the first time .

The next chapter will try to copy data from the local environment to the cloud . 【Azure Data Platform】ETL Tools (3)——Azure Data Factory Copy from local data source to Azure

原网站

版权声明
本文为[Hair dung coating wall]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280531217555.html