当前位置:网站首页>[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)

[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)

2022-06-13 03:22:00 Hair dung coating wall

This paper belongs to 【Azure Data Platform】 series .
Continued above :【Azure Data Platform】ETL Tools (1)——Azure Data Factory brief introduction
This article demonstrates how to use ADF from Azure Blob Storage Copy data to Azure SQL DB.

In the last article , We have created ADF service , Here is a demonstration of the simplest ADF operation , except ADF Out of service , This article will create a Azure Blob Storage and Azure SQL Database As a demonstration of data transmission .

stay ADF in , There is one “ Copy the data ” Tools , With this tool , You can put them in different places ( Local or cloud ) To realize data transmission in different data sources . Basically support all the regular data sources you can think of , The specific list shall be subject to here :Supported data stores and formats.

Here we introduce a concept :Integration Runtime (IR), Integration runtime .

ADF The current support 3 class IR:

  1. Azure Integration Runtime: It mainly involves public network access .
  2. Self-Hosted Integration Runtime: It is used to access the local data source of the source or target table .
  3. Azure SSIS Integration Runtime: Used to run the SSIS package .

ADF Use IR Safely run replication activities in different network environments . And select the closest available area as the data source . It can be understood as IR Set up replication activities (copy activity) And link services (Linked services) The bridge .

Environmental preparation

Here is a very common requirement , from Azure Blob Copy data to SQL DB. This is based on cloud environment (Azure) Internal data replication operations . So , Let's quickly create a Azure Blob Storage and SQL DB, If you can use the default, use the default .

establish Blob Storage

This series assumes that you have basically created Azure service , And because the budget is limited , Will try to choose low configuration services . establish Blob Storage As shown in the figure below :
 Insert picture description here

establish SQL DB

Again, the cheapest configuration is selected here :
 Insert picture description here
 Insert picture description here
Be careful “ The Internet ” In the options page , The default is “ Cannot access ”, In order to make Blob Storage Be able to access DB, Choose here “ Public endpoint ” And then in “ Firewall rules ” Chinese vs “ allow Azure Services and resources access this server ” choice “ yes ”
 Insert picture description here
After the resource is created , We can start the operation .

Operation demo

First, let's connect DB And create a test table “ADFLab”, As shown in the figure below :
 Insert picture description here

Check if there is data in the table :
 Insert picture description here

Then prepare a test file and upload it to Blob Storage On . This is a txt file , And with a comma “,” As a separator , You'll see in the back , The reason for using commas , This is because the tool for copying data is separated by commas by default , If you use other as the separator , Need extra configuration , The contents of the document are as follows :
 Insert picture description here
Upload to Blob Storage Of container(adflab) in :
 Insert picture description here

When you're ready , Begin ADF The development work of .

ADF Development

To use the copy data tool , First create Pipeline( The Conduit ), Is used to blob container Copy the file in to Azure SQL DB in .

step 1: Open the studio of the data factory :
 Insert picture description here

step 2: Click on 【 newly build 】 And select 【 The Conduit 】:
 Insert picture description here
step 3: Click new pipe (pipeline1), Put... In the picture below 【 Copy the data 】 Drag to the right margin , Note here that due to time , There is no good name for each step , In a formal setting , Each component must be named with a meaningful identifier .
 Insert picture description here

step 4: Configuration source , Click in the following order , choice Blob Storage As 【 Source 】:
 Insert picture description here

step 5: Select the format type of the data , Because this time we use txt file , So choose DelimitedText, Then click on the bottom 【 continue 】:
 Insert picture description here

step 6: Select or create a new linked service (linked service), Because of the new environment , So here's a new one :

 Insert picture description here

step 7: Create a new link service and test the connection :
 Insert picture description here
After passing the test , You can click the red box in the following figure to view the files inside the container in a graphical form :
 Insert picture description here
We can see that there is a problem , It also means that the connection is successful , Select this file as the data source :
 Insert picture description here
After configuring the source , Click on 【 Preview data 】, View the contents of the data , It can also be used as a verification process :
 Insert picture description here
step 8: To configure 【 Receiver 】, That's the goal , As shown in the figure below , Select new and then select Azure SQL database :
 Insert picture description here
step 9: Similar to the configuration source , Fill in the necessary information and click test link :

 Insert picture description here
The following figure shows the configuration :
 Insert picture description here

step 10: Configuration mapping , Mapping refers to how to parse files in the source ( Or other formats ) structure , In this case , Is to make ADF analysis Blob The process of setting up file structures and configuring database tables :
 Insert picture description here

step 11: To configure 【 Set up 】, Just keep the default settings here for the time being :
 Insert picture description here
step 12: To configure 【 User properties 】, Click on the user attribute 【 Automatic generation 】 You can load some attribute information , You can also add necessary information by yourself .
 Insert picture description here
step 13: Then we verify and debug :
 Insert picture description here
An error is reported during debugging , Viewing the information, you can find that the data type does not correspond well :
 Insert picture description here
The error message is as follows :
 Insert picture description here
Then look back at the configuration : The following figure was not checked , Check it out this time :
 Insert picture description here

Click on 【 framework 】, And then click 【 Import schema 】, Refresh data source analysis results , You can see 【 Name 】 I have read it :
 Insert picture description here
And then back to 【 mapping 】 in , Again 【 Import schema 】, Refresh structure , You can see that the source has also been updated :
 Insert picture description here
Next, debug again , It turned out to be a success .
 Insert picture description here

The last step is to release the program : Insert picture description here

Then go back to the query results in the database , You can see that the database has been successfully imported :
 Insert picture description here

up to now , Have already put one Blob Storage The simple file on is loaded into SQL DB in , The process is very simple , Very idealized , However, in enterprise use, various environmental requirements will bring a lot of additional configurations , Like authentication , Network connectivity, etc .

But as the first practical exercise for getting started , I think this level should be enough . From this practice , As ETL The entrant , Many details still need to be studied , practice , But you should not be discouraged just because you encounter various problems for the first time .

The next chapter will try to copy data from the local environment to the cloud . 【Azure Data Platform】ETL Tools (3)——Azure Data Factory Copy from local data source to Azure

原网站

版权声明
本文为[Hair dung coating wall]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280531217555.html