当前位置:网站首页>[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)
[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)
2022-06-13 03:22:00 【Hair dung coating wall】
This paper belongs to 【Azure Data Platform】 series .
Continued above :【Azure Data Platform】ETL Tools (1)——Azure Data Factory brief introduction
This article demonstrates how to use ADF from Azure Blob Storage Copy data to Azure SQL DB.
In the last article , We have created ADF service , Here is a demonstration of the simplest ADF operation , except ADF Out of service , This article will create a Azure Blob Storage and Azure SQL Database As a demonstration of data transmission .
stay ADF in , There is one “ Copy the data ” Tools , With this tool , You can put them in different places ( Local or cloud ) To realize data transmission in different data sources . Basically support all the regular data sources you can think of , The specific list shall be subject to here :Supported data stores and formats.
Here we introduce a concept :Integration Runtime (IR), Integration runtime .
ADF The current support 3 class IR:
- Azure Integration Runtime: It mainly involves public network access .
- Self-Hosted Integration Runtime: It is used to access the local data source of the source or target table .
- Azure SSIS Integration Runtime: Used to run the SSIS package .
ADF Use IR Safely run replication activities in different network environments . And select the closest available area as the data source . It can be understood as IR Set up replication activities (copy activity) And link services (Linked services) The bridge .
Environmental preparation
Here is a very common requirement , from Azure Blob Copy data to SQL DB. This is based on cloud environment (Azure) Internal data replication operations . So , Let's quickly create a Azure Blob Storage and SQL DB, If you can use the default, use the default .
establish Blob Storage
This series assumes that you have basically created Azure service , And because the budget is limited , Will try to choose low configuration services . establish Blob Storage As shown in the figure below :
establish SQL DB
Again, the cheapest configuration is selected here :
Be careful “ The Internet ” In the options page , The default is “ Cannot access ”, In order to make Blob Storage Be able to access DB, Choose here “ Public endpoint ” And then in “ Firewall rules ” Chinese vs “ allow Azure Services and resources access this server ” choice “ yes ”
After the resource is created , We can start the operation .
Operation demo
First, let's connect DB And create a test table “ADFLab”, As shown in the figure below :
Check if there is data in the table :
Then prepare a test file and upload it to Blob Storage On . This is a txt file , And with a comma “,” As a separator , You'll see in the back , The reason for using commas , This is because the tool for copying data is separated by commas by default , If you use other as the separator , Need extra configuration , The contents of the document are as follows :
Upload to Blob Storage Of container(adflab) in :
When you're ready , Begin ADF The development work of .
ADF Development
To use the copy data tool , First create Pipeline( The Conduit ), Is used to blob container Copy the file in to Azure SQL DB in .
step 1: Open the studio of the data factory :
step 2: Click on 【 newly build 】 And select 【 The Conduit 】:
step 3: Click new pipe (pipeline1), Put... In the picture below 【 Copy the data 】 Drag to the right margin , Note here that due to time , There is no good name for each step , In a formal setting , Each component must be named with a meaningful identifier .
step 4: Configuration source , Click in the following order , choice Blob Storage As 【 Source 】:
step 5: Select the format type of the data , Because this time we use txt file , So choose DelimitedText, Then click on the bottom 【 continue 】:
step 6: Select or create a new linked service (linked service), Because of the new environment , So here's a new one :
step 7: Create a new link service and test the connection :
After passing the test , You can click the red box in the following figure to view the files inside the container in a graphical form :
We can see that there is a problem , It also means that the connection is successful , Select this file as the data source :
After configuring the source , Click on 【 Preview data 】, View the contents of the data , It can also be used as a verification process :
step 8: To configure 【 Receiver 】, That's the goal , As shown in the figure below , Select new and then select Azure SQL database :
step 9: Similar to the configuration source , Fill in the necessary information and click test link :
The following figure shows the configuration :
step 10: Configuration mapping , Mapping refers to how to parse files in the source ( Or other formats ) structure , In this case , Is to make ADF analysis Blob The process of setting up file structures and configuring database tables :
step 11: To configure 【 Set up 】, Just keep the default settings here for the time being :
step 12: To configure 【 User properties 】, Click on the user attribute 【 Automatic generation 】 You can load some attribute information , You can also add necessary information by yourself .
step 13: Then we verify and debug :
An error is reported during debugging , Viewing the information, you can find that the data type does not correspond well :
The error message is as follows :
Then look back at the configuration : The following figure was not checked , Check it out this time :
Click on 【 framework 】, And then click 【 Import schema 】, Refresh data source analysis results , You can see 【 Name 】 I have read it :
And then back to 【 mapping 】 in , Again 【 Import schema 】, Refresh structure , You can see that the source has also been updated :
Next, debug again , It turned out to be a success .
The last step is to release the program :
Then go back to the query results in the database , You can see that the database has been successfully imported :
up to now , Have already put one Blob Storage The simple file on is loaded into SQL DB in , The process is very simple , Very idealized , However, in enterprise use, various environmental requirements will bring a lot of additional configurations , Like authentication , Network connectivity, etc .
But as the first practical exercise for getting started , I think this level should be enough . From this practice , As ETL The entrant , Many details still need to be studied , practice , But you should not be discouraged just because you encounter various problems for the first time .
The next chapter will try to copy data from the local environment to the cloud . 【Azure Data Platform】ETL Tools (3)——Azure Data Factory Copy from local data source to Azure
边栏推荐
- 2-year experience summary to tell you how to do a good job in project management
- Least recently used cache (source force deduction)
- JVM virtual machine stack (III)
- MASA Auth - 从用户的角度看整体设计
- QML connecting to MySQL database
- Alibaba cloud OSS access notes
- 关于复合函数的极限问题
- Simple use of qtreeview of QT (including source code + comments)
- A personal understanding of interpreted and compiled languages
- English grammar_ Frequency adverb
猜你喜欢
Aggregation analysis of research word association based on graph data
Typical application of ACL
Vscode liveserver use_ Liveserver startup debugging
C # simple understanding - method overloading and rewriting
2021-08-30 distributed cluster
Complex network analysis capability based on graph database
Application framework / capability blueprint
JS merge multiple string arrays to maintain the original order and remove duplicates
JVM GC (V)
JVM JMM (VI)
随机推荐
Differences between XAML and XML
C simple understanding - arrays and sets
Common command records of redis client
brew工具-“fatal: Could not resolve HEAD to a revision”错误解决
P1048 [noip2005 popularization group] Drug collection
2022.05.29
2022.05.29
Array in PHP array function_ Slice and array_ flip
Polymorphism in golang
Detailed explanation of curl command
Solution of Kitti data set unable to download
JVM class loading (I)
Implode and explode in golang
C 10 new features_ C 10 new features
PostgreSQL common SQL
MySQL index bottom layer (I)
C method parameter: ref
How to Draw Useful Technical Architecture Diagrams
JMeter quick start
PHP import classes in namespace