Data Engineering with Azure (Part 2): Build Data Ingestion Pipeline with Azure Data Factory

Ahmad Maulana Malik Fattah
Data Engineering Indonesia
5 min readSep 3, 2022

--

Photo by Christopher Burns on Unsplash

We have prepared all the resources we need to build our pipeline in Part 1 of this article. In this part, we are going to do the main objective: build a pipeline that ingests data from an Azure SQL Database and moves it into a SQL Pool.

Let’s go tackle it!

Configure Connection to the Data Source

Go to the Data Factory Studio we already visit in the previous part. There, match the Data Factory name so it fits with the resource we’ve deployed. After that, click Continue.

Select the Data Factory resource

Now you must see the Data Factory Studio Dashboard. Since we want to ingest data, let’s choose the Ingest menu.

Choose Ingest menu on Data Factory Studio

Our task is simply copying the data from a data source, then we choose the Built-in copy task. For now, let’s not schedule the pipeline, so make sure to choose Run once now in the schedule configuration. Click Next after you completed the configuration.

Configure the pipeline’s properties

A sidebar would appear to set the source connection. We want to ingest data from an Azure SQL Database resource, so you could search for ‘SQL’ and then choose Azure SQL Database.

Set a new connection to Azure SQL Database

Next, configure how we want to connect to the resource. Here, let’s use the Azure subscription instead of entering it manually. Make sure that the server, database, and account credentials match the Azure SQL Database resource we’ve deployed in Part 1. In Part 1, the server name is ‘nawasena’, the database name is ‘nawasenadb’, and the username & password is the SQL server account.

Configure data source connection credential

When you’ve done with the configuration, do a Test connection to check did we configure it correctly. If the test result is a success, hit the Create button to create the connection to the Azure SQL Database resource.

Test and create a connection to the data source

… Ensure that the server, database, and account credentials match the Azure SQL Database resourceNext, in the Source data store section, set the data source configuration so it matches the Connection we’ve made. We could also choose which column of the SQL table we want to ingest. When you’ve completed it, hit the Next button.

Choose the columns from the data source

Great! Next, let’s…

Configure Connection to the Destination Data Store

Now, you may see a new pop-up to set the destination data store. Make a New connection and choose Azure Synapse Analytics as the destination.

Configure the connection to the SQL Pool resource we’ve made inside Synapse Analytics before. As well on the source data store, do the Test connection for the destination too. If everything is okay, you could hit the Create button.

Configure data destination connection credential
Test and create a connection to the data destination

You will be directed to map the columns from the source to the destination data store. Ensure all the columns in the data source has a pair in the data destination. Next, in the Column mapping, we could choose whether to convert the data type or not. In this article, let’s uncheck the Type conversion so the data type on the data destination store will remain the same as on the data source. Then, hit the Next button.

Map columns between the data source and destination
Disable Type conversion

Nice, we’re almost done! Last, let’s make the final settings.

Configure Staging Area

Now we are in the Settings section. Set the task and description of the data ingestion task.

We could also configure a staging area between the source and destination resources. The staging area is a middle zone used for data processing during ETL (Extract, Transform, Load) tasks. For our data ingestion pipeline, a staging area is not required. However, let’s keep Enable staging just to get in touch with the feature.

In the Staging account linked service, click New.

Set the task, description, and staging area

Create the staging connection to an Azure Blob Storage resource. Next, expand the Advanced section and set the Copy method to Polybase. Uncheck the Use type default setting and hit the Next button.

Create a connection to the staging area

Now, after all the configuration, let’s review a Summary and make sure everything has well-configured. Last, hit Next to deploy our pipeline.

Review the pipeline configuration

Monitor Deployed Pipeline

After the deployment is successfully finished, go to the Monitor section to get an insight into the pipeline status. In the Pipeline runs page, choose our deployed pipeline. We could monitor the activities by clicking the glasses icon on the right side of the activity name.

Monitor pipeline activities
Pipeline activity details

Nice, that’s all up!

We’ve explored the ingestion features of the Azure Data Factory. It has many other features we could explore and use in the industry. And besides the Data Factory Studio, Azure also provides SDK to interact with Data Factory by using code.

Stay tuned for the next Data Engineering exploration articles and let me know if you have any thoughts about the article!

--

--

Ahmad Maulana Malik Fattah
Data Engineering Indonesia

Data Engineer || Love to work with data, both in engineering and analytics parts || s.id/who-is-ammfat