Microsoft Certified Azure Data Engineer Associate | DP 203 | Step By Step Activity Guides (Hands-On Labs)

This blog post covers Hands-On Labs that you must perform in order to learn Data Engineering on Microsoft Azure & clear the Data Engineering on Microsoft Azure [DP-203] Certification.

This post helps you in Data Engineering on the Microsoft Azure journey with your self-paced learning as well as for your team learning. There are 17 Hands-On Labs in this course.

LAB 1: Explore Compute And Storage Options For Data Engineering Workloads

In this lab, you’ll interact with two compute technologies; Azure Databricks and Azure Synapse Analytics Spark pools, but not in any real depth and this is by design. The objective of the lab is how these compute technologies interact with the primary data storage option for analytical workloads in Azure; Azure Data Lake storage. Azure Data Lake Storage Gen2 is enabled by enabling hierarchical namespace in the Azure Storage Account creation. However, the lab provides an in-depth focus on two primary considerations when working with Azure Data Lake Storage Gen2 with data engineering compute technologies:

Organizing the data lake folders to be optimized for data exploration, loading, and querying
Using compute libraries to optimize the querying of files in a data lake

DP203-lab1-image

LAB 2: Designing And Implementing The Serving Layer

In this lab, you’ll create a star schema in an SQL database, using foreign key constraints. You’ll also create a snowflake schema in SQL database and then you’ll explore a common way for managing a common dimension. The time dimension. In the end, you’ll create a star schema in Azure Synapse Analytics and you’ll learn how you can update a dimension table by loading data into a dimension table using Azure Synapse pipelines.

star-schema

Source: Microsoft

LAB 3: Data Engineering Considerations

In this lab, you’ll do a basis for discussion around Modern Data Warehouse patterns, File formats, and folder structures, Security. The process of building a modern data warehouse typically consists of:

Data Ingestion and Preparation.
Making the data ready for consumption by analytical tools.
Providing access to the data, in a shaped format so that it can easily be consumed by data visualization tools.

Source: Microsoft

LAB 4: Run Interactive Queries Using Serverless SQL Pools

In this lab, you’ll learn how to work with files stored in the data lake and external file sources, through T-SQL statements executed by a serverless SQL pool in Azure Synapse Analytics. You’ll query Parquet files stored in a data lake, as well as CSV files stored in an external data store. Next, you’ll create Azure Active Directory security groups and enforce access to files in the data lake through Role-Based Access Control (RBAC) and Access Control Lists (ACLs).

#Read parquet file
select top 10 * 
from openrowset(
    bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet', 
    format = 'parquet') as rows
#Read parquet file
create external data source covid 
with ( location = 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases' ); go 
select top 10 * 
from openrowset( 
    bulk 'latest/ecdc_cases.parquet', 
    data_source = 'covid', 
    format = 'parquet' 
) as rows

#Explicitly specify a schema
select top 10 * 
from openrowset( 
    bulk 'latest/ecdc_cases.parquet', 
    data_source = 'covid', 
    format = 'parquet' 
) with ( 
    date_rep date, 
    cases int, 
    geo_id varchar(6) 
) as rows

LAB 5: Explore, Transform, And Load Data Into The Data Warehouse Using Apache Spark

In the lab, you’ll explore data stored in a data lake, transform the data, and load data into a relational data store. you’ll explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. Then you will use Apache Spark to load data into the data warehouse and join Parquet data in the data lake with data in the dedicated SQL pool. In this lab you will:

Perform Data Exploration in Synapse Studio
Ingest data with Spark notebooks in Azure Synapse Analytics
Transform data with Data Frames in Spark pools in Azure Synapse Analytics
Integrate SQL and Spark pools in Azure Synapse Analytics

Load-data-into-a-spark-dataframe-img-01

LAB 6: Data Exploration And Transformation In Azure Databricks

In the lab, you’ll take to opportunity to explore how to use various Apache Spark DataFrame methods to explore and transform data in Azure Databricks. You’ll learn how to perform standard DataFrame methods to explore and transform data. Then you’ll also learn how to perform more advanced tasks, such as removing duplicate data, manipulate date/time values, rename columns, and aggregate data.

The following exercises will be performed during the lab:

Use DataFrames in Azure Databricks to explore and filter data
Cache a DataFrame for faster subsequent queries
Remove duplicate data
Manipulate date/time values
Remove and rename DataFrame columns
Aggregate data stored in a DataFrame

azure-databricks Source: Microsoft

LAB 7: Ingest And Load Data Into The Data Warehouse

In this lab, you’ll learn how to ingest data into the data warehouse through T-SQL scripts and Synapse Analytics integration pipelines. You’ll learn how to load data into Synapse dedicated SQL pools with PolyBase and COPY using and how to use workload management along with a Copy activity in an Azure Synapse pipeline for petabyte-scale data ingestion.

load-data-into-Datawarehouse

LAB 8: Transform Data With Azure Data Factory Or Azure Synapse Pipelines

In this lab, you’ll learn how to build data integration pipelines to ingest data from multiple data sources, transform data using mapping data flows and notebooks, and perform data movement into one or more data sinks.

transformation-in-adf

LAB 9: Orchestrate Data Movement And Transformation In Azure Synapse Pipelines

In the lab, you’ll create a notebook to query user activity then you’ll then add the notebook to a pipeline using the new Notebook activity and execute this notebook after the Mapping Data Flow as part of their orchestration process. While configuring this, you’ll implement parameters to add dynamic content in the control flow and validate how the parameters can be used.

notebook

LAB 10: Optimize Query Performance With Dedicated SQL Pools In Azure Synapse

In this lab, you’ll be using window functions to perform calculations over a set of rows. You will explore the Over clause, aggregation functions, analytical functions and use the Rows clause to see the different ways you can make use of the windowing function for your data warehouse. You will also see an example of how the APPROX_COUNT_DISTINCT clause works. You will explore optimizing the data warehouse in Azure Synapse Analytics using a range of features including table distribution; indexing, and partitioning and you will look at how to further improve query performance using materialized views, the result set caching, and updating indexes and statistics.

lab10-img

LAB 11: Analyze And Optimize Data Warehouse Storage

This lab explains to you how to analyze and optimize the data storage of the Azure Synapse dedicated SQL pools. You will know the right approach to understand table space usage and column store storage details. Next, you will know how to compare storage requirements between identical tables that use different data types. Finally, you will monitor the impact materialized views have when executed in place of complex queries and learn how to avoid extensive logging by optimizing delete operations.

data-skew-img

Source: Microsoft

LAB 12: Support Hybrid Transactional Analytical Processing (HTAP) With Azure Synapse Link

In this lab, you’ll learn how Azure Synapse Link enables seamless connectivity of an Azure Cosmos DB account to an Azure Synapse workspace. You will understand how to enable and configure the Synapse link, then how to query the Azure Cosmos DB analytical store using Apache Spark and SQL Serverless.

synapse-analytics-cosmos-db-architecture

Source: Microsoft

LAB 13: End-To-End security With Azure Synapse Analytics

In this lab, you’ll learn how to secure a Synapse Analytics workspace and its supporting infrastructure. You’ll examine the SQL Active Directory Admin, manage IP firewall rules, manage secrets with Azure Key Vault and access those secrets through a Key Vault linked service and pipeline activities. Then you’ll understand how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.

data-masking

LAB 14: Real-time Stream Processing With Stream Analytics

This lab covers how to process streaming data with Azure Stream Analytics. You’ll ingest data into Event Hubs, then process that data in real-time, using various windowing functions in Azure Stream Analytics. You’ll output the data to Azure Synapse Analytics. Finally, you will learn how to scale the Stream Analytics job to increase throughput.

stream-analytics-end-to-end-solution

LAB 15: Create A Stream Processing Solution With Event Hubs And Azure Databricks

In this lab, you’ll know how to ingest and process streaming data at scale with Event Hubs and Spark Structured Streaming in Azure Databricks. You will learn the key features and uses of Structured Streaming. You will implement sliding windows to aggregate over chunks of data and apply watermarking to remove stale data. Finally, you will connect to Event Hubs to read and write streams.

create-event-hubs

LAB 16: Build Reports Using Power BI Integration With Azure Synapse Analytics

In this lab, you’ll learn how to integrate Power BI with their Azure Synapse workspace to build reports in Power BI. You’ll create a new data source and Power BI report in Azure Synapse Studio. Then you’ll learn how to improve query performance with materialized views and result-set caching. Finally, you’ll explore the data lake with serverless SQL pools and create visualizations against that data in Power BI.

synapse-analytics-power-bi-integration

Source: Microsoft

LAB 17: Perform Integrated Machine Learning Processes In Azure Synapse Analytics

In the lab, you’ll explore the integrated, end-to-end Azure Machine Learning and Azure Cognitive Services experience in Azure Synapse Analytics. You will learn how to connect an Azure Synapse Analytics workspace to an Azure Machine Learning workspace using a Linked Service and then trigger an Automated ML experiment that uses data from a Spark table. You’ll also learn how to use trained models from Azure Machine Learning or Azure Cognitive Services to enrich data in a SQL pool table and then serve prediction results using Power BI.

integrate-azure-machine-learning-create-linked-service-img

Related/References

Next Task For You

In our Azure Data Engineer training program, we will cover 17 Hands-On Labs. If you want to begin your journey towards becoming a Microsoft Certified: Azure Data Engineer Associate by checking our FREE CLASS.

The post Microsoft Certified Azure Data Engineer Associate | DP 203 | Step By Step Activity Guides (Hands-On Labs) appeared first on Cloud Training Program.

LAB 1: Explore Compute And Storage Options For Data Engineering Workloads

LAB 2: Designing And Implementing The Serving Layer

LAB 3: Data Engineering Considerations

LAB 4: Run Interactive Queries Using Serverless SQL Pools

LAB 5: Explore, Transform, And Load Data Into The Data Warehouse Using Apache Spark

LAB 6: Data Exploration And Transformation In Azure Databricks

LAB 7: Ingest And Load Data Into The Data Warehouse

LAB 8: Transform Data With Azure Data Factory Or Azure Synapse Pipelines

LAB 9: Orchestrate Data Movement And Transformation In Azure Synapse Pipelines

LAB 10: Optimize Query Performance With Dedicated SQL Pools In Azure Synapse

LAB 11: Analyze And Optimize Data Warehouse Storage

LAB 12: Support Hybrid Transactional Analytical Processing (HTAP) With Azure Synapse Link

LAB 13: End-To-End security With Azure Synapse Analytics

LAB 14: Real-time Stream Processing With Stream Analytics

LAB 15: Create A Stream Processing Solution With Event Hubs And Azure Databricks

LAB 16: Build Reports Using Power BI Integration With Azure Synapse Analytics

LAB 17: Perform Integrated Machine Learning Processes In Azure Synapse Analytics

Related/References

Next Task For You

Trending Articles