This blog cover Step-By-Step Activity Guides of the Microsoft Azure Data Engineer Associate [DP-200 & DP-201] Hands-On Labs Training program that you must perform to learn this course.
Azure Data Engineer designs and implement the management, security, monitoring, and privacy of the data using the full stack of Azure data services to satisfy business needs.
The walkthrough of the Step-By-Step Activity Guides of the Microsoft Azure Data Engineer Associate [DP-200 & DP-201] Training program will prepare you thoroughly for the DP-200 & DP-201 certifications.
DP-200| Implementing an Azure Data Solution
- Azure for the Data Engineer.
- Working with Data Storage.
- Enabling Team-Based Data Science with Azure Databricks.
- Building a Globally Distributed Database with Cosmos DB.
- Working with Relational Data Stores in the Cloud.
- Performing Real-Time Analytics with Stream Analytics.
- Orchestrating Data Movement with Azure Data Factory.
- Securing Azure Data Platforms.
- Monitoring and Troubleshooting Data Storage and Processing.
DP-201 | Designing an Azure Data Solution
- Case Study
- Architect an Enterprise-grade Conversational Bot in Azure
- Azure Real-Time Reference Architectures
- Data Platform Security Design Considerations
- Designing for Resiliency and Scale
- Design for Efficiency and Operations
Here’s the quick guide of how to start learning Data Science on Azure & to clear Azure Data Engineer Associate by doing Hands-on.
To know in more detail about the Azure Data Engineer Associate Certification click here.
Skills Measured in Exam DP-200
- Implement Data Storage Solutions (40-45%)
- Manage and Develop Data Processing (25-30%)
- Monitor and Optimize Data Solutions (30-35%)
Lab 1: Azure for the Data Engineer
Exercise 1: Identify the evolving of world data
- In this exercise, we’ll identify the data requirements and identify if the data structure for the requirement is structured, semi-structured or unstructured from the case study.
- Non-Relational: Document data, graph data, column family, etc.
- Relational: Data stored in tables. i.e. Customer table, employee tables.
Exercise 2: Determine the Azure Data Platform services
- In this exercise, we’ll determine the data platform technology that delivers the identified data requirements.
Exercise 3: Identify the tasks to be performed by the Data Engineer
- In this exercise, we’ll select one of the requirements and determine the high-level tasks that will perform to meet the data requirement selected.
- Provisioning data storage services.
- Ingesting Streaming and batch data.
- Transforming data.
Exercise 4: Finalize the data engineering deliverables
- In this exercise, we’ll finalize the data engineering deliverables for AdventureWorks.
Lab 2: Working with Data Storage
Exercise 1: Choose a data storage approach in Azure
- In this exercise, we’ll identify the data storage requirements for the static images for the website, and for the predictive analytics solution from the case study.
- Each data set has different requirements, and it’s our job to figure out which storage solution is best.
Exercise 2: Create an Azure Storage Account
- In this exercise, we’ll create an Azure resource group in the region closest to the lab location.
- Create a container named images, phone calls, and tweets within the storage account.
- Upload some graphics to the images container of the storage account.
Exercise 3: Explain Azure Data Lake Storage
- In this exercise, we’ll create and configure a storage account as a Data Lake Store Gen2 storage type in the region closest to the lab location, within the resource group.
Exercise 4: Upload data into Azure Data Lake
- In this exercise, we’ll Install and start Microsoft Azure Storage Explorer and Upload some data files to the containers of the Data Lake Gen II Storage Account.
Lab 3: Enabling Team-Based Data Science with Azure Databricks
Exercise 1: Explain Azure Databricks
- Azure Databricks is easy to set up data analytics platforms. Based on Apache Spark “big data” platform.
Exercise 2: Work with Azure Databricks
- In this exercise, we’ll Create an Azure Databricks Premium Tier instance in a resource group and then Open Azure Databricks and then Launch a Databricks Workspace and create a Spark Cluster.
Exercise 3: Read data with Azure Databricks
- In this exercise, we’ll confirm that the Databricks cluster has been created and then collect the Azure Data Lake Store Gen2 account name. Enable your Databricks instance to access the Data Lake Gen2 Store.
- we’ll Create a Databricks Notebook and connect to a Data Lake Store and then Read data in Azure Databricks.
Exercise 4: Perform basic transformations with Azure Databricks
- In this exercise, we’ll Retrieve specific columns on a Dataset and then Performing a column rename on a Dataset. Add an Annotation and If Time permits: Additional transformations.
Lab 4: Building Globally Distributed Databases with Cosmos DB
Exercise 1: Create an Azure Cosmos DB database built to scale
- In this exercise, we’ll create an Azure Cosmos DB instance.
Exercise 2: Insert and query data in your Azure Cosmos DB database
- In this exercise, we’ll Setup your Azure Cosmos DB database and container and then add data using the portal
- We’ll Run queries in the Azure portal. Run complex operations on our data.
Exercise 3: Distribute your data globally with Azure Cosmos DB
- In this exercise, we’ll Replicate Data to Multiple Regions and managing Failover.
Lab 5: Working with Relational Data Stores in the Cloud
Exercise 1: Use Azure SQL Database
- A “database as a service” offering from Azure runs the SQL server database engine under the hood, not 100 % compatible, but also a minor change to our code might be required some SQL server feature are not supported.
- In this exercise, we’ll create and configure a SQL Database instance.
Exercise 2: Describe Azure Synapse Analytics
- In this exercise, we’ll Create and configure an Azure Synapse Analytics instance and Configure the Server Firewall and then Pause the warehouse database.
Exercise 3: Creating an Azure Synapse Analytics database and tables
- In this exercise, we’ll Install SQL Server Management Studio and connect to a data warehouse instance and then create a SQL Data Warehouse database and create SQL Data Warehouse tables.
Exercise 4: Using PolyBase to Load Data into Azure Synapse Analytics
- Polybase allows us to query external databases like SQL, Oracle, Teradata, MongoDB and Azure blob storage.
- In this exercise, we’ll Collect Data Lake Storage container and key details and then create a dbo.Dates table using PolyBase from Azure Data Lake Storage.
Lab 6: Performing Real-Time Analytics with Stream Analytics
Exercise 1: Explain data streams and event processing
- In this exercise, we’ll identify the data stream ingestion technology for AdventureWorks, and the high-level tasks that you will conduct as a data engineer to complete the social media analysis requirements from the case study and the scenario.
Exercise 2: Data Ingestion with Event Hubs
- In this exercise, we’ll create and configure an Event Hub Namespace, and Event Hub and configure Event Hub security.
Exercise 3: Starting the telecom event generator application
- In this exercise, we’ll update the application connection string and run the application.
Exercise 4: Processing Data with Stream Analytics Jobs
In this exercise, we’ll do the following tasks :
- Provision a Stream Analytics job and Specify the Stream Analytics job input.
- Specify the Stream Analytics job output and Define a Stream Analytics query.
- Start the Stream Analytics job and Validate streaming data is collected.
Lab 7: Orchestrating Data Movement with Azure Data Factory
Exercise 1: Setup Azure Data Factory
- In this exercise, we’ll set up Azure Data Factory.
Exercise 2: Ingest data using the Copy Activity
- In this exercise, we’ll add the Copy Activity to the designer and then Create a new HTTP dataset to use as a source. Create a new ADLS Gen2 sink and test the Copy Activity.
Exercise 3: Transforming Data with Mapping Data Flow
- In this exercise, we’ll be preparing the environment and be Adding a Data Source. Using Mapping Data Flow transformation writing to a Data Sink and then running the Pipeline.
Exercise 4: Azure Data Factory and Databricks
- In this exercise, we’ll Generate a Databricks Access Token. Generate a Databricks Notebook Create Linked Services and we Create a Pipeline that uses Databricks Notebook Activity and then triggers a Pipeline Run.
Lab 8: Securing Azure Data Platforms
Exercise 1: An introduction to security
- We will find accurate and timely information about Azure security. In this exercise, we’ll Security as a layered approach.
Exercise 2: Key security components
- In this exercise, we’ll be Assessing Data and Storage Security Hygiene.
Exercise 3: Securing Storage Accounts and Data Lake Storage
- In this exercise, we’ll be determining the appropriate security approach for Azure Blob.
Exercise 4: Securing Data Stores
- In this exercise, we’ll Enable Auditing, Query the Database and View the Audit log.
Exercise 5: Securing Streaming Data
- In this exercise, we’ll Change Event Hub Permissions.
Lab 9: Monitoring and Troubleshooting Data Storage and Processing
Exercise 1: Explain the monitoring capabilities that are available
- In this exercise, we’ll be Defining a corporate monitoring approach.
- Network Performance Monitor.
- Application Gateway Analytics.
Exercise 2: Troubleshoot common data storage issues
- In this exercise, we’ll find issues that are related to data storage.
- Consistency
- Corruption
Exercise 3: Troubleshoot common data processing issues
- In this exercise, we’ll determine issues that are related to data processing.
Exercise 4: Manage disaster recovery
- In this exercise, we’ll Manage Disaster Recovery.
- In Azure, there are two core services that we’ll take advantage of. The first is the Azure Site Recovery or ASR, and the second is Azure Backup. Both ASR and Azure Backup complement each other to provide you with end-to-end business continuity and disaster recovery solution with unlimited scale.
Skills Measured in Exam DP-201
- Design Azure data storage solutions (40-45%)
- Design data processing solutions (25-30%)
- Design for data security and compliance (25-30%)
Lab 1 – Data Platform Architecture Considerations
Exercise 1: Design with Security in Mind
- In this exercise, we’ll identify the security requirements of AdventureWorks from the case study. Every certification of Microsoft has one or the other way of checking how the compliance, how security is for your data. So what Microsoft has done is divided this security for your application into seven different layers.
Exercise 2: Design for Performance and Scalability
- In this exercise, we’ll determine the scalability and performance requirements as identify from the case study.
- Scale-up: scaling up is an act of adding more resources for the same instance.
- scale-out: scaling out is adding multiple instances in a particular set of the cluster.
Exercise 3: Design for Availability and Recoverability
- In this exercise, we’ll determine the recoverability and availability requirements as identify from the case study.
Exercise 4: Design for Efficiency and Operations
- In this exercise, we’ll determine the operations and efficiency requirements for AdventureWorks.
Lab 2 – Azure Batch Processing Reference Architectures
Exercise 1: Design an Enterprise BI solution in Azure
- From the case study, we’ll identify the requirements that would form part of the Batch mode processing of data in an Enterprise BI solution in AdventureWorks.
- We’ll also build a high-level Architecture that reflects the Enterprise BI solution in AdventureWorks.
Exercise 2: Automate enterprise BI solutions in Azure
- In this exercise, we’ll enhance a high-level Architecture to include automation of an Enterprise BI solution in AdventureWorks.
Exercise 3: Conversational bot solutions in Azure
- In this exercise, we’ll enhance a high-level Architecture to include automation of a conversational bot solution in AdventureWorks.
Lab 3 – Azure Real-Time Reference Architectures
Exercise 1: Architect a stream processing pipeline with Azure Stream Analytics
- In this exercise, we’ll identify the requirements that would form part of the real-time processing of data in AdventureWorks from the case study.
- We’ll build a high-level Architecture that reflects a stream processing pipeline with Azure Stream Analytics.
Exercise 2: Design a stream processing pipeline with Azure Databricks
- In this exercise, we’ll create a high-level Architecture to include a stream processing pipeline with Azure Databricks solution in AdventureWorks.
Exercise 3: Create an Azure IoT reference architecture
- In this exercise, we’ll Confirm which architecture would form part of an Azure IoT reference architecture.
Lab 4 – Azure Data Platform Security Considerations
Exercise 1: Defense in Depth Security Approach
- We’ll identify the security requirements for AdventureWorks from the case study.
Exercise 2: Identity Management
- We’ll define the primary authentication mechanism for each technology used to meet AdventureWorks requirements.
Lab 5 – Designing for Scale and Resiliency
Exercise 1: Adjust Workload Capacity by Scaling
- In this exercise, we’ll list out the services that would benefit from scaling and how the scale units are measured per service from the case study.
Exercise 2: Design for Optimized Storage and Database Performance
- In this exercise, we’ll define a service feature that can be used to optimize storage and database performance.
Exercise 3: Design a Highly Available Solution
- In this exercise, we’ll define a service feature that provides high availability where possible.
Exercise 4: Incorporate Disaster Recovery into Architectures
- In this exercise, we’ll Outline the Disaster Recovery approach for the data services used by AdventureWorks.
Lab 6 – Designing for Efficiency and Operations
Exercise 1: Maximize the Efficiency of your Cloud Environment
- In this exercise, we’ll provide a link to the Azure Price Calculator and a list of best practices that the IS department should follow to minimize costs.
Exercise 2: Use Monitoring and Analytics to Gain Operational Insights
- In this exercise, we’ll draft a monitoring and analytics strategy that should be adopted by AdventureWorks
Exercise 3: Use Automation to Reduce Effort and Error
- In this exercise, we’ll list the options for automation languages and approaches.
Related/References
- Microsoft Azure Data Engineer Associate DP-200/DP-201: All You Need To Know
- [DP-100] Microsoft Certified Azure Data Scientist Associate: Everything you must know
- Microsoft Certified Azure Data Scientist Associate | DP 100 | Step By Step Activity Guides (Hands-On Labs)
- [AI-900] Microsoft Certified Azure AI Fundamentals Course: Everything you must know
- Microsoft Azure AI Fundamentals [AI-900]: Step By Step Activity Guides (Hands-On Labs)
Next Task For You
To know more about Data Engineering for beginners, why you should learn, Job opportunities, and what to study Including Hands-On labs you must perform to clear [DP-200] Implementing an Azure Data Solution and [DP-201] Designing an Azure Data Solution.
Click on the below image to join the waitlist for Microsoft Azure Data Engineer Associate Certification.
The post Microsoft Certified Azure Data Engineer Associate | DP-200 And DP-201 | Step By Step Activity Guides [Hands-On Labs] appeared first on Cloud Training Program.