Microsoft Azure Data Scientist DP-100 : Self Study Guide

Preparing for Microsoft Azure Data Scientist [DP-100] certification but confused as to where to start?! Don’t worry we have got you covered

This blog will take you through all the domains, modules & topics required to know to clear this certification.

Exam DP-100: Designing and Implementing a Data Science Solution on Azure

A candidate for this certification should have knowledge and experience in data science and using Azure Machine Learning and Azure Databricks.

The Microsoft Azure Data Scientist DP 100 Certification is aimed towards those who apply their knowledge of data science and machine learning to implement and run machine learning workloads on Azure, using Azure Machine Learning Service. This implies planning and creating a suitable working environment for data science workloads on Azure, running data experiments, and training predictive ML models.

Exam Pattern

Microsoft DP-100 exam will have 40-60 questions that may be in a format like multiple-choice questions, arranged in the correct sequence type questions, scenario-based single answer questions, or drop type of questions.

There will be a time limit of 180 min to complete the exam and the cutoff score is a minimum of 700. Further, the Microsoft DP-100 exam will cost $165 USD and the exam can be taken in only the English language.

Types of Questions

Below is the type of questions:

Case Study with 4-6 Questions.
Multiple Choice Single Answer
Multiple Choice Multiple Answers
Arrange in Correct Order
Complete the Code

Study Guide for Microsoft Azure Data Scientist [DP-100]

Here is a comprehensive list of study material covering DP-100 scope & questions.

1. Official Microsoft labs on DP-100 for anyone to learn from:

MicrosoftLearning/mslearn-dp100 (Lab Setup)

MicrosoftLearning/mslearn-dp100 (github.com)

Complete Documentation-Azure Machine Learning

2. Azure free account:

Create Your Azure Free Account Today | Microsoft Azure

3. Microsoft Learn:

Browse all – Learn | Microsoft Docs

Module 1: Getting Started With Azure Machine Learning

In this module, you will learn how to provision an Azure Machine Learning workspace and use it to manage machine learning assets such as data, compute, model training code, logged metrics, and trained models. You will cover the web-based Azure Machine Learning studio interface as well as the Azure Machine Learning SDK and developer tools like Visual Studio Code and Jupyter Notebooks to work with the assets in your workspace.

1. Azure Machine Learning

Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle. Data professionals can use it in their day-to-day workflows to train and deploy models, and manage MLOps.

Azure Machine Learning Overview| Microsoft Docs

Azure Machine Learning Service Workflow

2. Azure Machine Learning Studio

Azure ML Studio is the web portal for data scientist developers in Azure Machine Learning. It combines no-code and code-first experiences for an inclusive data science platform.

Azure Machine Learning Studio| Microsoft Docs

Azure Machine Learning Studio & Its Features

3. Azure ML Workspace

It is the top-level resource for Azure ML. Users can store assets created when they use Azure Machine Learning, including Environments, Experiments, Pipelines, Datasets, Models, and Endpoints.

Azure Machine Learning Architecture | Microsoft Docs

4. Azure Databricks

Azure Databricks enables you to build highly scalable data processing and machine learning solutions. It offers a fast, easy, and collaborative Spark-based analytics service. It is used to accelerate big data analytics, artificial intelligence, performant data lakes, interactive data science, machine learning, and collaboration.

Azure Databricks Workspace – Learn| Microsoft Docs

Azure Databricks- Beginners Guide

Module 2: Visual Tools for Machine Learning

This module introduces the Automated Machine Learning and Designer visual tools, which you can use to train, evaluate, and deploy machine learning models without writing any code.

1. Automated ML

Automated ML is the process of automating the time-consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

Automated Machine Learning Overview| Microsoft Docs

2. Feature Engineering

Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.

Feature Engineering in Automated ML| Microsoft Docs

3. Azure ML Designer

Machine Learning designer is a drag-and-drop interface used to train and deploy models in Azure Machine Learning.

Azure Machine Learning Designer|Microsoft Docs

Azure Machine Learning Model in Production- ML Designer

Module 3: Running Experiments and Training Models

In this Microsoft Azure Data Scientist certification module, you will get started with experiments that encapsulate data processing and model training code, and use them to train machine learning models.

1. Azure ML SDK

Data scientists and AI developers use the Azure Machine Learning SDK for Python to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks, Visual Studio Code, or your favorite Python IDE.

Azure ML SDK Setup-Learn| Microsoft Docs

2. Azure ML Experiments

In Azure Machine Learning, an experiment is a named process, usually the running of a script or a pipeline, that can generate metrics and outputs and be tracked in the Azure Machine Learning workspace.

Machine Learning Experiments-Learn| Microsoft Docs

Module 4: Working with Data

Data is a fundamental element in any machine learning workload, so in this module, you will learn how to create and manage datastores and datasets in an Azure Machine Learning workspace, and how to use them in model training experiments.

1. Datastores

Datastores are abstractions for cloud data sources that encapsulate the information required to connect to data sources. They can be accessed directly in code by using the Azure ML SDK and use it to upload or download data.

Introduction to Datastores-Learn| Microsoft Docs

Supported Datastores in Azure ML

2. Datasets

Datasets are versioned packaged data objects that can be easily consumed in experiments and pipelines. It is the recommended way to work with data and is the primary mechanism for advanced Azure ML capabilities like data labeling and data drift monitoring.

Introduction to Datasets-Learn|Microsoft Docs

Working with Datasets & Datastores in Azure

Module 5: Working with Compute

One of the key benefits of the cloud is the ability to leverage compute resources on-demand, and use them to scale machine learning processes to an extent that would be infeasible on your own hardware. In this module, you’ll learn how to manage experiment environments that ensure consistent runtime consistency for experiments, and how to create and use compute targets for experiment runs.

1. Environment

Azure Machine Learning handles environment creation and package installation for you – usually through the creation of Docker containers. You can specify the Conda or pip packages you need, and have Azure Machine Learning create an environment for the experiment.

Introduction to Envionment-Learn| Microsoft Docs

2. Compute Targets

Compute Targets are physical or virtual computers on which experiments are run.

Compute Targets in Azure ML

3. Types of Compute Targets

Azure Machine Learning supports multiple types of compute for experimentation and training. This enables you to select the most appropriate type of compute target for your particular needs.

Local Compute
Compute Clusters
Attached Compute

Types of Compute Targets-Learn|Microsoft Docs

Working with Compute in Azure ML

Module 6: Orchestrating Operations with Pipelines

Now that you understand the basics of running workloads as experiments that leverage data assets and compute resources, it’s time to learn how to orchestrate these workloads as pipelines of connected steps. Pipelines are key to implementing an effective Machine Learning Operationalization (ML Ops) solution in Azure, so you’ll explore how to define and run them in this module.

1. Azure Machine Learning Pipelines

A pipeline is a workflow of machine learning tasks in which each task is implemented as a step and these steps can be arranged sequentially or in parallel, enabling you to build sophisticated flow logically to orchestrate machine learning operations.

Here you will have to focus on:

Creating a Pipeline
Pass data between Pipeline
Reuse Pipeline
Publish a Pipeline
Schedule a Pipeline

Azure ML Pipeline – Learn| Microsoft Docs

Azure Machine Learning Pipeline- Overview

Module 7: Deploying and Consuming Models

Models are designed to help decision-making through predictions, so they’re only useful when deployed and available for an application to consume. In this module learn how to deploy models for real-time inferencing, and for batch inferencing.

1. Real-time inferencing service

Inferencing refers to the use of a trained model to predict labels for new data on which the model has not been trained. Often, the model is deployed as part of a service that enables applications to request immediate, or real-time, predictions for individual, or small numbers of data observations.

You can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform, such as Azure Kubernetes Services (AKS).

Deploy real-time Azure ML service-Learn|Microsoft Docs

2. Batch inferencing service

In many production scenarios, long-running tasks that operate on large volumes of data are performed as batch operations. In machine learning, batch inferencing is used to apply a predictive model to multiple cases asynchronously – usually writing the results to a file or database.

Batch inferencing solutions can be implemented by creating a pipeline including a step to read the input data, load a registered model, predict labels, and write the results as its output.

Deploy Batch Inference Pipeline services-Learn|Microsoft Docs

Module 8: Training Optimal Models

By this stage of the course, you’ve learned the end-to-end process for training, deploying, and consuming machine learning models; but how do you ensure your model produces the best predictive outputs for your data? In this module, you’ll explore how you can use hyperparameter tuning and automated machine learning to take advantage of cloud-scale compute and find the best model for your data.

1. Hyperparameters

In machine learning, models are trained to predict unknown labels for new data based on correlations between known labels and features found in the training data. Depending on the algorithm used, you may need to specify hyperparameters to configure how the model is trained.

Tune Hyperparameters with Azure ML-Learn|Microsoft Docs

2. Hyperparameter Tuning

Hyperparameter tuning is the process of finding the configuration of hyperparameters that will result in the best performance.

Hyperparameter Tuning in Azure

3. Search Space

The set of hyperparameter values tried during hyperparameter tuning is known as the search space. The definition of the range of possible values that can be chosen depends on the type of hyperparameter.

Define Search Space-Learn|Microsoft Docs

4. Automated Machine Learning model selection

Automated Machine Learning enables you to try multiple algorithms and preprocessing transformations with your data. This, combined with scalable cloud-based compute makes it possible to find the best performing model for your data without the huge amount of time-consuming manual trial and error that would otherwise be required.

Automated ML with SDK-Learn|Microsoft Docs

Machine Learning Model Performance Evaluation

Module 9: Responsible Machine Learning

Data scientists have a duty to ensure they analyze data and train machine learning models responsibly; respecting individual privacy, mitigating bias, and ensuring transparency. This module explores some considerations and techniques for applying responsible machine learning principles.

1. Differential Policy

When data is used for analysis, it’s important that the data remains private and confidential throughout its use. Differential privacy is a set of systems and practices that help keep the data of individuals safe and private.

Explore Differential Privacy-Learn|Microsoft Docs

2. Explain ML Models

To build interpretable AI systems, use InterpretML, an open-source package built by Microsoft. The InterpretML package supports a wide variety of interpretability techniques such as SHapley Additive exPlanations (SHAP), mimic explainer, and permutation feature importance (PFI).

Explain Machine Learning Models with Azure Machine Learning-Learn| Microsoft Docs

3. Feature Importance

Model explainers use statistical techniques to calculate feature importance. This enables you to quantify the relative influence each feature in the training dataset has on label prediction. Explainers work by evaluating a test data set of feature cases and the labels the model predicts for them.

Explain Feature Importance-Learn|Microsoft Docs

4. Detect & Mitigate Unfairness

Machine learning models can often encapsulate unintentional bias that results in unfairness. With Fairlearn and Azure Machine Learning, you can detect and mitigate unfairness in your models.

Detect & mitigate unfairness in models with Azure ML-Leanr|Microsoft Docs

5. Fairlearn

Fairlearn is a Python package that you can use to analyze models and evaluate disparity between predictions and prediction performance for one or more sensitive features.

Analyze model fairness with fairlearn-Learn|Microsoft Docs

Module 10: Monitoring Models

After a model has been deployed, it’s important to understand how the model is being used in production and to detect any degradation in its effectiveness due to data drift. This module describes techniques for monitoring models and their data.

1. Application Insights

Application Insights is an application performance management service in Microsoft Azure that enables the capture, storage, and analysis of telemetry data from applications.

Monitor Models with Azure ML-Leanr|Microsoft Docs

Azure Application Insights

2. Data Drift

Change in data profiles between training and inferencing is known as data drift, and it can be a significant issue for predictive models used in production. It is therefore important to be able to monitor data drift over time, and retrain models as required to maintain predictive accuracy.

Monitor Data Drift with Azure ML-Learn|Microsoft Docs

3. RBAC

Azure Role-Based Access Control (RBAC) is an authorization system that allows fine-grained access management of Azure Machine Learning resources. It enables users to manage team members’ access to Azure cloud resources by assigning roles.

Azure RBAC-Explore Security Concepts in Azure ML|Microsoft Docs

4. Azure Key Vault

Azure Key Vault provides secure storage of generic secrets for applications in Azure-hosted environments.

Keys and secrets with Azure Key Vault|Microsoft Docs

Additional Resources

I hope this Microsoft Azure Data Scientist DP-100 Certification Exam Study Guide helps you pass the exam. I also highly recommend that you open a free Azure account if you don’t have one yet. You can create your free Azure account here. Also, check out my blog posts about Microsoft Azure Data Scientist Certification:

Next Task For You

To know more about the course, AI, ML, Data Science for beginners, why you should learn, Job opportunities, and what to study Including Hands-On labs you must perform to clear [DP-100] Microsoft Azure Data Scientist Associate Certification register for our FREE CLASS.

The post Microsoft Azure Data Scientist DP-100 : Self Study Guide appeared first on Cloud Training Program.