Preparing for Microsoft Azure Data Scientist [DP-100] certification but confused as to where to start?! Don’t worry we have got you covered
This blog will take you through all the domains, modules & topics required to know to clear this certification.
Exam DP-100: Designing and Implementing a Data Science Solution on Azure
A candidate for this certification should have knowledge and experience in data science and using Azure Machine Learning and Azure Databricks.
The Microsoft Azure Data Scientist DP 100 Certification is aimed towards those who apply their knowledge of data science and machine learning to implement and run machine learning workloads on Azure, using Azure Machine Learning Service. This implies planning and creating a suitable working environment for data science workloads on Azure, running data experiments, and training predictive ML models.
Exam Pattern
Microsoft DP-100 exam will have 40-60 questions that may be in a format like multiple-choice questions, arranged in the correct sequence type questions, scenario-based single answer questions, or drop type of questions.
There will be a time limit of 180 min to complete the exam and the cutoff score is a minimum of 700. Further, the Microsoft DP-100 exam will cost $165 USD and the exam can be taken in only the English language.
Types of Questions
Below is the type of questions:
- Case Study with 4-6 Questions.
- Multiple Choice Single Answer
- Multiple Choice Multiple Answers
- Arrange in Correct Order
- Complete the Code
Study Guide for Microsoft Azure Data Scientist [DP-100]
Here is a comprehensive list of study material covering DP-100 scope & questions.
1. Official Microsoft labs on DP-100 for anyone to learn from:
MicrosoftLearning/mslearn-dp100 (Lab Setup)
MicrosoftLearning/mslearn-dp100 (github.com)
Complete Documentation-Azure Machine Learning
2. Azure free account:
Create Your Azure Free Account Today | Microsoft Azure
3. Microsoft Learn:
Browse all – Learn | Microsoft Docs
Module 1: Getting Started With Azure Machine Learning
In this module, you will learn how to provision an Azure Machine Learning workspace and use it to manage machine learning assets such as data, compute, model training code, logged metrics, and trained models. You will cover the web-based Azure Machine Learning studio interface as well as the Azure Machine Learning SDK and developer tools like Visual Studio Code and Jupyter Notebooks to work with the assets in your workspace.
1. Azure Machine Learning
Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle. Data professionals can use it in their day-to-day workflows to train and deploy models, and manage MLOps.
Azure Machine Learning Overview| Microsoft Docs
Azure Machine Learning Service Workflow
2. Azure Machine Learning Studio
Azure ML Studio is the web portal for data scientist developers in Azure Machine Learning. It combines no-code and code-first experiences for an inclusive data science platform.
Azure Machine Learning Studio| Microsoft Docs
Azure Machine Learning Studio & Its Features
3. Azure ML Workspace
It is the top-level resource for Azure ML. Users can store assets created when they use Azure Machine Learning, including Environments, Experiments, Pipelines, Datasets, Models, and Endpoints.
Azure Machine Learning Architecture | Microsoft Docs
4. Azure Databricks
Azure Databricks enables you to build highly scalable data processing and machine learning solutions. It offers a fast, easy, and collaborative Spark-based analytics service. It is used to accelerate big data analytics, artificial intelligence, performant data lakes, interactive data science, machine learning, and collaboration.
Azure Databricks Workspace – Learn| Microsoft Docs
Azure Databricks- Beginners Guide
Module 2: Visual Tools for Machine Learning
This module introduces the Automated Machine Learning and Designer visual tools, which you can use to train, evaluate, and deploy machine learning models without writing any code.
1. Automated ML
Automated ML is the process of automating the time-consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
Automated Machine Learning Overview| Microsoft Docs
2. Feature Engineering
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
Feature Engineering in Automated ML| Microsoft Docs
3. Azure ML Designer
Machine Learning designer is a drag-and-drop interface used to train and deploy models in Azure Machine Learning.
Azure Machine Learning Designer|Microsoft Docs
Azure Machine Learning Model in Production- ML Designer
Module 3: Running Experiments and Training Models
In this Microsoft Azure Data Scientist certification module, you will get started with experiments that encapsulate data processing and model training code, and use them to train machine learning models.
1. Azure ML SDK
Data scientists and AI developers use the Azure Machine Learning SDK for Python to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks, Visual Studio Code, or your favorite Python IDE.
Azure ML SDK Setup-Learn| Microsoft Docs
2. Azure ML Experiments
In Azure Machine Learning, an experiment is a named process, usually the running of a script or a pipeline, that can generate metrics and outputs and be tracked in the Azure Machine Learning workspace.
Machine Learning Experiments-Learn| Microsoft Docs
Module 4: Working with Data
Data is a fundamental element in any machine learning workload, so in this module, you will learn how to create and manage datastores and datasets in an Azure Machine Learning workspace, and how to use them in model training experiments.
1. Datastores
Datastores are abstractions for cloud data sources that encapsulate the information required to connect to data sources. They can be accessed directly in code by using the Azure ML SDK and use it to upload or download data.
Introduction to Datastores-Learn| Microsoft Docs
Supported Datastores in Azure ML
2. Datasets
Datasets are versioned packaged data objects that can be easily consumed in experiments and pipelines. It is the recommended way to work with data and is the primary mechanism for advanced Azure ML capabilities like data labeling and data drift monitoring.
Introduction to Datasets-Learn|Microsoft Docs
Working with Datasets & Datastores in Azure
Module 5: Working with Compute
One of the key benefits of the cloud is the ability to leverage compute resources on-demand, and use them to scale machine learning processes to an extent that would be infeasible on your own hardware. In this module, you’ll learn how to manage experiment environments that ensure consistent runtime consistency for experiments, and how to create and use compute targets for experiment runs.
1. Environment
Azure Machine Learning handles environment creation and package installation for you – usually through the creation of Docker containers. You can specify the Conda or pip packages you need, and have Azure Machine Learning create an environment for the experiment.
Introduction to Envionment-Learn| Microsoft Docs
2. Compute Targets
Compute Targets are physical or virtual computers on which experiments are run.
3. Types of Compute Targets
Azure Machine Learning supports multiple types of compute for experimentation and training. This enables you to select the most appropriate type of compute target for your particular needs.
- Local Compute
- Compute Clusters
- Attached Compute
Types of Compute Targets-Learn|Microsoft Docs
Working with Compute in Azure ML
Module 6: Orchestrating Operations with Pipelines
Now that you understand the basics of running workloads as experiments that leverage data assets and compute resources, it’s time to learn how to orchestrate these workloads as pipelines of connected steps. Pipelines are key to implementing an effective Machine Learning Operationalization (ML Ops) solution in Azure, so you’ll explore how to define and run them in this module.
1. Azure Machine Learning Pipelines
A pipeline is a workflow of machine learning tasks in which each task is implemented as a step and these steps can be arranged sequentially or in parallel, enabling you to build sophisticated flow logically to orchestrate machine learning operations.
Here you will have to focus on:
- Creating a Pipeline
- Pass data between Pipeline
- Reuse Pipeline
- Publish a Pipeline
- Schedule a Pipeline
Azure ML Pipeline – Learn| Microsoft Docs
Azure Machine Learning Pipeline- Overview
Module 7: Deploying and Consuming Models
Models are designed to help decision-making through predictions, so they’re only useful when deployed and available for an application to consume. In this module learn how to deploy models for real-time inferencing, and for batch inferencing.
1. Real-time inferencing service
Inferencing refers to the use of a trained model to predict labels for new data on which the model has not been trained. Often, the model is deployed as part of a service that enables applications to request immediate, or real-time, predictions for individual, or small numbers of data observations.
You can create real-time inferencing solutions by deploying a model as a service, hosted in a containerized platform, such as Azure Kubernetes Services (AKS).
Deploy real-time Azure ML service-Learn|Microsoft Docs
2. Batch inferencing service
In many production scenarios, long-running tasks that operate on large volumes of data are performed as batch operations. In machine learning, batch inferencing is used to apply a predictive model to multiple cases asynchronously – usually writing the results to a file or database.
Batch inferencing solutions can be implemented by creating a pipeline including a step to read the input data, load a registered model, predict labels, and write the results as its output.
Deploy Batch Inference Pipeline services-Learn|Microsoft Docs
Module 8: Training Optimal Models
By this stage of the course, you’ve learned the end-to-end process for training, deploying, and consuming machine learning models; but how do you ensure your model produces the best predictive outputs for your data? In this module, you’ll explore how you can use hyperparameter tuning and automated machine learning to take advantage of cloud-scale compute and find the best model for your data.
1. Hyperparameters
In machine learning, models are trained to predict unknown labels for new data based on correlations between known labels and features found in the training data. Depending on the algorithm used, you may need to specify hyperparameters to configure how the model is trained.
Tune Hyperparameters with Azure ML-Learn|Microsoft Docs
2. Hyperparameter Tuning
Hyperparameter tuning is the process of finding the configuration of hyperparameters that will result in the best performance.
Hyperparameter Tuning in Azure
3. Search Space
The set of hyperparameter values tried during hyperparameter tuning is known as the search space. The definition of the range of possible values that can be chosen depends on the type of hyperparameter.
Define Search Space-Learn|Microsoft Docs
4. Automated Machine Learning model selection
Automated Machine Learning enables you to try multiple algorithms and preprocessing transformations with your data. This, combined with scalable cloud-based compute makes it possible to find the best performing model for your data without the huge amount of time-consuming manual trial and error that would otherwise be required.
Automated ML with SDK-Learn|Microsoft Docs
Machine Learning Model Performance Evaluation
Module 9: Responsible Machine Learning
Data scientists have a duty to ensure they analyze data and train machine learning models responsibly; respecting individual privacy, mitigating bias, and ensuring transparency. This module explores some considerations and techniques for applying responsible machine learning principles.
1. Differential Policy
When data is used for analysis, it’s important that the data remains private and confidential throughout its use. Differential privacy is a set of systems and practices that help keep the data of individuals safe and private.
Explore Differential Privacy-Learn|Microsoft Docs
2. Explain ML Models
To build interpretable AI systems, use InterpretML, an open-source package built by Microsoft. The InterpretML package supports a wide variety of interpretability techniques such as SHapley Additive exPlanations (SHAP), mimic explainer, and permutation feature importance (PFI).
Explain Machine Learning Models with Azure Machine Learning-Learn| Microsoft Docs
3. Feature Importance
Model explainers use statistical techniques to calculate feature importance. This enables you to quantify the relative influence each feature in the training dataset has on label prediction. Explainers work by evaluating a test data set of feature cases and the labels the model predicts for them.
Explain Feature Importance-Learn|Microsoft Docs
4. Detect & Mitigate Unfairness
Machine learning models can often encapsulate unintentional bias that results in unfairness. With Fairlearn and Azure Machine Learning, you can detect and mitigate unfairness in your models.
Detect & mitigate unfairness in models with Azure ML-Leanr|Microsoft Docs
5. Fairlearn
Fairlearn is a Python package that you can use to analyze models and evaluate disparity between predictions and prediction performance for one or more sensitive features.
Analyze model fairness with fairlearn-Learn|Microsoft Docs
Module 10: Monitoring Models
After a model has been deployed, it’s important to understand how the model is being used in production and to detect any degradation in its effectiveness due to data drift. This module describes techniques for monitoring models and their data.
1. Application Insights
Application Insights is an application performance management service in Microsoft Azure that enables the capture, storage, and analysis of telemetry data from applications.
Monitor Models with Azure ML-Leanr|Microsoft Docs
2. Data Drift
Change in data profiles between training and inferencing is known as data drift, and it can be a significant issue for predictive models used in production. It is therefore important to be able to monitor data drift over time, and retrain models as required to maintain predictive accuracy.
Monitor Data Drift with Azure ML-Learn|Microsoft Docs
3. RBAC
Azure Role-Based Access Control (RBAC) is an authorization system that allows fine-grained access management of Azure Machine Learning resources. It enables users to manage team members’ access to Azure cloud resources by assigning roles.
Azure RBAC-Explore Security Concepts in Azure ML|Microsoft Docs
4. Azure Key Vault
Azure Key Vault provides secure storage of generic secrets for applications in Azure-hosted environments.
Keys and secrets with Azure Key Vault|Microsoft Docs
Additional Resources
I hope this Microsoft Azure Data Scientist DP-100 Certification Exam Study Guide helps you pass the exam. I also highly recommend that you open a free Azure account if you don’t have one yet. You can create your free Azure account here. Also, check out my blog posts about Microsoft Azure Data Scientist Certification:
- [DP-100] Microsoft Certified Azure Data Scientist Associate: Everything you must know
- Microsoft Certified Azure Data Scientist Associate: Step By Step Activity Guides (Hands-On Labs)
- Microsoft Azure Data Scientist DP-100 FAQ
- Sample Exam Questions: DP-100
- MLOps on Azure
- Machine Learning Service Workflow
- Data Preparation With Azure Databricks for Machine Learning
Next Task For You
To know more about the course, AI, ML, Data Science for beginners, why you should learn, Job opportunities, and what to study Including Hands-On labs you must perform to clear [DP-100] Microsoft Azure Data Scientist Associate Certification register for our FREE CLASS.
The post Microsoft Azure Data Scientist DP-100 : Self Study Guide appeared first on Cloud Training Program.