In this blog, we are going to cover the Design & Implement a Data Science Solution on Azure [DP-100] Exam Questions & Answers that give you a first-hand idea of the type of sample exam questions and answers that may appear in the final certification exam.
The DP 100 Microsoft Azure Data Scientist Certification is aimed towards those who apply their knowledge of data science and machine learning to implement and run machine learning workloads on Azure, using Azure Machine Learning Service. This implies planning and creating a suitable working environment for data science workloads on Azure, running data experiments, and training predictive ML models.
There is a lot of raw data generated per day in most of the IT Industries, so they need a dedicated team who can evaluate this data, plot this data to make inferences, and apply the Machine Learning algorithm to make predictions. Hence there is a huge gap in the demand and supply of Data Scientists.
If you are preparing for Microsoft Azure Data Scientist Certification [DP-100] Exam, then check your readiness by attending to these questions & answers for Microsoft Azure Data Scientist Certification.
Design & Implement a Data Science Solution Exam Questions & Answers
Question 1: You are designing your ML work environment. Your data resides in an Azure storage account, in a blob storage container. You want to prevent unauthorized access to your source data and don’t want to risk exposing access credentials. Which options should you use to fulfil the above requirements?
A. Describe the connection data in the training script
B. Register the blob storage as a Datastore
C. Register the blob storage as a Dataset
D. Register the blob storage using an Estimator
Correct Answer: B
Explanation: Option B is CORRECT because, in the Azure ML environment, datastores are designed to store connection information like subscription IDs, access keys etc. By using datastores, all this information will be stored securely, and used via referencing the datastore which keeps the sensitive data hidden from scripts, applications etc.
Read: Azure Datasets & Datastores
Question 2: You have a real-time inference web service that you have just deployed to Azure Kubernetes Service. During its run, some unexpected errors occur. You need to troubleshoot it quickly and cost-effectively. Which is the quickest and cheapest option you can use?
A. Deploy it as a local web service and debug locally
B. Deploy it to ACI
C. Use a compute instance as a deployment target for debugging
D. Deploy it to AKS and set the maximum number of replicas to one; debug it in the production environment
Correct Answer: A
Explanation: Option A is CORRECT because using a local web service makes it easier to troubleshoot and debug problems. If you have problems with your model deployed to ACI or AKS, try deploying it as a local web service. You can then troubleshoot runtime issues by making changes to the scoring file referenced in the inference configuration, and reloading the service without redeploying it. This can be done only with local services.
Question 3: You are using classification algorithms in AutoML to train a model to predict whether your customers are expected to take a loan or not. Your model has predictors as marital status, job and education. After running ML experiments, you want to find which predictor is most relevant in predicting the target variable. Which action should you take?
A. Select the feature with the highest local importance
B. Enable auto-featurization
C. Select the feature with the lowest global importance
D. Select the feature with the highest global importance
Correct Answer: D
Explanation: Option D is CORRECT because global feature importance can be used to understand the relative importance of features. You should look for features with the highest global importance as the strongest contributors to the predictions.
Read more about AutoML here
Question 4: You have just completed an ML experiment in Azure ML. You have trained models with several regression algorithms, which is to be used to predict the effectiveness of some newly developed medicine. Which two evaluation tools/metrics would help you decide how powerful your model is?
A. Normalized root mean squared error (RMSE)
B. Predicted vs. True chart
C. ROC Chart
D. Recall
E. AUC
Correct Answer: A & B
Explanation: Option A is CORRECT because the root mean squared error is a single value that summarizes the errors in the model. Its normalized version (RMSE divided by the range of the data) is one of the metrics typically used for regression problems . The closer its value to 0.0 the better.
Option B is CORRECT because one of the visualizations Azure ML provides for evaluating regression models is the Predicted vs. True diagram. It shows the relationship between a predicted value and its correlating true value. It indicates good model performance if the predicted values are close to the y=x line.
Read more about machine learning evaluation tools & metrics here
Question 5: You are about setting up a machine learning environment. You already have a workspace where you need to configure the compute resources for your experiments. You are going to make use of the capabilities of Azure’s AutoML feature and you want to use ML pipelines to organize your workflow, for which you want to use the ML Designer. Which compute resource should you choose?
A. Azure ML compute instance
B. Azure HDInsight
C. Remote VM
D. Azure ML compute cluster
Correct Answer: D
Explanation: Option D is CORRECT because Azure ML compute cluster is the only option suitable for AutoML, for running pipelines as well as to exploiting the capabilities of the graphical ML Designer.
Question 6: You have 100 CSV files in Azure Blob Storage which you have to use to train your ML model. The files contain measurement data collected from manufacturing machines and have been collected in order to analyze the causes of malfunctions. Each row in the files is a snapshot of machine parameters at a given time. Using the ML Designer, you have to use the data in CSV files as input for your machine learning pipeline, ensuring reusability and versioning of data and minimizing the time to load during running experiments. What should you do?
A. Register the files as a File Dataset in your ML workspace; add the Dataset module to your pipeline
B. Add an Import Data module to your pipeline and configure it for accessing the files; set the Regenerate output = Yes
C. Register the files as a Tabular Dataset in your ML workspace; add the Dataset module to your pipeline
D. Add an ImportData module to your pipeline and configure it for accessing the files; set the Regenerate output = No
Correct Answer: C
Explanation: Option C is CORRECT because the recommended practice for getting data into the ML pipeline without repeating the input operation for each run is registering the data as a Dataset. Structured files (like CSVs) have to be registered as Tabular datasets. The registered datasets can be found in the module palette, under Datasets and can be used like any other module. By having a dataset registered, additional features as versioning and data monitoring become available.
Question 7: You have a large dataset of observations with a high number of features. You need to train a multiclass classification model with hyperparameter tuning in a time- and cost-effective way.
Which of the following decisions helps you to reduce training time and save cost?
A. Use Grid sampling
B. Use the Default Termination Policy
C. Use Filter Based Feature Selection
D. Disable overfitting
Correct Answer: C
Explanation: Option C is CORRECT because by using feature selection you include a process of applying statistical tests to inputs. The goal is to find the columns which are more predictive of the output. The Filter Based Feature Selection module provides several feature selection algorithms you can choose from. Reducing the number of features can have a remarkable effect on the training time of the model.
Download The Complete DP100 Exam Questions & Answers
When you have tested your knowledge by answering these DP-100 questions & answers, I hope you have a clear stand in terms of your Design & Implement a Data Science Solution on Azure exam preparation.
Note: K21Academy also offers a Design & Implement a Data Science Solution on Azure Exam Questions & Answers Prep Guide where learners get to practice questions to test their DP-100 exam preparation before the actual exam.
To download the complete DP-100 Exam Questions & Answers guide click here.
- [DP-100] Microsoft Certified Azure Data Scientist Associate: Everything you must know
- Exam DP-100: Designing and Implementing a Data Science Solution on Azure
- AI-900: Azure AI Fundamentals: Everything You Need To Know
- Microsoft Azure AI Fundamentals [AI-900]: Step By Step Activity Guides (Hands-On Labs)
- DP 100 Exam | Microsoft Certified Azure Data Scientist Associate
- [DP-100] Designing and Implementing a Data Science Solution on Azure
- Microsoft Azure Data Scientist DP-100 FAQ
Next Task For You
To know more about the course, AI, ML, Data Science for beginners, why you should learn, Job opportunities, and what to study Including Hands-On labs you must perform to clear [DP-100] Microsoft Azure Data Scientist Associate Certification register for our FREE CLASS.
The post [DP-100] Design & Implement a Data Science Solution on Azure Exam Questions & Answers appeared first on Cloud Training Program.