Quantcast
Channel: Cloud Training Program
Viewing all articles
Browse latest Browse all 1891

Microsoft Data Analyst Associate [DA-100] Module 6: Optimize a Model for Performance in Power BI

$
0
0

In this blog, we will go cover some quick tips, including QnA and important topics that we covered in the Microsoft Data Analyst Associate Day 6 Live Session, which will help you to gain a better understanding and make it easier for you to learn the Microsoft Data Analyst, clear the [DA-100] Certificationget a better-paid job.

In this blog, I am going to share some quick tips, including Q/A and Important topics from Day 6 of Microsoft Data Analyst, Covering Module 6: Optimize a model for performance in Power BI, Where we covered topics like  Analyze performance, Metadata, Reduce cardinality levels, DirectQuery, Optimize Performance, etc.

On Day 5 Session, we covered an overview of Create reports in Power BI.

A week before, on Day 4 Session, we covered an overview of Creating Model Calculations using DAX in Power BI.

> Performance Optimization

  • Performance optimization, also known as performance tuning, involves making changes to the current state of the data model so that it runs more efficiently.
  • Essentially, when your data model is optimized, it performs better. It even changes the size of the data model.
  • Before starting with the optimization process, first, we need to identify the rows or columns in the data model where there are performance issues.
  • After that, we can proceed and take action to resolve those issues to improve performance.

Performance optimization ensures that:

  • Deleting unnecessary columns and rows
  • Avoiding repeated values
  • Reducing the cardinalities
  • Analyzing the model

> Performance Analyzer

Performance Analyzer in Power BI Desktop to help you find out how each of your report elements is performing when users interact with them. A Performance Analyzer will help you identify the elements that are contributing to your performance issues, which can be useful during troubleshooting.

Before you run a Performance analyzer, to ensure you get the most accurate results in your analysis (test), make sure that you start with a clear visual cache and a clear data engine cache.

  • Visual cache – When you load a visual, you can’t clear this visual cache without closing the Power BI Desktop and opening it again. To avoid any caching in play, you need to start your analysis with a clean visual cache. To ensure that you have a clear visual cache, add a blank page to your Power BI Desktop (.pbix) file and then, with that page selected, save and close the file. Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open on the blank page.
  • Data engine cache – When a query is run, the results are cached, so the results of your analysis will be misleading. You need to clear the data cache before rerunning the visual. To clear the data cache, you can either restart Power BI Desktop or connect DAX Studio to the data model and then call Clear Cache.

  • To begin the analysis process, select Start recording, select the page of the report that you want to analyze and interact with the elements of the report that you want to measure.
  • You will see the results of your interactions display in the Performance analyzer pane as you work. When you are finished, select the Stop button.

> Metadata

  • Metadata is information about other data. Power BI metadata contains information on your data model, such as the name, data type, and format of each of the columns, the schema of the database, the report design, when the file was last modified, the data refresh rates, and much more.
  • When you load data into Power BI Desktop, it is good practice to analyze the corresponding metadata so you can identify any inconsistencies with your dataset and normalize the data before you start to build reports.
  • Running analysis on your metadata will improve data model performance because, while analyzing your metadata, you will identify unnecessary columns, errors within your data, incorrect data types, and much more.

> Use Variables to Improve Performance

  • Variables measures perform more efficiently as they remove the need for Power BI to evaluate the same expression multiple times.
  • If your data model has multiple queries with multiple measures, the use of variables could cut the overall query processing time in half and improve the overall performance of the data model.

> Improve Performance by Reducing Cardinality Levels

Power BI Desktop offers different techniques that you can use to help reduce the data that is loaded into data models, such as summarization.

  • Reducing the data that is loaded into your model will improve the relationship cardinality of the report.
  • For this reason, it is important that you strive to minimize the data that will be loaded into your models. This case is especially true for large models, or models that you anticipate will grow to become large over time.
  • Perhaps the most effective technique to reduce a model size is to use a summary table from the data source.
  • Where a detailed table might contain every transaction, a summary table would contain one record per day, per week, or per month. It might be an average of all of the transactions per day, for instance.
  • In Power BI Desktop, the Mixed mode design produces the composite model. Essentially, it allows you to determine a storage mode for each table. Therefore, each table can have its Storage Mode property set as DirectQuery or Import.
  • The effective technique to reduce the model size is to set the Storage Mode property for larger fact-type tables to DirectQuery

> DirectQuery

  • DirectQuery is one way to get data into Power BI Desktop. The DirectQuery method involves connecting directly to data in its source repository from within Power BI Desktop. It is an alternative to importing data into Power BI Desktop.
  • When you use the DirectQuery method, the overall user experience depends heavily on the performance of the underlying data source.
  • Slow query response times will lead to a negative user experience and, in the worst-case scenarios, queries might time out.
  • The performance of your Power BI model will not only be impacted by the performance of the underlying data source, but also by other uncontrollable factors

> Limitations of DirectQuery Connections

The use of DirectQuery can have negative implications. The limitations vary, depending on the specific data source that is being used. You should take the following points into consideration:

  • Performance – As previously discussed, your overall user experience depends heavily on the performance of the underlying data source.
  • Security – If you use multiple data sources in a DirectQuery model, it is important to understand how data moves between the underlying data sources and the associated security implications.
  • Data transformation – Compared to imported data, data that is sourced from DirectQuery has limitations when it comes to applying data transformation techniques within Power Query Editor
  • Modeling – Some of the modeling capabilities that you have with imported data aren’t available, or are limited when you use DirectQuery.
  • Reporting — Almost all the reporting capabilities that you have with imported data is also supported for DirectQuery models, provided that the underlying source offers a suitable level of performance.

> Optimize Data in Power BI Desktop

  • When you have optimized the data source as much as possible, you can take further action within Power BI Desktop by using a Performance analyzer, where you can isolate queries to validate query plans.
  • You can analyze the duration of the queries that are being sent to the underlying source to identify the queries that are taking a long time to load.
  • You don’t need to use a special approach when optimizing a DirectQuery model; you can apply the same optimization techniques that you used on the imported data to tune the data from the DirectQuery source.

> Optimize the Underlying Data Source

Your first stop is the data source. You need to tune the source database as much as possible because anything you do to improve the performance of that source database will, in turn, improve Power BI DirectQuery.
Standard database practices that apply to most situations:

  • Avoid the use of complex calculated columns because the calculation expression will be embedded into the source queries. It is more efficient to push the expression back to the source because it avoids pushdown. You could also consider adding surrogate key columns to dimension-type tables.
  • Review the indexes and verify that the current indexing is correct. If you need to create new indexes, ensure that they are appropriate.

> Query Reduction

Power BI Desktop gives you the option to send fewer queries and to disable certain interactions that will result in a poor experience if the resulting queries take a long time to run. Applying these options prevents queries from continuously hitting the data source, which should improve performance.

You access the settings by selecting File > Options and settings > Options, scrolling down the page, and then selecting the Query reduction option.

Query reduction options are:

  • Reduce the number of queries sent by – By default, every visual interacts with every other visual. Selecting this check box disables that default interaction. You can then optionally choose which visuals interact with each other by using the Edit interactions feature.
  • Slicers – By default, the Instantly apply slicer changes option is selected. To force the report users to manually apply slicer changes, select the Add an apply button to each slicer to apply changes when you’re a ready option.
  • Filters – By default, the Instantly apply basic filter changes option is selected.

Q/A asked during the session

Q1: What is a Performance Analyzer?

A. Performance analyzer helps you find out how each of your report elements is performing when users interact with them and also will help you identify the elements that are contributing to your performance issues, which can be useful during troubleshooting.

Q2: Explain Query folding in Power Query

A. Query folding is the process of converting or translating the code in the Power Query Editor into SQL. Query folding is needed when queries or codes are getting executed by the source database instead of the client machines. This happens when there is a limited resource on the client machines. It also helps in scaling and performance optimization.

Q3: Advantages of using variables in the data?

A. The use of variables in your data model provides the following advantages:

  • Improved performance – Variables can make measures more efficient because they remove the need for Power BI to evaluate the same expression multiple times.
  • Improved readability – Variables have short, self-describing names and are used in place of an ambiguous, multi-worded expression. You might find it easier to read and understand the formulas when variables are used.
  • simplified debugging – You can use variables to debug a formula and test expressions, which can be helpful during troubleshooting.
  • Reduced complexity – Variables do not require the use of EARLIER or EARLIEST DAX functions, which are difficult to understand. These functions were required before variables were introduced, and were written in complex expressions that introduced new filter contexts

Q4: Why visual cache is important?

A. A visual cache can improve access to information on a display screen by organizing multiple views to take advantage of locality in the user’s patterns of information search and exploration. The notion of the visual cache may be a useful principle in designing graphic interfaces to avoid disorientation.

Q5: What is Metadata?

A.Metadata is information about other data. Power BI metadata contains information on your data model, such as the name, data type, and format of each of the columns, the schema of the database, the report design, when the file was last modified, the data refresh rates, and much more.

Q6: Difference Between Direct Query and Import?

A. Import: The selected columns and tables are imported into the Power BI Desktop. As you interact or create with the visualization, Power BI Desktop uses an imported data. To see underlying data changes since the most recent refresh or initial import, you must refresh the data, which imports the full dataset.

Direct Query: No data is copied or imported into Power BI Desktop. For relational sources, the selected tables and columns appear in a Fields list. For multi-dimensional sources like SAP Business Warehouses, the measures and dimensions of the selected cube appear in the Field List. As you initial or create with a visualization, Power BI Desktop queries the underlying data source, so you’re always viewing current data.

Q7: What benefit do you get from analyzing the metadata?

A.The benefit of analyzing the metadata is that you can clearly identify data inconsistencies with your dataset.

Q8: Why do we have to remove rows and columns?

A.Unnecessary columns-When we evaluate the need for each column. If one or more columns will not be used in the report and are therefore unnecessary, you should remove them

Unnecessary rows – Checks the first few rows in the dataset to see if they are empty or if they contain data that you do not need in your reports; if so, it removes those rows

Q9:Is it possible to create a relationship between two columns if they are different DATA TYPE columns?

A.No, both columns in a relationship must be sharing the same DATATYPE.

Quiz Time (Sample Exam Questions)

With our Microsoft Data Analyst Associate, we cover Over 100+ Sample questions to help you prepare for the Certification [DA-100]

Check out these Questions:

Ques: The data model has an impact on the time consumed in the refreshing of a model but once the model is loaded in the memory, it has no impact on the performance of the reports. Select whether the statement is True/False.

A. True

B. False

Comment your answer in the comment box.

References

Next Steps to begin with DA100 Certification:

 In our Microsoft Data Analyst Associate Training Program, we will cover all the exam objectives, 11 Hands-On Labs, and practice tests. If you want to begin your journey towards becoming a Microsoft Certified: Data Analyst Associate [DA-100] by checking our FREE CLASS.

The post Microsoft Data Analyst Associate [DA-100] Module 6: Optimize a Model for Performance in Power BI appeared first on Cloud Training Program.


Viewing all articles
Browse latest Browse all 1891

Trending Articles