Querying Data with Power BI

Microsoft introduced Self Service BI back in 2009, just around when jEyLaBs was founded, announcing Power Pivot for Microsoft Excel 2010.For several years SharePoint was the answers for the demand to share reports with a team.

Fast forward to 2014 and Power BI was made avaialble in Office365. The first release of Power BI was based on the Microsoft Excel-based add-ins: Power Query, Power Pivot and Power View. With time, Microsoft also added many additional features like Question and Answers, enterprise level data connectivity and security options via Power BI Gateways and many more. Power BI was first released to the general public on July 24, 2015

PowerBI gives you a single view of your most critical business data. Monitor the health of your business using a live dashboard, create rich interactive reports with Power BI Desktop and access your data on the go with native Power BI Mobile apps.

Power BI allows user to connect to the data using two different methods, Direct query and Import. Direct Query, or a direct connection to source data, is a technique that accesses the data source each time that a dashboard element (or visual) is created or modified. This approach stands in stark contrast to the more “classic” approach that is used with Power BI of creating a connection to a data source, using a small subset of the data to transform, filter and model the data and finally loading all the required data into the Power BI compressed in-memory data model.

When using In-Memory mode in Power BI can take a while to load and compress large data sets from multiple relational database tables or data warehouse fact and dimension tables. Although speeding up this process by using a powerful workstation with lots of memory and an industrial-strength server and fast network, it can, nonetheless, become a brake on creative analysis of the data. Moreover, once the data has been loaded into the local data model it is essentially static. So either reload or refresh all or part of data model every time that it is modified – or you suspect that it has been altered – to ensure that you are working on the latest version. Once again this can take longer than expected.

So, the idea is that using DirectQuery, access to source data is faster by bypassing the creation of a local data model. However only a handful of sources support DirectQuery. These are:

  • SQL Server relational databases
  • SSAS “Classic” OLAP cubes
  • SSAS Tabular data warehouses
  • Oracle relational databases
  • Teradata
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • SAP Hana

There are 3 key benefits to using DirectQuery as a source of data:

  • DirectQuery lets you build reports based on extremely large datasets, this is particularly true in the case of huge tables where it would simply not be practical to import all of the data
  • Underlying data changes can necessitate data being refreshed frequently. It follows that for some reports, a requirement for up to the minute data can require massive data transfers, making re-importing data not a practical solution. By contrast, DirectQuery reports always use current data, and the hard work is carried out where it should be done – on the server.
  • When user re-queries data, Power BI the application will re-use existing data to avoid overloading the server with multiple requests for data.

Nonetheless, there are a few drawbacks to using DirectQuery. They include the following:

  • All tables must come from a single database. There is simply no way of combining data from multiple sources yet. Of course, you can always see if it is possible to create a single database that contains views over linked databases so that Power BI Desktop can obtain the source data from a single data connection that masks the underlying cross-database links.
  • If the Query Editor query is overly complex an error will occur. To remedy any error, you might have to import the data instead of using DirectQuery.
  • Relationship filtering is limited to a single direction, rather than both directions.
  • By default, a few limitations are placed on the DAX expressions that are allowed in measures. Unfortunately, these can only be discovered the hard way, as you encounter error messages when Power BI Desktop queries the data source.
  • There is no easy way to convert a Power BI Desktop file from DirectQuery to a connection that downloads the data to the local data model. You simply have to create a duplicate file with a “classic” connection. This will mean exporting and/or recreating any measures and calculated columns that you have added to the DirectQuery dashboard.

Certainly, establishing a direct connection to a data source will not resolve every challenge. Above all it will not remove the requirement to apply careful thought to the way that you handle your analytic data. So you will probably still need to:

  • Only select the source data that you really need, this can mean
    • Filter the data (where possible) using custom SQL, MDX or DAX queries to ensure that only essential data is used by Power BI Desktop.
    • Select only the objects (again, where this is possible) that you really need to use
  • Give careful consideration to the structure of the data that you are accessing. Complex data structures will, inevitably, lead to slower output.
  • Where possible, think of end-users when developing Power BI solutions for less technical people. This can mean that, when using relational sources, you might need to hide tables and fields and add any necessary hierarchies and metrics – so that a more operational (not to say “raw”) data source becomes comprehensible and useable by non-technical staff.

Contributed by Arun Kumar, Senior BI Consultant at jEyLaBs

Published by jeylabs

Are you ready to Transform your Enterprise? www.jeylabs.com

%d bloggers like this: