In Power BI, Data Sources are different from connection types and should not be mistaken. Over 160 connectors (Data sources) are available in Power BI Desktop. A single data source can support several connection types e.g., SQL while a data source like Excel only supports one connection type. Connection Types in Power BI refer to how Power BI interacts with data sources that you want to connect to.
In Power BI there are 3 main data Connection Types which are:
- Import
- DirectQuery
- Live
Import Data Connection Type
The Import Data connection type pulls the whole dataset from the data source and stores it in the memory of the PBI file either on-prem or in the cloud. In this case, the loaded data needs to be refreshed at a scheduled time to avoid being obsolete.
One major advantage of the Import Data Connection type is that it supports all forms of data transformation carried out with Power BI Desktop (Power Query). It also supports DAX, and all data sources leverage this connection type.
Excel as a data source only supports the Import Data Connection type. If the data source supports multiple connection types, you can choose it when you connect to that source. For example, SQL supports both Import Data and DirectQuery.
Get Data from SQL Server, and choose import connectivity mode.
Let’s talk about the pros and cons of the Import Data connection type.
Advantages
- Supports all transformation functionalities using Power Query
- Data Modelling
- DAX Functions
- Report performance is great since the model resides within Power BI
- Supports multiple data sources.
Disadvantages
- Size limitation
- A limited number of refreshes (Maximum of 8 times per day)
DirectQuery
When using the DirectQuery connection type, the Data schema is pulled directly from the data source in real time and stored in Power BI Desktop Model. While the Dataset remains in the data source. All data transformations and data modeling are done at the source i.e. outside of Power BI. It is usually preferred when the data source is really large.
Unlike Import Data Connection Type, DirectQuery does not support all data sources but mostly supports Data sources with Relational Database Models (RDMs). Below is a list of data sources DirectQuery Supports.
- SQL Server
- Azure SQL Database
- Azure SQL Data Warehouse
- Amazon RedShift
- Oracle
- Snowflake
- Spark
- Google Big Query
All but not limited to the list above. You can check out the complete up-to-date list here. It is advisable to be very cautious when using DirectQuery because any form of messy data structure may likely affect the performance of your report.
DirectQuery Demo
Get data from SQL Server, and choose the DirectQuery Connection mode.
Once the data is loaded, you’ll see the storage mode at the bottom right corner of the screen.
Since data is not loaded into Power BI using this connection method, the data view would be absent.
Pros of using Power BI DirectQuery
- Users do not have to deal with obsolete data as data is queried from the data source in real-time and refreshes occur every 15 minutes
- Less storage(memory) capacity is needed since the datasets are not stored
- Excellent for large data sets
Cons
- Transformation using Power Query is Limited
- Limited Data Modelling capabilities.
- DAX functionalities are also limited.
- Slower report performance since the data source is queried at run time, there could be delays due to the user’s bandwidth.
Live Connection
Live connection type is a method of connecting to a Dataset that has already been published to Power BI Service. Here, Power BI connects to a Power BI Dataset or Analysis Services such as SQL Server Analysis Services (SSAS) or Azure Analysis Services (AAS). Note that these are the only data sources that are supported by the Live connection method.
When a report is published from Power BI Desktop to Power BI service, its elements are usually separated into a report which is visualization, and a dataset(s). This method of connection connects to the dataset (including all the data models and measures).
Live Connection demo
First, You’ll need to get data from either Power BI datasets or Analysis Services.
When creating a Live connection to Power BI, choose the dataset from the data hub as seen below.
In the case of connecting to ASS or SAS select “Connect Live”
After creating a live connection you should see something similar to this at the bottom right of the Power BI window.
When you publish a report using a Live connection to an already published Power BI dataset, what happens is that the data set remains the same but a new report will be created. See the image below.
On the other hand, when you publish a report using a Live connection to AAS or SSAS a new dataset will appear in the Power BI Service, but this isn’t really a dataset, instead, it’s just a pointer to the AAS or SSAS database where the dataset resides.
Let’s state the Pros and Cons of Live Query
- Excellent for huge data models
- Supports DAX
Cons
- No Power Query – You have to depend on already created transformations.
Composite Connection Type
Recall that when using DirectQuery, users cannot connect to multiple data sources at a time. The composite model introduced in 2018, allows a report to have two or more connections to different data sources. These data source combinations can be one or more DirectQuery Connections and an import connection or two DirectQuery connections. You can learn more about composite connection types in this Microsoft Documentation.
With Composite Model, you get the best of both worlds. Report performance is greatly improved, and we can work with large datasets also. Some datasets can be in Import Mode and others DirectQuery mode.
Composite Model – How it works.
- We load our data from SQL Server using DirectQuery Mode
- Load another dataset from Excel. A dialogue Box appears to warn you about the potential risks, click on OK to continue.
- The data will be loaded, and you will notice an update to the storage mode.
The mixed-mode of the composite model implies that the tables loaded from the SQL Server in our illustration are NOT LOADED into Power BI. They will be queried each time from the SQL Server database table.
Additionally, in the data view, we are unable to view the tables loaded using DirectQue
Gateway
The whole essence of Gateway software in Power BI is to access data located in an on-premises database or file i.e., your local machine.
Why do you need a gateway?
It is required if you want to refresh a published report on Power BI Service. Gateway serves as an intermediary between your on-prem data and the cloud. It helps to securely transfer data from the desktop (internal system) to the cloud. Note that you do not need a paid license to use the Power BI gateway. Without a gateway, you would not be able to activate schedule refresh if your dataset uses a data source that Power BI can’t access over a direct network connection.
Working with Power BI Gateway.
Now that we understand the concept of Gateway in Power BI let’s learn how to work with it.
- Download from Power BI service at the top right corner of the window, you’ll see the download icon as shown below.
- There are two modes available – Standard mode and Personal Mode. Standard mode is appropriate for enterprise users and is usually set up by IT. On the other hand, personal for individual users.
- For the purpose of this demo, we will download and install the personal mode using my work/Microsoft email.
- Now let’s head over to Power BI service to see the gateway installed.
- Now the gateway is available and running on my local machine.
- Click on Apply and then you have to edit the data source credential to Public or None
- You can now use the schedule refresh feature.
Schedule Refresh
The Schedule Refresh feature in Power BI Service allows datasets of published reports to get updated in the most current version up to 8 times a day. When working with an on-prem data source and the gateway has not been set up, the Schedule Refresh feature may be greyed out. Note that there is also an option to refresh now, this method is manual.
Setting up Schedule Refresh
- Navigate to the workspace the report and dataset you intend to refresh reside.
- Hover over the dataset and click on the schedule refresh icon which is right beside the refresh now icon.
- The settings window opens, navigate to Schedule Refresh and toggle it on.
- Choose the refresh frequency.
- Select the appropriate Time Zone that applies.
- Pick your preferred refresh time – up to 8 times when using a pro license and up to 48 while using a premium per-user license. Use the “Add Time” option to schedule the times as you go.
- Type in the email address of the users you would like to notify if the refresh fails.
- Select Apply and your datasets will automatically refresh at the scheduled time.
Summary
In this post, you’ve learned about the 3 connection types, their pros and cons, and the cases they can be used. Import Data is the fastest and most functional type when working with Power BI, DirectQuery supports huge datasets since the data is not stored in Power BI, and Live connection is a direct connection to the analysis services model. With the Composite model, you get the best of both worlds, it allows you to connect to multiple data sources using a combination(s) of DirectQuery and Import Mode. If you have questions after reading this, please don’t hesitate to drop your questions in the comment section.
Well written Tolu. Well done.