Combine data from multiple data sources (Power Query)
Note: Power Query is known as Get & Transform in Excel 2016. Information provided here applies to both. To learn more, see Get & Transform in Excel 2016.
In this tutorial, you'll use Power Query's Query Editor to import data from a local Excel file that contains product information, and from an OData feed that contains product order information. You perform transformation and aggregation steps, and combine data from both sources to produce a Total Sales per Product and Year report.
In order to perform this tutorial, you need the Products and Orders workbook. In the Save As dialog box, name the file Products and Orders.xlsx.
In this tutorial
Task 1: Import products into an Excel workbook
In this task, you import products from the Products and Orders.xlsx file into an Excel workbook.
Step 1: Connect to an Excel workbook
-
Create an Excel workbook.
-
In the POWER QUERY ribbon tab, click From File > From Excel.
-
In the Excel browse dialog box, browse for or type the Products and Orders.xlsx path to import or link to a file.
-
In the Navigator pane, double click the Products worksheet or click Products and click Edit Query. When you edit a query, or connect to a new data source, the Query Editor window appears.
Note: For a very quick video on how to display Query Editor, see the end of this article.
Step 2: Promote the first row to table column headers
In the Query Preview grid, the first row of the table does not contain the table column names. To promote the first row to table column headers:
-
Click the table icon ( ) in the top-left corner of the data preview.
-
Click Use First Row as Headers.
Step 3: Remove other columns to only display columns of interest
In this step you remove all columns except ProductID, ProductName, CategoryID, and QuantityPerUnit.
-
In the Query Preview grid, select the ProductID, ProductName, CategoryID, and QuantityPerUnit columns (use Ctrl+Click or Shift+Click).
-
In the Query Editor ribbon, click Remove Columns > Remove Other Columns or right-click on a column header, and click Remove Other Columns.
Power Query steps created
As you perform query activities in Power Query, query steps are created and listed in the Query Settings pane, in the APPLIED STEPS list. Each query step has a corresponding Power Query formula, also known as the "M" language. For more information about the Power Query formula language, see Learn about Power Query formulas.
Task | Query step | Formula |
Connect to an Excel workbook | Source | Source{[Name="Products"]}[Data] |
Promote the first row to table column headers | FirstRowAsHeader | (Products) |
Remove other columns to only display columns of interest | RemovedOtherColumns | (FirstRowAsHeader,{"ProductID", "ProductName", "CategoryID", "QuantityPerUnit"}) |
Step 4: Import a products query
In this step, you import the Products query into your Excel workbook.
-
In the Query Editor ribbon, click Apply & Close. The results appear in a new Excel worksheet.
Task 2: Import order data from an OData feed
In this task, you import data into your Excel workbook from the sample Northwind OData feed at http://services.odata.org/Northwind/Northwind.svc.
Step 1: Connect to an OData feed
-
In the POWER QUERY ribbon tab, click From Other Sources > From OData Feed.
-
In the OData Feed dialog box, enter the URL for the Northwind OData feed.
-
Click OK.
-
In the Navigator pane, double click the Orders table or click Orders and click Edit.
Note: When you hover your mouse over a table, you will see a table preview fly out.
Step 2: Expand an Order_Details table
In this step, you expand the Order_Details table that is related to the Orders table, to combine the ProductID, UnitPrice, and Quantity columns from Order_Details into the Orders table. The Expand operation combines columns from a related table into a subject table. When the query runs, rows from the related table (Order_Details) are combined into rows from the subject table (Orders).
In Power Query, a column containing a link to a related table has an Entry link or Table link. An Entry link navigates to a single related record, and represents aone to one relationship with a subject table. A Table link navigates to a related table, and represents a one-to-many relationship with a subject table. A link represents navigation properties in a data source within a relational model. For an OData feed, navigation properties represent an entity with a foreign key association. In a database, such as SQL Server, navigation properties represent foreign key relationships in the database.
Expand the Order_Details Table link
After you expand the Order_Details table, three new columns and additional rows are added to the Orders table, one for each row in the nested or related table.
-
In the Query Preview pane, scroll to the Order_Details column.
-
In the Order_Details column, click the expand icon ( ).
-
In the Expand drop-down:
-
Click (Select All Columns) to clear all columns.
-
Click ProductID, UnitPrice, and Quantity.
-
Click OK.
Note: In Power Query, you can expand tables linked from to a column, and also the ability to perform aggregate operations on the columns of the linked table before expanding the data in the subject table. For more information about how to perform aggregate operations, see Aggregate data from a column.
-
Step 3: Remove other columns to only display columns of interest
In this step you remove all columns except OrderDate, ProductID, UnitPrice, and Quantity columns. In the previous task, you used Remove Other Columns. For this task, you remove selected columns.
Remove selected columns
-
In the Query Preview pane, select all columns:
-
Click the first column (OrderID).
-
Shift+Click the last column (Shipper).
-
Ctrl+Click the OrderDate, Order_Details.ProductID, Order_Details.UnitPrice, and Order_Details.Quantity columns.
-
-
Right-click on a selected column header, and click Remove Columns.
Step 4: Calculate the line total for each Order_Details row
In this step, you create a Custom Column to calculate the line total for each Order_Details row.
Calculate the line total for each Order_Details row
-
In the Query Preview pane, click the table icon ( ) at the top-left corner of the preview.
-
Click Insert Column > Custom.
-
In the Insert Custom Column dialog box, in the Custom Column Formula textbox, enter [Order_Details.UnitPrice] * [Order_Details.Quantity].
-
In the New column name textbox, enter Line Total.
-
Click OK.
Step 5: Transform an OrderDate year column
In this step, you transform the OrderDate column to render the order date year.
-
In the Preview grid, right-click the OrderDate column, and click Transform > Year.
-
Rename the OrderDate column to Year:
-
Double-Click the OrderDate column, and enter Year or
-
Right-Click on the OrderDate column, click Rename, and enter Year.
-
Step 6: Group rows by ProductID and Year
-
In the Query Preview grid, select Year and Order_Details.ProductID.
-
Right-Click one of the headers, and click Group By.
-
In the Group By dialog box:
-
In the New column name textbox, enter Total Sales.
-
In the Operation drop-down, select Sum.
-
In the Column drop-down, select Line Total.
-
-
Click OK.
Step 7: Rename a query
Before you import the sales data into Excel, name the query Total Sales:
-
In the Query Settings pane, in the Name text box enter Total Sales.
Final query results
After you perform each step, you will have a Total Sales query over the Northwind OData feed.
Power Query steps created
As you perform query activities in Power Query, query steps are created and listed in the Query Settings pane, in the APPLIED STEPS list. Each query step has a corresponding Power Query formula, also known as the "M" language. For more information about the Power Query formula language, see Learn about Power Query formulas.
Task | Query step | Formula |
Connect to an OData feed | Source | Source{[Name="Orders"]}[Data] |
Expand the Order_Details table | Expand Order_Details | (Orders, "Order_Details", {"ProductID", "UnitPrice", "Quantity"}, {"Order_Details.ProductID", "Order_Details.UnitPrice", "Order_Details.Quantity"}) |
Remove other columns to only display columns of interest | RemovedColumns | (#"Expand Order_Details",{"OrderID", "CustomerID", "EmployeeID", "RequiredDate", "ShippedDate", "ShipVia", "Freight", "ShipName", "ShipAddress", "ShipCity", "ShipRegion", "ShipPostalCode", "ShipCountry", "Customer", "Employee", "Shipper"}) |
Calculate the line total for each Order_Details row | InsertedColumns | (RemovedColumns, "Custom", each [Order_Details.UnitPrice] * [Order_Details.Quantity]) |
Transform the OrderDate column to render the year | RenamedColumns | (InsertedCustom,{{"Custom", "Line Total"}}) |
TransformedColumn | (RenamedColumns,{{"OrderDate", Date.Year}}) | |
RenamedColumns1 | (TransformedColumn,{{"OrderDate", "Year"}}) | |
Group rows by ProductID and Year | GroupedRows | Table.Group |
Step 8: Disable query download into an Excel workbook
Since the Total Sales query does not represent the final Total Sales per Product and Year report, you disable query download into the Excel workbook. When the Load to Worksheet option is Off in the Queries settings pane, the data result of this query is not downloaded, but the query can still be combined with other queries in order to build the desired result. You learn how to combine this query with the Products query in the next task.
Disable a query download
-
In the Query Settings pane, uncheck Load to worksheet.
-
In the Query Editor ribbon, click Apply & Close. In the Workbook Queries pane, the Total Sales query displays Load is disabled.
Task 3: Combine the Products and Total Sales queries
Power Query enables you to combine multiple queries, by merging or appending them. The Merge operation is performed on any Power Query query with a tabular shape, independent of the data source that the data comes from. For more information about combining data sources, see Combine multiple queries.
In this task, you combine the Products and Total Sales queries by using a Merge and Expand query step.
Step 1: Merge ProductID into a Total Sales query
-
In the Excel workbook, navigate to the Products query on Sheet2.
-
In the QUERY ribbon tab, click Merge.
-
In the Merge dialog box, select Products as the primary table, and select Total Sales as the second or related query to merge. Total Sales will become a new expandable column.
-
To match Total Sales to Products by ProductID, select the ProductID column from the Products table, and the Order_Details.ProductID column from the Total Sales table.
-
In the Privacy Levels dialog box:
-
Select Organizational for your privacy isolation level for both data sources.
-
Click Save.
-
-
Click OK.
Security Note: Privacy Levels prevent a user from inadvertently combining data from multiple data sources, which might be private or organizational. Depending on the query, a user could inadvertently send data from the private data source to another data source that might be malicious. Power Query analyzes each data source and classifies it into the defined level of privacy: Public, Organizational, and Private. For more information about Privacy Levels, see Privacy Levels.
After you click OK, the Merge operation creates a query. The query result contains all columns from the primary table (Products), and a single column containing a navigation link to the related table (Total Sales). An Expand operation adds new columns into the primary or subject table from the related table.
Step 2: Expand a merge column
In this step, you expand the merge column with the name NewColumn to create two new columns in the Products query: Year and Total Sales.
Expand the NewColumn table link
-
In the Query Preview grid, click the NewColumn expand icon ( ).
-
In the Expand drop-down:
-
Click (Select All Columns) to clear all columns.
-
Click Year and Total Sales.
-
Click OK.
-
-
Rename these two columns to Year and Total Sales.
-
Sort Descending by Total Sales to find out which products and in which years the products got the highest volume of sales.
-
Rename the query to Total Sales per Product.
Power Query steps created
As you perform Merge query activities in Power Query, query steps are created and listed in the Query Settings pane, in the APPLIED STEPS list. Each query step has a corresponding Power Query formula, also known as the "M" language. For more information about the Power Query formula language, see Learn about Power Query formulas.
Task | Query step | Formula |
Merge ProductID into the Total Sales query | Source (data source for Merge operation) | (Products,{"ProductID"},#"Total Sales",{"Order_Details.ProductID"},"NewColumn") |
Expand a merge column | ExpandNewColumn | (Source, "NewColumn", {"Year", "Total Sales"}, {"NewColumn.Year", "NewColumn.Total Sales"}) |
RenamedColumns | (#"Expand NewColumn",{{"NewColumn.Year", "Year"}, {"NewColumn.Total Sales", "Total Sales"}}) | |
SortedRows | (RenamedColumns,{{"Total Sales", Order.Descending}}) |
Step 3: Load a Total Sales per Product query into an Excel Data Model
In this step, you disable the Load to Worksheet option and load a query into the Excel Data Model, in order to build a report connected to the query result. In addition to loading query results into an Excel worksheet, Power Query enables you to load a query result into an Excel Data Model. After you load data into the Excel Data Model, you can use Power Pivot and Power View to further data analysis.
Load the Total Sales per Product query into the Excel Data Model
-
In the Query Settings pane, uncheck Load to worksheet and check Load to data model.
-
To load the query into the Excel Data Model, click Apply & Close.
Final Total Sales per Product query
After you perform each step, you will have a Total Sales per Product query that combines data from the Products and Orders.xlsx file and Northwind OData feed. This query can be applied to a Power Pivot model. In addition, changes to the query in Power Query modify and refresh the resulting table in the Power Pivot model.
Note: The Query Editor only appears when you load, edit, or create a new query using Power Query. The following video shows the Query Editor window appearing after editing a query from an Excel workbook. To view the Query Editor without loading or editing an existing workbook query, from the Get External Data section in the Power Query ribbon tab, select From Other Sources > Blank Query. The following video shows one way to display the Query Editor.
No comments:
Post a Comment