Friday, May 22, 2020

Power query 101

In this tutorial, you'll use Excel's Power Query experience to create a query that imports data from a web page. As part of this process, you'll navigate across tables available on a web page, and apply data transformation steps to display just the data you want to see. Power Query records all of your steps, and they will be repeated whenever you refresh your data.

Step 1: Connect to a Wikipedia page

Excel 2016: On the Data tab, click New Query > From Other Sources > From Web. If you don't see the New Query button, click Data > From Web.

Excel 2010-2013: On the Power Query tab, click From Web. If you don't see the Power Query tab, make sure you have downloaded and installed the Power Query add-in.

  1. In the From Web dialog box, paste the Wikipedia URL (http://en.wikipedia.org/wiki/UEFA_European_Football_Championship) in the URL text box.

    Power Query > From Web > Input URL dialog
  2. Click OK.

    After establishing a connection to the web page, you see a list of tables available on this Wikipedia page in the Navigator pane. You can click on each table to see a quick preview displayed in the pane on the right.

    Power Query > From Web > Navigator Table View

  3. Double-click the Results[edit] table, and the Query Editor will open with the tournament data.

Top of Page

Step 2: Shape the data

Now that you have the table open in the Query Editor, you can clean and shape the data so that it fits your needs. In this step, you'll reshape the data by removing all columns except Year and Final Winners.

  1. In the Query Preview grid, use Ctrl+Click to select the Year and Final Winners columns.

  2. Click Remove Columns > Remove Other Columns.

    Power Query > Query Editor > Remove Columns

Step 3: Clean the data

In this step, you'll clean the data by replacing values and filtering the data.

  1. Select the Year column.

  2. In the Query Editor, click Replace Values.

  3. In the Replace Values dialog box, type "Details" in the Value to Find text box and leave the Replace With text box empty.

  4. Click OK.

    Power Query > Query Editor > Replace Values

Step 4: Filter values in a column

Now, you'll filter the Year column to display rows that do not contain "Year".

  1. Click the filter drop down arrow on the Year column.

  2. In the Filter drop-down, uncheck Year.

  3. Click OK.

Power Query - Query Editor > Filter to clean data

Step 5: Name a query

Now it's time to name the query you've been creating.

In the Query Settings pane, in the Name text box, enter Euro Cup Winners. If you click the All Properties link, you can also enter a query description in the Description text box.

Power Query > Query Editor > Query Settings

Step 6: Load the query to a worksheet

Finally, you can now load the Euro Cup Winners query to a worksheet.

  1. In the upper-left corner, click Close & Load.

    Power Query will return the query results to a worksheet.

    If you need to update the data at a later time, right-click anywhere in the data range and then click Refresh. Any new information from the web page will be automatically updated.

Step 7: Discover what happens behind the scenes

  1. Click anywhere within the data, and on the Query tab, click Edit.

  2. On the right, notice the list of Applied Steps. As you performed query activities, query steps were created. Each query step has a corresponding formula written in the "M" language.

  3. Here is an explanation of each step:

Query step

Task

Formula

Source

Connect to a web data source

= Web.Page(Web.Contents("http://en.wikipedia.org/wiki/UEFA_European_Football_Championship"))

Navigation

Select the table to connect

= Source{2}[Data]

Changed Type

Change types - Power Query does this automatically

= Table.TransformColumnTypes(Data2,{{"Year", type text}, {"Host", type text}, {"", type text}, {"Final Winner", type text}, {"Final Score", type text}, {"Final Runner-up", type text}, {"2", type text}, {"Third place match Third place", type text}, {"Third place match Score", type text}, {"Third place match Fourth place", type text}, {"3", type text}, {"Number of teams", type text}})

RemovedOtherColumns

Remove Other Columns to only display columns of interest

Table.SelectColumns

= Table.SelectColumns(#"Changed Type",{"Final Winner", "Year"})

ReplacedValue

Replace Values to clean up values in a selected column

Table.ReplaceValue

= Table.ReplaceValue(#"Removed Other Columns","Details","",Replacer.ReplaceText,{"Year"})

FilteredRows

Filter values in a column

Table.SelectRows

= Table.SelectRows(#"Replaced Value", each ([Year] <> "Year"))

You can see the M code in its entirety by clicking on the Advanced Editor option in the Power Query ribbon. For more information about the Power Query formula language, see Learn about Power Query formulas.

Need more help?

You can always ask an expert in the Excel Tech Community, get support in the Answers community, or suggest a new feature or improvement on Excel User Voice.

See Also

Power Query is known as Get & Transform in Excel 2016

Import data from external data sources

Shape data with Power Query

No comments:

Post a Comment