Aggregations are a way of collapsing, summarizing, or grouping data. When you start with raw data from tables or other data sources, the data is often flat, meaning there is lots of detail, but it has not been organized or grouped in any way. This lack of summaries or structure might make it difficult to discover patterns in the data. An important part of data modeling is to define aggregations that simplify, abstract, or summarize patterns in answer to a specific business question.
Most common aggregations, such as those using AVERAGE, COUNT, DISTINCTCOUNT, MAX, MIN, or SUM can be created in a measure automatically by using AutoSum. Other types of aggregations, such as AVERAGEX, COUNTX, COUNTROWS, or SUMX return a table, and require a formula created using Data Analysis Expressions (DAX).
Understanding Aggregations in Power Pivot
Choosing Groups for Aggregation
When you aggregate data, you group data by attributes such as product, price, region, or date and then define a formula that works on all the data in the group. For example, when you create a total for a year, you are creating an aggregation. If you then create a ratio of this year over the previous year and present those as percentages, it is a different type of aggregation.
The decision of how to group the data is driven by the business question. For example, aggregations can answer the following questions:
Counts How many transactions were there in a month?
Averages What were the mean sales in this month, by salesperson?
Minimum and maximum values Which sales districts were the top five in terms of units sold?
To create a calculation that answers these questions, you must have detailed data that contains the numbers to count or sum, and that numeric data must be related in some way to the groups that you will use to organize the results.
If the data does not already contain values that you can use for grouping, such as a product category or the name of the geographical region where the store is located, you might want to introduce groups to your data by adding categories. When you build groups in Excel, you must manually type or select the groups you want to use from among the columns in your worksheet. However, in a relational system, hierarchies such as categories for products are often stored in a different table than the fact or value table. Usually the category table is linked to the fact data by some kind of key. For example, suppose you find that your data contains product IDs, but not the names of products or their categories. To add the category to a flat Excel worksheet, you would have to copy in the column that contained the category names. With Power Pivot, you can import the product category table to your data model, create a relationship between the table with the number data and the product category list, and then use the categories to group data. For more information, see Create a relationship between tables.
Choosing a Function for Aggregation
After you have identified and added the groupings to use, you must decide which mathematical functions to use for aggregation. Often the word aggregation is used as a synonym for the mathematical or statistical operations that are used in aggregations, such as sums, averages, minimum, or counts. However, Power Pivot enables you to create custom formulas for aggregation, in addition to the standard aggregations found in both Power Pivot and Excel.
For example, given the same set of values and groupings that were used in the preceding examples, you could create custom aggregations that answer the following questions:
Filtered counts How many transactions were there in a month, excluding the end-of-month maintenance window?
Ratios using averages over time What was the percentage growth or decline in sales compared to the same period last year?
Grouped minimum and maximum values Which sales districts were ranked top for each product category, or for each sales promotion?
Adding Aggregations to Formulas and PivotTables
When you have a general idea of how your data should be grouped to be meaningful, and the values that you want to work with, you can decide whether to build a PivotTable or create calculations within a table. Power Pivot extends and improves the native ability of Excel to create aggregations such as sums, counts, or averages. You can create custom aggregations in Power Pivot either within the Power Pivot window, or within the Excel PivotTable area.
-
In a calculated column, you can create aggregations that take into account the current row context to retrieve related rows from another table, and then sum, count, or average those values in the related rows.
-
In a measure, you can create dynamic aggregations that use both filters defined within the formula, and filters imposed by the design of the PivotTable and the selection of Slicers, column headings, and row headings. Measures using standard aggregations can be created in Power Pivot by using AutoSum or by creating a formula. You can also create implicit measures using standard aggregations in a PivotTable in Excel.
Adding Groupings to a PivotTable
When you design a PivotTable, you drag fields that represent groupings, categories, or hierarchies, to the columns and rows section of the PivotTable to group the data. You then drag fields that contain numeric values into the values area so that they can be counted, averaged, or summed.
If you add categories to a PivotTable but the category data is not related to the fact data, you might get an error or peculiar results. Usually Power Pivot will try to correct the problem, by automatically detecting and suggesting relationships. For more information, see Work with Relationships in PivotTables.
You can also drag fields into Slicers, to select certain groups of data for viewing. Slicers let you interactively group, sort, and filter the results in a PivotTable.
Working with Groupings in a Formula
You can also use groupings and categories to aggregate data that is stored in tables by creating relationships between tables, then creating formulas that leverage those relationships to look up related values.
In other words, if you want to create a formula that groups values by a category, you would first use a relationship to connect the table containing the detail data and the tables containing the categories, and then build the formula.
For more information about how to create formulas that use lookups, see Lookups in Power Pivot Formulas.
Using Filters in Aggregations
A new feature in Power Pivot is the ability to apply filters to columns and tables of data, not only in the user interface and within a PivotTable or chart, but also in the very formulas that you use to calculate aggregations. Filters can be used in formulas both in calculated columns and in s.
For example, in the new DAX aggregation functions, instead of specifying values over which to sum or count, you can specify an entire table as the argument. If you did not apply any filters to that table, the aggregation function would work against all the values in the specified column of the table. However, in DAX you can create either a dynamic or static filter on the table, so that the aggregation operates against a different subset of data depending on the filter condition and the current context.
By combining conditions and filters in formulas you can create aggregations that change depending on the values supplied in formulas, or that change depending on the selection of rows headings and column headings in a PivotTable.
For more information, see Filter Data in Formulas.
Comparison of Excel Aggregation Functions and DAX Aggregation Functions
The following table lists some of the standard aggregation functions provided by Excel, and provide links to the implementation of these functions in Power Pivot. The DAX version of these functions behaves much the same as the Excel version, with some minor differences in syntax and handling of certain data types.
Standard Aggregation Functions
Function | Use |
Returns the average (arithmetic mean) of all the numbers in a column. | |
Returns the average (arithmetic mean) of all the values in a column. Handles text and non-numeric values. | |
Counts the number of numeric values in a column. | |
Counts the number of values in a column that are not empty. | |
Returns the largest numeric value in a column. | |
Returns the largest value from a set of expressions evaluated over a table. | |
Returns the smallest numeric value in a column. | |
Returns the smallest value from a set of expressions evaluated over a table. | |
Adds all the numbers in a column. |
DAX Aggregation Functions
DAX includes aggregation functions that let you specify a table over which the aggregation is to be performed. Therefore, instead of just adding or averaging the values in a column, these functions let you create an expression that dynamically defines the data to aggregate.
The following table lists the aggregations functions that are available in DAX.
Function | Use |
Averages a set of expressions evaluated over a table. | |
Counts a set of expressions evaluated over a table. | |
Counts the number of blank values in a column. | |
Counts the total number of rows in a table. | |
Counts the number of rows returned from a nested table function, such as filter function. | |
Returns the sum of a set of expressions evaluated over a table. |
Differences between DAX and Excel Aggregation Functions
Although these functions have the same names as their Excel counterparts, they utilize Power Pivot's in-memory analytics engine and have been rewritten to work with tables and columns. You cannot use a DAX formula in an Excel workbook, and vice versa. They can only be used in the Power Pivot window and in PivotTables that are based on Power Pivot data. Also, although the functions have identical names, the behavior may be slightly different. For more information, see the individual function reference topics.
The way that columns are evaluated in an aggregation is also different from the way that Excel handles aggregations. An example may help illustrate.
Suppose you want to get a sum of the values in the Amount column in the table Sales, so you create the following formula:
=SUM('Sales'[Amount])
In the simplest case, the function gets the values from a single unfiltered column, and the result is the same as in Excel, which always just adds up the values in the column, Amount. However, in Power Pivot, the formula is interpreted as "Get the value in Amount for each row of the Sales table, and then add up those individual values. Power Pivot evaluates each row over which the aggregation is performed and calculates a single scalar value for each row, and then performs an aggregation on those values. Therefore, the result of a formula can be different if filters have been applied to a table, or if the values are calculated based on other aggregations that might be filtered. For more information, see Context in DAX Formulas.
DAX Time Intelligence Functions
In addition to the table aggregation functions described in the previous section, DAX has aggregation functions that work with dates and times you specify, to provide built-in time intelligence. These functions use ranges of dates to get related values and aggregate the values. You can also compare values across date ranges.
The following table lists the time intelligence functions that can be used for aggregation.
Function | Use |
Calculates a value at the calendar end of the given period. | |
Calculates a value at the calendar end of the period prior to the given period. | |
Calculates a value over the interval that starts at the first day of the period and ends at the latest date in the specified date column. |
The other functions in the Time Intelligence function section (Time Intelligence Functions) are functions that can be used to retrieve dates or custom ranges of dates to use in aggregation. For example, you can use the DATESINPERIOD function to return a range of dates, and use that set of dates as an argument to another function to calculate a custom aggregation for just those dates.
No comments:
Post a Comment