Saturday, September 22, 2018

Data Analysis Expressions (DAX) in Power Pivot

Data Analysis Expressions (DAX) in Power Pivot

Data Analysis Expressions (DAX) sounds a little intimidating at first, but don't let the name fool you. DAX basics are really quite easy to understand. First things first - DAX is NOT a programming language. DAX is a formula language. You can use DAX to define custom calculations for Calculated Columns and for Measures (also known as calculated fields). DAX includes some of the functions used in Excel formulas, and additional functions designed to work with relational data and perform dynamic aggregation.

Understanding DAX Formulas

DAX formulas are very similar to Excel formulas. To create one, you type an equal sign, followed by a function name or expression, and any required values or arguments. Like Excel, DAX provides a variety of functions that you can use to work with strings, perform calculations using dates and times, or create conditional values.

However, DAX formulas are different in the following important ways:

  • If you want to customize calculations on a row-by-row basis, DAX includes functions that let you use the current row value or a related value to perform calculations that vary by context.

  • DAX includes a type of function that returns a table as its result, rather than a single value. These functions can be used to provide input to other functions.

  • Time Intelligence Functionsin DAX allow calculations using ranges of dates, and compare the results across parallel periods.

Where to Use DAX Formulas

You can create formulas in Power Pivot either in calculated columns or in calculated fields.

Calculated Columns

A calculated column is a column that you add to an existing Power Pivot table. Instead of pasting or importing values in the column, you create a DAX formula that defines the column values. If you include the Power Pivot table in a PivotTable (or PivotChart), the calculated column can be used as you would any other data column.

The formulas in calculated columns are much like the formulas that you create in Excel. Unlike in Excel, however, you cannot create a different formula for different rows in a table; instead, the DAX formula is automatically applied to the entire column.

When a column contains a formula, the value is computed for each row. The results are calculated for the column as soon as you create the formula. Column values are only recalculated if the underlying data is refreshed or if manual recalculation is used.

You can create calculated columns that are based on measures and other calculated columns. However, avoid using the same name for a calculated column and a measure, as this can lead to confusing results. When referring to a column, it is best to use a fully qualified column reference, to avoid accidentally invoking a measure.

For more detailed information, see Calculated Columns in Power Pivot.

Measures

A measure is a formula that is created specifically for use in a PivotTable (or PivotChart) that uses Power Pivot data. Measures can be based on standard aggregation functions, such as COUNT or SUM, or you can define your own formula by using DAX. A measure is used in the Values area of a PivotTable. If you want to place calculated results in a different area of a PivotTable, use a calculated column instead.

When you define a formula for an explicit measure, nothing happens until you add the measure into a PivotTable. When you add the measure, the formula is evaluated for each cell in the Values area of the PivotTable. Because a result is created for each combination of row and column headers, the result for the measure can be different in each cell.

The definition of the measure that you create is saved with its source data table. It appears in the PivotTable Fields list and is available to all users of the workbook.

For more detailed information, see Measures in Power Pivot.

Creating Formulas by Using the Formula Bar

Power Pivot, like Excel, provides a formula bar to make it easier to create and edit formulas, and AutoComplete functionality, to minimize typing and syntax errors.

To enter the name of a table   Begin typing the name of the table. Formula AutoComplete provides a dropdown list containing valid names that begin with those letters.

To enter the name of a column   Type a bracket, and then choose the column from the list of columns in the current table. For a column from another table, begin typing the first letters of the table name, and then choose the column from the AutoComplete dropdown list.

For more details and a walkthrough of how to build formulas, see Create Formulas for Calculations in Power Pivot.

Tips for Using AutoComplete

You can use Formula AutoComplete in the middle of an existing formula with nested functions. The text immediately before the insertion point is used to display values in the drop-down list, and all of the text after the insertion point remains unchanged.

Defined names that you create for constants do not display in the AutoComplete drop-down list, but you can still type them.

Power Pivot does not add the closing parenthesis of functions or automatically match parentheses. You should make sure that each function is syntactically correct or you cannot save or use the formula. 

Using Multiple Functions in a Formula

You can nest functions, meaning that you use the results from one function as an argument of another function. You can nest up to 64 levels of functions in calculated columns. However, nesting can make it difficult to create or troubleshoot formulas.

Many DAX functions are designed to be used solely as nested functions. These functions return a table, which cannot be directly saved as a result; it should be provided as input to a table function. For example, the functions SUMX, AVERAGEX, and MINX all require a table as the first argument.

Note: Some limits on nesting of functions exist within measures, to ensure that performance is not affected by the many calculations required by dependencies among columns.

Comparing DAX Functions and Excel Functions

The DAX function library is based on the Excel function library, but the libraries have many differences. This section summarizes the differences and similarities between Excel functions and DAX functions.

  • Many DAX functions have the same name and the same general behavior as Excel functions but have been modified to take different types of inputs, and in some cases, might return a different data type. Generally, you cannot use DAX functions in an Excel formula or use Excel formulas in Power Pivot without some modification.

  • DAX functions never take a cell reference or a range as reference, but instead DAX functions take a column or table as reference.

  • DAX date and time functions return a datetime data type. In contrast, Excel date and time functions return an integer that represents a date as a serial number.

  • Many of the new DAX functions either return a table of values or make calculations based on a table of values as input. In contrast, Excel has no functions that return a table, but some functions can work with arrays. The ability to easily reference complete tables and columns is a new feature in Power Pivot.

  • DAX provides new lookup functions that are similar to the array and vector lookup functions in Excel. However, the DAX functions require that a relationship is established between the tables.

  • The data in a column is expected to always be of the same data type. If the data is not the same type, DAX changes the entire column to the data type that best accommodates all values.

DAX Data Types

You can import data into a Power Pivot data model from many different data sources that might support different data types. When you import or load the data, and then use the data in calculations or in PivotTables, the data is converted to one of the Power Pivot data types. For a list of the data types, see Data types in Data Models.

The table data type is a new data type in DAX that is used as the input or output to many new functions. For example, the FILTER function takes a table as input and outputs another table that contains only the rows that meet the filter conditions. By combining table functions with aggregation functions, you can perform complex calculations over dynamically defined data sets. For more information, see Aggregations in Power Pivot.

Formulas and the Relational Model

The Power Pivot window is an area where you can work with multiple tables of data and connect the tables in a relational model. Within this data model, tables are connected to each other by relationships, which let you create correlations with columns in other tables and create more interesting calculations. For example, you can create formulas that sum values for a related table and then save that value in a single cell. Or, to control the rows from the related table, you can apply filters to tables and columns. For more information, see Relationships between tables in a Data Model.

Because you can link tables by using relationships, your PivotTables can also include data from multiple columns that are from different tables.

However, because formulas can work with entire tables and columns, you need to design calculations differently than you do in Excel.

  • In general, a DAX formula in a column is always applied to the entire set of values in the column (never to only a few rows or cells).

  • Tables in Power Pivot must always have the same number of columns in each row, and all rows in a column must contain the same data type.

  • When tables are connected by a relationship, you are expected to make sure that the two columns used as keys have values that match, for the most part. Because Power Pivot does not enforce referential integrity, it is possible to have non-matching values in a key column and still create a relationship. However, the presence of blank or non-matching values might affect the results of formulas and the appearance of PivotTables. For more information, see Lookups in Power Pivot Formulas.

  • When you link tables by using relationships, you enlarge the scope, or context in which your formulas are evaluated. For example, formulas in a PivotTable can be affected by any filters or column and row headings in the PivotTable. You can write formulas that manipulate context, but context can also cause your results to change in ways that you might not anticipate. For more information, see Context in DAX Formulas.

Updating the Results of Formulas

Data r efresh and recalculation are two separate but related operations that you should understand when designing a data model that contains complex formulas, large amounts of data, or data that is obtained from external data sources.

Refreshing data is the process of updating the data in your workbook with new data from an external data source. You can refresh data manually at intervals that you specify. Or, if you have published the workbook to a SharePoint site, you can schedule an automatic refresh from external sources.

Recalculation is the process of updating the results of formulas to reflect any changes to the formulas themselves and to reflect those changes in the underlying data. Recalculation can affect performance in the following ways:

  • For a calculated column, the result of the formula should always be recalculated for the entire column, whenever you change the formula.

  • For a measure, the results of a formula are not calculated until the measure is placed in the context of the PivotTable or PivotChart. The formula will also be recalculated when you change any row or column heading that affects filters on the data or when you manually refresh the PivotTable.

Troubleshooting Formulas

Errors when writing formulas

If you get an error when defining a formula, the formula might contain either a syntactic error, semantic error, or calculation error.

Syntactic errors are the easiest to resolve. They typically involve a missing parenthesis or comma. For help with the syntax of individual functions, see the DAX Function Reference.

The other type of error occurs when the syntax is correct, but the value or the column referenced does not make sense in the context of the formula. Such semantic and calculation errors might be caused by any of the following problems:

  • The formula refers to a non-existing column, table, or function.

  • The formula appears to be correct, but when the data engine fetches the data it finds a type mismatch, and raises an error.

  • The formula passes an incorrect number or type of parameters to a function.

  • The formula refers to a different column that has an error, and therefore its values are invalid.

  • The formula refers to a column that has not been processed, meaning it has metadata but no actual data to use for calculations.

In the first four cases, DAX flags the entire column that contains the invalid formula. In the last case, DAX grays out the column to indicate that the column is in an unprocessed state.

Incorrect or unusual results when ranking or ordering column values

When ranking or ordering a column that contains value NaN (Not a Number), you might get wrong or unexpected results. For example, when a calculation divides 0 by 0, an NaN result is returned.

This is because the formula engine performs ordering and ranking by comparing the numeric values; however, NaN cannot be compared to other numbers in the column.

To assure correct results, you can use conditional statements using IF function to test for NaN values and return a numeric 0 value.

Compatibility with Analysis Services Tabular Models and DirectQuery Mode

In general, DAX formulas that you build in Power Pivot are completely compatible with Analysis Services tabular models. However, if you migrate your Power Pivot model to an Analysis Services instance, and then deploy the model in DirectQuery mode, there are some limitations.

  • Some DAX formulas may return different results if you deploy the model in DirectQuery mode.

  • Some formulas might cause validation errors when you deploy the model to DirectQuery mode, because the formula contains a DAX function that is not supported against a relational data source.

For more information, see Analysis Services tabular modeling documentation in SQL Server 2012 BooksOnline.

No comments:

Post a Comment