Wednesday, September 13, 2017

Relationships between tables in a Data Model

Relationships between tables in a Data Model

Your browser does not support video. Install Microsoft Silverlight, Adobe Flash Player, or Internet Explorer 9.
Video: Relationships in Power View and Power Pivot

Add more power to your data analysis by creating relationships between data in different tables. A relationship is a connection between two tables of data, based on one column in each. To see why relationships are useful, imagine that you track data for customer orders in your business. You could track all the data in a single table that has a structure like the following:

CustomerID

Name

EMail

DiscountRate

OrderID

OrderDate

Product

Quantity

1

Ashton

chris.ashton@contoso.com

.05

256

2010-01-07

Compact Digital

11

1

Ashton

chris.ashton@contoso.com

.05

255

2010-01-03

SLR Camera

15

2

Jaworski

michal.jaworski@contoso.com

.10

254

2010-01-03

Budget Movie-Maker

27

This approach can work, but it involves storing a lot of redundant data, such as the customer e-mail address for every order. Storage is cheap, but if the e-mail address changes you have to make sure you update every row for that customer. One solution to this problem is to split the data into multiple tables and define relationships between those tables. This is the approach used in relational databases like SQL Server. For example, a database that you import might represent order data by using three related tables:

Customers

[CustomerID]

Name

Email

1

Ashton

chris.ashton@contoso.com

2

Jaworski

michal.jaworski@contoso.com

CustomerDiscounts

[CustomerID]

DiscountRate

1

.05

2

.10

Orders

[CustomerID]

OrderID

OrderDate

Product

Quantity

1

256

2010-01-07

Compact Digital

11

1

255

2010-01-03

SLR Camera

15

2

254

2010-01-03

Budget Movie-Maker

27

Relationships exist within a Data Model that you explicitly create, or that Excel creates on your behalf when you import multiple tables simultaneously. You can also use the Power Pivot add-in to create or manage the model. See Create a Data Model in Excel for details.

If you use the Power Pivot add-in to import tables from the same database, Power Pivot can detect the relationships between the tables based on the columns that are in [brackets], and can reproduce these relationships in a Data Model that it builds behind the scenes. For more information, see Automatic Detection and Inference of Relationships in this article. If you import tables from multiple sources, you can manually create relationships as described in Create a relationship between two tables.

Top of Page

In this article

Columns and keys

Types of relationships

Relationships and performance

Multiple relationships between tables

Requirements for a table relationship

Unsupported in a table relationship

Composite Keys and Lookup Columns

Many-to-Many Relationships

Self-Joins and Loops

Automatic detection and inference of relationships in Power Pivot

Automatic Detection for Named Sets

Inference of Relationships

Columns and keys

Relationships are based on columns in each table that contain the same data. For example, the Customers and Orders tables can be related to each other because they both contain a column that stores a customer ID. In the example, the column names are the same, but this is not a requirement. One could be CustomerID and another CustomerNumber, as long as all of the rows in the Orders table contain an ID that is also stored in the Customers table.

In a relational database, there are several types of keys, which are typically just columns with special properties. Understanding the purpose of each key can help you manage a multi-table Data Model that provides data to a PivotTable, PivotChart, or Power View report.

The following keys are the most interesting for our purposes:

  • Primary key: uniquely identifies a row in a table, such as CustomerID in the Customers table.

  • Alternate key (or candidate key): a column other than the primary key that is unique. For example, an Employees table might store an employee ID and a social security number, both of which are unique.

  • Foreign key: a column that refers to a unique column in another table, such as CustomerID in the Orders table, which refers to CustomerID in the Customers table.

In a Data Model, the primary key or alternate key is referred to as the related column. If a table has both a primary and alternate key, you can use either one as the basis of a table relationship. The foreign key is referred to as the source column or just column. In our example, a relationship would be defined between CustomerID in the Orders table (the column) and CustomerID in the Customers table (the lookup column). If you import data from a relational database, by default Excel chooses the foreign key from one table and the corresponding primary key from the other table. However, you can use any column that has unique values for the lookup column.

Types of relationships

The relationship between Customers and Orders is a one-to-many relationship. Every customer can have multiple orders, but an order can't have multiple customers. The other types of relationships are one-to-one and many-to-many. The CustomerDiscounts table, which defines a single discount rate for each customer, is in a one-to-one relationship with the Customers table.

The following table shows the relationships between the three tables:

Relationship

Type

Lookup Column

Column

Customers-CustomerDiscounts

one-to-one

Customers.CustomerID

CustomerDiscounts.CustomerID

Customers-Orders

one-to-many

Customers.CustomerID

Orders.CustomerID

Note:  Many-to-many relationships are not supported in a Data Model. An example of a many-to-many relationship is a direct relationship between Products and Customers, in which a customer can buy many products and the same product can be bought by many customers.

Relationships and performance

After any relationship has been created, Excel typically must recalculate any formulas that use columns from tables in the newly created relationship. Processing can take some time, depending on the amount of data and the complexity of the relationships. See Recalculate Formulas for details.

Multiple relationships between tables

A Data Model can have multiple relationships between two tables. To create accurate calculations, Excel needs a single path from one table to the next. Therefore, only one relationship between each pair of tables is active at a time. The others are inactive, but you can specify an inactive relationship in formulas and queries. In Diagram View the active relationship is a solid line and the inactive ones are dashed lines. For example, in AdventureWorksDW2012, the table DimDate contains a column, DateKey, that is related to three different columns in the table FactInternetSales: OrderDate, DueDate, and ShipDate. If the active relationship is between DateKey and OrderDate, that is the default relationship in formulas unless you specify otherwise.

Top of Page

Requirements for a table relationship

A relationship can be created when the following requirements are met:

Criteria

Description

Unique Identifier for Each Table

Each table must have a single column that uniquely identifies each row in that table. This column is often referred to as the primary key.

Unique Lookup Columns

The data values in the lookup column must be unique. In other words, the column can't contain duplicates. In a Data Model, nulls and empty strings are equivalent to a blank, which is a distinct data value. This means that you can't have multiple nulls in the lookup column.

Compatible Data Types

The data types in the source column and lookup column must be compatible. For more information about data types, see Data types supported in Data Models.

Unsupported in a table relationship

In a Data Model, you cannot create a table relationship if the key is a composite key. You're also restricted to creating one-to-one and one-to-many relationships. Other relationship types are not supported.

Composite Keys and Lookup Columns

A composite key is composed of more than one column. Data Models can't use composite keys; a table must always have exactly one column that uniquely identifies each row in the table. If you import tables that have an existing relationship based on a composite key, the Table Import Wizard in Power Pivot will ignore that relationship because it can't be created in the model.

To create a relationship between two tables that have multiple columns defining the primary and foreign keys, first combine the values to create a single key column before creating the relationship. You can do this before you import the data, or by creating a calculated column in the Data Model using the Power Pivot add-in.

Many-to-Many Relationships

A Data Model cannot have many-to-many relationships. You can't simply add junction tables in the model. However, you can use DAX functions to model many-to-many relationships.

Self-Joins and Loops

Self-joins are not permitted in a Data Model. A self-join is a recursive relationship between a table and itself. Self-joins are often used to define parent-child hierarchies. For example, you could join an Employees table to itself to produce a hierarchy that shows the management chain at a business.

Excel does not allow loops to be created among relationships in a workbook. In other words, the following set of relationships is prohibited.

  • Table 1, column a   to   Table 2, column f

  • Table 2, column f   to   Table 3, column n

  • Table 3, column n   to   Table 1, column a

If you try to create a relationship that would result in a loop being created, an error is generated.

Top of Page

Automatic detection and inference of relationships in Power Pivot

One of the advantages to importing data using the Power Pivot add-in is that Power Pivot can sometimes detect relationships and create new relationships in the Data Model it creates in Excel.

When you import multiple tables, Power Pivot automatically detects any existing relationships among the tables. Also, when you create a PivotTable, Power Pivot analyzes the data in the tables. It detects possible relationships that have not been defined, and suggests appropriate columns to include in those relationships.

The detection algorithm uses statistical data about the values and metadata of columns to make inferences about the probability of relationships.

  • Data types in all related columns should be compatible. For automatic detection, only whole number and text data types are supported. For more information about data types, see Data types supported in Data Models.

  • For the relationship to be successfully detected, the number of unique keys in the lookup column must be greater than the values in the table on the many side. In other words, the key column on the many side of the relationship must not contain any values that are not in the key column of the lookup table. For example, suppose you have a table that lists products with their IDs (the lookup table) and a sales table that lists sales for each product (the many side of the relationship). If your sales records contain the ID of a product that does not have a corresponding ID in the Products table, the relationship can't be automatically created, but you might be able to create it manually. To have Excel detect the relationship, you need to first update the Product lookup table with the IDs of the missing products.

  • Make sure the name of the key column on the many side is similar to the name of the key column in the lookup table. The names do not need to be exactly the same. For example, in a business setting, you often have variations on the names of columns that contain essentially the same data: Emp ID, EmployeeID, Employee ID, EMP_ID, and so forth. The algorithm detects similar names and assigns a higher probability to those columns that have similar or exactly matching names. Therefore, to increase the probability of creating a relationship, you can try renaming the columns in the data that you import to something similar to columns in your existing tables. If Excel finds multiple possible relationships, then it does not create a relationship.

This information might help you understand why not all relationships are detected, or how changes in metadata--such as field name and the data types--could improve the results of automatic relationship detection. For more information, see Troubleshoot Relationships.

Automatic Detection for Named Sets

Relationships are not automatically detected between Named Sets and related fields in a PivotTable. You can create these relationships manually. If you want to use automatic relationship detection, remove each Named Set and add the individual fields from the Named Set directly to the PivotTable.

Inference of Relationships

In some cases, relationships between tables are automatically chained. For example, if you create a relationship between the first two sets of tables below, a relationship is inferred to exist between the other two tables, and a relationship is automatically established.

  • Products and Category -- created manually

  • Category and SubCategory -- created manually

  • Products and SubCategory -- relationship is inferred

In order for relationships to be automatically chained, the relationships must go in one direction, as shown above. If the initial relationships were between, for example, Sales and Products, and Sales and Customers, a relationship is not inferred. This is because the relationship between Products and Customers is a many-to-many relationship.

Top of Page

1 comment: