Add more power to your data analysis by creating relationships amogn different tables. A relationship is a connection between two tables that contain data: one column in each table is the basis for the relationship. To see why relationships are useful, imagine that you track data for customer orders in your business. You could track all the data in a single table having a structure like this:
CustomerID | Name | | DiscountRate | OrderID | OrderDate | Product | Quantity |
---|---|---|---|---|---|---|---|
1 | Ashton | chris.ashton@contoso.com | .05 | 256 | 2010-01-07 | Compact Digital | 11 |
1 | Ashton | chris.ashton@contoso.com | .05 | 255 | 2010-01-03 | SLR Camera | 15 |
2 | Jaworski | michal.jaworski@contoso.com | .10 | 254 | 2010-01-03 | Budget Movie-Maker | 27 |
This approach can work, but it involves storing a lot of redundant data, such as the customer e-mail address for every order. Storage is cheap, but if the e-mail address changes you have to make sure you update every row for that customer. One solution to this problem is to split the data into multiple tables and define relationships between those tables. This is the approach used in relational databases like SQL Server. For example, a database that you import might represent order data by using three related tables:
Customers
[CustomerID] | Name | |
---|---|---|
1 | Ashton | chris.ashton@contoso.com |
2 | Jaworski | michal.jaworski@contoso.com |
CustomerDiscounts
[CustomerID] | DiscountRate |
---|---|
1 | .05 |
2 | .10 |
Orders
[CustomerID] | OrderID | OrderDate | Product | Quantity |
---|---|---|---|---|
1 | 256 | 2010-01-07 | Compact Digital | 11 |
1 | 255 | 2010-01-03 | SLR Camera | 15 |
2 | 254 | 2010-01-03 | Budget Movie-Maker | 27 |
Relationships exist within a Data Model—one that you explicitly create, or one that Excel automatically creates on your behalf when you simultaneously import multiple tables. You can also use the Power Pivot add-in to create or manage the model. See Create a Data Model in Excel for details.
If you use the Power Pivot add-in to import tables from the same database, Power Pivot can detect the relationships between the tables based on the columns that are in [brackets], and can reproduce these relationships in a Data Model that it builds behind the scenes. For more information, see Automatic Detection and Inference of Relationships in this article. If you import tables from multiple sources, you can manually create relationships as described in Create a relationship between two tables.
Columns and keys
Relationships are based on columns in each table that contain the same data. For example, you could relate a Customers table with an Orders table if each contains a column that stores a Customer ID. In the example, the column names are the same, but this is not a requirement. One could be CustomerID and another CustomerNumber, as long as all of the rows in the Orders table contain an ID that is also stored in the Customers table.
In a relational database, there are several types of keys. A key is typically column with special properties. Understanding the purpose of each key can help you manage a multi-table Data Model that provides data to a PivotTable, PivotChart, or Power View report.
Though there are many types of keys, these are the most important for our purpose here:
-
Primary key: uniquely identifies a row in a table, such as CustomerID in the Customers table.
-
Alternate key (or candidate key): a column other than the primary key that is unique. For example, an Employees table might store an employee ID and a social security number, both of which are unique.
-
Foreign key: a column that refers to a unique column in another table, such as CustomerID in the Orders table, which refers to CustomerID in the Customers table.
In a Data Model, the primary key or alternate key is referred to as the related column. If a table has both a primary and alternate key, you can use either one as the basis of a table relationship. The foreign key is referred to as the source column or just column. In our example, a relationship would be defined between CustomerID in the Orders table (the column) and CustomerID in the Customers table (the lookup column). If you import data from a relational database, by default Excel chooses the foreign key from one table and the corresponding primary key from the other table. However, you can use any column that has unique values for the lookup column.
Types of relationships
The relationship between a customer and an order is a one-to-many relationship. Every customer can have multiple orders, but an order can't have multiple customers. Another important table relationship is one-to-one. In our example here, the CustomerDiscounts table, which defines a single discount rate for each customer, has a one-to-one relationship with the Customers table.
This table shows the relationships between the three tables (Customers, CustomerDiscounts, and Orders):
Relationship | Type | Lookup Column | Column |
---|---|---|---|
Customers-CustomerDiscounts | one-to-one | Customers.CustomerID | CustomerDiscounts.CustomerID |
Customers-Orders | one-to-many | Customers.CustomerID | Orders.CustomerID |
Note: Many-to-many relationships are not supported in a Data Model. An example of a many-to-many relationship is a direct relationship between Products and Customers, in which a customer can buy many products and the same product can be bought by many customers.
Relationships and performance
After any relationship has been created, Excel must typically recalculate any formulas that use columns from tables in the newly created relationship. Processing can take some time, depending on the amount of data and the complexity of the relationships. For more details, see Recalculate Formulas.
Multiple relationships between tables
A Data Model can have multiple relationships between two tables. To build accurate calculations, Excel needs a single path from one table to the next. Therefore, only one relationship between each pair of tables is active at a time. Though the others are inactive, you can specify an inactive relationship in formulas and queries.
In Diagram View, the active relationship is a solid line and the inactive ones are dashed lines. For example, in AdventureWorksDW2012, the table DimDate contains a column, DateKey, that is related to three different columns in the table FactInternetSales: OrderDate, DueDate, and ShipDate. If the active relationship is between DateKey and OrderDate, that is the default relationship in formulas unless you specify otherwise.
Requirements for a table relationship
A relationship can be created when the following requirements are met:
Criteria | Description |
---|---|
Unique Identifier for Each Table | Each table must have a single column that uniquely identifies each row in that table. This column is often referred to as the primary key. |
Unique Lookup Columns | The data values in the lookup column must be unique. In other words, the column can't contain duplicates. In a Data Model, nulls and empty strings are equivalent to a blank, which is a distinct data value. This means that you can't have multiple nulls in the lookup column. |
Compatible Data Types | The data types in the source column and lookup column must be compatible. For more information about data types, see Data types supported in Data Models. |
Unsupported database features in the Excel Data Model
In a Data Model, you cannot create a table relationship if the key is a composite key. You're also restricted to creating one-to-one and one-to-many relationships. Other relationship types are not supported.
Composite Keys and Lookup Columns
A composite key is composed of more than one column. Data Models can't use composite keys: a table must always have exactly one column that uniquely identifies each row in the table. If you import tables that have an existing relationship based on a composite key, the Table Import Wizard in Power Pivot will ignore that relationship because it can't be created in the model.
To create a relationship between two tables that have multiple columns defining the primary and foreign keys, first combine the values to create a single key column before creating the relationship. You can do this before you import the data, or by creating a calculated column in the Data Model using the Power Pivot add-in.
Many-to-Many Relationships
A Data Model cannot have many-to-many relationships. You can't simply add junction tables in the model. However, you can use DAX functions to model many-to-many relationships.
Self-Joins and Loops
Self-joins are not permitted in a Data Model. A self-join is a recursive relationship between a table and itself. Self-joins are often used to define parent-child hierarchies. For example, you could join an Employees table to itself to produce a hierarchy that shows the management chain at a business.
Excel does not allow loops to be created among relationships in a workbook. In other words, the following set of relationships is prohibited.
Table 1, column a to Table 2, column f
Table 2, column f to Table 3, column n
Table 3, column n to Table 1, column a
If you try to create a relationship that would result in a loop being created, an error is generated.
Automatic detection and inference of relationships in Power Pivot
One of the advantages to importing data using the Power Pivot add-in is that Power Pivot can sometimes detect relationships and create new relationships in the Data Model it creates in Excel.
When you import multiple tables, Power Pivot automatically detects any existing relationships among the tables. Also, when you create a PivotTable, Power Pivot analyzes the data in the tables. It detects possible relationships that have not been defined, and suggests appropriate columns to include in those relationships.
The detection algorithm uses statistical data about the values and metadata of columns to make inferences about the probability of relationships.
-
Data types in all related columns should be compatible. For automatic detection, only whole number and text data types are supported. For more information about data types, see Data types supported in Data Models.
-
For the relationship to be successfully detected, the number of unique keys in the lookup column must be greater than the values in the table on the many side. In other words, the key column on the many side of the relationship must not contain any values that are not in the key column of the lookup table. For example, suppose you have a table that lists products with their IDs (the lookup table) and a sales table that lists sales for each product (the many side of the relationship). If your sales records contain the ID of a product that does not have a corresponding ID in the Products table, the relationship can't be automatically created, but you might be able to create it manually. To have Excel detect the relationship, you need to first update the Product lookup table with the IDs of the missing products.
-
Make sure the name of the key column on the many side is similar to the name of the key column in the lookup table. The names do not need to be exactly the same. For example, in a business setting, you often have variations on the names of columns that contain essentially the same data: Emp ID, EmployeeID, Employee ID, EMP_ID, and so on. The algorithm detects similar names and assigns a higher probability to those columns that have similar or exactly matching names. Therefore, to increase the probability of creating a relationship, you can try renaming the columns in the data that you import to something similar to columns in your existing tables. If Excel finds multiple possible relationships, then it does not create a relationship.
This information might help you understand why not all relationships are detected, or how changes in metadata--such as field name and the data types--could improve the results of automatic relationship detection. For more information, see Troubleshoot Relationships.
Automatic Detection for Named Sets
Relationships are not automatically detected between Named Sets and related fields in a PivotTable. You can create these relationships manually. If you want to use automatic relationship detection, remove each Named Set and add the individual fields from the Named Set directly to the PivotTable.
Inference of Relationships
In some cases, relationships between tables are automatically chained. For example, if you create a relationship between the first two sets of tables below, a relationship is inferred to exist between the other two tables, and a relationship is automatically established.
Products and Category -- created manually
Category and SubCategory -- created manually
Products and SubCategory -- relationship is inferred
In order for relationships to be automatically chained, the relationships must go in one direction, as shown above. If the initial relationships were between, for example, Sales and Products, and Sales and Customers, a relationship is not inferred. This is because the relationship between Products and Customers is a many-to-many relationship.
No comments:
Post a Comment