Thursday, February 15, 2018

Data Analysis

Data Analysis

  • If the Data Analysis dialog box is displayed, click the tool that you want to use under Analysis Tools, and then click OK.

  • Enter the appropriate data and click the options in the dialog box for the tool that you chose, and then click OK.

    For a description of each tool and information about how to use the dialog box for each tool, click a tool name in the following list:

    Anova

    The Anova analysis tools provide different types of variance analysis. The tool that you should use depends on the number of factors and the number of samples that you have from the populations that you want to test.

    Anova: Single Factor

    This tool performs a simple analysis of variance on data for two or more samples. The analysis provides a test of the hypothesis that each sample is drawn from the same underlying probability distribution against the alternative hypothesis that underlying probability distributions are not the same for all samples. If there are only two samples, you can use the worksheet function TTEST. With more than two samples, there is no convenient generalization of TTEST, and the Single Factor Anova model can be called upon instead.

    Anova: Single Factor dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

    Grouped by      To indicate whether the data in the input range is arranged in rows or in columns, click Rows or Columns.

    Labels in First Row/Labels in First Column     If the first row of your input range contains labels, select the Labels in First Row check box. If the labels are in the first column of your input range, select the Labels in First Column check box. This check box is clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the level at which you want to evaluate the critical values for the F statistic. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data or extend beyond the bounds of the worksheet.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook, and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Anova: Two-Factor with Replication

    This analysis tool is useful when data can be classified along two different dimensions. For example, in an experiment to measure the height of plants, the plants may be given different brands of fertilizer (for example, A, B, C) and might also be kept at different temperatures (for example, low, high). For each of the six possible pairs of {fertilizer, temperature}, we have an equal number of observations of plant height. Using this Anova tool, we can test:

    • Whether the heights of plants for the different fertilizer brands are drawn from the same underlying population. Temperatures are ignored for this analysis.

    • Whether the heights of plants for the different temperature levels are drawn from the same underlying population. Fertilizer brands are ignored for this analysis.

      Whether having accounted for the effects of differences between fertilizer brands found in the first bulleted point and differences in temperatures found in the second bulleted point, the six samples representing all pairs of {fertilizer, temperature} values are drawn from the same population. The alternative hypothesis is that there are effects due to specific {fertilizer, temperature} pairs over and above the differences that are based on fertilizer alone or on temperature alone.

      input range setup for anova tool

      Anova: Two-Factor With Replication dialog box

      Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

      Rows per sample      Enter the number of rows contained in each sample. Each sample must contain the same number of rows, because each row represents a replication of the data.

      Alpha      Enter the level at which you want to evaluate critical values for the F statistic. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

      Output Range      Enter the reference for the upper-left cell of the output table. Microsoft Office Excel automatically determines the size of the output area and displays a message if the output table will replace existing data or extend beyond the boundaries of the worksheet.

      New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

      New Workbook      Click to create a new workbook in which results are added to a new worksheet.

      Anova: Two-Factor Without Replication

      This analysis tool is useful when data is classified on two different dimensions as in the Two-Factor case With Replication. However, for this tool it is assumed that there is only a single observation for each pair (for example, each {fertilizer, temperature} pair in the preceding example).

      Anova: Two-Factor Without Replication dialog box

      Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

      Labels      This check box is clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

      Alpha      Enter the level at which you want to evaluate critical values for the F statistic. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

      Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data or extend beyond the boundaries of the worksheet.

      New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

      New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Correlation

    The CORREL and PEARSON worksheet functions both calculate the correlation coefficient between two measurement variables when measurements on each variable are observed for each of N subjects. (Any missing observation for any subject causes that subject to be ignored in the analysis.) The Correlation analysis tool is particularly useful when there are more than two measurement variables for each of N subjects. It provides an output table, a correlation matrix, that shows the value of CORREL (or PEARSON) applied to each possible pair of measurement variables.

    The correlation coefficient, like the covariance, is a measure of the extent to which two measurement variables "vary together." Unlike the covariance, the correlation coefficient is scaled so that its value is independent of the units in which the two measurement variables are expressed. (For example, if the two measurement variables are weight and height, the value of the correlation coefficient is unchanged if weight is converted from pounds to kilograms.) The value of any correlation coefficient must be between -1 and +1 inclusive.

    You can use the correlation analysis tool to examine each pair of measurement variables to determine whether the two measurement variables tend to move together — that is, whether large values of one variable tend to be associated with large values of the other (positive correlation), whether small values of one variable tend to be associated with large values of the other (negative correlation), or whether values of both variables tend to be unrelated (correlation near 0 (zero)).

    Correlation dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

    Grouped By      To indicate whether the data in the input range is arranged in rows or in columns, click Rows or Columns.

    Labels in First Row/Labels in First Column     If the first row of your input range contains labels, select the Labels in First Row check box. If the labels are in the first column of your input range, select the Labels in First Column check box. This check box is clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel populates only half of the table, because correlation between two ranges of data is independent of the order in which the ranges are processed. Cells in the output table with matching row and column coordinates contain the value 1, because each data set correlates exactly with itself.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Covariance

    The Correlation and Covariance tools can both be used in the same setting, when you have N different measurement variables observed on a set of individuals. The Correlation and Covariance tools each give an output table, a matrix, that shows the correlation coefficient or covariance, respectively, between each pair of measurement variables. The difference is that correlation coefficients are scaled to lie between -1 and +1 inclusive. Corresponding covariances are not scaled. Both the correlation coefficient and the covariance are measures of the extent to which two variables "vary together."

    The Covariance tool computes the value of the worksheet function COVAR for each pair of measurement variables. (Direct use of COVAR rather than the Covariance tool is a reasonable alternative when there are only two measurement variables, that is, N=2.) The entry on the diagonal of the Covariance tool's output table in row i, column i is the covariance of the i-th measurement variable with itself. This is just the population variance for that variable, as calculated by the worksheet function VARP.

    You can use the covariance tool to examine each pair of measurement variables to determine whether the two measurement variables tend to move together — that is, whether large values of one variable tend to be associated with large values of the other (positive covariance), whether small values of one variable tend to be associated with large values of the other (negative covariance), or whether values of both variables tend to be unrelated (covariance near 0 (zero)).

    Covariance dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

    Grouped By      To indicate whether the data in the input range is arranged in rows or columns, click Rows or Columns.

    Labels in First Row/Labels in First Column     If the first row of your input range contains labels, select the Labels in First Row check box. If the labels are in the first column of your input range, select the Labels in First Column check box. This check box is clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel populates only half of the table, because covariance between two ranges of data is independent of the order in which the ranges are processed. The table's diagonal contains the variance of each range.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Descriptive Statistics

    The Descriptive Statistics analysis tool generates a report of univariate statistics for data in the input range, providing information about the central tendency and variability of your data.

    Descriptive Statistics dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze. The reference must consist of two or more adjacent ranges of data arranged in columns or rows.

    Grouped By      To indicate whether the data in the input range is arranged in rows or in columns, click Rows or Columns.

    Labels in First Row/Labels in First Column     If the first row of your input range contains labels, select the Labels in First Row check box. If the labels are in the first column of your input range, select the Labels in First Column check box. This check box is clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Confidence Level for Mean      Select if you want to include a row in the output table for the confidence level of the mean. In the box, enter the confidence level that you want to use. For example, a value of 95 percent calculates the confidence level of the mean at a significance of 5 percent.

    Kth Largest      Select if you want to include a row in the output table for the kth largest value for each range of data. In the box, enter the number to use for k. If you enter 1, this row contains the maximum of the data set.

    Kth Smallest      Select if you want to include a row in the output table for the kth smallest value for each range of data. In the box, enter the number to use for k. If you enter 1, this row contains the minimum of the data set.

    Output Range      Enter the reference for the upper-left cell of the output table. This tool produces two columns of information for each data set. The left column contains statistics labels, and the right column contains the statistics. Excel writes a two-column table of statistics for each column or row in the input range, depending on the Grouped By option that you select.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Summary statistics      Select if you want Excel to produce one field for each of the following statistics in the output table: Mean, Standard Error (of the mean), Median, Mode, Standard Deviation, Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, Count, Largest (#), Smallest (#), and Confidence Level.

    Exponential Smoothing

    The Exponential Smoothing analysis tool predicts a value that is based on the forecast for the prior period, adjusted for the error in that prior forecast. The tool uses the smoothing constant a, the magnitude of which determines how strongly the forecasts respond to errors in the prior forecast.

    Note: Values of 0.2 to 0.3 are reasonable smoothing constants. These values indicate that the current forecast should be adjusted 20 percent to 30 percent for error in the prior forecast. Larger constants yield a faster response but can produce erratic projections. Smaller constants can result in long lags for forecast values.

    Exponential Smoothing dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze. The range must contain a single column or row with four or more cells of data.

    Damping factor      Enter the damping factor that you want to use as the exponential smoothing constant. The damping factor is a corrective factor that minimizes the instability of data that is collected across a population. The default damping factor is 0.3.

    Note: Values of 0.2 to 0.3 are reasonable smoothing constants. These values indicate that the current forecast should be adjusted 20 percent to 30 percent for error in the prior forecast. Larger constants yield a faster response but can produce erratic projections. Smaller constants can result in long lags for forecast values.

    Labels      Select if the first row and column of your input range contain labels. Clear this check box if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. If you select the Standard Errors check box, Excel generates a two-column output table with standard error values in the right column. If there are insufficient historical values to project a forecast or calculate a standard error, Excel returns the #N/A error value.

    Note: The output range must be on the same worksheet as the data that is used in the input range. For this reason, the New Worksheet Ply and New Workbook options are unavailable.

    Chart Output      Select to generate an embedded chart for the actual and forecast values in the output table.

    Standard Errors      Select if you want to include a column that contains standard error values in the output table. Clear if you want a single-column output table without standard error values.

    F-Test Two-Sample for Variances

    The F-Test Two-Sample for Variances analysis tool performs a two-sample F-test to compare two population variances.

    For example, you can use the F-Test tool on samples of times in a swim meet for each of two teams. The tool provides the result of a test of the null hypothesis that these two samples come from distributions with equal variances, against the alternative that the variances are not equal in the underlying distributions.

    The tool calculates the value f of an F-statistic (or F-ratio). A value of f close to 1 provides evidence that the underlying population variances are equal. In the output table, if f < 1 "P(F <= f) one-tail" gives the probability of observing a value of the F-statistic less than f when population variances are equal, and "F Critical one-tail" gives the critical value less than 1 for the chosen significance level, Alpha. If f > 1, "P(F <= f) one-tail" gives the probability of observing a value of the F-statistic greater than f when population variances are equal, and "F Critical one-tail" gives the critical value greater than 1 for Alpha.

    F-Test Two-Sample for Variances dialog box

    Variable 1 Range      Enter the reference for the first column or row of data that you want to analyze.

    Variable 2 Range      Enter the reference for the second column or row of data that you want to analyze.

    Labels      Select if the first row or column of your input range contains labels. Clear this check box if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the confidence level for the test. This value must be in the range 0...1. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Fourier Analysis

    The Fourier Analysis tool solves problems in linear systems and analyzes periodic data by using the Fast Fourier Transform (FFT) method to transform data. This tool also supports inverse transformations, in which the inverse of transformed data returns the original data.

    Input and output ranges for Fourier analysis

    Fourier Analysis dialog box

    Input Range      Enter the reference for the range of real or complex data that you want to transform. Complex data must be in either x+yi or x+yj format. The number of input range values must be an even power of 2. If x is a negative number, precede it with an apostrophe ( ' ). The maximum number of values is 4096.

    Labels in First Row      Select if the first row of your input range contains labels. Clear if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Inverse      If this check box is selected, the data in the input range is considered transformed, and an inverse transformation is performed, returning the original inputs. If this check box is cleared, the data in the input range is transformed in the output table.

    Histogram

    The Histogram analysis tool calculates individual and cumulative frequencies for a cell range of data and data bins. This tool generates data for the number of occurrences of a value in a data set.

    For example, in a class of 20 students, you can determine the distribution of scores in letter-grade categories. A histogram table presents the letter-grade boundaries and the number of scores between the lowest bound and the current bound. The single most-frequent score is the mode of the data.

    Histogram dialog box

    Input Range      Enter the cell reference for the range of data that you want to analyze.

    Bin Range (optional)      Enter the cell reference to a range that contains an optional set of boundary values that define bin ranges. These values should be in ascending order. Microsoft Office Excel counts the number of data points between the current bin number and the adjoining higher bin, if any. A number is counted in a particular bin if it is equal to or less than the bin number down to the last bin. All values below the first bin value are counted together, as are the values above the last bin value.

    If you omit the bin range, Excel creates a set of evenly distributed bins between the data's minimum and maximum values.

    Labels      Select if the first row or column of your input range contains labels. Clear this check box if your input range has no labels. Excel generates the appropriate data labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Pareto (sorted histogram)      Select to present data in the output table in descending order of frequency. If this check box is cleared, Excel presents the data in ascending order and omits the three rightmost columns that contain the sorted data.

    Cumulative Percentage      Select to generate an output table column for cumulative percentages and to include a cumulative percentage line in the histogram chart. Clear to omit the cumulative percentages.

    Chart Output      Select to generate an embedded histogram chart with the output table.

    Moving Average

    The Moving Average analysis tool projects values in the forecast period, based on the average value of the variable over a specific number of preceding periods. A moving average provides trend information that a simple average of all historical data would mask. Use this tool to forecast sales, inventory, or other trends. Each forecast value is based on the following formula.

    Formula to calculate moving averages

    where:

    • N is the number of prior periods to include in the moving average

    • Aj is the actual value at time j

    • Fj is the forecasted value at time j

      Moving Average dialog box

      Input Range      Enter the cell reference for the range of data that you want to analyze. The range must consist of a single column that contains four or more cells of data.

      Labels in First Row      Select if the first row of your input range contains labels. Clear this check box if your input range has no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

      Interval      Enter the number of values that you want to include in the moving average. The default interval is 3.

      Output Range      Enter the reference for the upper-left cell of the output table. If you select the Standard Errors check box, Excel generates a two-column output table with standard error values in the right column. If there are insufficient historical values to project a forecast or calculate a standard error, Excel returns the #N/A error value.

      The output range must be on the same worksheet as the data that is used in the input range. For this reason, the New Worksheet Ply and New Workbook options are unavailable.

      Chart Output      Select to generate an embedded histogram chart with the output table.

      Standard Errors      Select if you want to include a column that contains standard error values in the output table. Clear if you want a single-column output table without standard error values.

    Random Number Generation

    The Random Number Generation analysis tool fills a range with independent random numbers that are drawn from one of several distributions. You can characterize the subjects in a population with a probability distribution. For example, you can use a normal distribution to characterize the population of individuals' heights, or you can use a Bernoulli distribution of two possible outcomes to characterize the population of coin-flip results.

    Random Number Generation dialog box

    Number of Variables      Enter the number of columns of values that you want in the output table. If you do not enter a number, Microsoft Office Excel fills all columns in the output range that you specify.

    Number of Random Numbers      Enter the number of data points that you want to see. Each data point appears in a row of the output table. If you do not enter a number, Excel fills all rows in the output range that you specify.

    Distribution      Click the distribution method that you want to use to create random values.

    Uniform      This is characterized by lower and upper bounds. Variables are drawn with equal probability from all values in the range. A common application uses a uniform distribution in the range 0...1.

    Normal      This is characterized by a mean and a standard deviation. A common application uses a mean of 0 and a standard deviation of 1 for the standard normal distribution.

    Bernoulli      This is characterized by a probability of success (p value) on a given trial. Bernoulli random variables have the value 0 or 1. For example, you can draw a uniform random variable in the range 0...1. If the variable is less than or equal to the probability of success, the Bernoulli random variable is assigned the value 1; otherwise, it is assigned the value 0.

    Binomial      This is characterized by a probability of success (p value) for a number of trials. For example, you can generate number-of-trials Bernoulli random variables, the sum of which is a binomial random variable.

    Poisson      This is characterized by a value lambda, equal to 1/mean. Poisson distribution is often used to characterize the number of events that occur per unit of time — for example, the average rate at which cars arrive at a toll plaza.

    Patterned      This is characterized by a lower bound and an upper bound, a step, repetition rate for values, and a repetition rate for the sequence.

    Discrete      This is characterized by a value and the associated probability range. The range must contain two columns: The left column contains values, and the right column contains probabilities that are associated with the value in that row. The sum of the probabilities must be 1.

    Parameters      Enter values to characterize the distribution that is selected.

    Random Seed      Enter an optional value from which to generate random numbers. You can reuse this value later to produce the same random numbers.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Rank and Percentile

    The Rank and Percentile analysis tool produces a table that contains the ordinal and percentage rank of each value in a data set. You can analyze the relative standing of values in a data set. This tool uses the worksheet functions RANK and PERCENTRANK. RANK does not account for tied values. If you want to account for tied values, use the worksheet function RANK together with the correction factor that is suggested in the Help file for RANK.

    Rank and Percentile dialog box

    Input Range      Enter the cell reference for the range of worksheet data that you want to analyze.

    Grouped By      To indicate whether the data in the input range is arranged in rows or columns, click Rows or Columns.

    Labels in First Row/Labels in First Column     If you click Columns and the first row of your input range contains labels, select the Labels in First Row check box. If you click Rows and the first column of your input range contains labels, select the Labels in First Column check box. Clear the check box if your input range has no labels. Microsoft Office Excel generates the appropriate row and column labels for the output table.

    Output Range      Enter the reference for the upper-left cell of the output table. Excel generates one output table for each data set in the input range. Each output table contains four columns: the data point number, the data point value, the data point rank, and the data point percentage rank, sorted in ascending rank order.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Regression

    The Regression analysis tool performs linear regression analysis by using the "least squares" method to fit a line through a set of observations. You can analyze how a single dependent variable is affected by the values of one or more independent variables. For example, you can analyze how an athlete's performance is affected by such factors as age, height, and weight. You can apportion shares in the performance measure to each of these three factors, based on a set of performance data, and then use the results to predict the performance of a new, untested athlete.

    The Regression tool uses the worksheet function LINEST.

    Regression dialog box

    Input Y Range      Enter the reference for the range of dependent data. The range must consist of a single column of data.

    Input X Range      Enter the reference for the range of independent data. Microsoft Office Excel orders independent variables from this range in ascending order from left to right. The maximum number of independent variables is 16.

    Labels      Select if the first row or column of your input range or ranges contains labels. Clear if your input has no labels. Excel generates the appropriate data labels for the output table.

    Confidence Level      Select to include an additional level in the summary output table. In the box, enter the confidence level that you want to be applied, in addition to the default 95 percent level.

    Constant is Zero      Select to force the regression line to pass through the origin.

    Output Range      Enter the reference for the upper-left cell of the output table. Allow at least seven columns for the summary output table, which includes an Anova table, coefficients, standard error of y estimate, r2 values, number of observations, and standard error of coefficients.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Residuals      Select to include residuals in the residuals output table.

    Standardized Residuals      Select to include standardized residuals in the residuals output table.

    Residual Plots      Select to generate a chart for each independent variable versus the residual.

    Line Fit Plots      Select to generate a chart for predicted values versus the observed values.

    Normal Probability Plots      Select to generate a chart that plots normal probability.

    Sampling

    The Sampling analysis tool creates a sample from a population by treating the input range as a population. When the population is too large to process or chart, you can use a representative sample. You can also create a sample that contains only the values from a particular part of a cycle if you believe that the input data is periodic. For example, if the input range contains quarterly sales figures, sampling with a periodic rate of four places the values from the same quarter in the output range.

    Sampling dialog box

    Input Range      Enter the references for the range of data that contains the population of values that you want to sample. Microsoft Office Excel draws samples from the first column, then the second column, and so on.

    Labels      Select if the first row or column of your input range contains labels. Clear if your input range has no labels. Excel generates the appropriate data labels for the output table.

    Sampling Method      Click Periodic or Random to indicate the sampling interval that you want.

    Period      Enter the periodic interval at which you want the sampling to take place. The period-th value in the input range and every period-th value thereafter are copied to the output column. Sampling stops when the end of the input range is reached.

    Number of Samples      Enter the number of random values that you want in the output column. Each value is drawn from a random position in the input range, and any number can be selected more than once.

    Output Range      Enter the reference for the upper-left cell of the output table. Data is written in a single column below the cell. If you select Periodic, the number of values in the output table is equal to the number of values in the input range, divided by the sampling rate. If you select Random, the number of values in the output table is equal to the number of samples.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    t-Test

    The Two-Sample t-Test analysis tools test for equality of the population means that underlie each sample. The three tools employ different assumptions: that the population variances are equal, that the population variances are not equal, and that the two samples represent before-treatment and after-treatment observations on the same subjects.

    For all three tools below, a t-Statistic value, t, is computed and shown as "t Stat" in the output tables. Depending on the data, this value, t, can be negative or nonnegative. Under the assumption of equal underlying population means, if t < 0, "P(T <= t) one-tail" gives the probability that a value of the t-Statistic would be observed that is more negative than t. If t >=0, "P(T <= t) one-tail" gives the probability that a value of the t-Statistic would be observed that is more positive than t. "t Critical one-tail" gives the cutoff value, so that the probability of observing a value of the t-Statistic greater than or equal to "t Critical one-tail" is Alpha.

    "P(T <= t) two-tail" gives the probability that a value of the t-Statistic would be observed that is larger in absolute value than t. "P Critical two-tail" gives the cutoff value, so that the probability of an observed t-Statistic larger in absolute value than "P Critical two-tail" is Alpha.

    t-Test: Paired Two Sample For Means

    You can use a paired test when there is a natural pairing of observations in the samples, such as when a sample group is tested twice — before and after an experiment. This analysis tool and its formula perform a paired two-sample Student's t-Test to determine whether observations that are taken before a treatment and observations taken after a treatment are likely to have come from distributions with equal population means. This t-test form does not assume that the variances of both populations are equal.

    Note: Among the results that are generated by this tool is pooled variance, an accumulated measure of the spread of data about the mean, which is derived from the following formula.

    Formula to calculate pooled variance

    t-Test: Paired Two Sample for Means dialog box

    Variable 1 Range      Enter the cell reference for the first range of data that you want to analyze. The range must consist of a single column or row and contain the same number of data points as the second range.

    Variable 2 Range      Enter the cell reference for the second range of data that you want to analyze. The range must consist of a single column or row and contain the same number of data points as the first range.

    Hypothesized Mean Difference      Enter the number that you want for the shift in sample means. A value of 0 (zero) indicates that the sample means are hypothesized to be equal.

    Labels      Select if the first row or column of your input ranges contains labels. Clear this check box if your input ranges have no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the confidence level for the test. This value must be in the range 0...1. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    t-Test: Two-Sample Assuming Equal Variances

    This analysis tool performs a two-sample student's t-test. This t-test form assumes that the two data sets came from distributions with the same variances. It is referred to as a homoscedastic t-test. You can use this t-test to determine whether the two samples are likely to have come from distributions with equal population means.

    t-Test: Two-Sample Assuming Equal Variances dialog box

    Variable 1 Range      Enter the cell reference for the first range of data that you want to analyze. The range must consist of a single column or row of data.

    Variable 2 Range      Enter the cell reference for the second range of data that you want to analyze. The range must consist of a single column or row of data.

    Hypothesized Mean Difference      Enter the number that that you want for the shift in sample means. A value of 0 (zero) indicates that the sample means are hypothesized to be equal.

    Labels      Select if the first row or column of your input ranges contains labels. Clear this check box if your input ranges have no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the confidence level for the test. This value must be in the range 0...1. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    t-Test: Two-Sample Assuming Unequal Variances

    This analysis tool performs a two-sample student's t-test. This t-test form assumes that the two data sets came from distributions with unequal variances. It is referred to as a heteroscedastic t-test. As with the preceding Equal Variances case, you can use this t-test to determine whether the two samples are likely to have come from distributions with equal population means. Use this test when there are distinct subjects in the two samples. Use the Paired test, described in the follow example, when there is a single set of subjects and the two samples represent measurements for each subject before and after a treatment.

    The following formula is used to determine the statistic value t.

    Formula to calculate value t

    The following formula is used to calculate the degrees of freedom, df. Because the result of the calculation is usually not an integer, the value of df is rounded to the nearest integer to obtain a critical value from the t table. The Excel worksheet function TTEST uses the calculated df value without rounding, because it is possible to compute a value for TTEST with a noninteger df. Because of these different approaches to determining the degrees of freedom, the results of TTEST and this t-Test tool will differ in the Unequal Variances case.

    Formula to approximate degrees of freedom

    t-Test: Two-Sample Assuming Unequal Variances dialog box

    Variable 1 Range      Enter the cell reference for the first range of data that you want to analyze. The range must consist of a single column or row of data.

    Variable 2 Range      Enter the cell reference for the second range of data that you want to analyze. The range must consist of a single column or row of data.

    Hypothesized Mean Difference      Enter the number that you want for the shift in sample means. A value of 0 (zero) indicates that the sample means are hypothesized to be equal.

    Labels      Select if the first row or column of your input ranges contains labels. Clear this check box if your input ranges have no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the confidence level for the test. This value must be in the range 0...1. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    z-Test

    The z-Test: Two Sample for Means analysis tool performs a two sample z-Test for means with known variances. This tool is used to test the null hypothesis that there is no difference between two population means against either one-sided or two-sided alternative hypotheses. If variances are not known, the worksheet function ZTEST should be used instead.

    When you use the z-Test tool, be careful to understand the output. "P(Z <= z) one-tail" is really P(Z >= ABS(z)), the probability of a z-value further from 0 in the same direction as the observed z value when there is no difference between the population means. "P(Z <= z) two-tail" is really P(Z >= ABS(z) or Z <= -ABS(z)), the probability of a z-value further from 0 in either direction than the observed z-value when there is no difference between the population means. The two-tailed result is just the one-tailed result multiplied by 2. The z-Test tool can also be used for the case where the null hypothesis is that there is a specific nonzero value for the difference between the two population means. For example, you can use this test to determine differences between the performances of two car models.

    z-Test: Two Sample for Means dialog box

    Variable 1 Range      Enter the cell reference for the first range of data that you want to analyze. The range must consist of a single column or row of data.

    Variable 2 Range      Enter the cell reference for the second range of data that you want to analyze. The range must consist of a single column or row of data.

    Hypothesized Mean Difference      Enter the number that you want for the shift in sample means. A value of 0 (zero) indicates that the sample means are hypothesized to be equal.

    Variable 1 Variance (known)      Enter the known population variance for the Variable 1 input range.

    Variable 2 Variance (known)      Enter the known population variance for the Variable 2 input range.

    Labels      Select if the first row or column of your input ranges contains labels. Clear this check box if your input ranges have no labels. Microsoft Office Excel generates the appropriate data labels for the output table.

    Alpha      Enter the confidence level for the test. This value must be in the range 0...1. The alpha level is a significance level that is related to the probability of having a type I error (rejecting a true hypothesis).

    Output Range      Enter the reference for the upper-left cell of the output table. Excel automatically determines the size of the output area and displays a message if the output table will replace existing data.

    New Worksheet Ply      Click to insert a new worksheet in the current workbook and paste the results starting at cell A1 of the new worksheet. To name the new worksheet, type a name in the box.

    New Workbook      Click to create a new workbook in which results are added to a new worksheet.

    Notes: 

    • The data analysis functions can be used on only one worksheet at a time. When you perform data analysis on grouped worksheets, results will appear on the first worksheet, and empty formatted tables will appear on the remaining worksheets. To perform data analysis on the remainder of the worksheets, recalculate the analysis tool for each worksheet.

1 comment:

  1. A data quality assurance service https://provectus.com/ can help you maintain the accuracy and consistency of your data. This involves detecting and eliminating errors and anomalies. Some data cleansing services can be performed manually or automatically using data quality tools and batch processing techniques. Others can be done through data migration or scripting. Some services also include parsing and standardizing values.

    ReplyDelete