While the standard Excel package includes some basic statistical functions, its support for probability distributions is very limited and almost useless for real world probabilistic modelling applications. This article discusses the worksheet functions provided by EasyFitXL, the distribution fitting add-in for Excel, that can be applied to perform a range of calculations using a variety of probability distributions.
Contents
Introduction
Which Distribution Should You Use?
How The Distributions Are Presented in Excel / EasyFitXL
How To Specify The Distribution Functions
The Two Most Popular Distribution Functions - CDF and Inverse CDF
Other Useful Distribution Functions
If you work with random data of any kind, you can use the probability distributions to measure the uncertainty and deal with risk affecting your business. There are many different probability distributions developed by statisticians to model random data of different kinds, ranging from business data, finance data (stock prices) to engineering data (system failures) and environmental data (maximum wind speeds, flood flows).
EasyFitXL includes over 55 distributions covering most real world applications. For each distribution, EasyFitXL provides several functions to be used in your Excel sheets. These functions allow you to obtain comprehensive information on any distribution, and thus on the underlying random process you are modelling.
When performing the analysis, it is essential that you select a probability distribution that best fits to your data, since if you use an inappropriate distribution, you will get incorrect results leading to bad decisions. We have already covered the topic of fitting distributions to data in Excel using EasyFitXL, and in this article we will focus on applying the resulting distributions to derive useful information about your data.
Sometimes you already know which distribution to use, for example, based on expert judgement or some apriori information about your data. In this case, you don't need to perform distribution fitting, and can proceed directly to applying the distribution model, i.e. to calculating various properties of that distribution.
Any probability distribution can be defined as a function that takes on a number of parameters. Since all the required functions are already included in EasyFitXL, you only need to specify the distribution name and its actual parameters. For example, to define the Normal distribution with parameters =1, =0, you should enter the following string directly into an empty worksheet cell:
A | |
1 | Normal(1;0) |
2 |
Note that the distribution parameters should be separated by a semicolon, and the particular order of parameters should be exactly the same as specified in the documentation for EasyFitXL - see continuous and discrete distributions for details.
When using the distribution fitting feature, you don't really need to remember the distribution names and parameters: once the distributions are fitted, select one or several models from the list, right-click on the list, and choose the Copy menu option as indicated on the image below:
This will copy the selected models to the Clipboard in the required format, so you can paste them directly into a worksheet using the standard Excel Paste option.
Now that you have learned how to specify a distribution, we can proceed to discussing the
functions that do the actual calculations. All these functions have two things
in common: they accept a probability distribution in the format we
just mentioned, and their names start with "Dist
". For instance,
the =DistMean
function calculates the mean, or the expected
value, of a specified distribution:
A | |
1 | Normal(1;0) |
2 | =DistMean(A1) |
Similarly, the Probability Density Function, PDF(x), can be defined as
A | |
1 | Normal(1;0) |
2 | =DistPdf(A1;1.5) |
where 1.5 is the x-value at which you want to calculate the PDF(x). The table below lists some other supported distribution functions:
Description | Function Name & Arguments |
Probability Density Function | =DistPdf(Distribution;x) |
Cumulative Distribution Function | =DistCdf(Distribution;x) |
Inverse Cumulative Distribution Function | =DistInv(Distribution;P) |
Survival Function | =DistSurv(Distribution;x) |
Hazard Function | =DistHaz(Distribution;x) |
Mode (most likely value) | =DistMode(Distribution) |
Mean (expected value) | =DistMean(Distribution) |
Variance | =DistVar(Distribution) |
Some functions such as the mean, variance, PDF, and CDF are available in a simplified
form for your convenience. For example, rather than having to specify the Normal(1;0)
distribution in one cell and
the sheet function =DistMean
in another, you can just enter =NormalMean(1;0)
in a single cell, and that will return exactly the same value.
While there are many different functions associated with each distribution, the Cumulative Distribution Function (CDF) and the Inverse Cumulative Distribution Function (ICDF) are used much more frequently than the others combined. The applications of these functions are covered in separate tutorials - see calculating probabilities and making estimates & projections for details.
The CDF of any distribution at a given x-value can be calculated using the
=DistCdf
function that accepts a distribution
and an actual x-value. The return value is the probability
a random variate takes on a value less than or equal to x, for example:
A | |
1 | Normal(1;0) |
2 | =DistCdf(A1;1.5) |
will return the value of 0.9332 in cell A2, indicating the probability a random variate X (following the specified Normal distribution) is less than 1.5: P(X≤x) = P(X≤1.5) = 0.9332. There is also a simplified version of this function:
A | |
1 | =NormalCdf(1.5;1;0) |
2 |
Note that the x-value has to be specified before the distribution parameters in this case. For details on using the Cumulative Distribution Function. see calculating probabilities.
The ICDF of a distribution at a given P-value (probability) can be calculated using the
=DistInv
function:
A | |
1 | Normal(1;0) |
2 | =DistInv(A1;0.7) |
This will return the value of 0.5244 in cell A2, indicating the fixed x-value (x=0.5244) such that P(X≤x)=0.7, where X is a normally distributed random variate, and P=0.7 is a predefined probability value. In other words, given the P-value, this function returns the corresponding x-value - that is why it is called the Inverse CDF.
The Inverse CDF has a simplified version as well:
A | |
1 | =NormalInv(0.7;1;0) |
2 |
When using the simplified version of the ICDF worksheet function, the P-value needs to go first in the argument list. See also making estimates & projections for a more detailed description and real world applications of this function.
Besides the CDF and the Inverse CDF, there are some additional functions useful for real world data analysis. The basic statistics can be used when analyzing any kind of data, while the Survival and Hazard functions are primarily applied in specific areas.
The particular benefit of using EasyFitXL is that it allows you to calculate basic statistics of most distributions. The sample worksheet below shows how to use the worksheet functions of EasyFitXL to calculate various statistics of a specifed distribution - in this example we use the Lognormal distribution with =1, =1:
A | B | C | |
1 | Lognormal(1;1) | ||
2 | =DistMode(A1) | most likely value | |
3 | =DistMean(A1) | =LognormalMean(1;1) | average value |
4 | =DistVar(A1) | =LognormalVar(1;1) | variance |
5 | =DistStdev(A1) | =LognormalStdev(1;1) | spread, dispersion |
6 | =DistSkew(A1) | skewness | |
7 | =DistKurt(A1) | kurtosis |
The distribution mode and mean are perhaps the most useful statistics indicating the most likely value and the average value of your data respectively.
While both the distribution variance and the standard deviation indicate the spread of your data around the average value, the standard deviation is more useful, since it has the same units as the original variable and can be easier to interpret for this reason.
The following three functions are primarily used in specific areas including survival analysis and reliability engineering applications:
A | B | C | |
1 | Lognormal(1;1) | ||
2 | =DistSurv(A1;4) | survival function at x=4 | |
3 | =DistHaz(A1;4) | =LognormalHaz(4;1;1) | hazard function at x=4 |
4 | =DistCumHaz(A1;4) | cumulative hazard function at x=4 |
See EasyFitXL online help on worksheet functions for a complete list of supported functions and additional examples.