Calculating probabilities is perhaps the most common and intuitive application of distributions. This article explains how you can use the distribution you selected to perform probability calculations and apply the results to make informed decisions.
The ultimate goal of your analysis is to deal with uncertainty affecting your business, and calculating probabilities is the way to measure that uncertainty. In a typical scenario, you would define two or more possible outcomes, calculate the probability of each outcome, and make your decisions based on the results you get.
The best fitting probability distribution you selected by analyzing your data is the valid model of the random process you are dealing with. There are a number of useful functions associated with each probability distribution, and one of them is the Cumulative Distribution Function (CDF). This function enables you to calculate the probability of various outcomes, or events.
By definition, the Cumulative Distribution Function indicates the probability that the random variate X takes on a value less than or equal to x:
F(x) = P(X≤x)
The definition of the CDF is very simple yet compelling: in most cases, calculating the probability that the random variable is less than (or more than) some fixed value is exactly what you need. The CDF takes on values in the interval [0,1].
To calculate the probability that the variate X takes on a value more than x, you can use the following formula:
P(X>x) = 1 − P(X≤x) = 1 − F(x)
Similarly, you can calculate the probability that the random variable falls into some interval (x1,x2):
P(x1<X<x2) = P(X<x2) − P(X≤x1)
For continuous distributions, P(X=x) always equals zero, so the expressions "P(X≤x)" and "P(X<x)" are equivalent, and you can use the following formula:
P(x1<X<x2) = P(x1≤X≤x2) = F(x2) − F(x1)
The graph below shows the CDF of the standard Normal distribution (the red curve) and various probabilities we have discussed:
John is the head of the customer support team at a large company, and needs to measure the typical customer waiting (on hold) time in order to increase customer satisfaction. John defines two alternative outcomes:
If the probability of the bad outcome is too large (for example, more than 5%), John will need to consider hiring additional staff or take other steps necessary to reduce the customer waiting time.
Based on the historical records of the waiting time, John determines that the data is best described by the Exponential distribution with parameter λ=0.04. The cumulative distribution function of this distribution can be used to calculate the probability of both outcomes. The Exponential CDF is defined as:
F(x) = 1 − exp(−λ * x)
The probability of the first outcome can be calculated as:
P(X≤60) = F(60) = 1 − exp(−0.04 * 60) = 0.91 = 91%,
and the probability of the second outcome is as follows:
P(X>60) = 1 − F(60) = 0.09 = 9%,
meaning that 9% of customers wait on hold for more than 60 seconds. The analysis results suggest that the probability of the bad outcome is too high, and John makes the decision to hire extra support staff.
EasyFit allows you to easily calculate probabilities of various events using StatAssist - the integrated distribution viewer and probability calculator. To open the StatAssist window, select Tools|StatAssist from the main menu.
If you have already fitted some distributions to your data, you can highlight and right-click one of the fitted models, and select StatAssist:
This will display the fitted distribution in StatAssist (you don't need to manually specify the distribution parameters in this case). If you launch StatAssist from the main menu, it will display the default distribution.
In StatAssist, open the Delimiters pane, click on the "None" button, select "One Delimiter" from the popup menu, and specify the x1 value:
StatAssist will display the following values on the Probabilities tab:
If you need to calculate the probability P(x1<X<x2), you can select "Two Delimiters" from the popup menu, and specify x2. The Probabilities tab will update accordingly:
You can apply the Cumulative Distribution Function (CDF) to calculate the probability of various outcomes, and use this information to make appropriate business decisions. EasyFit allows you to easily calculate probabilities from more than 50 distributions using StatAssist - the built-in distribution viewer and calculator.
See also: Help on StatAssist