This article discusses the probabilistic approach to making estimates and projections based on historical data widely used in many industries dealing with random data.
An estimate is a certain measure of a random process which usually applies to the current or previous time period (the more strict definition see below). In contrast, a projection refers to a future time period, and can be considered "an estimate of the future" based on past data.
The idea behind making future projections is based on the fact that many real-world random processes are considered to be stationary, meaning that the statistical characteristics (such as mean, variance) of these processes do not change over time. This enables you to use the theoretical distribution based on historical data to predict future events.
Note that there are also certain real-world processes that are nonstationary, and using past data generated by those random processes to make future projections can be incorrect. However, in many industries dealing with nonstationary random processes, such as investment, it is assumed that the statistical characteristics of the processes change over time, but not significantly, so the probabilistic approach can still be used to make predictions for a limited future time horizon. For example, if you determine the stock price distribution based on ten years of historical data, it would be safe to use the same distribution for a relatively short future time period such as one month.
In the context of using probability distributions, projections and estimates are performed in the same way, and these terms are used interchangeably throughout the article.
In order to make estimates or future projections based on your data, you first need to perform distribution fitting and select the model describing the random process you are dealing with. The problem of fitting probability distributions to data is discussed separately.
While calculating probabilities can be performed using the Cumulative Distribution Function (CDF) of the best fitting distribution, the estimates are made using the Quantile Function. The CDF indicates the probability that the random variate X takes on a value less than or equal to x:
F(x) = P(X≤x),
and the Quantile Function, also known as the Inverse CDF, is defined for continuous distributions in the following way:
x = F-1(p),
where p=F(x), and F(x) is the CDF of the same distribution. To make an estimate or a projection means to calculate x for a given probability value p.
For some continuous distributions, the analytical expression for the Inverse CDF can be easily derived from the CDF, however, for many models, including the Normal, Lognormal, Beta, and Gamma distributions, the Inverse CDF is not available in closed form, and should be evaluated using either iterative numerical methods or approximation formulas.
One of the popular applications of probability distributions is in queueing analysis to model the service times. For example, the probabilistic model can help you answer the typical question:
The service time is frequently described by the Exponential distribution which has the following CDF:
p = F(x) = 1 − exp(−λ * x),
and the Inverse CDF has the form:
x = F-1(p) = −ln(1−p) / λ,
where p is the probability (0≤p<1), and λ is the distribution parameter (λ>0) which is usually determined from historical data. Assuming that λ is known, you can easily evaluate the Inverse CDF at any p. For example, for λ=0.04 and p=0.95:
x = −ln(1−0.95) / 0.04 = 75,
indicating an estimate:
or a projection:
EasyFit provides the integrated StatAssist tool (available from the Tools menu) allowing you to make estimates/projections for all the commonly used distributions. If you have already fitted some distributions to your data, you can highlight and right-click one of the fitted models, and select StatAssist:
In StatAssist, open the Calculations tab, and specify the probability value. StatAssist will automatically calculate the Inverse CDF value indicating an estimate or a future projection:
The probabilistic estimates and projections can be performed using the Inverse CDF of the distribution that best fits to your historical data. Depending on the distribution, the Inverse CDF can be either derived from the CDF, or evaluated using numerical methods or approximation formulas. EasyFit allows to easily evaluate the Inverse CDF for more than 30 popular distributions, enabling you to apply the probabilistic model to obtain useful information about the random process you are dealing with.