Distribution Fitting - Which Distributions To Consider

EasyFit: select the best fitting distribution and use it to make better decisions. learn more

Should you just test as many distributions as possible, or consider some specific models? If you have some additional information about your data, chances are you can narrow your choice to those distributions which have the same properties as the random process you are dealing with.

Continuous Data Analysis

A large number of continuous distributions have been developed over the past several centuries, and sometimes it can be difficult to understand the relationship between all these models: some distributions can "mimic" others, some are available in several "versions" etc.

Distribution Types

One of the ways to classify the distributions is based on their range of definition: each distribution is defined on a specific segment (also called domain, support, or X-range), so the distributions with a similar domain fall into the same class. Based on the nature of your data, you can narrow your choice to one of the following distribution types:

the bounded distributions are defined for a ≤ x ≤ b, and include the Beta, Pert (also called Beta-PERT), Power Function, Triangular, Uniform, and Johnson SB models;
the unbounded distributions are defined on the entire real line: Normal, Cauchy, Gumbel (Extreme Value), Laplace (Double Exponential), Logistic, Student's t, and Johnson SU distributions;
the non-negative models are defined for x ≥ 0 (or x ≥ γ, where γ is a constant); EasyFit supports the Exponential, Gamma, Lognormal, Pareto, Rayleigh, Weibull, Chi-Squared, Inverse Gaussian, Frechet (Extreme Value Type II), and several additional distributions of this type.

For example, if you are analyzing the distribution of the customer service time, it is obvious that your data cannot contain negative values, and it's worth trying to fit the non-negative distributions in the first place.

Advanced Distributions

Even though most continuous distributions fall into one of the three categories, some models are a bit more complex in this regard. These models have been developed not so long ago, compared to the "classical" distributions, and are just starting to gain traction. A good example of such a model is the Generalized Extreme Value (GEV) distribution which has three parameters (k, σ, μ) and the following range of definition:

As you can see, the range depends on the value of the shape parameter k, so this distribution cannot be easily classified into one of the categories we have discussed. In EasyFit, this distribution and some additional models (Generalized Pareto, Wakeby etc.) are called advanced, partly because they are more flexible than most classical distributions.

Distribution Fitting Options

When fitting distributions to your data using EasyFit, you can specify the lower/upper domain bounds in the Distribution Fitting Options dialog (available from the Options menu):

The following options are available:

Open means that the distribution does not have a finite bound;
Closed indicates that the distribution has a finite bound:
- Estimate: the bound is unknown (EasyFit will try to estimate it from your data);
- Fixed: the bound is known (you should specify its value);
Unknown: select this option if you have no information on distribution bounds.

Based on your selection, EasyFit will determine one or several distribution types (bounded, unbounded, non-negative, advanced) which should be automatically fitted to your data. For example, if you set the Lower Bound to Closed and the Upper Bound to Open, EasyFit will fit the non-negative and advanced distributions:

By default, the advanced distributions are fitted regardless of your choice, but you can prevent EasyFit from fitting these models by deselecting them on the Distributions tab.

Discrete Data Analysis

In contrast to continuous models, there are about 5-8 commonly used discrete distributions, including the Binomial, Geometric, Hypergeometric, Negative Binomial, and Poisson models. If you are dealing with discrete (integer) data, it shouldn't be a problem for you to select an appropriate model. Also, there is nothing wrong in using continuous distributions for discrete data analysis, so you can apply the techniques and features of EasyFit described above to analyze your data as if it were continuous.

Conclusion

Based on the nature of your data (e.g. where it comes from), you can sometimes judge whether the distribution is bounded, unbounded, or non-negative. When fitting distributions to your data, you can try to narrow your choice to those models which have the same range of definition as the underlying distribution. EasyFit enables you to specify whether the domain bounds are open, closed, or unknown, and depending on your choice, determines the appropriate distribution types.

top