Should you just test as many distributions as possible, or consider some specific models? If you have some additional information about your data, chances are you can narrow your choice to those distributions which have the same properties as the random process you are dealing with.
A large number of continuous distributions have been developed over the past several centuries, and sometimes it can be difficult to understand the relationship between all these models: some distributions can "mimic" others, some are available in several "versions" etc.
One of the ways to classify the distributions is based on their range of definition: each distribution is defined on a specific segment (also called domain, support, or X-range), so the distributions with a similar domain fall into the same class. Based on the nature of your data, you can narrow your choice to one of the following distribution types:
For example, if you are analyzing the distribution of the customer service time, it is obvious that your data cannot contain negative values, and it's worth trying to fit the non-negative distributions in the first place.
Even though most continuous distributions fall into one of the three categories, some models are a bit more complex in this regard. These models have been developed not so long ago, compared to the "classical" distributions, and are just starting to gain traction. A good example of such a model is the Generalized Extreme Value (GEV) distribution which has three parameters (k, σ, μ) and the following range of definition:
As you can see, the range depends on the value of the shape parameter k, so this distribution cannot be easily classified into one of the categories we have discussed. In EasyFit, this distribution and some additional models (Generalized Pareto, Wakeby etc.) are called advanced, partly because they are more flexible than most classical distributions.
When fitting distributions to your data using EasyFit, you can specify the lower/upper domain bounds in the Distribution Fitting Options dialog (available from the Options menu):
The following options are available:
Based on your selection, EasyFit will determine one or several distribution types (bounded, unbounded, non-negative, advanced) which should be automatically fitted to your data. For example, if you set the Lower Bound to Closed and the Upper Bound to Open, EasyFit will fit the non-negative and advanced distributions:
By default, the advanced distributions are fitted regardless of your choice, but you can prevent EasyFit from fitting these models by deselecting them on the Distributions tab.
In contrast to continuous models, there are about 5-8 commonly used discrete distributions, including the Binomial, Geometric, Hypergeometric, Negative Binomial, and Poisson models. If you are dealing with discrete (integer) data, it shouldn't be a problem for you to select an appropriate model. Also, there is nothing wrong in using continuous distributions for discrete data analysis, so you can apply the techniques and features of EasyFit described above to analyze your data as if it were continuous.
Based on the nature of your data (e.g. where it comes from), you can sometimes judge whether the distribution is bounded, unbounded, or non-negative. When fitting distributions to your data, you can try to narrow your choice to those models which have the same range of definition as the underlying distribution. EasyFit enables you to specify whether the domain bounds are open, closed, or unknown, and depending on your choice, determines the appropriate distribution types.