Distribution Fitting Software & Articles

EasyFit: select the best fitting distribution and use it to make better decisions. learn more
EasyFit Screenshot - Click To Enlarge
Download Free Trial

How To Select The Best Fitting Distribution Using The Goodness of Fit Tests

You have fitted some distributions to your data, and now need to determine the most valid model. How do you compare the fitted distributions? How do you know whether a certain distribution is a good fit? The goodness of fit (GOF) tests can help you answer these and some related questions.

How The Goodness of Fit Tests Work

The idea behind the goodness of fit tests is to measure the "distance" between the data and the distribution you are testing, and compare that distance to some threshold value. If the distance (called the test statistic) is less than the threshold value (the critical value), the fit is considered good.

The logic of applying various goodness of fit tests is the same, however, they differ in how the test statistics and critical values are calculated. The test statistics are usually defined as some function of sample data and the theoretical (fitted) cumulative distribution function.

The critical values depend on the sample size and the significance level chosen. The significance level is the probability of rejecting a fitted distribution (as if it was a bad fit) when it is actually a good fit. The significance level is indicated by the Greek letter α (alpha), and the most commonly used levels are 0.05 and 0.01. For example, if you choose the 0.05 level when performing the goodness of fit tests, the probability of rejecting a good fit in error would be 5%.

EasyFit supports all the most popular goodness of fit tests, including the Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared tests. Once the distributions are fitted, EasyFit displays the goodness of fit reports which include the test statistics and critical values calculated for various significance levels:

Goodness of Fit - Details
 

The goodness of fit reports can be used to determine whether a certain probability distribution is a good fit.

How To Compare The Fit of Several Distributions

Since the goodness of fit test statistics indicate the distance between the data and the fitted distributions, it is obvious that the distribution with the lowest statistic value is the best fitting model. Based on this fact, EasyFit assigns each distribution a rank (1 = the very best model, 2 = the next best model etc.), allowing you to easily compare the fited models and select the most valid one:

Goodness of Fit - Summary
 

To order the fitted distributions by one of the goodness of fit test statistics, click the appropriate test name. A similar option is available on the Graphs page - right click the distribution list to show the menu:

Ordering The Distribution List
 

In this example, the distributions are ordered by the Kolmogorov-Smirnov test statistic, and the best fitting distribution (Gamma) is displayed at the top of the list.

Conclusion

The goodness of fit tests can be used to compare the fitted distributions, select one of the models, and determine how well it fits to your data. EasyFit displays the interactive reports allowing you to take a quick look at the fitted distributions, as well as to evaluate the goodness of fit of particular models at various significance levels.