Distribution Fitting Software & Articles

# Goodness of Fit Tests

## Introduction

## Kolmogorov-Smirnov Test

### Definition

H_{0}: The data follow the specified distribution.

H_{A}: The data do not follow the specified distribution.

## Anderson-Darling Test

### Definition

H_{0}: The data follow the specified distribution.

H_{A}: The data do not follow the specified distribution.

## Chi-Squared Test

### Definition

,

where O_{i} is the observed frequency for bin i, and
E_{i} is the expected frequency for bin i calculated by
,

where F is the CDF of the probability distribution being tested, and x_{1}, x_{2} are the limits for bin i.

H_{0}: The data follow the specified distribution.

H_{A}: The data do not follow the specified distribution.

The goodness of fit (GOF) tests measure the compatibility of a random sample with a theoretical probability distribution function. In other words, these tests show how well the distribution you selected fits to your data.

The general procedure consists of defining a test statistic which is some function of the data measuring the distance between the hypothesis and the data, and then calculating the probability of obtaining data which have a still larger value of this test statistic than the value observed, assuming the hypothesis is true. This probability is called the confidence level.

Small probabilities (say, less than one percent) indicate a poor fit. Especially high probabilities (close to one) correspond to a fit which is too good to happen very often, and may indicate a mistake in the way the test was applied.

This test is used to decide if a sample comes from a hypothesized continuous
distributuion. It is based on the empirical cumulative distribution function
(ECDF). Assume that we have a random sample
*x*_{1}, ... , *x*_{n}
from some continuous distribution with CDF *F(x)*. The empirical CDF is
denoted by

The Kolmogorov-Smirnov statistic (D) is based on the largest vertical
difference between *F(x)* and *F _{n}(x)*. It is defined as

H

H

The hypothesis regarding the distributional form is rejected at the chosen
significance level (*alpha*) if the test statistic, D, is greater than the
critical value obtained from a table.

The Anderson-Darling procedure is a general test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails than the Kolmogorov-Smirnov test.

The Anderson-Darling statistic (A^{2}) is defined as

H

H

The hypothesis regarding the distributional form is rejected at the chosen
significance level (*alpha*) if the test statistic, A^{2}, is greater
than the critical value obtained from a table.

The Chi-Squared test is used to determine if a sample comes from a population with a specific distribution. This test is applied to binned data, so the value of the test statistic depends on how the data is binned.

Although there is no optimal choice for the number of bins (k), there are several formulas which can be used to calculate this number based on the sample size (N). For example, EasyFit employs the following empirical formula:

The data can be grouped into intervals of *equal probability*
or *equal width*. The first approach is generally more acceptable
since it handles peaked data much better. Each bin should contain at
least 5 or more data points, so certain adjacent bins sometimes need
to be joined together for this condition to be satisfied.

The Chi-Squared statistic is defined as

where O

where F is the CDF of the probability distribution being tested, and x

H

H

The hypothesis regarding the distributional form is rejected at the chosen significance level () if the test statistic is greater than the critical value defined as

meaning the Chi-Squared inverse CDF with k-1 degrees of freedom and a significance level of .

**See also:** EasyFit Help on goodness of fit tests