Distribution Fitting Software & Articles

# How To Create And Use Histograms

## What Is a Histogram?

## Creating a Histogram

k = 1 + log_{2}N,
## Displaying Fitted Distributions On Top Of a Histogram

### How To Properly Display The Probability Density Function

Area = SUM(W*n_{i}/N) = W/N*SUM(n_{i}) = W/N*N = W
## Conclusions

Using histograms is perhaps the most popular and intuitive way
to display random data. Histograms can be helpful for visualizing the shape
of your data *before* the distributions are fitted, as well as to
see how good a certain distribution fits to your data.

**Contents**

What Is a Histogram?

Creating a Histogram

Displaying Fitted Distributions On Top Of a Histogram

A histogram is a graph that consists of a number of "bins", or vertical bars, into which the sample values are sorted.

The height of each histogram bar indicates how many of your data points
fall into that bin, relative to the total number of data values, so
this kind of chart is also called the *relative frequency histogram*.

Even though EasyFit automatically creates histograms based on sample data, understanding how it works would be useful.

The first step is to choose the number of bins, or classes, into which your data will be sorted. There are several ways to do this, and one of the most commonly used methods is to define the number of bins based on the total number of observations:

where N is the total number of data values, and k is the resulting number of bins. If you get a non-integer k using this formula, you should round it to the nearest integer. When using EasyFit, you can either have the number of bins calculated automatically, or manually specify it through the Options|Graph menu.

The next step is to divide the entire range of your data from x_{min} to
x_{max} into k intervals of equal width, and calculate how many
values fall into each interval. And finally, the height of each bar is
calculated as the number of data points falling into that interval, divided
by the total number of observations.

Note that when displaying the resulting bars, they *must be adjacent*
- there mustn't be any space between the neighboring bars. Histograms are frequently
confused with "bar charts" used to display categorical data, meaning that you
can have non-numerical values on the x-axis, so the distance between the
bars, as well as their particular order, is not really important, which is not
the case for histograms.

Aside from using histograms for initial visual analysis, you can apply them to compare the fitted distributions to your sample data, and possibly select the best fitting model, or at least reject the distributions that don't fit to your data very well. To do this, you should plot the fitted Probability Density Functions, or PDFs, on top of your histogram.

For instance, the graph on the right indicates that the Gumbel distribution fits to a data set much better than the Normal distribution.

One source of confusion is the fact that the PDF, like any regular function with fixed parameters, has a constant shape, while the appearance of a histogram can change depending on the number of bins. Using a larger number of bins can make your histogram more detailed, but that will also decrease the height of a histogram (note the y-axis values):

To correctly display the PDF on top of a histogram, it must be
scaled depending on the number of bins. Assuming that W is the
bin width, and n_{i} (i=1...k) is the number of data values
falling into each bin, we can calculate the total area of the
histogram:

However, according to the definition, the area under the Probability Density Function graph must equal 1, so the theoretical PDF(x) values have to be multiplied by the bin width W to match the histogram. Of course, this new W*PDF(x) function will not be the "real" PDF anymore, but it will still have the same shape useful for comparing against the histogram.

EasyFit automatically scales the density curve based on the number of histogram bins, allowing you to visually identify the distributions that fit to your data well. If you need to see the original unscaled PDF graph of a fitted distribution, you can use StatAssist, the built-in distribution viewer tool.

The histograms are widely used for random data visualization and analysis, however, extra care must be taken not to confuse histograms with bar charts, as well as to properly scale the Probability Density Function graphs when displaying them on top of histograms. With EasyFit, you can easily create histograms with variable number of bins, and overlay one or several fitted distributions to compare them against your sample data.