Distribution Fitting - How To Specify Your Input Data

EasyFit: select the best fitting distribution and use it to make better decisions. learn more

Correctly specifying your input data is essential for successful analysis and decision making. This article discusses several important aspects you should consider when specifying your data in distribution fitting software EasyFit.

How The Data Is Stored in EasyFit

EasyFit provides an Excel-like data table allowing you to manually enter your data and perform common manipulations such as copying/pasting cells and inserting/deleting rows:

If your project requires you to analyze several data sets, you can store them in separate tables. All data tables along with the analysis results can be saved in a single project file by selecting File|Save from the main menu.

Importing Your Data into EasyFit

There are several ways to bring your data into EasyFit. If the data is stored in a file, you can use the File|Open menu option allowing to import your data in one of the following formats:

Text Files (TXT)
Comma Separated Values (CSV)
Excel Workbooks (XLS)

Sometimes you need to select a subset of your data when importing it into EasyFit. After you specify the file containing your data, EasyFit will display the dialog enabling you to indicate the range of rows that should be imported:

The Import Options dialog also allows for some variations in file formats: even though the CSV files are supposed to contain comma separated values, some files use other separators like space and semicolon. These options can be specified on the Field Delimiters tab. Click the Update button to see how the imported data will look like. Once you are done, click OK.

top

Specifying The Input Data

Selecting Analyze|Fit Distributions from the main menu will cause EasyFit to display the dialog allowing to specify the input data columns:

The Input Data dialog can also be used to indicate whether your data is continuous or discrete, and whether you want to analyze the entire data set or a selected subset of your data.

Data Format

The format of your data usually depends on how the data was collected. The most commonly used format is an unordered set of values obtained by observing some random process. The term unordered means that the order of values in your data set is not important. If your continuous or discrete data comes in this format, which is usually the case, you should specify a single data column when analyzing your data using EasyFit.

Another continuous data format supported by EasyFit is a set of X, DENSITY(X) data pairs, where DENSITY(X) is the value of the Probability Density Function calculated at point X. In this case, you need to specify two input data columns.

If you are dealing with discrete data and there are some equal observations in your sample, for example:

5, 5, 5, 3, 3, 3, 3, 2,
you can specify your data in the form of X, COUNT(X) pairs:

(X=5, COUNT=3), (X=3, COUNT=4), (X=2, COUNT=1),
which means that each X value was observed COUNT(X) times. This is also a two-column input data format.

Sample Size

It is recommended that you have at least 75-100 data points available to perform distribution fitting and get reliable results. Even though the more data you have, the better, it can take a significant amount of time to analyze samples of 10,000 data points and more.

Click OK to start the distribution fitting process.

top