Correctly specifying your input data is essential for successful analysis and decision making. This article discusses several important aspects you should consider when specifying your data in distribution fitting software EasyFit.
EasyFit provides an Excel-like data table allowing you to manually enter your data and perform common manipulations such as copying/pasting cells and inserting/deleting rows:
If your project requires you to analyze several data sets, you can store them in separate tables. All data tables along with the analysis results can be saved in a single project file by selecting File|Save from the main menu.
There are several ways to bring your data into EasyFit. If the data is stored in a file, you can use the File|Open menu option allowing to import your data in one of the following formats:
Sometimes you need to select a subset of your data when importing it into EasyFit. After you specify the file containing your data, EasyFit will display the dialog enabling you to indicate the range of rows that should be imported:
The Import Options dialog also allows for some variations in file formats: even though the CSV files are supposed to contain comma separated values, some files use other separators like space and semicolon. These options can be specified on the Field Delimiters tab. Click the Update button to see how the imported data will look like. Once you are done, click OK.
Selecting Analyze|Fit Distributions from the main menu will cause EasyFit to display the dialog allowing to specify the input data columns:
The Input Data dialog can also be used to indicate whether your data is continuous or discrete, and whether you want to analyze the entire data set or a selected subset of your data.
The format of your data usually depends on how the data was collected. The most commonly used format is an unordered set of values obtained by observing some random process. The term unordered means that the order of values in your data set is not important. If your continuous or discrete data comes in this format, which is usually the case, you should specify a single data column when analyzing your data using EasyFit.
Another continuous data format supported by EasyFit is a set of X, DENSITY(X) data pairs, where DENSITY(X) is the value of the Probability Density Function calculated at point X. In this case, you need to specify two input data columns.
If you are dealing with discrete data and there are some equal observations in your sample, for example:
It is recommended that you have at least 75-100 data points available to perform distribution fitting and get reliable results. Even though the more data you have, the better, it can take a significant amount of time to analyze samples of 10,000 data points and more.
Click OK to start the distribution fitting process.