Distribution fitting is the procedure of selecting a statistical distribution that best fits to a data set generated by some random process. In other words, if you have some random data available, and would like to know what particular distribution can be used to describe your data, then distribution fitting is what you are looking for.
Random factors affect all areas of our life, and businesses striving to succeed in today's highly competitive environment need a tool to deal with risk and uncertainty involved. Using probability distributions is a scientific way of dealing with uncertainty and making informed business decisions.
In practice, probability distributions are applied in such diverse fields as actuarial science and insurance, risk analysis, investment, market research, business and economic research, customer support, mining, reliability engineering, chemical engineering, hydrology, image processing, physics, medicine, sociology, demography etc.
Probability distributions can be viewed as a tool for dealing with uncertainty: you use distributions to perform specific calculations, and apply the results to make well-grounded business decisions. However, if you use a wrong tool, you will get wrong results. If you select and apply an inappropriate distribution (the one that doesn't fit to your data well), your subsequent calculations will be incorrect, and that will certainly result in wrong decisions.
In many industries, the use of incorrect models can have serious consequences such as inability to complete tasks or projects in time leading to substantial time and money loss, wrong engineering design resulting in damage of expensive equipment etc. In some specific areas such as hydrology, using appropriate distributions can be even more critical.
Distribution fitting allows you to develop valid models of random processes you deal with, protecting you from potential time and money loss which can arise due to invalid model selection, and enabling you to make better business decisions.
The Normal distribution has been developed more than 250 years ago, and is probably one of the oldest and frequently used distributions out there. So why not just use it?
The probability density function of the Normal distribution is symmetric about its mean value, and this distribution cannot be used to model right-skewed or left-skewed data:
The Normal distribution is defined on the entire real axis (-Infinity, +Infinity), and if the nature of your data is such that it is bounded or non-negative (can only take on positive values), then this distribution is almost certainly not a good fit:
The shape of the Normal distribution does not depend on the distribution parameters. Even if your data is symmetric by nature, it is possible that it is best described by one of the heavy-tailed models such as the Cauchy distribution:
Similarly, you cannot "just guess" and use any other particular distribution without testing several alternative models as this can result in analysis errors.
Over the last several centuries, numerous probability distributions have been developed to address the data analysis needs in various industries, and a number of statistical methods exist to assist you in selecting the best fitting distribution.
In most cases, you need to fit two or more distributions, compare the results, and select the most valid model. The "candidate" distributions you fit should be chosen depending on the nature of your probability data. For example, if you need to analyze the time between failures of technical devices, you should fit non-negative distributions such as Exponential or Weibull, since the failure time cannot be negative.
You can also apply some other identification methods based on properties of your data. For example, you can build a histogram and determine whether the data are symmetric, left-skewed, or right-skewed, and use the distributions which have the same shape.
To actually fit the "candidate" distributions you selected, you need to employ statistical methods allowing to estimate distribution parameters based on your sample data. The solution of this problem involves the use of certain algorithms implemented in specialized software.
After the distributions are fitted, it is necessary to determine how well the distributions you selected fit to your data. This can be done using the specific goodness of fit tests or visually by comparing the empirical (based on sample data) and theoretical (fitted) distribution graphs. As a result, you will select the most valid model describing your data.
What kind of information can you obtain using the distribution you selected, and how to apply that information to make business decisions?
Calculating probabilities is the most common way of using distributions. Once you calculate the probability, you can use it to make informed decisions: for instance, if the probability of a good outcome is high enough, then the decision you are about to make is probably correct.
Some of the typical answers you can get using probability distributions in various industries:
Making estimates or projections is an inverse problem requiring you to set a fixed, desired probability value.
For example, in project management, assuming that a project should be finished in time and on budget with 95% probability, you can obtain a realistic time estimate which takes into account good/bad weather, timely supply of materials, oil prices, strikes, and other factors affecting your business.
Another example: you are an engineer, and need to determine an appropriate warranty term for the device you are designing. You would like to ensure that the device will not fail during the warranty term with 99% probability (i.e. 1 out of 100 devices will fail on the average). Based on the fixed probability, you can calculate how many hours the device can work properly, and make your design decisions using this estimate.
The most frequently used statistic is the distribution mean (the expected value) representing the average amount you can expect as the outcome if a large number of observations is considered. You can use the mean value to take a quick look at your data, however, you should not base your decisions on this statistic alone.
The standard deviation indicates the spread of your data about the mean, and one of the most obvious applications of this statistic is in finance and investment where it is used to determine the volatility, as well as to quanitfy the risk associated with a given security, or a portfolio of securities.
Another useful statistic is the mode value which indicates the most likely outcome. For instance, in project management, this statistic is quite frequently used to determine the most likely amount of time required for successful project completion.
There are a large number of applications of probability distributions in specific industries, to name just a few:
The use of probability distributions involves complex calculations which are practically impossible or very hard and time consuming to do by hand. Distribution fitting software helps you automate the data analysis and decision making process, and enables you to focus on your core business goals rather than technical issues.
You can spend literally hours trying to fit a single distribution (not to mention several alternative models) to a data set using manual methods. Distribution fitting software enables you to fit a large number of distributions in seconds and compare the fitted distributions to select the best model.
Distribution fitting software prevents analysis errors and helps you make informed decisions based on the data available, protecting you from potential money loss.
Specialized distribution fitting software not only performs all calculations for you, but also provides an integrated environment making you more productive.
EasyFit is a software product allowing you to easily and quickly select the distribution which best fits to your data, and apply the results to make right business decisions. You can use EasyFit even if you have only a basic knowledge of statistics... learn more