Distribution Fitting — Chemical Yield Analysis

Introduction

A chemical equation tells what the theoretical yield of the reaction should be. In practice, the actual yield may be less than the theoretical yield because of many reasons (incomplete reactions etc.), so the percentage yield measuring how successful a reaction has been is normally used.

Problem

A chemical process is producing batches of a desired material. The goal is to calculate the probability of a batch having yield greater than 70%, based on a data set of 200 observed values of the percentage yield.

Solution

To solve the problem, we will use the given data set to develop a model of the underlying random process. Specifically, we will first visualize the data set, and define some probability distributions which can potentially be used as the probabilistic model for percentage yield X. Then, we will fit chosen distributions, and select the distribution which best fits to our data set. Finally, we will calculate the desired probability using the model we develop.

To get an impression of properties of our data set, we use EasyFit to visualize the data in the form of a histogram:

The histogram shows the range, relative frequency, and scatter of data points. Since random variations of percentage yield X are the resultant of some independent random sources in the chemical manufacturing process, we will use the Normal distribution as the probabilistic model. We will also fit the Lognormal and Gamma distributions which can be used to approximate the Normal distribution, but are defined for positive values of X only.

Now that we have chosen three different probability distributions which are most likely to fit to our data set well, we can estimate their parameters, and select the most adequate model. EasyFit automatically estimates distribution parameters, so you only need to specify what distributions should be fitted exactly. Parameter values estimated by EasyFit are shown in the following table:

Estimated parameters (Gamma, Lognormal, Normal)

After the parameters are estimated, we need to measure the compatibility of a random sample with fitted probability distributions. First, we display the graphs of theoretical CDF (cumulative distribution function) of fitted distributions on the same chart:

We can see that graphs of fitted distributions overlap, meaning that the Lognormal and Gamma distributions approximate the Normal distribution very well, and thus they are interchangeable. However, we would like as much accuracy as possible, so we employ analytic goodness of fit (GOF) tests to determine the best fitting distribution. EasyFit automatically calculates the goodness of fit statistics for each fitted distribution, and presents them in the form of table:

Fitted distributions are ordered according to goodness of fit statistics. Both Kolmogorov-Smirnov and Anderson-Darling tests show that the Lognormal distribution with parameters sigma=0.02996 and mu=4.27751 represents the best fit. Thus, we select this distribution as the probabilistic model for percentage yield X.

Finally, we can use the developed model to estimate the probability of a batch having yield greater than 70%:

P{X>70} = 1 - P{X≤70} = 1 - F(70) = 1 - 0.166 = 0.834, or 83.4%.

Summary

We used EasyFit to carry out the probabilistic modelling of percentage yield of the chemical process. We fitted three probability distributions to our data, and found that the Lognormal distribution is the best fitting one. Then, we used the theoretical CDF to predict the probability of a batch having yield greater than a certain value.

top