Distribution Fitting — Flood Frequency Analysis

Introduction

One of the major problems in hydrologic design is the estimation of maximum floods. These estimations are used to assign hydrological and hydraulic dimensions to bridges and sewers, dams and protection embankments, diversion canals, detention ponds etc. Accurate estimation of flood frequency discharge increases safety of the structures.

Problem

Perform a probabilistic modelling of flood flow records of the Feather River at Oroville, CA. The source data is a set of 59 records of maximum annual flood flows in 1000 cfs.

Solution

To carry out the analysis, we need to define some probability distributions which can potentially be used as flood frequency models. The Log-Pearson Type III (LP3) distribution is recommended for flood frequency analysis, according to the U.S. Water Advisory Committee on Water Data (1982). However, numerous studies have shown that this distribution performs poorly in many cases. Instead, several alternative distributions are suggested as an improvement over the LP3 distribution.

In this example, we will employ three advanced distributions which often provide the best approximation to flood flow data: Generalized Extreme Value, Generalized Pareto, and Wakeby distributions. In addition, we will use two common distributions: Lognormal and Gumbel (Type I Extreme Value distribution).

EasyFit automatically fits selected distributions, performs goodness of fit tests, and displays graphs of fitted distributions. Estimated parameters are shown in the table below:

Generally, EasyFit supports two versions of the Gumbel distribution:

Gumbel Max (Maximum Extreme Value Type I), and
Gumbel Min (Minimum Extreme Value Type I).

In practice, the Minimum Extreme Value distribution (Type I) is commonly used, and is usually reffered to as the Gumbel distribution. However, the Maximum Extreme Value distribution (Type I) is also sometimes utilized. To prevent possible confusion, these two distributions are given different names in EasyFit.

After the distributions are fitted, we can visualize them:

Based on the probability density graphs, we can assume that the Generalized Pareto and Wakeby distributions are most likely to fit the best, while the Lognormal distribution fits poorly. Both Kolmogorv-Smirnov and Anderson-Darling goodness of fit tests confirm this assumption:

It is clear from the table that the Generalized Pareto distribution fits to observed data better than the Wakeby distribution does. However, according to goodness of fit statistics calculated for these two distributions, the probabilistic models they represent are quite similar. To compare the models and learn how they differ, we can use the probability difference graph:

Now that we have selected the Generalized Pareto distribution as the best approximation to historical flood flow data, we can apply this probabilistic model to obtain the needed inferences.

Specifically, we will use the cumulative distribution function (CDF) of the fitted distribution to calculate the annual exceedance probability (AEP), or the probability that the event is equaled or exceeded in any single year. For example, considering a 200,000 cfs level, the exceedance probability is calculated in the following way:

Exceedance_P = P{X≥200} = 1 - P{X<200} = 1 - F(200) = 1 - 0.975 = 0.025

To obtain the return period (also known as the recurrence interval) of the event, we should calculate the reciprocal of the exceedance probability:

Return_Period = 1 / 0.025 = 40 years.

The interpretation is that in a very long series, the 40-year flood value would be exceeded every 40 years on the average. For example, about twenty-five 40-year floods can be expected during a 1000 year period (on the average).

Summary

We applied EasyFit to perform a probabilistic modelling of flood flow records using historical flood flow data. To construct the model, we fitted three advanced and two common probability distributions, and determined that the Generalized Pareto distribution represents the most valid model. Finally, we applied this model to calculate the exceedance probability and the return period.

top