Probability Models for Estimation of Number and Costs of Landslides

 

by Robert A. Crovelli1

 

 

 

 

Open-File Report 00-249

 

 

2000

 

 

 

 

This report is preliminary and has not been reviewed for conformity with U.S. Geological Survey editorial standards or with the North American Stratigraphic Code. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

 

U.S. DEPARTMENT OF THE INTERIOR

U.S. GEOLOGICAL SURVEY

 

1U.S. Geological Survey, MS 939, Box 25046, Denver Federal Center, Denver, Colorado 80225


 

Abstract

The objective of this report is to describe the development of probability models for estimation of the number and costs of landslides during a specified time. Important philosophical ideas about natural processes and probability models are presented first. Then two probability models for the number of landslides that occur during a specified time are investigated: a continuous-time model (Poisson model) and a discrete-time model (binomial model). Estimation theory is developed for the estimation of the parameters of both of the models. The exceedance probability of one or more landslides during a specified time is formulated for both models. The estimation theory and probability formulation of the Poisson model are applied to the future occurrence of landslides in Seattle, Washington, using historical data from 1909 to 1997. Theoretical and numerical comparisons between the Poisson and binomial models are conducted that show the binomial model is an approximation to the Poisson model. An economic probability model is developed as an addition to the Poisson model for the estimation of the total damage from future landslides in terms of economic loss as costs in dollars. For illustrative purposes the economic probability model is applied to damaging landslides caused by El Nino rainstorms within the winter season 1997-98 in the San Francisco Bay region, California.

Philosophy of Probability Models

Natural Processes

Important philosophical ideas about natural processes:

Probability Models

Important philosophical ideas about probability models:

In summary, hazard processes are deterministic, but because of our limitations when studying hazards, we resort to probability models that incorporate our uncertainty.

Probability Models for Landslides

Consider the occurrence of landslides during a specified time in a particular area.

Denote

N(t): Number of landslides that occur during time t in a particular area

We are interested in deriving a formula for calculating the probability of one or more landslides during a specified time t. That is,

P{N(t) ³ 1}

Two probability models for N(t) will be investigated: first, a continuous-time model and second, a discrete-time model.

Poisson Model for Number of Landslides

The Poisson model is a continuous-time model consisting of the occurrence of random point-events (landslides) in ordinary time which is naturally continuous. The Poisson model is the most commonly used model for the occurrence of random point-events in time and has been used in modeling the occurrence of earthquakes.

Assumptions of the Poisson model:

It is important to acknowledge that these assumptions may not completely hold for the occurrence of landslides, especially the independence assumption. However, given a certain lack of understanding of the physical processes that control landslides, the Poisson model represents the best first-approximation model in attempting to model their occurrence. A first-approximation model is often applied in mathematical modeling when the assumptions are not completely satisfied by the physical process. Usually the first-approximation model is relatively easy to work with and is mathematically tractable. A more accurate model might be extremely complex and not mathematically tractable.

Poisson Distribution -- Probability of n landslides during time t:

where

l : Rate of occurrence of landslides

Note that time t is specified, whereas rate l is estimated.

Definition of recurrence intervals {Ti, i = 1, 2, …, n}:

T1: Time until the first landslide

Ti: Time between the (i – 1)st and the ith landslide for i > 1

Note that n landslides will have n recurrence intervals.

Theorem – Recurrence intervals {Ti, i = 1, 2, …, n} are independent identically distributed exponential random variables having mean recurrence interval (m ) equal to the reciprocal of the rate of occurrence, i.e., m = 1/l .

For landslides, the mean recurrence interval (m ) is the average time interval between landslides.

Note

l = 1/m

Variance of Ti is

V[Ti] = 1/l 2 = m 2

Probability of a recurrence interval being greater than time t

Probability of one or more landslides during time t (exceedance probability)

Note

If t is fixed and m ® ¥ , then P{N(t) ³ 1} ® 0.

If m is fixed and t ® ¥ , then P{N(t) ³ 1} ® 1.

Mean or expected value of N(t) is

E[N(t)] = l t = t/m

Note that the smaller the m , the larger the E[N(t)].

Variance of N(t) is

V[N(t)] = l t = t/m

Estimation of the Parameters l and m in the Poisson Model

A Poisson model having an unknown rate l is to be observed for a fixed time t*.

We want to determine a statistic that is a good estimator of the parameter l .

It can be shown (Ross, 1972) that the maximum likelihood estimator of l , denoted by R, is given by

Mean or expected value of R is

E[R] = E[N(t*)/t*] = l t*/t* = l

(R is an unbiased estimator of l )

Variance of R is

V[R] = V[N(t*)/t*] = l t*/(t*)2 = l /t*

Theorem – The statistic R is the unique minimum variance unbiased estimator of l .

Since m = 1/l , a statistic that is a good estimator of the parameter m is

 

Consider another statistic as an estimator of the parameter m

where

Ti: The ith observed recurrence interval (i = 1, 2, …, N(t*))

Because

The estimator M¢ will tend to be biased low and underestimate m .

Example

Suppose a record of the occurrence of landslides for the past 100 years (t*) showed that 5 landslides (n) occurred.

An estimate of l would be

r = n/t* = 5/100 = 1/20 = 0.05

Hence, we expect landslides at a rate of 0.05 per year.

An estimate of m would be

m = 1/r = t*/n = 100/5 = 20

Therefore, we expect the mean recurrence interval to be 20 years.

From either of these estimates of the parameters l and m , we could calculate the probability of one or more landslides during a future time t

Using m = 20 and specifying t = 50, we get

There is a 91.8% chance of one or more landslides occur during the next 50 years.

Application

Seattle, Washington, has kept records of landslide occurrence from 1909 to present. Records from 1909 to 1997 (t* = 88.4 years) were analyzed to determine landslide density using a moving count circle approach (Coe and others, 2000). This analysis showed that n landslides occurred within each count circle, where n ranged from 0 to 30. The Poisson model was applied to these data, and the results are given in table 1.

Binomial Model for Number of Landslides

Costa and Baker (1981) give a probability model that they used in flood hazard analyses for modeling the occurrence of floods. The Costa-Baker model was also used by Keaton and others (1988) and Lips and Wieczorek (1990) in modeling the occurrence of debris flows. The Costa-Baker model was given without any derivation as follows, written in the notation of this paper:

with the mean recurrence interval m = 1/p and t is number of years

where

p: Probability of a flood in any one year

The Costa-Baker model is a crude model in that it divides time into fixed discrete increments (one-year increments). It is designed for large values of t and m .

The Costa-Baker model is actually an example of the binomial model.

The binomial model is a discrete-time model consisting of the occurrence of random point-events (landslides) in discrete time; that is, time is partitioned into a series of discrete increments of the same length and within each increment a single point-event (landslide) may or may not occur.

Assumptions of the binomial model:

Binomial Distribution -- Probability of n landslides during discrete time t:

Note that time t is specified, whereas probability p is estimated.

Definition of recurrence intervals {Ti, i = 1, 2, …, n}:

T1: Number of time increments until the first landslide

Ti: Number of time increments between the (i – 1)st and until the ith landslide for i > 1

Note that n landslides will have n recurrence intervals.

Theorem – Recurrence intervals {Ti, i = 1, 2, …, n} are independent identically distributed geometric random variables having mean recurrence interval (m ) equal to the reciprocal of the probability of success, i.e., m = 1/p.

For landslides, the mean recurrence interval (m ) is the average time interval between landslides.

Since

p = 1/m

the larger the mean recurrence interval (m ), the smaller the probability of a landslide in any one year (p). Also, if m < 1, then p > 1 which is not allowed. Therefore, the binomial model has the restriction m ³ 1.

Variance of Ti is

V[Ti] = (1 – p)/p2 = m (m - 1)

Probability of a recurrence interval being greater than time t

Probability of one or more landslides during time t (exceedance probability)

where the last expression is the Costa-Baker model.

Note that 1 – p is raised to the t power under the assumption of independence.

Mean or expected value of N(t) is

E[N(t)] = tp = t/m

Variance of N(t) is

V[N(t)] = tp(1 – p) = (t/m )(1 – 1/m )

Estimation of the Parameters p and m in the Binomial Model

A binomial model having an unknown probability p is to be observed for a fixed time t*.

We want to determine a statistic that is a good estimator of the parameter p.

It can be shown that the maximum likelihood estimator of p, denoted by F, is given by the relative frequency of occurrence

Recall

t*: Number of one-year increments

N(t*): Number of one-year increments in which a landslide occurred

Mean or expected value of F is

E[F] = E[N(t*)/t*] = t*p/t* = p

(F is an unbiased estimator of p)

Theorem – The statistic F is the unique minimum variance unbiased estimator of p.

Since m = 1/p, a statistic that is a good estimator of the parameter m is

Example

Suppose a record of the occurrence of landslides for the past 100 years (t*) showed that 5 landslides (n) occurred. Assume that the 5 landslides occur in 5 individual one-year increments.

An estimate of p would be

f = n/t* = 5/100 = 1/20 = 0.05

Hence, we expect the probability of a landslide in any one-year increment to be 0.05.

An estimate of m would be

m* = 1/f = t*/n = 100/5 = 20

Therefore, we expect the mean recurrence interval to be 20 years.

From either of these estimates of the parameters p and m , we could calculate the probability of one or more landslides during a future time t

Using m = 20 and specifying t = 50, we get

There is a 92.3% chance of one or more landslides during the next 50 years.

The Binomial Model is an Approximation to the Poisson Model

Poisson model

Binomial model

Now compare e-1/m and 1 – 1/m

Recall the exponential series

Then

Let

x = -1/m

Thus

1 – 1/m is equal to the first two terms of the exponential series for e-1/m .

Hence

Therefore

For a numerical comparison between the Poisson model and the binomial model see tables 2 (Poisson model) and 3 (binomial model) which give the results from the application of each model to a generic data set. The binomial model significantly over estimates the exceedance probabilities for relatively short mean recurrence intervals (a few years) and short periods of time. For example, when the mean recurrence interval is two years and the specified time is one year, the exceedance probability is equal to 50% using the binomial model, whereas it is equal to 39.3% using the Poisson model. The difference between the two models becomes negligible for longer mean recurrence intervals and longer time periods. This obviously is significant to hazards because events with short mean recurrence intervals and short time periods (less than 25 years) play a major role in determining the degree of hazard.

Theorem – The binomial distribution is an approximation to the Poisson distribution.

Given

Poisson distribution with parameters

t: Specified time (number of years)

l : Rate of occurrence of events (landslides)

Binomial distribution with parameters

n : Number of "trials" (number of time increments)

p: Probability of occurrence of a "success" (landslide) in any trial

When n tends to infinity, and p tends to zero, but means l t = n p remain constant,

then the binomial distribution approaches the Poisson distribution.

For a proof of this theorem see Walpole and Myers (1989).

Probability Model for Costs of Landslides

Damage due to landslides will be taken in the form of economic loss as costs in dollars. However, the theory below would also apply to other types of damage as in the case of human loss in deaths.

Given

where

N(t): Number of landslides that occur during time t in a particular area

N(t) has a Poisson distribution with rate l .

Xi: The amount of damage (cost) from the ith landslide

The Xi (i = 1, 2, …) are independent and identically distributed random variables which are also independent of N(t).

Y(t): The total amount of damage (costs) from all of the landslides during time t

Then

Mean or expected value of Y(t) is

m Y = E[Y(t)] = l tE[X]

Variance of Y(t) is

s Y2 = V[Y(t)] = l t{V[X] + (E[X])2}

From an observed number of landslides n, the sample mean cost MX is an estimator of E[X] where

The sample variance SX2 is an estimator of V[X] where

In the case that only observed estimates of the minimum value of X, Min(X), and maximum value of X, Max(X), are available, then an estimator of the standard deviation of X would be

The divisor of 6 is based on plus and minus three standard deviations from the mean for a range of 6 standard deviations.

The Pareto probability distribution is possibly a good approximate distribution for the random variable X.

Crovelli (1992) showed that the lognormal probability distribution is a good approximate distribution for the type of random variable Y(t). Hence, the fractiles of Y(t) can be approximated by using the lognormal distribution. As derived in Crovelli (1992), the characterizing parameters of the lognormal distribution, namely mu (m *) and sigma (s *), can be calculated from the mean m Y and standard deviation s Y of a lognormal random variable Y as follows

Knowing the lognormal characterizing parameters, the lognormal fractiles can be calculated from the formula

Where Z is a standard normal random variable and P{Z > za } = a .

For example, two fractiles of interest in this report are

There is a 95% chance of exceeding F95, and a 5% chance of exceeding F5. Together, the low value of F95 and the high value of F5 form a range of values that is a 90% prediction interval for Y(t), the total costs from landslides during a specified time, (at a 90% confidence level).

The reverse problem would be to find the probability of exceeding a specified amount in economic loss due to landslides in a particular area during a specified time. That is, given ya , find a such that

P{Y(t) > ya } = a

Normalizing

Now, from za , find a such that P{Z > za } = a .

The aggregation of the total amounts of damage (costs) from landslides in k areas:

where Yi: The total amount of damage (costs) from landslides in the ith area.

Mean or expected value of W

Variance of W under the assumption of independence of the Yi

Variance of W under the assumption of perfect positive correlation of the Yi

Also, under the assumption of perfect positive correlation, the fractiles are additive. That is,

 

Rough rather than rigorous mathematical definitions of independence and perfect positive correlation are the following:

The normal probability distribution is a good approximate distribution for this type of random variable W because of the well-known Central Limit Theorem of probability theory.

Example

Suppose a record of the occurrence of landslides for the past year (t*) showed that 40 landslides (n) occurred.

An estimate of l would be

r = n/t* = 40/1 = 40

We expect landslides at a rate of 40 per year.

An estimate of m would be

m = 1/r = t*/n = 1/40 = 0.025

We expect the mean recurrence interval to be 0.025 years.

Suppose that from the 40 observed landslides, the sample mean cost mX = 0.5 million dollars and the sample standard deviation sX = 0.1 million dollars.

An estimate of the mean or expected value of Y(t) during the specified time t = 5 years, that is, E[Y(5)], is

mY = rt(mX) = (40)(5)(0.5) = 100

Hence, a mean estimate of the total costs from landslides during the next 5 years is 100 million dollars.

An estimate of the variance of Y(5), V[Y(5)], is

sY2 = rt(mX2 + sX2) = (40)(5)[(0.5)2 + (0.1)2] = 52

An estimate of the standard deviation of Y(5) is

sY = 7.21

Estimates of the characterizing parameters of the lognormal distribution, namely mu (m *) and sigma (s *), are

Estimates of the 95th fractile and the 5th fractile of Y(5), namely F95 and F5, are

Therefore, a low estimate of the total costs from landslides during the next 5 years is 88 million dollars. There is a 95% chance of exceeding 88 million dollars. A high estimate of the total costs from landslides during the next 5 years is 112 million dollars. There is a 5% chance of exceeding 112 million dollars.

Application

The direct costs assessed to landslides for each county in the 10-county San Francisco Bay region, California, listed in Godt (1999) will be used to illustrate the probabilistic methodology developed above (see table 4). The damaging landslides were caused by El Nino rainstorms within the winter season 1997-98. An economic landslide hazard assessment of each county is performed as in the case of the previous example. Then the ten counties are aggregated for comparison under the two assumptions: independence and perfect positive correlation.

It is very important to realize that the winter season 1997-98 was an anomalous year and not representative of conditions in a typical year because the occurrence and costs of the landslides were considerably higher than normal. In actual practice a longer period of record of landslides covering multiple years and storms should be used to determine estimates of future landslide occurrence and costs. The main purpose of this application is to illustrate what could be done with the economic probability model that has been developed by using available data required by the model; the future estimates themselves are not meaningful. On the other hand, since scenario planning is becoming wide spread by various planners, it could be argued that this application might be used to represent a "worst-case scenario," and the future estimates themselves would be meaningful.

Summary

Conclusions

Acknowledgments

The author wishes to gratefully acknowledge the helpful reviews of Jeffrey A. Coe and Rex L. Baum, both in the Landslide Group of the Geologic Hazards Team at the U.S. Geological Survey.

References

Coe, J.A., Michael, J.A., Crovelli, R.A., and Savage, W.Z., 2000, Preliminary map showing landslide densities, mean recurrence intervals, and exceedance probabilities as determined from historic records, Seattle, Washington: U.S. Geological Survey Open-File Report 00-xxx, in review.

Costa, J.E. and Baker, V.R., 1981, Surficial geology – building with the earth: New York, Wiley & Sons, 498 p.

Crovelli, R.A., 1992, Probabilistic methodology for estimation of undiscovered petroleum resources in play analysis of the United States: Nonrenewable Resources, v. 1, no. 2, p. 153-162.

Godt, J.W., 1999, Maps showing locations of damaging landslides caused by El Nino rainstorms, winter season 1997-98, San Francisco Bay region, California: U.S. Geological Survey pamphlet to accompany miscellaneous field studies maps MF-2325-A-J, 13 p.

Keaton, J.R., Anderson, L.R., and Mathewson, C.C., 1988, Assessing debris flow hazards on alluvial fans in Davis County, Utah, in Fragaszy, R.J., ed., Twenty-fourth Annual Symposium on Engineering Geology and Soils Engineering: Pullman, Washington, Publications and Printing, Washington State University, p. 89-108.

Lips, E.W. and Wieczorek, G.F., 1990, Recurrence of debris flows on an alluvial fan in central Utah, in French, R.H., ed., Hydraulics/Hydrology of Arid Lands (H2AL); Proceedings of the International Symposium: American Society of Civil Engineers, p. 555-560.

Ross, S.M., 1972, Introduction to probability models: New York, Academic Press, Inc., 273 p.

Walpole, R.E. and Myers, R.H., 1989, Probability and statistics for engineers and scientists: New York, Macmillan Publishing Company, 4th ed., 765 p.

 

 

Table 5. Summary of probability models for estimation of number and costs of landslides

Poisson Model for Number of Landslides (continuous-time model)

Random variables: N(t): Number of landslides T: Recurrence interval

Probability distributions: Poisson Exponential

Parameters: l : Rate of occurrence m : Mean recurrence interval

Means or expected values: E[N(t)] = l t E[T] = m = 1/l

Standard deviations: S[N(t)] = (l t)1/2 S[T] = m

Exceedance probabilities: P{N(t) ³ 1} = 1 – e-l t P{T > t} = e-t/m

Estimators of parameters: R = N(t*)/t* M = 1/R = t*/N(t*)

Binomial Model for Number of Landslides (discrete-time model)

Random variables: N(t): Number of landslides T: Recurrence interval

Probability distributions: Binomial Geometric

Parameters: p: Probability of landslide m : Mean recurrence interval

Means or expected values: E[N(t)] = pt E[T] = m = 1/p

Standard deviations: S[N(t)] = [p(1 – p)t]1/2 S[T] = m (m - 1)

Exceedance probabilities: P{N(t) ³ 1} = 1 – (1 – p)t P{T > t} = (1 – 1/m )t

Estimators of parameters: F = N(t*)/t* M = 1/F = t*/N(t*)

Probability Model for Costs of Landslides

Random variables: X: Cost of landslide Y(t): Total costs

Probability distributions: Pareto Lognormal

Means or expected values: E[X] E[Y(t)] = E[N]E[X]

Standard deviations: S[X] S[Y(t)] = {E[N](S[X])2 + (E[X])2(S[N])2}1/2

Random variable: W: Aggregation of total costs

Probability distribution: Normal

Mean or expected value:

Standard deviation:

USA.gov logo