Probability Theory for the Number of Landslides

Probability Models for Estimation of Number and Costs of Landslides

by Robert A. Crovelli¹

Open-File Report 00-249

2000

This report is preliminary and has not been reviewed for conformity with U.S. Geological Survey editorial standards or with the North American Stratigraphic Code. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

U.S. DEPARTMENT OF THE INTERIOR

U.S. GEOLOGICAL SURVEY

¹U.S. Geological Survey, MS 939, Box 25046, Denver Federal Center, Denver, Colorado 80225

Abstract

The objective of this report is to describe the development of probability models for estimation of the number and costs of landslides during a specified time. Important philosophical ideas about natural processes and probability models are presented first. Then two probability models for the number of landslides that occur during a specified time are investigated: a continuous-time model (Poisson model) and a discrete-time model (binomial model). Estimation theory is developed for the estimation of the parameters of both of the models. The exceedance probability of one or more landslides during a specified time is formulated for both models. The estimation theory and probability formulation of the Poisson model are applied to the future occurrence of landslides in Seattle, Washington, using historical data from 1909 to 1997. Theoretical and numerical comparisons between the Poisson and binomial models are conducted that show the binomial model is an approximation to the Poisson model. An economic probability model is developed as an addition to the Poisson model for the estimation of the total damage from future landslides in terms of economic loss as costs in dollars. For illustrative purposes the economic probability model is applied to damaging landslides caused by El Nino rainstorms within the winter season 1997-98 in the San Francisco Bay region, California.

Philosophy of Probability Models

Natural Processes

Important philosophical ideas about natural processes:

Determinism or the law of cause and effect is the doctrine that all events in the universe are deterministic: every event has a cause.

At the scale of geologic and atmospheric hazards (e.g., landslides, earthquakes, floods, tsunamis, volcanoes, and storms), nature is deterministic: every hazardous event has a cause.

A hazard process is a physical process involving the occurrence of point (hazardous) events in time.

Beginning at some point in time, after a certain amount of time, the first hazardous event occurs. Then, after a certain amount of time, the second hazardous event occurs. And so forth. The time between hazardous events is certainly irregular.

We cannot predict exactly when a hazard event will occur because of the limitations to our knowledge of nature.

The limitations to our knowledge of nature are explained by the following: Heisenberg Uncertainty Principle and Godel’s Theorem; chaos theory and fractal geometry; algorithmic and computational complexity; physical and financial constraints.

Chaos is the apparent randomness from extremely complex behavior occurring in a deterministic process due to excessive sensitivity of an event to small changes in initial conditions.

Probability Models

Important philosophical ideas about probability models:

Probability is a numerical measure of our uncertainty regarding nature.

A probability model is a mathematical model that incorporates our uncertainty.

Probability models are an approach to deal with the limitations to our knowledge of natural processes.

Probability models are used for purposes of description and prediction of physical processes in nature.

Randomness is an assumption of probability models, not natural processes. Hazards do not occur at random in nature, but they do occur at random in the models.

It is not correct to say that a natural process follows a particular probability model. (This would be putting the cart before the horse.)

We will always be uncertain of nature because of our limitations in understanding.

In summary, hazard processes are deterministic, but because of our limitations when studying hazards, we resort to probability models that incorporate our uncertainty.

Probability Models for Landslides

Consider the occurrence of landslides during a specified time in a particular area.

Denote

N(t): Number of landslides that occur during time t in a particular area

We are interested in deriving a formula for calculating the probability of one or more landslides during a specified time t. That is,

P{N(t) ³ 1}

Two probability models for N(t) will be investigated: first, a continuous-time model and second, a discrete-time model.

Poisson Model for Number of Landslides

The Poisson model is a continuous-time model consisting of the occurrence of random point-events (landslides) in ordinary time which is naturally continuous. The Poisson model is the most commonly used model for the occurrence of random point-events in time and has been used in modeling the occurrence of earthquakes.

Assumptions of the Poisson model:

The numbers of events (landslides) which occur in disjoint time intervals are independent.

The probability of an event occurring in a very short time interval is proportional to the length of the time interval. The probability of more than one event in such a short time interval is negligible.

The probability distribution of the number of events remains the same for all time intervals of a fixed length.

It is important to acknowledge that these assumptions may not completely hold for the occurrence of landslides, especially the independence assumption. However, given a certain lack of understanding of the physical processes that control landslides, the Poisson model represents the best first-approximation model in attempting to model their occurrence. A first-approximation model is often applied in mathematical modeling when the assumptions are not completely satisfied by the physical process. Usually the first-approximation model is relatively easy to work with and is mathematically tractable. A more accurate model might be extremely complex and not mathematically tractable.

Poisson Distribution -- Probability of n landslides during time t:

where

l : Rate of occurrence of landslides

Note that time t is specified, whereas rate l is estimated.

Definition of recurrence intervals {T_i, i = 1, 2, …, n}:

T₁: Time until the first landslide

T_i: Time between the (i – 1)st and the ith landslide for i > 1

Note that n landslides will have n recurrence intervals.

Theorem – Recurrence intervals {T_i, i = 1, 2, …, n} are independent identically distributed exponential random variables having mean recurrence interval (m ) equal to the reciprocal of the rate of occurrence, i.e., m = 1/l .

For landslides, the mean recurrence interval (m ) is the average time interval between landslides.

Note

l = 1/m

Variance of T_i is

V[T_i] = 1/l ² = m ²

Probability of a recurrence interval being greater than time t

Probability of one or more landslides during time t (exceedance probability)

Note

If t is fixed and m ® ¥ , then P{N(t) ³ 1} ® 0.

If m is fixed and t ® ¥ , then P{N(t) ³ 1} ® 1.

Mean or expected value of N(t) is

E[N(t)] = l t = t/m

Note that the smaller the m , the larger the E[N(t)].

Variance of N(t) is

V[N(t)] = l t = t/m

Estimation of the Parameters l and m in the Poisson Model

A Poisson model having an unknown rate l is to be observed for a fixed time t*.

We want to determine a statistic that is a good estimator of the parameter l .

It can be shown (Ross, 1972) that the maximum likelihood estimator of l , denoted by R, is given by

Mean or expected value of R is

E[R] = E[N(t*)/t*] = l t*/t* = l

(R is an unbiased estimator of l )

Variance of R is

V[R] = V[N(t*)/t*] = l t*/(t*)² = l /t*

Theorem – The statistic R is the unique minimum variance unbiased estimator of l .

Since m = 1/l , a statistic that is a good estimator of the parameter m is

Consider another statistic as an estimator of the parameter m

where

T_i: The ith observed recurrence interval (i = 1, 2, …, N(t*))

Because

The estimator M¢ will tend to be biased low and underestimate m .

Example

Suppose a record of the occurrence of landslides for the past 100 years (t*) showed that 5 landslides (n) occurred.

An estimate of l would be

r = n/t* = 5/100 = 1/20 = 0.05

Hence, we expect landslides at a rate of 0.05 per year.

An estimate of m would be

m = 1/r = t*/n = 100/5 = 20

Therefore, we expect the mean recurrence interval to be 20 years.

From either of these estimates of the parameters l and m , we could calculate the probability of one or more landslides during a future time t

Using m = 20 and specifying t = 50, we get

There is a 91.8% chance of one or more landslides occur during the next 50 years.

Application

Seattle, Washington, has kept records of landslide occurrence from 1909 to present. Records from 1909 to 1997 (t* = 88.4 years) were analyzed to determine landslide density using a moving count circle approach (Coe and others, 2000). This analysis showed that n landslides occurred within each count circle, where n ranged from 0 to 30. The Poisson model was applied to these data, and the results are given in table 1.

Binomial Model for Number of Landslides

Costa and Baker (1981) give a probability model that they used in flood hazard analyses for modeling the occurrence of floods. The Costa-Baker model was also used by Keaton and others (1988) and Lips and Wieczorek (1990) in modeling the occurrence of debris flows. The Costa-Baker model was given without any derivation as follows, written in the notation of this paper:

with the mean recurrence interval m = 1/p and t is number of years

where

p: Probability of a flood in any one year

The Costa-Baker model is a crude model in that it divides time into fixed discrete increments (one-year increments). It is designed for large values of t and m .

The Costa-Baker model is actually an example of the binomial model.

The binomial model is a discrete-time model consisting of the occurrence of random point-events (landslides) in discrete time; that is, time is partitioned into a series of discrete increments of the same length and within each increment a single point-event (landslide) may or may not occur.

Assumptions of the binomial model:

There are t independent "trials" (relatively small time-increments of fixed length).

Each trial results in a "success" (landslide) or a "failure" (no landslide).

The probability of success, p, remains the same from trial to trial.

Binomial Distribution -- Probability of n landslides during discrete time t:

Note that time t is specified, whereas probability p is estimated.

Definition of recurrence intervals {T_i, i = 1, 2, …, n}:

T₁: Number of time increments until the first landslide

T_i: Number of time increments between the (i – 1)st and until the ith landslide for i > 1

Note that n landslides will have n recurrence intervals.

Theorem – Recurrence intervals {T_i, i = 1, 2, …, n} are independent identically distributed geometric random variables having mean recurrence interval (m ) equal to the reciprocal of the probability of success, i.e., m = 1/p.

For landslides, the mean recurrence interval (m ) is the average time interval between landslides.

Since

p = 1/m

the larger the mean recurrence interval (m ), the smaller the probability of a landslide in any one year (p). Also, if m < 1, then p > 1 which is not allowed. Therefore, the binomial model has the restriction m ³ 1.

Variance of T_i is

V[T_i] = (1 – p)/p² = m (m - 1)

Probability of a recurrence interval being greater than time t

Probability of one or more landslides during time t (exceedance probability)

where the last expression is the Costa-Baker model.

Note that 1 – p is raised to the t power under the assumption of independence.

Mean or expected value of N(t) is

E[N(t)] = tp = t/m

Variance of N(t) is

V[N(t)] = tp(1 – p) = (t/m )(1 – 1/m )

Estimation of the Parameters p and m in the Binomial Model

A binomial model having an unknown probability p is to be observed for a fixed time t*.

We want to determine a statistic that is a good estimator of the parameter p.

It can be shown that the maximum likelihood estimator of p, denoted by F, is given by the relative frequency of occurrence

Recall

t*: Number of one-year increments

N(t*): Number of one-year increments in which a landslide occurred

Mean or expected value of F is

E[F] = E[N(t*)/t*] = t*p/t* = p

(F is an unbiased estimator of p)

Theorem – The statistic F is the unique minimum variance unbiased estimator of p.

Since m = 1/p, a statistic that is a good estimator of the parameter m is

Example

Suppose a record of the occurrence of landslides for the past 100 years (t*) showed that 5 landslides (n) occurred. Assume that the 5 landslides occur in 5 individual one-year increments.

An estimate of p would be

f = n/t* = 5/100 = 1/20 = 0.05

Hence, we expect the probability of a landslide in any one-year increment to be 0.05.

An estimate of m would be

m* = 1/f = t*/n = 100/5 = 20

Therefore, we expect the mean recurrence interval to be 20 years.

From either of these estimates of the parameters p and m , we could calculate the probability of one or more landslides during a future time t

Using m = 20 and specifying t = 50, we get

There is a 92.3% chance of one or more landslides during the next 50 years.

The Binomial Model is an Approximation to the Poisson Model

Poisson model

Binomial model

Now compare e^-1/m and 1 – 1/m

Recall the exponential series

Then

Let

x = -1/m

Thus

1 – 1/m is equal to the first two terms of the exponential series for e^-1/m.

Hence

Therefore

For a numerical comparison between the Poisson model and the binomial model see tables 2 (Poisson model) and 3 (binomial model) which give the results from the application of each model to a generic data set. The binomial model significantly over estimates the exceedance probabilities for relatively short mean recurrence intervals (a few years) and short periods of time. For example, when the mean recurrence interval is two years and the specified time is one year, the exceedance probability is equal to 50% using the binomial model, whereas it is equal to 39.3% using the Poisson model. The difference between the two models becomes negligible for longer mean recurrence intervals and longer time periods. This obviously is significant to hazards because events with short mean recurrence intervals and short time periods (less than 25 years) play a major role in determining the degree of hazard.

Theorem – The binomial distribution is an approximation to the Poisson distribution.

Given

Poisson distribution with parameters

t: Specified time (number of years)

l : Rate of occurrence of events (landslides)

Binomial distribution with parameters

n : Number of "trials" (number of time increments)

p: Probability of occurrence of a "success" (landslide) in any trial

When n tends to infinity, and p tends to zero, but means l t = n p remain constant,

then the binomial distribution approaches the Poisson distribution.

For a proof of this theorem see Walpole and Myers (1989).

Probability Model for Costs of Landslides

Damage due to landslides will be taken in the form of economic loss as costs in dollars. However, the theory below would also apply to other types of damage as in the case of human loss in deaths.

Given

where

N(t): Number of landslides that occur during time t in a particular area

N(t) has a Poisson distribution with rate l .

X_i: The amount of damage (cost) from the ith landslide

The X_i (i = 1, 2, …) are independent and identically distributed random variables which are also independent of N(t).

Y(t): The total amount of damage (costs) from all of the landslides during time t

Then

Mean or expected value of Y(t) is

m _Y = E[Y(t)] = l tE[X]

Variance of Y(t) is

s _Y² = V[Y(t)] = l t{V[X] + (E[X])²}

From an observed number of landslides n, the sample mean cost M_X is an estimator of E[X] where

The sample variance S_X² is an estimator of V[X] where

In the case that only observed estimates of the minimum value of X, Min(X), and maximum value of X, Max(X), are available, then an estimator of the standard deviation of X would be

The divisor of 6 is based on plus and minus three standard deviations from the mean for a range of 6 standard deviations.

The Pareto probability distribution is possibly a good approximate distribution for the random variable X.

Crovelli (1992) showed that the lognormal probability distribution is a good approximate distribution for the type of random variable Y(t). Hence, the fractiles of Y(t) can be approximated by using the lognormal distribution. As derived in Crovelli (1992), the characterizing parameters of the lognormal distribution, namely mu (m *) and sigma (s *), can be calculated from the mean m _Y and standard deviation s _Y of a lognormal random variable Y as follows

Knowing the lognormal characterizing parameters, the lognormal fractiles can be calculated from the formula

Where Z is a standard normal random variable and P{Z > za } = a .

For example, two fractiles of interest in this report are

There is a 95% chance of exceeding F95, and a 5% chance of exceeding F5. Together, the low value of F95 and the high value of F5 form a range of values that is a 90% prediction interval for Y(t), the total costs from landslides during a specified time, (at a 90% confidence level).

The reverse problem would be to find the probability of exceeding a specified amount in economic loss due to landslides in a particular area during a specified time. That is, given ya , find a such that

P{Y(t) > ya } = a

Normalizing

Now, from za , find a such that P{Z > za } = a .

The aggregation of the total amounts of damage (costs) from landslides in k areas:

where Y_i: The total amount of damage (costs) from landslides in the ith area.

Mean or expected value of W

Variance of W under the assumption of independence of the Y_i

Variance of W under the assumption of perfect positive correlation of the Y_i

Also, under the assumption of perfect positive correlation, the fractiles are additive. That is,

Rough rather than rigorous mathematical definitions of independence and perfect positive correlation are the following:

Two random variables are independent if they are not related in that knowing the value of one variable does not help in predicting the value of the other variable.

Two random variables are perfect positively correlated if they are positively related in that a large value of one variable is associated with a large value of the other variable. Also, a small value of one variable is associated with a small value of the other variable.

The normal probability distribution is a good approximate distribution for this type of random variable W because of the well-known Central Limit Theorem of probability theory.

Example

Suppose a record of the occurrence of landslides for the past year (t*) showed that 40 landslides (n) occurred.

An estimate of l would be

r = n/t* = 40/1 = 40

We expect landslides at a rate of 40 per year.

An estimate of m would be

m = 1/r = t*/n = 1/40 = 0.025

We expect the mean recurrence interval to be 0.025 years.

Suppose that from the 40 observed landslides, the sample mean cost m_X = 0.5 million dollars and the sample standard deviation s_X = 0.1 million dollars.

An estimate of the mean or expected value of Y(t) during the specified time t = 5 years, that is, E[Y(5)], is

m_Y = rt(m_X) = (40)(5)(0.5) = 100

Hence, a mean estimate of the total costs from landslides during the next 5 years is 100 million dollars.

An estimate of the variance of Y(5), V[Y(5)], is

s_Y² = rt(m_X² + s_X²) = (40)(5)[(0.5)² + (0.1)²] = 52

An estimate of the standard deviation of Y(5) is

s_Y = 7.21

Estimates of the characterizing parameters of the lognormal distribution, namely mu (m *) and sigma (s *), are

Estimates of the 95^th fractile and the 5^th fractile of Y(5), namely F95 and F5, are

Therefore, a low estimate of the total costs from landslides during the next 5 years is 88 million dollars. There is a 95% chance of exceeding 88 million dollars. A high estimate of the total costs from landslides during the next 5 years is 112 million dollars. There is a 5% chance of exceeding 112 million dollars.

Application

The direct costs assessed to landslides for each county in the 10-county San Francisco Bay region, California, listed in Godt (1999) will be used to illustrate the probabilistic methodology developed above (see table 4). The damaging landslides were caused by El Nino rainstorms within the winter season 1997-98. An economic landslide hazard assessment of each county is performed as in the case of the previous example. Then the ten counties are aggregated for comparison under the two assumptions: independence and perfect positive correlation.

It is very important to realize that the winter season 1997-98 was an anomalous year and not representative of conditions in a typical year because the occurrence and costs of the landslides were considerably higher than normal. In actual practice a longer period of record of landslides covering multiple years and storms should be used to determine estimates of future landslide occurrence and costs. The main purpose of this application is to illustrate what could be done with the economic probability model that has been developed by using available data required by the model; the future estimates themselves are not meaningful. On the other hand, since scenario planning is becoming wide spread by various planners, it could be argued that this application might be used to represent a "worst-case scenario," and the future estimates themselves would be meaningful.

Summary

The Poisson model is the most commonly used model for the occurrence of random point-events in time.

The Costa-Baker model is an example of the general binomial model.

The binomial model has the restriction m ³ 1.

The binomial model is an approximation to the Poisson model.

Estimation theory is developed for the estimation of the parameters of both of the models.

The exceedance probability of one or more landslides during a specified time is formulated for both models.

An economic probability model is developed for the estimation of the total damage from future landslides in terms of economic loss as costs in dollars.

A summary of probability models for estimation of number and costs of landslides is given in table 5.

Conclusions

The Poisson model is preferred over the binomial model (Costa-Baker model) because the Poisson model is a first-approximation model, and the binomial model is here an approximation of an approximation.

The Poisson model has many useful properties and results that are mathematically tractable.

The theory herein is applicable to many other hazard processes besides landslides, for example, earthquakes, floods, tsunamis, volcanoes, and storms.

Not only are the probability models applicable for many types of natural hazards, but also for many types of damage in addition to economic loss, for example, human loss in deaths.

Acknowledgments

The author wishes to gratefully acknowledge the helpful reviews of Jeffrey A. Coe and Rex L. Baum, both in the Landslide Group of the Geologic Hazards Team at the U.S. Geological Survey.

References

Coe, J.A., Michael, J.A., Crovelli, R.A., and Savage, W.Z., 2000, Preliminary map showing landslide densities, mean recurrence intervals, and exceedance probabilities as determined from historic records, Seattle, Washington: U.S. Geological Survey Open-File Report 00-xxx, in review.

Costa, J.E. and Baker, V.R., 1981, Surficial geology – building with the earth: New York, Wiley & Sons, 498 p.

Crovelli, R.A., 1992, Probabilistic methodology for estimation of undiscovered petroleum resources in play analysis of the United States: Nonrenewable Resources, v. 1, no. 2, p. 153-162.

Godt, J.W., 1999, Maps showing locations of damaging landslides caused by El Nino rainstorms, winter season 1997-98, San Francisco Bay region, California: U.S. Geological Survey pamphlet to accompany miscellaneous field studies maps MF-2325-A-J, 13 p.

Keaton, J.R., Anderson, L.R., and Mathewson, C.C., 1988, Assessing debris flow hazards on alluvial fans in Davis County, Utah, in Fragaszy, R.J., ed., Twenty-fourth Annual Symposium on Engineering Geology and Soils Engineering: Pullman, Washington, Publications and Printing, Washington State University, p. 89-108.

Lips, E.W. and Wieczorek, G.F., 1990, Recurrence of debris flows on an alluvial fan in central Utah, in French, R.H., ed., Hydraulics/Hydrology of Arid Lands (H²AL); Proceedings of the International Symposium: American Society of Civil Engineers, p. 555-560.

Ross, S.M., 1972, Introduction to probability models: New York, Academic Press, Inc., 273 p.

Walpole, R.E. and Myers, R.H., 1989, Probability and statistics for engineers and scientists: New York, Macmillan Publishing Company, 4^th ed., 765 p.

Table 5. Summary of probability models for estimation of number and costs of landslides

Poisson Model for Number of Landslides (continuous-time model)

Random variables: N(t): Number of landslides T: Recurrence interval

Probability distributions: Poisson Exponential

Parameters: l : Rate of occurrence m : Mean recurrence interval

Means or expected values: E[N(t)] = l t E[T] = m = 1/l

Standard deviations: S[N(t)] = (l t)^1/2 S[T] = m

Exceedance probabilities: P{N(t) ³ 1} = 1 – e^-l
t P{T > t} = e^-t/m

Estimators of parameters: R = N(t*)/t* M = 1/R = t*/N(t*)

Binomial Model for Number of Landslides (discrete-time model)

Random variables: N(t): Number of landslides T: Recurrence interval

Probability distributions: Binomial Geometric

Parameters: p: Probability of landslide m : Mean recurrence interval

Means or expected values: E[N(t)] = pt E[T] = m = 1/p

Standard deviations: S[N(t)] = [p(1 – p)t]^1/2 S[T] = m (m - 1)

Exceedance probabilities: P{N(t) ³ 1} = 1 – (1 – p)^t P{T > t} = (1 – 1/m )^t

Estimators of parameters: F = N(t*)/t* M = 1/F = t*/N(t*)

Probability Model for Costs of Landslides

Random variables: X: Cost of landslide Y(t): Total costs

Probability distributions: Pareto Lognormal

Means or expected values: E[X] E[Y(t)] = E[N]E[X]

Standard deviations: S[X] S[Y(t)] = {E[N](S[X])² + (E[X])²(S[N])²}^1/2

Random variable: W: Aggregation of total costs

Probability distribution: Normal

Mean or expected value:

Standard deviation:

Accessibility FOIA Privacy Policies and Notices
U.S. Department of the Interior \| U.S. Geological Survey URL: https://pubsdata.usgs.gov/pubs/of/2000/ofr-00-0249/ProbModels.html Questions or Assistance: USGS Web Contact Page Last Modified: Fri Jan 11 01:38 EST 2013