Western Geographic Science Center

U.S. Geological Survey
Open-File Report 2007–1398

Cost-Benefit Analysis of Computer Resources for Machine Learning

By Richard A. Champion, Jr.


two maps
Normalized road and dasymetric density across the San Francisco Bay Area. Road density is a measure of distance to the nearest road for each pixel. Dasymetric density uses census-block and land-cover information to estimate population density per 30-m pixel. Normalization scales values to the range 0 to 1. Population density values near the high end of the scale (near 1 person per 30-m pixel) suggest artifacts generated from inaccuracies in the land-cover information (from figure 1.)

Machine learning describes pattern-recognition algorithms—in this case, probabilistic neural networks (PNNs). These can be computationally intensive, in part because of the nonlinear optimizer, a numerical process that calibrates the PNN by minimizing a sum of squared errors. This report suggests efficiencies that are expressed as cost and benefit. The cost is computer time needed to calibrate the PNN, and the benefit is goodness-of-fit, how well the PNN learns the pattern in the data. There may be a point of diminishing returns where a further expenditure of computer resources does not produce additional benefits. Sampling is suggested as a cost-reduction strategy. One consideration is how many points to select for calibration and another is the geometric distribution of the points. The data points may be nonuniformly distributed across space, so that sampling at some locations provides additional benefit while sampling at other locations does not. A stratified sampling strategy can be designed to select more points in regions where they reduce the calibration error and fewer points in regions where they do not. Goodness-of-fit tests ensure that the sampling does not introduce bias. This approach is illustrated by statistical experiments for computing correlations between measures of roadless area and population density for the San Francisco Bay Area. The alternative to training efficiencies is to rely on high-performance computer systems. These may require specialized programming and algorithms that are optimized for parallel performance.

Download this report as a 14-page PDF file (of2007-1398.pdf; 488 kB)

For questions about the content of this report, contact Richard Champion.

Suggested citation and version history

Download a free copy of the latest version of Adobe Reader.

| Help | PDF help | Publications main page |
| Western Open-File Reports for 2007 |
| Geography | Western Geographic Science Center |

This report is available only on the Web

Accessibility FOIA Privacy Policies and Notices

Take Pride in America home page. USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: https://pubs.usgs.gov/of/2007/1398/
Page Contact Information: Michael Diggles
Page Last Modified: January 11, 2008