link back to U.S. Geological Survey home page

ESRI ArcView Extension: Point Stat Calc

U.S. Geological Survey Open-File Report 00-302

Online Release 1.0
Online Only

By Matthew Dombroski

Introduction

This discussion assumes that the reader is familiar with the ArcView 3.x commands and usage. Point Stat Calc is an ArcView extension that can be used to calculate summary statistics of data points based on their distribution within a set of polygons. Point Stat Calc can calculate the mean, median, maximum, minimum, count, or nth percentile for any numeric attribute(s) of the points inside each polygon. The results of these calculations are stored in a new table that is linked to the polygon shapefile's attribute table using the ArcView join command.

Compatibility

Point Stat Calc was developed and tested using Esri's ArcView versions 3.1 and 3.2 on Windows NT 4.0, Windows 98, and Windows 95 systems. The script was tested on neither Mac nor Unix versions of ArcView. It was written using ArcView's Avenue scripting language.

Installation

Once downloaded, place the pntstatcalc.avx file into the \Av_gis30\ArcView\Ext32 subdirectory of your ArcView installation's root directory (usually c:\Esri). Start ArcView, then load the extension by choosing File -> Extensions and then checking the box beside "Point Stat Calc." This places a new button on the button bar for each view.

Use

  1. Open ArcView, open or create a project, open or create a view with at least one point theme and one polygon theme, load the Point Stat Calc extension by clicking File -> Extensions... -> Point Stat Calc.
  2. In a view, select one point and one polygon theme by highlighting its legend (the second one will have to be "shift-selected"). All points in the point-theme will be used in the calculations, regardless of any selections that may be active (moreover, selections will be lost after the calculations are complete). However, statistics will only be gathered for currently selected polygons in the polygon-theme.
  3. Click the button for Point Stat Calc on the button bar. image of the button
  4. Choose one or more variables from the Values field. These are the numeric attributes of the point shapefile. All statistics selected in step 4 will be separately calculated for each variable selected here.
  5. Choose one or more selections from the Calculations field. Each selection here will result in one output variable being calculated for each variable selected in step 3. The "random" selection simply chooses one value at random from the set of points falling within each polygon.
  6. Check the box to include zeros if desired. Be sure that this is not selected for datasets having zeros in place of missing values.
  7. Check the box to include negative values if desired, choosing whether they will be included as negatives or positives. This allows one to use datasets in which negative values signify the symbol "<". If negative values are to be used, several options become available for processing them, including using the values unchanged, or replacing them with an arbitrary multiple of their absolute values.
  8. Check the box to include dummy values if your data set includes a value for "no data." Many types of geophysical datasets and data derived from certain types of grids contain a large, negative number to signify that data are absent.
  9. Click OK.
  10. If you chose to have an Nth Percentile calculated, enter up to five numeric values into the available fields, or alternately, select a number using the slider. All numbers must be between 0 and 100 and will be rounded to the nearest 0.5.
  11. Provide a unique name for the new table that will contain the results of the calculations. Point Stat Calc provides a default name that is the polygon shapefile's name with "-dat" appended to the end. This table will be joined to the polygon shapefile's attribute table after the calculations have been completed.
  12. Point Stat Calc must join its results table to the attribute table of the polygon shapefile based on a common field. This requires that there exists a field in the polygon attribute table containing a unique value (an index) for each polygon. Click one of the two radio buttons to either (a) use an existing field within your polygon attribute table or (b) create a new index field in the original polygon attribute table. Choice (b) results in the only modification that Point Stat Calc may make to the original shapefiles used in the calculations.
  13. Click OK.
  14. Click Yes to proceed, No to cancel.
  15. Once the calculations have completed, open the polygon shapefile's attribute table to view the results. The calculated fields are furthest to the right. They have been named using a combination of the source field (from the point shapefile) and the calculation performed. Reminder: the results are not permanently attached to the polygon shapefile's attribute table. They are temporarily joined. This join may be broken by examining the polygon shapefile's attribute table, and clicking Table -> Remove all joins. To permanently join the calculated data to the polygon theme's attribute table, select the polygon theme while it is still joined to the results table, and click Theme -> Convert To Shapefile to create a new shapefile.

Example

You have one polygon shapefile (Midwest.shp) of 4 Midwest states (Illinois, Wisconsin, Iowa, and Minnesota), and one point shapefile (Geochem.shp) containing arsenic, mercury, and lead values in rock samples within these states (data are randomly generated for this example only). You would like to know the average, median, and 95th percentile values as well as the counts for all three elements by state.To do this, complete the following steps:

 

  1. In the active view, select the Midwest.shp and Geochem.shp themes.
  2. Click the Point Stat Calc button on the button bar.
  3. Select Pb_ppm_, As_ppm_, and Hg_ppm_ from the Values field.
  4. Select Average, Median, Count and Nth Percentile from the Calculations field.
  5. Zeroes are not valid values in this data set, so make sure the check-box excludes them.
  6. In this data set, negative values mean "less than." For example, a value of -5 means <5. Check the box to include negative values, and then select the radio button to treat negative values as n multiplied by the absolute value. Enter 0.5 for n, so a value of <5 would be treated as 2.5 for the calculations.
  7. Check the box to ignore dummy values. In this data set, -9999 means "no data." Enter -9999 into the Dummy Value field.
  8. In the Choose Percentiles window, click Ok to calculate the 95th percentile.
  9. Accept the default table name of Midwest-dat.shp.
  10. Click on the "Create new index item in tables" radio button. Click Ok.
  11. Click Yes
  12. To view the data, open up Midwest.shp's attribute table. The calculations will be in the fields beginning with Hg, As, and Pb. Your results should match those displayed in the table below.

image of map with data point locations

Results:

Image of sample table with results. Each record has the following columns:  shape, area, state name, PSC index, and four columns for each element. The four columns for each element in this sample table represent the 95th percentile, median, average, and count.

Known Issues

Several factors have been shown to lengthen the processing time of Point Stat Calc on large data sets. Ensure that the point theme is a shapefile rather than an event theme. If it is an event theme, convert it to a shapefile. Also, be sure to place both themes’ associated files on the hard disk rather than running them off of a temporary storage device such as a Jaz or Zip disk.

Exceptionally large data sets (point themes with greater than 100,000 records) may take a considerable amount of time to process. A practical way of shortening the processing time is to remove unnecessary records from [copies of] each shapefile. For example, if you are using a U.S. counties theme for your polygon theme and you do not need calculations performed for any counties with areas greater than 1,800 square miles, do a query on a copy of the data set to select these counties and any other counties that may be removed. Then, use select-by-theme within the theme menu to select all points from a copy of your points theme that fall within these counties. Finally, delete all selected records from both the polygon and point shapefile copies. An alternate means of excluding unneeded polygons from a data set is to select only those polygons that are needed. Point Stat Calc will ignore all unselected polygons if any are selected, but will include all polygons if none are selected. Point Stat Calc will use all points regardless of whether or not they are selected. The only way to exclude points from the calculations is to delete them.

Contact

Please send any questions or comments to:
Jeffrey Grossman
US Geological Survey
954 National Center
Reston, VA 20192-0001
or
jgrossman@usgs.gov

Acknowledgments

Thanks to Yew Yuan, Jeffrey Grossman, Andrew Grosz, and Joseph Duval for help with developing and testing Point Stat Calc.

Download

Download Point Stat Calc
Download Sample Data

Disclaimers

This report is preliminary and has not been reviewed for conformity with U.S. Geological Survey editorial standards or with the North American Stratigraphic code.

Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Although all data and software released with this open file have been used by the USGS, no warranty, expressed or implied, is made by the USGS as to the accuracy of the data and related materials and (or) the functioning of the software.


Eastern Mineral Resources Team
USGS Geologic Information
This page is https://pubs.usgs.gov/openfile/of00-302/
Maintained by Eastern Publications Group
Last modified July 26, 2000 (jmw)