User Guide for R Package RepeatedMeasurements

Karl J. Ellefsen

2020-04-24

Introduction

Statistical modeling of earth science data often is based on the assumption that the measurement error is normally distributed with mean 0. Furthermore, the modeling itself often requires an estimate of the standard deviation of the normal distribution. Checking the normality assumption and estimating the standard deviation can be done with repeated measurements of a sample, for example. These calculations may be performed with this R-language package.

Example Data Set

This R package includes a data set that is used to demonstated the package functions. To view the data, type

ExampleData

The data comprise 62 measurements, which are stored in an R-language vector.

Calculations

An S3 class is constructed to store both the repeated measurements and various statistics that summarize the repeated measurements. To construct this class, execute the following script:

oEvaluation <- RM_Evaluation(ExampleData, "Example data (no units)")

The second argument to the function is the axis label for the graphics, which are described next. Variable oEvaluation is the object that instantiates the class. To graphically analyze the repeated measurements, execute the following script:

plot(oEvaluation)
Figure 1. Graphs showing the analysis of 62 repeated measurements. A, Dot plot of the measurements. B, Quantile-quantile plot of measurements. The diagonal red line indicates equality between the quantiles of the measurements and the quantiles of the normal distribution.

Figure 1. Graphs showing the analysis of 62 repeated measurements. A, Dot plot of the measurements. B, Quantile-quantile plot of measurements. The diagonal red line indicates equality between the quantiles of the measurements and the quantiles of the normal distribution.

Figure 1A shows the distribution of the measurements. By default, the distribution is shown as a dotplot, which usually is appropriate for small-size data sets (for example, data sets with less than 150 measurements). The distribution also may be shown as a histogram, by changing function argument plot_type. A histogram usually is appropriate for moderate-size to large-size data sets (for example, data sets with more than 150 measurements).

Figure 1B shows the repeated measurements as a quantile-quantile plot, for which the theoretical distribution is a normal distribution. The mean and the standard deviation of this normal distribution are, respectively, the sample mean and the sample standard deviation of the repeated measurements. If all (or nearly all) plot symbols are near the red line, then a normal distribution might be an appropriate representation of the repeated measurements. This interpretation of the quantile-quantile plot would be made by the author.

To print the summary statistics, execute the following script:

summary(oEvaluation)

The following information is printed on the R console: ###############################################################
Summary
Number of measurements: 62
Range: -4.25497 -4.05757
Mean: -4.14005
Standard deviation: 0.0323435
Median: -4.13451
IQR: 0.0339671
P-value for the Shapiro-Wilks test of normality: 0.0158376
###############################################################

The p-value from the Shapiro-Wilks test suggests that the repeated measurements are not normally distributed. In the author’s experience with different sets of repeated measurements, such low p-values are common. In this case, the inference based on the p-value is inconsistent with interpretation of the quantile-quantile plot. Thus, the user must decide whether a normal distribution is an adequate represention of the repeated measurements. If so, then the standard deviation of the repeated measurements is assumed to be standard deviation of the measurement error.