Arizona Geological Survey
416 W. Congress, #100
Tucson, AZ 85701
Telephone: (520) 770-3500
Fax: (520) 770-3505
Rock hand samples are small relative to our bodies (1-20 cm), and contain a very large number of constituent parts (grains). These parts can be classified into different types characterized by their size, composition, and shape. Hand-sample lithology is described in terms of a set of constituent parts defined by generalized characteristics, and the typical relationships between the types of constituents. A complete lithology description defines the set of constituent part types, the fraction of each type in the whole (modal petrography), and the typical relationships between grains of each type. The set of constituent types may be defined based on one or more criteria involving size, shape and composition. The composition of a constituent type may be a single mineral or another lithologic entity (e.g. clasts in conglomerate, leucosome in migmatite). Description of the relationships between grains includes information about the texture and fabric of the aggregate. Common lithologic terms may specify characteristics of only one aspect of the aggregate: e.g. grain size (sandstone), grain shape (breccia), fabric (schist), or grain composition (hornblendite).
On a "map scale" (1-100 km), geologic objects are large compared to our bodies, and are internally variable. Earth features at map-scale are described in terms of a set of rock bodies defined by generalized characteristics, and the surfaces that bound the rock bodies. A set of necessary and sufficient conditions is required to define the identity of each rock body. Each rock body is bounded by a set of intersecting surfaces that define a volume at a particular geographic location. In many cases, one of the bounding surfaces is the Earth's surface. The other bounding surfaces are referred to as geologic surfaces. These may be depositional, intrusive, gradational, faulted, or polygenetic. Geologic surfaces are typically directed. A depositional or intrusive contact is directional if rock on one side is known to be older than rock on the other side. A fault is directed if the orientation of the slip vector is known. A gradational contact can have a sense defined by the gradient of some physical property. Polygenetic contacts may not have an inherent directionality, or they may have one or more directions inherited from stages of their genesis. Geologic surfaces commonly have properties, for example boundary layer thickness, rock types unique to the boundary (fault rocks, soil profile, basal conglomerate...), and small-scale geometry of the surface (rough, smooth, interleaved, etc.).
This system will need to meet the needs of land managers or planners needing information pertinent to regulatory, planning, and development functions, mineral exploration geologists, researchers in search of detailed technical information, and curiosity-driven users from the general public. Many of these users may not be expert geologists, but still need to be able to query the system to obtain information. The underlying data model must be flexible enough to encompass a wide range of earth science information, storing it in such a fashion that it does not become obsolete with advances in geologic science.
As a starting point, it is useful to consider a fantasy view of the product. The ultimate geologic information system would contain a detailed 3-dimensional model of the Earth. Such a description would include:
The ultimate interface to such a system would allow natural language queries couched in technical or nontechnical terms, and respond by providing appropriate natural language answers, data tables, standard format output files, or visualizations (maps, cross sections, 3-D views, etc.). Knowledge of the physical location and structure of the stored data would be unnecessary to the user. Multiple working hypotheses would be stored for regions in which knowledge is incomplete or inconsistent. The origin of any particular fact or interpretation could be traced to its source. The system will play the role presently filled by geologic maps -- providing information in response to queries, checking consistency of data to be introduced into the system, and archiving data for future use. The flexibility of a computer-based system provides for a much more complete description of the Earth in a form that permits more sophisticated analysis, and can be user-customized to meet particular needs.
Meanwhile, back in the real world of incomplete knowledge, limited budgets, and existing computer software/hardware environments, the geologists of today need to lay the groundwork for the data archive and analysis system of the future. Standard engineering procedures for system design begin with a requirements analysis. Without going into detail, the desired geologic information system must: 1) allow geologists to archive earth science data and interpretations in a logically consistent form; 2) track data sources and updates to the data archive, and 3) provide applications that query, retrieve, and display existing geologic information to meet the needs of environmental, engineering, exploration, and research geoscientists. The full information system thus consists of a data repository and a set of applications (interfaces) specific to particular fields of enquiry. The data repository subsystem consists of a data model to organize information, applications (methods) to track updates and maintain data integrity, and the physical data storage implementation. This paper is concerned with the design of the data repository data model for a geologic information system.
The design of the data model begins with development of a conceptual model that describes earth science (the universe of discourse) at a fundamental level using earth science terms and concepts. The purpose of developing a conceptual model is to provide a consistent framework for developing logical and physical models couched in database-system terms and concepts. These guide implementation of the information system in a particular computer environment. It is difficult or impossible to transfer data between information systems based on models with inconsistent underlying concepts. For example, if rocks were thought of only in terms of their whole-rock chemistry in one model, there would be no unique way to convert data stored in that model into one in which rocks were modeled as aggregates of minerals.
The following sections analyze two key sub-domains of earth science, geologic maps and lithology, and proposes conceptual models for some aspects of these domains. The appendices include a glossary of terms used, a discussion of the philosophy of conceptual modeling, and a review of some existing data modeling efforts relevant to the development of a geologic information system.
In this discussion, the smallest scale of observation considered is the hand sample -- that is pieces of rock the average person can hold in their hand and examine with a magnifying glass. Generally this means that aggregates of particles smaller than about 0.1 mm are represented by average properties, and particles larger than about 10 cm are entities. The next scale considered is the outcrop -- the average size of exposed rock that can be examined and described at one time, say 10 cm to 100 m. Map scale is taken to encompass 100 m to 100 km, and is the largest scale of observation considered here. The next larger "spatial domain' might be called province scale, encompassing description of bodies with dimension 100-10,000 km.
Geologic descriptions can be associated with a point of observation, or can be a generalized description applied to a surface (fault, contact...) or rock volume (rock unit, region on a map, Formation...). For most of the commonly described geologic features, there are systems of classification based on certain characteristic features. Rock and fossil names are examples of such classification systems. A contact between rock bodies may be described as intrusive or depositional, connoting certain features of the contact. Many observations made by geologists in the field are summarized by identifying an observed object as a member of a predefined class. Such descriptive "data' is subjective because assignment to a class depends on the classification system chosen and the experience of the geologist making the observations.
Part of the geologist's work in mapping an area is developing a system of classification for rocks, contacts, and structures that summarizes observations and simplifies the task of describing the variety of things observed. Some of the observed things will match criteria for standard geologic classifications or classifications developed in nearby areas. When something can not be matched to satisfaction with a known type, a new classification is defined based on observations in the map area. The definition must include a set of necessary and sufficient conditions to assign membership to the class, and place the class within the existing hierarchy of known things. Recording a full description of the classification system used to pigeon hole observations is an essential part of making a geologic map.
The Glossary of Geology (Bates and Jackson, 1987) defines "rock' as an aggregate of one or more minerals, or a body of undifferentiated mineral matter (e.g. glass), or of solid organic matter (e.g. coal). This definition is modified slightly here to define a rock as a consolidated aggregate of constituent parts, or a solid body of undifferentiated mineral or solid organic matter. The constituent parts of a rock may be minerals or other rocks (as in a conglomerate). A simple rock consists of an aggregate of distinct mineral grains (granite). More complex rocks may consist of aggregates of rock fragments (conglomerate, breccia), or mixtures of distinct lithologic components (migmatite, gneiss).
The system of hand-sample description proposed here for use in computer databases is based strictly on features visible in the hand sample that are descriptive and non-genetic. Bates and Jackson (1987) define lithology as the physical character of a rock. In this report the term lithology will be modified by terms indicating the scale of description, and will refer only to rock-volume characteristics. The hand-sample scale classification scheme for Earth materials makes a first order distinction between consolidated and non-consolidated materials to separate rocks from other materials. For rocks, the highest level distinctions are made between aphanitic and phaneritic rocks and between rocks for which the origin can be determined based on features visible in hand sample and rocks of indeterminate origin. Further refinement of the lithologic nomenclature is based on five independent (orthogonal, unrelated) characteristics of the constituent parts, either individually or in combinations. These characteristics include grain size, grain shape, grain sorting, grain composition, and fabric (the relationship between the grains). Examples of a lithologic type based on each of these characteristics are sandstone, breccia, porphyry, hornblendite, and schist. The description of a complex hand sample might include descriptions of several lithologic components, each consisting of distinct assemblages of minerals with distinct fabrics, and a set of relationships between these lithologic components.
Standard lithologic nomenclature is based first on determination of origin -- igneous, sedimentary, or metamorphic. In some cases, particularly for very fine-grained or altered rocks, this determination requires some information about the context of the rock. The restriction to features visible in a hand sample requires descriptive terminology to allow for situations where the rock origin is indeterminate.
In more detail, the first order classifications in the descriptive lithology system are:
1. Degree of consolidation (consolidated or non-consolidated).
Consolidated is taken to mean that a piece of the material can be held in the hand as a single mass, and does not disintegrate into its constituent parts if struck with a hammer. The importance of this distinction is to separate materials for which necessary and sufficient definition may include criteria based on intergranular aggregate properties (e.g. sandstone) from those for which the definition must be based solely on the nature of the constituent particles (e.g. sand). Clearly there is a continuum from non-consolidated to consolidated (e.g. gravel in an active stream to conglomerate, or laterite to underlying granite), but to be useful for a computer database, a material must be one or the other. A non-consolidated deposit may form an outcrop (e.g. a trench to expose a soil profile), and can then be described in terms of outcrop-scale lithology. For hand sample lithologic description, a material is either a rock (consolidated) or a non-consolidated material.
2. Degree to which constituents can be discerned (aphanitic or phaneritic).
Rocks that have discernible constituents can be described in terms of those constituents. If the material is aphanitic (constituents not discernible, i.e. diameter <~0.2 mm), it is homogeneous for our purposes, and belongs to the undifferentiated mineral matter and solid organic matter types mentioned in the definition of rock. Phaneritic rocks are classified based on the nature of the parts that make the whole, and the relationship between the parts.
3. Genetic origin (sedimentary, igneous, metamorphic, composite, anthropogenic, or indeterminate)
The third criteria separates material for which generic names must be used from those for which the standard igneous, sedimentary, and metamorphic rock terminology can be used. Rocks with features that demonstrate crystallization or cooling from a melted liquid are igneous. Rocks with features that demonstrate formation of the rock at low temperatures and pressures at or near the earth's surface are sedimentary. Conceptually, a metamorphic rock has a composite origin, but for consistency with standard practice, metamorphic is considered to have the same rank as igneous, sedimentary, and anthropogenic. The composite origin type includes rocks containing features indicative of a polygenetic origin that are not metamorphic rocks. The composite origin type includes various problematic rock types. Volcaniclastic rocks are commonly not clearly of uniquely sedimentary or igneous origin. Composite rocks also include metasomatized (hydrothermally altered) and weathered rocks that are different enough from their original character to merit a new classification, but are not typically thought of as metamorphic. Anthropogenic Earth materials are those that are the product of human activity. Since this classification scheme is designed to be descriptive, the criteria to distinguish these various types must be based on features observable in a hand specimen. If no such features can be observed, the rock origin is indeterminate.
This schema for defining lithology can be extended to describe the Earth at larger scales. An outcrop description consists of a collection of lithologic components (hand-sample scale) and descriptions of the relationships between the components. At the outcrop scale, the orientation of geometric elements of the fabric (in the Earth reference frame) is introduced as a feature of a complete description. The terminology for this scale of observation probably corresponds more closely to familiar geologic usage, because it allows for more genetic nomenclature based on context. A map-scale description consists of a collection of map units (rock bodies) defined based on generalization of outcrop-scale description, and descriptions of the nature, location, and orientation of boundaries between the units (analogous to fabric on smaller scales). This description schema can be applied to stratigraphic sections, formations, groups, supergroups, terranes....etc.
The boundaries that separate rock bodies from other rock bodies are geological surfaces. A geologist studies the characteristics of these surfaces to infer the relationships between rock bodies. In detail, geologic surfaces are normally boundary layers of finite thickness. For example, a fault might have a breccia zone of a particular thickness, and a gradational contact has a characteristic thickness between what is clearly rock A and clearly rock B. The boundary layer thickness limits the precision at which that surface can be located. At contacts between rock bodies, one rock will in general be older than the other. Exceptions include boundaries between metamorphic or alteration zones and facies boundaries. A fault surface has an associated vector field that represents the slip on the fault at each point on the surface. Faults also have bracketing ages for time of movement. Geologic surfaces thus have intrinsic properties of direction, thickness, and time.
A geologic map is a complex knowledge representation. A standard geologic map depicts the location of rock body boundaries as lines that bound closed polygons. The polygons are colored or labeled to indicate the geologic map unit that is found at (or below!) the Earth's surface at points within the polygon. The system used to classify earth materials into the map units is described in accompanying text. The map units are often defined in terms of geologic formations that are only described in a cursory fashion with reference to technical literature that provides more complete information. On many maps the description of the classification system is such that a user who does not have a strong background in geology would have difficulty answering simple questions about the materials that might be expected to be associated with the outcrop of the map unit.
An important concept in analyzing the information contained in geologic maps is the distinction between a description of a rock volume, and a description of a surface. Rock units describe the characteristics of a volume of rock. Surficial geologic units describe the characteristics of the boundary layer between solid earth and atmosphere or hydrosphere. Surficial units may describe the lithology of deposits to a depth that is small relative to the horizontal extent of the map, or may relate to surface morphology, age (as opposed to deposit age), or depositional environment. The colored polygons on a geologic map either represent the intersection of rock volumes with the earth's surface (or map horizon), or represent characteristics of particular regions of the earth's surface (or map horizon). To a geologist interested in the processes and characteristics of the earth surface, the lines on the map represent boundaries of closed regions in the surface. Faults and rock-body contacts on a geologic map represent the intersection of two surfaces (rock body boundary and earth surface), projected onto a 2-D map surface. A geologist interested in the rock bodies that compose the earth beneath the map surface uses the 3-D geometry of these intersection lines, along with measurements of surface orientations noted on the map, to understand the 3-D model of the earth depicted by the map.
This 3-D analysis requires significant prior knowledge of geologic map interpretation. Classification of contact types on geologic maps is generally binary-'fault' or "contact'. Geologic reasoning must be used to determine the relationships between rock bodies. For example, consideration of rock type (e.g. granite), and rock age for adjacent rocks allows geologists to identify a contact as a conformable depositional contact, nonconformity, angular unconformity, or intrusive contact. Accompanying text sometimes describes the nature of contacts that do not conform to any simple classification. The geometry of faults and contacts is either encoded on the map using a symbol (dip-direction arrow and dip magnitude), or it must be extracted by analyzing the location of intersection points between the contact trace and topographic contour lines. The geometry of faults and contacts can then be used to determine underlying and overlying relationships not described in the map legend. Once the geometrical relationships of depositional and intrusive contacts are understood, the offset of intersection points of contacts at faults on the two sides of the fault for a series of contacts can be used to constrain possible magnitude and sense of displacement on a fault.
The classification of measurements of orientation of features (bedding, foliation, contacts...) shown on the map is generally indicated by a set of symbols representing standard feature types, with little or no explanation of the criteria used to assign membership to a particular class. In areas of complex structure, much of the orientation data collected in the field is not shown on the map because of space limitations. The printed geologic map thus contains a great deal of information that is not explicitly stated, and probably contains only a subset of the data and observations made by the geologist in the field. This data can be recorded in a usable form in a computerized map database.
A geologic map is a visualization of the geologist's knowledge about the spatial distribution of rocks and the relationships between them in a particular area. The steps to produce this visualization are:
1. Select a map horizon.
This is the surface that contains the physical features to depict. It may be any arbitrary surface within the earth, but most commonly is the actual earth's surface on which geologic observations are made.
2. Define the extent of the region in the map horizon to depict
3. Select a set of geologic entities to depict. These may be rock volume or surface objects.
4. Determine the intersection of these volumes and surfaces with the map horizon.
If the map horizon is non-planar (e.g. the Earth's surface), this procedure will produce bounded 3-D surfaces and 3-D lines. A surficial geologic map is a special case in which the geologic surfaces to depict are identical with the map horizon, in this case the Earth's surface. Other sorts of surface maps might show the distribution of fault rocks in a fault surface, or the distribution of alteration and mineralization along a vein.
5. Project the 3-D surfaces and lines from earth coordinates on the map horizon to map coordinates in a planar surface.
If the map horizon is planar (e.g. a cross section or mine-level map) this step only requires scaling. This results in a map consisting of lines and polygons that represent the intersection of the geologic entities of interest with the map horizon, or the extent of geologic surface entities on the map horizon. These lines and polygons are cartographic entities.
6. Symbolize each cartographic entity.
Assignment of graphical elements is based on the classification of the geologic volume or surface entity represented by the cartographic entity.
7. Superimpose the geologic map on a base map.
The base map supplies the geographical context necessary to relate the geologic relationships to the real world. If topographic information is included on the base map, the 3-D geometry of geologic lines on the map can be deduced to provide a basis for interpretation of the 3-D geometry of rock bodies.
One of the attractions of a digital geologic database is the possibility of including a record of as much information as there is time or need to record about an area. The geologic data model must facilitate the storage, retrieval, and analysis of information at whatever level of detail it is available -- whether the information is from detailed field notes or a 50-year old reconnaissance map. System-specific software tools based on the model would allow users to produce derivative maps tailored to their needs, showing things like distribution of rock containing more than 50% quartz, distribution of white rocks, distribution of fine-grained rocks, or all steeply dipping faults.
Figure 1.Overview of Object-Role Model (ORM) graphical notation and conventions.
Scalar Quantities (Figure 2) are used throughout the model to specify numerical values. The Scalar Quantity concept modeled here allows several representations. The simplest is "quantity has a single numeric value and a measurement unit attribute'. Possible more complex representations include a single value with an uncertainty, or a minimum and maximum value with a specified default value to use if a single value representation is required. In all cases the Scalar Quantity has a specified unit of measurement attribute. Subtypes of Scalar Quantity in other Figures are identified by the use of QuantityID as the reference mode for a subtype entity.
Figure 2. Schema for Scalar Quantity.
The Fractional Analysis (Figure 3) concept is used to model any situation in which an entity is composed of other entities in certain proportions. Common examples are chemical analyses, grain size-distributions, and modal mineralogy. Constituent types used in Fractional Analyses are specified in a system classification table. Constituents that are used as fractional parts of a described entity must be allowed as components of a Fractional Analysis of the Described Entity. If the quantity that specifies a fraction of some constituent is specified as a range Scalar Quantity, an algorithm is necessary to adjust default values so that the constraint "sum of fractions = 1" can be met. When a constituent is added in the analysis, a check must test that the sum of the minimum value for all the fractions is <= 1, and that the sum of the maximum value is >= 1. These checks ensure that "sum of fractions = 1" is possible. If the sum of the fractions for constituent parts is <1, the difference is automatically classified as an unspecified constituent. Fractional Analysis entities appear in Particular Lithology schema (Figure 5) for Chemical Composition, and in the Constituent Geometry schema for Grain Size Distribution.
Figure 3. Schema for Fractional Analysis.
Figure 4. Schema for General Lithology.
The Particular Lithology schema (Figure 5) models the description of a particular rock. The information system should include a set of idealized "typical' Particular Lithology entities associated with each General Lithology to provide default values for characteristics not specified in the General Lithology definition. The Particular Lithology entity will also be used for the description of rock hand samples. A Particular Lithology is modeled as an aggregate of Constituents. The Constituents may be Mineral or other Particular Lithology entities, allowing for recursive description. Each Constituent may be involved in one or more Relationship_within_Whole associations (see Table 1 for examples) with the Particular Lithology. For each role a Constituent plays in the aggregate, a Constituent Geometry can be described (Figure 6). This includes aspects such as grain size, grain shape, and sorting for a particular Rock Constituent playing different roles, e.g. as groundmass or as phenocrysts. Relationships between Constituents in particular roles within the whole aggregate are described by Relationship_between_constituent associations (Table 2). These associations define the hand-sample scale fabric and structure of the rock. A Mineral or a Particular Lithology may have physical property attributes such as color, density, magnetic susceptibility, sound velocity, and refractive index. Chemical Composition is a subclass of Fractional Analysis (Figure 2), and may be associated with a Mineral or a Particular Lithology.
Figure 5. Schema for Particular Lithology.
|Table 1. Examples of Relationship_within_Whole associations|
Forms matrix in
Defines graded beds
Forms phenocrysts in
Forms porphyroblasts in
|Table 2. Examples of Relationship_between_Constituent associations:|
Forms a rim on
Constituent Geometry (Figure 6) models a data structure for describing the geometry of individual constituents in a Particular Lithology. These descriptions are based on terms that may have qualitative or quantitative definitions (e.g. round, subround, euhedral, subhedral, fine-grained) or on free text descriptions referenced by "descID'. The Constituent Geometry description has two sub-classes, Grain Geometry and Body Geometry, because different terminology is used when describing individual grains or lithologic components in a mixed rock.
Figure 6. Schema for Constituent Geometry.
The Geologic Surface (Figure 7) schema requires that a Geologic Surface have a type attribute that determines the subclass membership for the surface, and an attribute that indicates if the surface is directed. A characteristic Thickness of the surface may also be defined. Fault surfaces have associated Slip Vectors that record displacements across the fault. A Fault may have several associated slip vectors if displacement is known to have changed over time. Faults and Contacts both have associated Age Ranges. For a Fault, the age range is the time interval during which the fault was active. For active faults, the lower bound of this range is 0. Faults may have more than one associated age range to allow for more than one period of activity. Distinct age ranges may be related to distinct slip vectors to record the full displacement history of a Fault. The age range associated with Contacts represents the time interval that separates rocks on either side of the Contact. Only one age range may be associated with a Contact. Degenerate Volumes are entities used to represent descriptions of a discrete layer or a boundary layers that is too thin at the scale of representation to depict on a map. Examples include soil profiles, dikes, veins, fault rocks, and marker beds. A Degenerate Volume surface is associated with a Rock Unit classification object that describes the lithologic characteristics of the material in the boundary layer. A thickness must be defined for a Degenerate Volume surface.
Figure 7. Schema for Geologic Surface.
Association: A concept that defines what a thing has to do with one or more other things. An association has no independent existence (Angus and Dziulka, 1998). Associations corresponds to logical predicates (Halpin, 1995).
Attribute: An inherent characteristic (Merriam-Webster Dictionary, 1999)
Class: A set of things sharing common attributes, defined by a set of criteria (a type). (Angus and Dziulka, 1998).
Collection: A contingent aggregate of things, interpreted as an individual (Allgayer and Franconi, 1994).
Concept: A general idea, derived from specific instances or occurrences (American Heritage Dictionary, 1982). An abstract or generic idea generalized from particular instances (Merriam-Webster Dictionary, 1999)
Domain: The collection of things that may exist in a particular model.
Entity: A thing that has independent, separate, or self-contained existence (Merriam-Webster Dictionary, 1999). An instance of a type that has its own identity and can be distinguished from another instance that meets the definition of the same type. This implies that an entity has an additional property that is its unique identifier.
Instance: A thing that is representative of a type (American Heritage Dictionary, 1982). An instance does not have a unique identity (compare to entity); two instances representative of the same type can not be distinguished (Angus and Dziulka, 1998).
Individual: A single thing in a domain (Allgayer and Franconi, 1994)
Primitive Concept: A concept that has no internal structure or cannot be defined in terms of necessary and sufficient properties (MacRandal, 1988).
Relationship: A connection between an association and a thing. The relationship is identified (with respect to the association) by the role played by the related thing in the association (Angus and Dziulka, 1998).
Role: The thing that receives or is affected by the action of an association. A concept that indicates in what capacity some thing is involved with some other thing. Roles are used in naming relationships with respect to an association (Angus and Dziulka, 1998). Roles hold information about function.
Specific Thing: An individual occurrence of some thing; an entity (Angus and Dziulka, 1998).
Taxonomy: A superclass/class/subclass hierarchy in which every instance of a child class is a member of the class(es) that are parent of the child.
Thing: An entity, situation, association, or event that may be perceived, known, or imagined.
Type: The semantic distinctions between "type', "class', and "domain' are ambiguous and inconsistently applied in the literature. Type here will be taken to apply to the set of attributes that determine, for any thing, whether that thing is or is not a member of a class. A type defines a domain; a class is a set of things that belong to the domain. The domain is the abstract collection of all things that meet the definition of the type. Instances of types do not have an identity, i.e., two instances of the same type can not be distinguished (definition distilled from many sources).
Typical Thing: A type based on a particular set of entities, characterized by a subset of the attributes of the real world entity(ies), and/or idealizations of those attributes. Attribute values are assigned, not measured, and are normally specified as a possible range of values (Angus and Dziulka, 1998).
A semantic network is formed by defining a set of primitive concepts, or abstractions of fundamental things in the real world. Concepts are the things we can talk about, the epistemological primitives of our thought process. A primitive concept commonly corresponds to a "natural kind' -- something abstracted from the real world, that cannot be defined logically in terms of necessary and sufficient conditions. A partial definition of a primitive concept can include necessary conditions. Derived concepts are defined by logical combination of concepts, or association of a concept with attributes and constraints to identify a subset of the concept. An association is an ordered pair of concepts and a predicate that defines the role of the association. The associations between the parts of a derived concept define the structure of the thing being modeled. A constraint is an association between parts of a derived concept that has a boolean value property whose value depends on properties of the instances of things it involves. For an instance of a concept to be valid, all its constraints must evaluate to true. In a semantic network, the concepts are represented as nodes, and the associations as connections between the nodes.
There is widespread interest in methods for developing and expressing data models, driven by efforts to automate various business operations. Recent contributions include XML-Data (http://www.w3.org/TR/1998/NOTE-XML-data/), which is an extension to the Extensible Markup Language (XML) (http://www.w3.org/TR/REC-xml ). XML-Data was designed to communicate information about the structure of data over the world wide web. This will allow applications to be produced that could acquire and analyze data sets from a variety of sources without prior knowledge of the data structure underlying the data set. The Universal Modeling Language (UML) (http://www.rational.com/um/) is a system for modeling the structure of data and operations necessary to utilize the data for a particular purpose. This language was motivated by efforts to provide a standard mechanism for describing complex data management operations, thereby facilitating the development of computer-aided software engineering (CASE) tools. UML is strongly based in Object-oriented system thinking. For a comparison of UML with the Object-Role Modeling approach used here see Halpin (1998).
Another important effort to develop tools for data modeling has been driven by the International Organization for Standardization (ISO) project 10303, Industrial automation systems and integration, Product data representation and exchange (referred to as the International STandard for the Exchange of Product Data or STEP). As part of this project, a formal information requirements specification language, named EXPRESS, has been developed (Spiby, 1998). EXPRESS and UML appear to provide similar modeling capabilities.
In connection with the ISO 10303 project, a group named EPISTLE (European Process Industries STEP Technical Liaison Executive) has published a document titled "Developing High Quality Data Models" (West, 1996). This document discusses the rationale for a systematic approach to data models, and lays out a set of principles for developing data models that are stable, flexible to changing practices, and extensible to changing needs. These principles include:
The EPISTLE group has also published a Framework document (Angus, and Dziulka, 1998), that provides a high-level (meta) model and a set of core constructs that serve as a basis for deriving "high-quality' conceptual models. I have attempted to incorporate the EPISTLE framework into the models presented here.
Significant development of spatial data models in the specific context of earth science has also been necessary in the development of software applications for visualization of 3-D geologic models. The GOCad Research Program was initiated in 1989 by the Computer Science group of the National School of Geology (ENSG) in Nancy, France. The goal of this project is to develop a new computer-aided approach for the modeling of geological objects specifically adapted to Geophysical, Geological and Reservoir Engineering applications (c.f. Royer et al., 1996). The project is supported by an international consortium of oil companies and academic institutions. [For more information see web page at http://www.ensg.u-nancy.fr/GOCAD/Welcome.html and http://www.ensg.u-nancy.fr/GOCAD/doc/ref_man.html.] The documentation on the web site reports compliance with the POSC standard (see below). 3DMove (http://www.mve.com/p-3dmove.html) is a similar commercial software package developed for visualization of 3-D geologic structure, but no information is available on the underlying data model.
Some problems with the POSC model have been identified. First, the model introduces "named data types' that are outside of the data types defined in the EXPRESS language specification (Spiby, 1998). Thus, public domain software designed to interpret and edit conceptual models written in EXPRESS does not work with the POSC model. Second, the scope of earth science included in the model is limited to aspects relevant to petroleum exploration and production. There are no provisions for the complexities of igneous and metamorphic rocks, or for describing surficial geology, beyond those that overlap with the geology of sedimentary rocks.
The Public Petroleum Data Model is the second data model developed to standardize data modeling by the petroleum industry. Details of this model are available only to members of the association, and have not been studied by the author. More information is available at http://www.ppdm.org/.
A number of government geological surveys have developed models for internal use. These are documented to varying degrees. Workers from the British Geological Survey (BGS) have published a series of papers on geologic map database models, data dictionaries, and standardization of mapping practices (Laxton and Becken, 1996; Bain and Giles, 1997; Giles, 1997; Allen, 1997). The paper by Bain and Giles (1997) outlines the framework of the data model used by BGS using entity-relationship diagrams, but does not give detailed definitions of the entities used. This paper points out the importance of defining the mapping horizon (GEOLOGICAL LEVEL in their terms) as a proxy for elevation data associated with x,y points on a geologic map.
The Australian Geological Survey Organisation (AGSO) has been involved with geological database development since the 1970's, and is a participant in the POSC and PPDM efforts. Their system has apparently developed from the ground up, evolving into an integrated database system with the OZROX Field Geology Database at its center (Ryburn et al., 1995). OZROX contains information on field locations, outcrop data, measured section and drill hole logs, lithology and sample data, and structural observations. It is linked with databases for geochronology (OZCHRON), whole-rock geochemistry (ROCKCHEM) petrography (PETROGRAPHY), biostratigraphy (STRATDATA), the Australian national petroleum database (PEDIN), a bibliographic database for source citations (AGSOREFS), and standard reference databases for Stratigraphic Names, Geological Provinces and the Geological Time Scale. A published description of the conceptual model underlying OZROX has not been found.
AGSO arranged for the Australian Mineral Industries Research Association (AMIRA) (http://www.amira.com.au/) to contract the development of a standard geoscience data model across the mineral exploration industry. The AMIRA project P431, "Geoscience Data Model" was delivered in April, 1998 (Ryburn and O'Donnell, 1998), consisting of entity-relationship diagrams and a data dictionary. The model is described in Miller et al. (1998). It models the domains of geology, geochemistry, drilling, and mineral resource information. The model is designed to provide a standard database framework for mineral exploration companies to file required reports on their exploration activities with government agencies, and to facilitate data exchange between companies. It provides elegant data structures for storing information on boreholes, rock samples, mineral deposits (orebody geometry, commodities, production), geochemical analyses, and relationships between companies, exploration projects, and tenements (mineral leases). The model includes a "profile' construct to provide metadata defining customized implementations of the full model. A conceptually similar "template' construct provides metadata to describe sets of chemical analyses. Detailed geologic descriptions are included through fields for stratigraphic unit, rock type, lithological name, and lithological descriptor and text comments that may be applied to any "fraction' of a mineral deposit, bore hole (bore hole interval), rock outcrop, or individual sample. Fabric elements may be related to any of these rock entities, and may have relationships defined with other fabric elements. Almost any entity in the model may have one or more free text "observations' associated with it. The model relegates development of systems of terminology to individual implementations of the model. The model does not attempt to describe geologic maps.
The Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia has also developed a model for Geoscientific Spatial Information systems (Lamb et al., 1996; Lamb et al., 1994; Power et al., 1995; http://www.ned.dem.csiro.au/research/visualisation/DMGE/). This is presented as an object-oriented data model that contains classes describing: geometry and topology of 3D geoscience entities, an audit trail for tracking sources of information, an information quality hierarchy, a spatiotemporal database record hierarchy, and a few geoscientific entities. This is a physical model for implementation in a C++ programming environment. CSIRO is a member of the GOCad consortium and their model apparently uses a framework similar to that used by GOCad for spatial and topological data.
The Geological Survey of Canada (GSC) has supported development of the FieldLog software system (http://gis.nrcan.gc.ca/fieldlog/Fieldlog.html) to aid geologist manage geologic field data. The data model underlying FieldLog is described in general terms in Brodaric (1997). This model has been used for a number of years by geologist for collection and compilation of field data, and appears to provide a flexible, robust, and expressive structure for collecting, archiving and analyzing the geologic data that support spatial data presented on geologic maps. The GSC has also published a cartographic database standard that is a physical model for geologic maps in an Arc/Info environment (http://www.nrcan.gc.ca/ess/carto/english/reference/GSCCDBS.pdf).
The Geologic Survey of Colombia (INGEOMINAS) has published some information on their model for geoscience data (Murillo, 1995). This data model was designed for INGEOMINAS as a means to integrate database operations over the domains of geology, geophysics, mining, geoenvironmental engineering, samples and wells. The fundamental entities in this data model are Observation Point (any location), the Spatial Reference Plane (spatial coordinate frame for Colombia), and the Mapping Terrain Unit (a closed surface object having characteristics different from surrounding units). No underlying conceptual model is elucidated in the description of this model.
Most participants at this conference are probably familiar with the standard data model proposed by the USGS, AASG and GSC (Johnson et al, 1998). Other efforts in the USGS have produced implicit or explicit data models. Implicit models underlie two significant software programs: AlaCarte, developed for geologic map data entry using Arc/Info (Fitzgibbon, 1991; Wentworth, 1991), and GSMCAD (Williams et al., 1996), developed for PC-based geologic mapping. Descriptions of the conceptual models underlying these packages are buried in the details of the physical implementations. Matti et al. (1997a, 1997b, 1997c) describe a unique physical data model using a linguistic root-suffix coding scheme. This model provides extensive hierarchical word lists that serves as an excellent guide for the concepts that need to be represented in a successful geologic knowledge base.
Acknowledgements.--Thanks to Claire Zucker (Pima Association of Governments) for a careful editorial review. My thinking has been shaped by instructive and stimulating conversations with Boyan Brodaric (GSC), Gary Raines, Bruce Johnson, Ron Wahl, Jon Matti (all USGS), and Tim Orr (AZGS). John Tsotsos (University of Toronto) kindly provided a copy of an unpublished manuscript that introduced me to the engineering/artificial intelligence approach to geoscience knowledge representation.
Allen, P. M., 1997, Standardization of mapping practices in the British Geological Survey: Computers in Geosciences, Vol. 23, p. 609-612.
Allgayer, Jurgen, and Franconi, Enrico, 1994, Collective entities and relations in concept languages, in Lakemeyer, Gerhard, and Nebel, Bernhard, eds., foundation of Knowledge Representation and Reasoning: Berlin, Springer-Verlag, p. 13-29.
American Heritage Dictionary, 1982, Second College Edition: Boston, Houghton Mifflin Co., 1568 p.
Angus, Chris, and Dziulka, Peter, 1998, EPISTLE Framework V2.0, Issue 1.22. Available at http://www.stepcom.ncl.ac.uk/epistle/data/framedoc.htm; or contact current editor: Peter Dziulka, Keyworth Institute, Department of Mechanical Engineering, University of Leeds, Woodhouse Lane, Leeds LS2 9JT.
Bain, K.A., and Giles, J.R.A., 1997, A standard model for storage of geological map data: Computers & Geosciences, v. 23, no. 6, p. 613-620.
Bates, R.L., and Jackson, J.J., eds., 1987, Glossary of Geology, Third Edition: Alexandria, VA, American Geologic Institute, 788 p.
Bernknopf, R.L., Brookshire, D.S., Soller, D.R., McKee, M.J., Sutter, J.F., Matti, J.C., and Campbell, R.H., 1993, Societal Value of Geologic Maps: Washington, D.C., U.S. Geological Survey Circular 1111, 53 p.
Brachman, R.J., and Schmolze, J.G., 1985, An overview of the KL-ONE Knowledge representation system: Cognitive Science, v. 9, p. 171-216.
Brodaric, Boyan, 1997, Field data capture and manipulation using GSC Fieldlog 3.0, in Soller, D.R., ed., Proceedings of a workshop on digital mapping techniques: Methods for geologic map data capture, management, and publication: U.S. Geological Survey Open-File Report 97-269, p. 77-82.
Fitzgibbon, T.T., 1991, ALACARTE Installation and System Manual, Version 1.0: U.S. Geological Survey Open-File Report 91-587B. Available at http://wrgis.wr.usgs.gov/docs/software/software.html.
Giles, J.R.A., Lowe, D.J., and Bain, K.A., 1997, Geological Dictionaries -- Critical elements of every geological database: Computers & Geosciences, v. 23, no. 6, pp. 621-626.
Halpin, T.A., 1995, Conceptual Schema and Relational Database Design, Second Edition, Prentice Hall Australia, 547 p.
Halpin, T.A., 1998, UML data models from an ORM Perspective: Part One: Journal of Conceptual Modeling, http://www.inconcept.com/JCM/April1998/print/halpin.html.
Johnson, B.R., Brodaric, Boyan, and Raines, G.L., 1998, Digital Geologic Maps Data Model, v. 4.2: AASG/USGS Data Model Working Group Report. Available at http://ncgmp.usgs.gov/ngmdbproject/standards/datamodel/model42.pdf.
Lamb, Peter, Horowitz, Frank, and Schmidt, H.W., 1996, A Data Model for Geoscientific Spatial Information Systems, version 1.2, Draft Nov. 27, 1996: CSIRO Division of Information Technology. Available at http://www.ned.dem.csiro.au/research/visualisation/DMGE/specs/paper.htm.
Lamb, Peter, Horowitz, Frank, and Schmidt, H.W., 1994. A data model for geoscientific spatial information systems: Version 1.0: Canberra, Australia, CSIRO Division of Information Technology, Technical Report TR-HJ-94-05.
Laxton, J.L., and Becken, K., 1996, The design and implementation of a spatial database for the production of geological maps: Computers and Geosciences, v. 22, p. 723-733.
MacRandal, Damian, 1988, Semantic Networks, in Ringland, G.A., and Duce, D.A., eds., Approaches to Knowledge Representation: an Introduction: New York, John Wiley and Sons, Inc, p. 45-79.
Matti, J.C., Miller, F.K., Powell, R.E., Kennedy, S.A., Bunyapanasarn, T.P., Koukladas, Catherine, Hauser, R.M., and Cosette, P.M., 1997a, Geologic-point attributes for digital geologic-map databases produced by the Southern California Areal Mapping Project (SCAMP), Version 1.0: U.S. Geological Survey Open-File Report 97-859, 37 p.
Matti, J.C., Miller, F.K., Powell, R.E., Kennedy, S.A., and Cosette, P.M., 1997b, Geologic-polygon attributes for digital geologic-map data bases produced by the Southern California Areal Mapping Project (SCAMP), Version 1.0: U.S. Geological Survey Open-File Report 97-860, 42 p.
Matti, J.C., Powell, R.E., Miller, F.K., Kennedy, S.A., Ruppert, K.R., Morton, G.L., and Cosette, P.M., 1997c, Geologic-line attributes for digital geologic-map databases produced by the Southern California Areal Mapping Project (SCAMP),Version 1.0: U.S. Geological Survey Open-File Report 97-861, 103 p.
Merriam-Webster Dictionary, 1999, http://www.m-w.com/dictionary.htm.
Miller, D.R., Hume, R.G., and Parker, A.J., 1998, The Geoscience Data model, AMIRA Project P431 Final Report to Sponsors: Australasian Spatial Data Exchange Centre, ref.#12587.
Murillo, Alvaro, 1995, A GIS Data Model Prototype: ESRI Annual User's Conference Proceedings. Available at: http://www.esri.com/library/userconf/proc95/to050/p017.html.
Open GIS Consortium Technical Committee, 1998, The OpenGIS Guide, Third Edition, edited by Kurt Buehler and Lance McKee: Open GIS Consortium, Wayland, Massachusetts, USA. Available at http://www.opengis.org/techno/guide.htm.
POSC, 1997, POSC Specifications, Version 2.2: Houston, Texas, Petroleum Open Software Corporation, 1 CD ROM. Available at http://www.posc.org/.
Power, W.L., Lamb, Peter, and Horowitz, F.G., 1995, Data transfer standards and data structures for 3D Geological Modelling, in Application of Computers and Operations Research in the Minerals Industries: Carlton, Victoria, Australia, Australasian Institute of Mining and Metallurgy Publications Series No. 4/95, ISBN 1 875776 26 5. Available at http://www.ned.dem.csiro.au/research/visualisation/papers/power.html.
Ringland, G.A., and Duce, D.A., editors, 1988, Approaches to Knowledge Representation: An Introduction: New York, John Wiley and Sons, Inc., 260 p.
Royer, Jean-Jaques, Gerard, Benoit, LeCarlier de Veslud, Christian, and Shtuka, Arben, 1996, 3D Modeling of Complex Natural Objects, in Dubois, J.-E., and Gershon, N., eds, Modeling Complex Data for Creating Information: Berlin, Springer, ISBN 3-540-61069-3, p. 155-168.
Ryburn, Rod, Bond, Lynton, and Hazell, Murray, 1995, Guide to OZROX - AGSO's field geology database, Australian Geological Survey Organisation Record 1995/79. See also http://www.agso.gov.au/information/structure/isd/database/ozrox.html.
Ryburn, Rod, and O'Donnell, Ian, 1998, Towards national geoscience data standards: AGSO Research Newsletter v. 29, Nov. 1998.
Soller, D.R., and Berg, T.M., 1997, The National Geologic Map Database - A Progress Report: Geotimes, v. 42, no. 2. Available at http://ncgmp.usgs.gov/ngmdbproject/reports/geotimes97.html.
Spiby, P., ed., 1998, Production data representation and exchange, Description methods, The EXPRESS language reference manual: ISO Document TC184/SC4/WG11 N48, 296 p., http://www.nist.gov/sc4/wg_qc/wg11/n047/wg11n047.zip
Wentworth, C.M., 1991, ALACARTE User Manual, Version 1.0: U.S. Geological Survey Open-File Report 91-587C. Available at http://wrgis.wr.usgs.gov/docs/software/software.html.
West, M., 1996, Developing High Quality Data Models, Issue 2.1, EPISTLE. Available at http://www.stepcom.ncl.ac.uk/epistle/data/mdlgdocs.htm; or from Matthew West, Shell International Ltd., ISCL/4 Shell Centre, London, SE1 7NA, UK.
Williams, V.S., Selner, G.I., and Taylor R.B., 1996, GSMCAD, a new computer program that combines the functions of the GSMAP and GSMEDIT programs and is compatible with Microsoft Windows and Arc/Info: U.S. Geological Survey Open-File Report 96-0007, 18 p., one 3.5-inch disk. Available at http://greenwood.cr.usgs.gov/pub/open-file-reports/ofr-96-0007/.
Return to Table of Contents
This site is http://pubs.usgs.gov/openfile/of99-386/richard.html