Grammar to Graph—An Approach for Semantic Transformation of Annotations to Triples
Links
- Document: Report (3.49 MB pdf) , HTML , XML
- Data Release: USGS data release - Grammar transformations of topographic feature type annotations of the U.S. to structured graph data
- Download citation as: RIS | Dublin Core
Abstract
Data annotation is the process of labeling data to show the outcome that a related data model should predict. In this study, annotation data were transformed into semantic graph triples, mainly for use with the Resource Description Framework (RDF), a type of entity-relationship-attribute data model for graph databases. The transformation of annotation data to semantic graph triples provides complex linguistic meaning with data handling advantages such as reduced data storage needs, improved logical specification of relations between objects, and reusable classes and properties that support logic and inference. A grammar-based framework in graph form supports user questions and queries.
The words defining approximately 334 topographic feature types compiled by the U.S. Geological Survey were tokenized as units of analysis and grouped by part of speech. Their dependency relations were identified for this study using natural language processing libraries. Dependency concepts are used as structured semantic relations among part-of-speech classes. Tokens, units equivalent to words, form instances of classes and were quantified within a tabular output format using PostgreSQL data storage software. Table data were logically aligned as triples following a mapping file and stored with an ontology file using Ontop virtual triplestore software. A grammar ontology schema for the data was synchronized to match queries whose results validated the graph’s structure. The text analysis produced 8 part-of-speech classes of content words for object representations and 4 classes of function words for operational applications. Dependency relations formed 27 ontology properties for topographic subgraph structures. Token occurrences shaped overall ontology salience and formed a lexicon of syntactic terms for subgraph objects and properties. The schema ontology of class and property population shapes formed the lexicon of English terms. SPARQL Protocol and RDF Query Language (SPARQL) was used with the lexicon to conform data to RDF guidelines.
This study confirms the hypothesis that although linguistic logic varies from description logic, its approximation applies to ontology design. Property and query use case patterns extracted from the analysis support queries concerning complex topographic relations and patterns normally embedded within text definitions. The method used in this study could be applied to text forms in other domains, such as survey notes.
Suggested Citation
Varanka, D.E., and Abbott, E., 2025, Grammar to graph—An approach for semantic transformation of annotations to triples: U.S. Geological Survey Scientific Investigations Report 2025–5064, 20 p., https://doi.org/10.3133/sir20255064.
ISSN: 2328-0328 (online)
Table of Contents
- Abstract
- Introduction
- Background
- Methods
- Results
- Discussion
- Summary
- Acknowledgments
- References Cited
- Glossary
- Appendix 1. Topographic Property Patterns
| Publication type | Report |
|---|---|
| Publication Subtype | USGS Numbered Series |
| Title | Grammar to graph—An approach for semantic transformation of annotations to triples |
| Series title | Scientific Investigations Report |
| Series number | 2025-5064 |
| DOI | 10.3133/sir20255064 |
| Publication Date | September 02, 2025 |
| Year Published | 2025 |
| Language | English |
| Publisher | U.S. Geological Survey |
| Publisher location | Reston VA |
| Contributing office(s) | Center for Geospatial Information Science (CEGIS) |
| Description | Report: vi, 20 p.; Data Release |
| Online Only (Y/N) | Y |