North American Data Model Steering Committee
Notes of meeting May 24, 2001, Tuscaloosa, Alabama
Attending:
- Tom Berg (OH)
- Boyan Brodaric (GSC)
- Murray Journeay (GSC)
- Rob Krumm (IL)
- Scott McColloch (WV)
- Steve Richard (AZ) [for Jon Matti, USGS]
- Peter Schweitzer (USGS)
- Dave Soller (USGS)
- Loudon Stanford (ID)
- Jerry Weisenfluh (KY)
Visitors:
- Jim McDonald (OH)
- Peter Davenport (GSC)
- Eric Boisvert (GSC)
- Terry Houlahan (GSC)
Notes:
Introductions, and welcome to new Committee member
Jerry Weisenfluh has replaced Jim Cobb, as Committee representative from the Kentucky Geological Survey. We thank both Jim and Jerry for their Committee participation.
Presentation by Peter Davenport (GSC)
Peter provided an update on activities related to building the Canadian Geoscience Knowledge Network (CGKN, ). The initiative intends for CGKN to become a distributed network of information that assures local ownership of data and agency autonomy. To enable some level of standardization across agencies, data models are under development for metadata, bedrock geology, surficial geology, geochemistry, geophysics, mineral deposits, and XML.
Newly-funded efforts include development of a bedrock geologic map database, a surficial geologic map database, and a metadata catalog. For at least the surficial database, a draft science language will be developed and submitted to the SLTT. They intend to iterate toward a common language acceptable to the provinces and GSC. Regarding the metadata project, this is a relatively high-priority effort to put a metadata catalog in place by Spring, 2002.
The Committee expressed enthusiasm for increased GSC and Provincial participation on the Steering Committee. Canadian attendees will address this issue in the coming weeks.
Proposed MOU with XMML project:
Simon Cox has approached Dave with a proposal to establish a MOU between the Committee and his XMML project. [This is an effort within CSIRO (Australia) to develop an XML encoding of exploration and mining data (see item #1 in previous meeting notes, )]. Dave will investigate the administrative possibility for the Committee to establish a MOU, and will send to the Committee his correspondence with Simon.
Progress reports from the Technical Teams
Tool Development Technical Team (TDTT) -- at this time, tool development is focusing exclusively on refinement of Geomatter II, for use with data model version 4.3 and the Canadian variant commonly referred to as "5.x". Eric reported that a (Win2000) bug in Geomatter II has been fixed. Peter reported that he is now a member of Bruce Johnson's project, and is tasked with identifying and fixing software problems with the (v.4.3) data model implementation in that project. Eric and Peter will be coordinating the software support (including bug fixes) for Geomatter II users of these two data model versions. In the future, the Team may focus on development of other tools (e.g., for data output).
Documentation Technical Team (DTT) -- Rob and Loudon reported that they would like to produce a short document for general, non-technical readers. The document would contain basic information on the value and utility of the data model. They have done some background work on this document, and wanted to get Committee approval before proceeding further. The Committee affirmed that this document would be an important and useful tool for conveying concepts and objectives to colleagues and the public. Rob and Loudon will put together a short Powerpoint file on the subject and send it to Dave, who will ask an editor for a review. Then, the file will be sent to the Committee for comment. When completed, the file will be available for general use in presentations.
Data Model Design Technical Team (DMDT) -- The majority of DMDT members met the previous day (5/23). Boyan summarized for the Committee the status of their work and then led a discussion to explore what the Team should address in the coming months.
In the past few months, the DMDT has evaluated the "20 queries" exercise initiated by the SLTT, in order to identify common threads among the responses so that a common query language might be constructed. Starting with the many responses to the SLTT exercise, Bruce Johnson, Steve Richard, and Jerry have nearly finished distilling the information into a representative set of questions that users might typically ask a geologic database. Another subset of the DMDT has nearly finished identifying the geologic information commonly requested in the above queries (i.e., the representative set of elements). In mid-July, these representative sets of questions and elements will be submitted to the Committee, to be approved and posted to the Web site. This information is intended to help guide data model development. Regarding development of a geologic query language based on the above work, Jordan Hastings has constructed a formal computer-science representation of the query language; this work will be provided to the TDTT for further consideration.
Since v.4.3 was released more than 30 months ago, various NADM (North American Data Model) variants have arisen. These variants became necessary during the numerous attempts in the various agencies to implement the conceptual data model. How should the Committee and the DMDT address the situation, and assess the utility of v.4.3 and of these variants? A productive discussion ensued.
The Committee noted the long period of time during which NADM variants have been informally developed and recognized. The Committee decided to formally recognize the existence of NADM variants, and to develop a plan to evaluate each of them. It was decided that mechanisms are needed to:
inform users that variants exist, that they exist for valid reasons, and that they are a natural part of the evolutionary process of data model development.
evaluate the variants and, as needed, formally revise or replace the existing version of NADM.
Item #1 will be done immediately. Boyan will draft a paragraph that addresses these issues. After Committee comment and Boyan's revisions, Peter will post the paragraph to the Committee Web site.
Item #2 will involve the documentation of existing variants, followed by evaluation of each variant. The variants are of two types -- conceptual data models and implementations. Examples of conceptual data models are: Cordlink ("v.5.x"), Steve Richard's work, Loudon Stanford's work, and the National Geologic Map Database (NGMDB) Object-Oriented "Kentucky" prototype. Examples of implementations are: Hydrolink, GASL, the Kentucky and Yellowstone NGMDB prototypes, and Bruce's implementation of v.4.3.
In order for a variant to be evaluated under this process, persons responsible for a variant must contact the DMDT for advice on building the documentation document. In that document, the following must be addressed:
statement of intention (e.g., why was the variant developed? Was it intended to be NADM-compliant? If not, what was the source data model?)
is it a conceptual data model or an implementation?
concise summary of concepts and data model, appropriate for non technical audience.
list of specific issues, concepts, and details that differ from NADM v.4.3.
The variants listed above each should be considered for evaluation. After documentation for at least one of these variants has been posted to the Committee Web site, the Committee will issue to the Digital Mapping Techniques '01 attendees a request for similar information on other variants.
By end of Summer, the documentation and other files and information necessary to evaluate each variant must be submitted to the DMDT. Then, the DMDT will evaluate this information and, at the Fall, 2001, Committee meeting, summarize each variant and make recommendations for how to proceed. At that meeting, the Committee will identify if, or how, the DMDT will develop a revised or new data model. The DMDT will present their results at the Spring, 2002, Committee meeting.
Science Language Technical Team (SLTT) -- Steve Richard summarized the Team's activities, in Jon Matti's absence. This Team is now addressing development of a standard, hierarchical rock lithology terminology first, then will work on terminology for structures, contacts, etc. The Team has divided itself into has formed 4 teams to address terminology for igneous, sedimentary, metamorphic rocks and Quaternary/surficial deposits. Each subgroup will operate semi-autonomously:
- Sedimentary Rocks--Dave Houseknecht and Jon Matti, Subgroup Leaders
- Igneous Rocks:
- Plutonic Subgroup--Ron Kistler and Doug Morton, Subgroup Leaders
- Volcanic Subgroup--Steve Ludington and Bob Christiansen, Subgroup Leaders
- Metamorphic Rocks--Wright Horton and Steve Richard, Subgroup Leaders
- Surficial Materials--Ron DiLabio, Dave Miller, Carolyn Olson, and Dave Soller, Subgroup Leaders
As of the NADMSC meeting, draft classification schemes were being circulated in the sub teams for igneous rocks, sedimentary rocks, and metamorphic rocks. The five leaders of the Surficial group (Ron Dilabio, Jon Matti, Dave Miller, Carolyn Olson, and Dave Soller) have had numerous teleconferences, in an attempt to set a schedule for a series of subdiscipline-specific workshops (e.g., glacial and taiga deposits, coastal deposits). The leaders have rethought the plan for workshops and now intend to first hold a meeting of the group leaders. At that meeting, the framework for a draft classification will be developed. This framework then will be circulated to the Surficial group for comment. A series of workshops is unlikely, unless consensus is impossible under this revised plan. There is not yet a revised date for submitting the draft classification to the Committee.
The general plan for each SubGroup is to develop lists of control-words for the description and naming of geologic materials and geologic structures. Control-words are rigidly defined words whose definitions cannot be violated (sandstone has exactly one definition; monzogranite has exactly one definition; thick-bedded has only one definition). Specifically, the SLTT plans to:
- provide formal definition of each control-word (sources: AGI dictionary of geoscience, IUGS plutonic-rock classification, widely-cited geoscience textbooks, etc.)
- develop hierarchical classification of control-words (parent-child relationships using software to be announced) (e.g., Visio2000pro)
- provide all documentation to the Committee, including:
- definitions of control-terms
- diagrams of parent-child relations
- Minimal boiler-plate that describes results and places them in the context of NADM. The SLTT will consider developing a thesaurus approach to control terms and their non-controlled equivalents (synonyms, related terms, proxies for control-terms).
The SubGroups will review draft classifications and develop documents for review by the SLTT for approval and submission to NADMSC. Two issues have arisen that require consideration:
- Lithologic classification of some rocks is non-unique. Some provision needs to be considered in the data model to allow for this. No solution was developed for this issue, and further discussion is necessary.
- The evolving rock classifications for metamorphic rocks and igneous rocks suggest that the classification system be constructed as a collection of classifications for the aspects of lithology used to classify different kinds of rocks. The task facing the SLTT subteams for lithologic nomenclature would then include:
- Determine the aspects of lithology used to classify rocks within their broad class (igneous, sedimentary, metamorphic, surficial)
- Develop hierarchical classification based on descriptive criteria for each classification aspect
- Develop a set of standard lithologic terms, along with classification of each term for each lithologic aspect that is part of the definition of that term. A standard lithology would be defined as a combination of specific lithologic properties (aspects).
This scheme impacts the modeling of lithologic classification.
The storage of descriptions in the database independent of the terminology applied will allow thesauri to be developed for local use. Geologists will thus be able to access the database using familiar terminology, in pick lists that are customized to the rocks present in their area of interest. At the same time, confusion about rock classification due to variations in terminology should be minimized. The standard lithologic terms may need to be limited to general terms in order to achieve consensus on their definition.
GSC/USGS Memorandum of Understanding (MOU)
Dave reported that the MOU Annex requested by the USGS and GSC Directors at their meeting in late 1999 was signed in early 2001. Its title is "Development of standardized national geologic map databases." Boyan and Dave are the Annex Correspondents. Later this year, they will prepare for GSC and USGS a progress report that includes Committee activities; before submittal, this report will be circulated to the Committee.
Discussion of Object-Oriented data model developed for NGMDB "Kentucky" prototype
In a brief discussion, Boyan noted that this prototype offered the opportunity to: 1) evaluate data model concepts in an object-oriented environment (because the O-O environment may help overcome perceived deficiencies in the current, relational data model), and 2) implement the O-O data model in a commercial system. The conceptual model developed for this prototype shows clear similarities to the model developed independently by Steve Richard. Differences with the existing NADM model include:
- metadata -- streamlined, feature-level metadata is possible in O-O model
- the concepts used in the O-O model are less legend- and map-centric
- science vocabulary (i.e., the pick lists) is more elegantly handled, in a central location (under "concepts")
Murray Journeay's presentation on Cordlink
Murray described development of the architecture that supports Cordlink's distributed library concept, in particular the development of the NADM variant commonly known as "v.5.x". In the near future, Cordlink will be upgraded from v.1 to v.2. Although the Cordlink prototype has now ended, the concepts and system will continue to be used -- for example, as a regional node for Cordilleran geologic information, and in the above-described Canadian bedrock geology project (the Cordlink system will be further populated with map data produced by that project).
Miscellaneous
FGDC Data Model Proposal -- this proposal, discussed at the previous meeting, has been reviewed by the Committee, and by the FGDC Geologic Data Subcommittee (which coordinates this work at the U.S. Federal level). The proposal has been formally submitted to the FGDC for approval.
Steve expressed some concern regarding potential ambiguities in the Committee and Team missions. In a subsequent email, he will elaborate on these concerns.
Rob noted the importance of periodically assessing our progress and mechanisms for communicating with the public and our colleagues. Certainly, the informational material under development by the Documentation TT will assist in this communication. The Committee may want to consider issuing articles on its work to geoscience journals, and perhaps staffing a booth at the Fall GSA meeting.
Next meeting:
The next meeting will be held during the week of the 2001 GSA Annual meeting (Nov. 5-8, in Boston, ). Committee meeting date and time will be determined in early Fall, 2001.