Data Model Design Team meeting 23 May 2001, Tuscaloosa AL

NADM Data Model Design Team (DMDT) Meeting Notes

May 23, 2001, Tuscaloosa, Alabama

Executive Summary

Attendees:

�ric Boisvert (GSC)
Boyan Brodaric (GSC)
Peter Davenport (GSC)
Jordan Hastings (UC-Santa Barbara)
Terry Houlahan (GSC)
Murray Journeay (GSC)
Jim McDonald (Ohio GS)
Steve Richard (Arizona GS)
Peter Schweitzer (USGS)
Ron Wahl (USGS)
Jerry Weisenfluh (Kentucky GS)

Introductions and confirmation of agenda
Revisions to Reno'00 meeting notes
Discussion: data model variants and implementations
A mechanism to post NADM-related data models and implementations on the NADM web site will be established in order to record NADM related activities and to provide a basis for NADM evolution. All contributions will be welcome, providing they: (1) supply brief summary documentation and diagrams, and (2) develop in conjunction with DMDT a short comparison of the contribution with NADM 4.3. An initial list of contributions will be developed and posted, followed by a public solicitation for more contributions.
DMDT Progress Reports
1. Use-cases-Steve Richard, Jerry Weisenfluh
  A requirements document is well in progress and will be completed by July 15.
2. Geologic syntax-Jordan Hastings
  A draft geologic map query language was designed for possible use in tools.
3. Science Language Team-Steve Richard
  Development of common lithologic vocabulary is well in progress. Common rock names and their associated descriptions are being identified. Some unresolved data model issues have arisen from this work.
Other Reports NADM-related activities in Canada-Peter Davenport
Several existing and new Canadian initiatives are utilizing NADM.
Geolibraries and NADM-Murray Journeay
NADM facilitates knowledge construction and representation in geolibraries.
Web-based NADM database interoperability-Eric Boisvert
HTML-based interop. of NADM databases is successfully prototyped.
Object-oriented NADM for the US Nat'l Map Db-Boyan Brodaric
An object-oriented NADM variant is prototyped in GE SmallWorld.
XMML-Boyan Brodaric for Simon Cox (CSIRO)
An Australian-based XML encoding for geoscience data is well underway.
Discussion: DMDT tasks and future activities
1. Complete the requirements report by July 15 and submit to NADMSC.
2. Post initial data model variants and solicit other contributions by GSA'01.
3. Develop an updated NADM design by DMT'02, using (a) and (b) above.

NADM Data Model Design Team (DMDT) Meeting Notes

May 23, 2001, Tuscaloosa, Alabama

Introductions and confirmation of agenda
Revisions to Reno meeting notes
Discussion: data model variants and implementations
A mechanism to post NADM-related data models and implementations on the NADM web site will be established in order to record NADM related activities and to provide a basis for NADM evolution. All contributions will be welcome, providing they: (1) supply brief summary documentation and diagrams, and (2) develop in conjunction with DMDT a short comparison of the contribution with NADM 4.3. A template for the comparison document will be developed by DMDT. An initial list of contributions will be developed and posted, followed by a public solicitation for more contributions. Each contribution will generate a discussion thread on the NADM web site; Peter Schweitzer will aid in posting contributions and adapting the current web site. Initial contributions possibly include:
- NADM 4.3 implementations
- the CordLink 5.2 variant and its implementations (CordLink, HydroLink, GASL),
- the NDGMDB object-oriented variant and its SmallWorld implementation,
- the Yellowstone implementation,
- an Oracle implementation,
- the Idaho variant and implementation,
- the Arizona variant and implementation.

DMDT Progress Reports

Use-case group-Steve Richard, Jerry Weisenfluh

In order to collect requirements information for a geologic spatial database, geologists from the USGS and state geological surveys from around the country were requested to submit lists of 20 questions they would like to be able to answer using such a system.

Activity

The questions were compiled by Jonathan Matti and organized into a hierarchy of 84 categories. Bruce Johnson reviewed the queries and classified them into groups based on the degree to which the queries could be resolved using the Johnson et al. [1998] data model 4.3 architecture. These categories were:

1	Queries that can be answered with data in the current Model
2	Queries that can be answered with current model, if the user re-classifies rock units
3	Queries that could be answered by data in the current model in conjunction with additional data themes
4	Queries best answered with additional attributes or minor modifications to the current data model
5	Queries that require a separate application +/- additional data (3)
6	Queries that require data that is not normally available (10)
7	Queries that are not within the purview of the NADM (14)
8	Unclear queries (4)

Within groups 1-5, the queries were classified into one of 48 topical categories, with some hierarchical arrangement of the categories.

Jerry Weisenfluh (JAW) developed an Excel Spread sheet to categorize the queries. Lists of concepts, query forms, classifications, descriptions and spatial concepts necessary to answer the queries were compiled from the list of queries. The spreadsheet includes 566 queries, some of which are compound. Each query includes Bruce Johnson' s classification and a classification by JAW to identify the kind of query according to the type of activity required. Table 1 lists the classes used.

Table 1. Jerry Weisenfluh's query classes

`SQ`	Simple query: Find spatial objects according to criteria from a single property
`CQ`	Compound query: Find spatial objects according to criteria from more than one property
`CALC`	Calculation: Perform a calculation on a set of spatial objects
`MD`	Metadata: Return metadata information pertaining to a set of objects
`SA`	Spatial analysis: Evaluate a question by performing a spatial comparison
`MC`	Map classification: Reclassify map objects in order to generalize or differentiate
`CF`	Complex function
`AM`	Ambiguous query
`??`	Not quite sure what the user wants

Stephen Richard (SMR) transferred the compilation of queries by Jonathan Matti (20_queries_master_1.pdf) into a Microsoft Access2000 database, constructed to allow multiple Categories to be related to each Query. The queries were then sequentially reviewed and analyzed for their component categories. The categories are meant to classify the queries according to the sort of information required to answer the query. After this analysis, the categorized queries were reviewed to generate a distilled list of queries that typify the kinds of information requests. At the same time, a list of classifications, descriptions, relationships and operations required to address the queries was compiled. These lists were presented to the DMDT committee at the meeting.

Summary of SMR analysis: 848 total queries. 16 of these I generated during the course of the analysis. 209 of the Queries in the JAW table were not identical to queries in the SMR table because one or the other of us edited the text of the query. 496 queries were analyzed into categories based on the sort of information required to answer the query. In 29 of these cases the queries were ambiguous and significant interpretation of the intention of the query was made in order to do the analysis. 352 queries were deemed duplicates for the purposes of this analysis. At the end of the analysis, 92 (variously consistent) categories had been identified.

Summary

At this point we have 4 semi-independent analyses of the queries, distilled to varying degrees. The list distributed by SMR at the technical team meeting will be distributed sequentially (order: JAW, Canadians, Jim McDonald, Ron Wahl/ Jordan Hastings, Peter Schweitzer, Bruce Johnson, BMB/SMR) to all committed members for review, and should be compared against the analyses made by Weisenfluh, Johnson, and Matti. The review will remove duplication, and add any omitted query types, information elements or operations, with a goal of keeping the list as brief as possible, while being complete. The final list will include:

Typical queries
Information elements required to address queries
1. Descriptions
2. Classifications
3. Relationships
Operations required of the database

Time line

Circulation of document complete by June 30. SMR will coordinate the circulation. July 15: submit to NADMSC with completed summary attached.

Plan

These lists will serve as requirements for a NADM database. A few pages of explanatory material will be prepared as a introduction to these lists by JAW and SMR (and whoever else wants to contribute...), and the text, lists, and the complete list of queries will be presented to the NADM steering committee by July 15'01, as a recommendation for a requirements document as criteria for evaluation of the present and future data models.

Geologic syntax-Jordan Hastings
A preliminary syntax for the SLLT queries was developed and described in BNF notation. The syntax could be well applied in user interfaces for querying geological maps.
Science Language Team-Steve Richard
Two unresolved data model issues have arisen from SLLT work: (1) lithologic description of some rocks is non-unique; and (2) multiple rock names may be assigned to the same rock description.

The study of classifications for metamorphic rocks and igneous rocks by SLLT suggests that rock name vocabulary requires two components: (1) one (or more) hierarchical arrangements of rock names, and (2) a catalog of common rock descriptions that the names refer to. The rock descriptions would consist of salient attributes (e.g. fabric, texture, composition, etc.), with each attribute itself drawing from a (hierarchical) list of appropriate terms. The SLTT working groups would then be tasked to
1. Determine the attributes (such as fabric, texture, etc.) used to classify various types of rocks (such as igneous, sedimentary, metamorphic, etc., rocks).
2. Develop a hierarchical list of terms for each attribute.
3. Determine common rock descriptions consisting of some combination of attribute values.
4. Develop a thesaurus that relates commonly used rock names to the common rock descriptions
This scheme is beneficial in that normalizing rock descriptions, and segregating them from rock names, permits specific naming conventions to be accommodated, while minimizing confusion about rock name meanings as their definitions are clearly stated. This approach is challenged by the possibility that agreement on the commonality of many rock descriptions may not be achieved (i.e. non-uniqueness of rock descriptions)

Other Reports
1. NADM-related activities in Canada-Peter Davenport
  Background
  
  The Federal (GSC), Provincial and Territorial surveys all want to capitalize on the Internet as a vehicle for raising public awareness of geoscience, and for disseminating geoscience data and information to both traditional and new audiences. Their association, the National Geological Surveys Committee or NGSC, has endorsed the idea of a collaborative initiative-the Canadian Geoscience Knowledge Network (CGKN)-to share the development effort to do this, and also to provide a consistent interface to access the surveys' individual information holdings.
  
  Guidelines for development of CGKN
  - Each agency manages its own data holdings
  - CGKN will build on the existing data infrastructure of each agency
  - Each agency will determine its own participation rate
  - Discipline specific data models will be developed for CGKN to facilitate interoperability between agencies
  - Data models, including science language, will be developed to international standards where available
  Priority data types
  - Metadata for digital data sets (FGDC subset)
  - Bedrock geology (NADM)
  - Surficial Geology (NADM)
  - Geochemistry
  - Geophysics (potential field initially)
  - Mineral Deposits
  Current Projects
  - Metadata; Federal co-leader James Rupert ([email protected])
  - Bedrock Geology; Federal co-leader Peter Davenport ([email protected])
  - Surficial Geology; Federal co-leader Ron DiLabio ([email protected])
  - Geochemistry; Federal co-leader Andy Moore ([email protected])
  - Geophysics; Federal co-leader Joan Tod ([email protected])
  - Mineral Deposits; leader tba
  - ESS (GSC) Data Warehouse; leader Jodie Francis ( [email protected] )
  Metadata
  - Funding in place from Federal Targeted Geoscience Initiative
  - M3CAT software developed and distributed to provincial/territorial surveys and GSC offices to enter FGDC compliant metadata
  - A minimal subset of fields defined as compulsory; agencies may opt to include additional information
  - Population of databases (dbms software up to individual agencies) underway
  - Target for completion of digital data catalog: Spring 2002
  Bedrock Geology
  - Increasing interest in digital geology maps and databases, but many mappers are skeptical of the whole process
  - Need for some convincing examples to demonstrate the benefits, and tools and clear instructions to allow agencies to adopt the idea of geological map databases
  - Project funded to simultaneously develop tools and databases using NADM in five geologically diverse regions (BC, Yukon, Nunavut, Labrador and Newfoundland); additional funding is also requested.
  - Tools will include loaders for population of NADM databases from existing flat files and databases, Geomatter for organizing data and information within the NADM databases, and Internet viewer(s) to allow access to the databases
  - The databases would be distributed with agencies owning/managing their own data
  - Modifications to NADM will be required to support this approach
  - The results of the SLTT will be used as standards for lithology (assuming availability)
  - For each region a comprehensive lithostratigraphic/lithodemic schema will be required; it is envisaged that this will extend from the fundamental lithostrat units (i.e. group, formation, member) to high level regional units such as geological province, sub-province, etc.
  Surficial Geology
  - Situation similar to bedrock geology
  - Project funded to develop standards, infrastructure and tools for an on-line surficial geology database; an additional funds hae been requested. Aspects of this project include:
    1. A review of existing science language and preparation of hierarchical taxonomy as contribution to SLTT
    2. Development of standard template for digital compilation of surficial geology maps
    3. Definition of a stewardship policy for taxonomy
    4. Adoption, modification and enhancement of tools for data management, data input and web distribution for surficial geological data
    5. Development of test database(s) using representative maps to test both data model and tools
  Geochemistry
  
  This project has been underway for two years to develop a "standard" data model for geochemical data (see http://geochem.gsc.nrcan.gc.ca/). It continues this year to develop web-accessible tools to facilitate the input of Canadian geochemical data into databases based on the data model. The ultimate goal is to make the geochemical data holdings of NGSC agencies available on line.
  
  Geophysics and Mineral deposits
  
  Preliminary work is underway to assemble inter-agency project teams to develop requirements for data models.
  
  XML
  
  A proposal has been submitted for funding to develop XML transfer standards for mineral occurrence, geochemical and geophysical data. If approved, a contractor will start by developing UML models for the mineral occurrence and geochemical data sets of provincial/territorial agencies, and potential field geophysical data in GeoSoft grid format. From the UML models the contractor will develop XML schemas, and test them against real data. Finally, tools will be developed to generate XML encoded files for data transfer.
  
  Coordination
  
  The overall approach is to develop discipline-specific models that can be linked rather than a single monolithic model. To ensure that individual data models are compatible, and there is minimal duplication between them, oversight for the whole process is essential. The way this is to be done is evolving, and includes:
  - Data Infrastructure working group, and coordination sub-group
  - CGKN secretariat
  - Data Warehouse project
  Other data modelling activities
  
  The Public Petroleum Data Model (PPDM) is used by some agencies for sub-surface (well) information. There is overlap with NADM in some types of information such as stratigraphy and lithology. For the later, NADM's SLTT is well ahead, but the latest version of PPDM (3.5) has a comprehensive model for stratigraphic information that might benefit NADM. PPDM has also started work on the 3D spatial enabling of well data.
  
  Conclusion
  
  Significant in-kind and actual funding has been committed to developing the CGKN data infrastructure for 2001/02. Further funding has been requested in funding from Geoconnections to accelerate the building of the geoscience component of the Canadian Geospatial Data Infrastructure (CGDI). Progress is accelerating, and is being made on several fronts. Coordination of the many components is recognized as a major challenge that remains to be fully addressed.
2. Geolibraries and NADM-Murray Journeay
  The conceptual foundations and rationale behind the CordLink digital library were presented.
3. Web-based NADM database interoperability-Eric Boisvert
  GSC Qu�bec had the opportunity to test interoperable implementation of two NADM 5.2 databases. The implementation is very preliminary and has limited functionality but offers quick win solution for immediate problem. The requirement for the implementation was to allow relatively autonomous databases to be queries using a central concept tree (COA). The system works with a central database of COA items tag with global unique ids. The local instances of database map their own local COA trees to the global tree and correlate their local concept to globally defined concepts. Since most of NADM attributes can be somewhat related to a COA, pieces of information can be extracted by referencing a commonly known COA. This mechanism permit a certain level of flexibility from local instance of the database to expand the global tree to fit their own need.
4. NGMDB Project Object-oriented data model-Boyan Brodaric, Jordan Hastings
  An object-oriented (OO) variant of the NADM 4.3 data model was developed and implemented using the GE SmallWorld GIS and map information from the Kentucky GS. This work represents a prototype approach to the development of the US National Geologic Map Database. Benefits related to the object-oriented design and its implementation were observed. Benefits realized from the object-oriented design include:
  - Feature level metadata: any object may possess metadata such as originating authorship, date and time of creation and change, error, spatial resolution, etc.
  - Centralized vocabulary: all scientific vocabulary is centralized into a Concept object, providing a uniform mechanism for managing scientific terminology such as formation names, rock names, geologic times, geologic feature types (such as faults, etc.), etc.
  - Not legend-centric: previous NADM versions were very legend-centric, requiring much of the data manipulation to pass through the map legend because geologic concepts and their symbolization were closely integrated. In this OO variant, concepts and their cartographic appearance are segregated, permitting a geologic feature to be related to various concepts and to different various symbols. This results in greater flexibility in classifying features (as specific concepts) and symbolizing them.
  Benefits related to the GE SmallWorld implementation include:
  - Web mapping: the web mapping feature permits spatial databases to be constructed, analyzed and edited live over the Internet.
  - Scale-sensitive display: the presence/absence of features and their symbolization may vary according to the scale of visualization.
  - Dynamic symbolization: spatial features may be symbolized 'on-the-fly' permitting derivative maps to be generated from the database.
  - Dynamic topology: topology is dynamically maintained when features are clipped from the database.
5. XMML-Simon Cox
  The XMML project is developing a general purpose XML encoding for geoscience data. XMML is an extension/specialisation of GML - the Geography Markup Languaged eveloped by the OpenGIS Consortium. It is based on an object-model for geospatial features, compatible with modern GIS concepts and ISO. Because the encoding uses XML, it is highly compatible with the web infrastructure and generic B2B systems. Because it is based on OGC standards, it will be compatible with WFS and GML conformant servers and clients.
  
  The project is being coordinated by CSIRO, and is sponsored by several mining-industry organisations, and also by the geological surveys in Australia. We are pursuing active collaboration with additional organisations from the geoscience sector. This will be particularly important for some vocabularies. The current sponsors have determined the priority order for the development of specific features, starting with samples and drillholes. However, the patterns used are being designed to support general applications in the geosciences, and implementation of additional specialised feature types is straightforward. The model is designed to be application neutral, rather than directly mapping onto the internal data model of any specific processing system. It is likely to be best used as a transfer or archive format, reflecting a "snapshot" of an extract from a datastore.
  
  For more information, see http://www.ned.dem.csiro.au/XMML/.
Discussion: DMDT tasks and future activities
1. Complete the requirements report by July 15 and submit to NADMSC.
2. Post initial data model variants and solicit other contributions by GSA'01.
3. Develop an updated NADM design by DMT'02, using (a) and (b) above

This page is <http://geology.usgs.gov/dm/steering/teams/design/DMDT_20010523.html>
Maintained by Peter Schweitzer
Last updated 13-May-2002