Skip to end of metadata
Go to start of metadata

Search and discovery metadata, which is stored by an ESG Gateway in the form of RDF triples in a Sesame Triple Store, is exchanged among Gateways, and with partner Data Centers, by leveraging OAI-PMH (Open Archive Initiative - Protocol for Metadata Harvesting) technology. According to the OAI-PMH specification, metadata records can be exchanged among sites as XML documents transmitted over HTTP, as a result of a request/response communication that includes six possible verbs: Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord.

An ESG Gateway provides functionality that implements both sides of the OAI-PMH communication protocol:

  • OAI Repository: an ESG Gateway makes available its search and discovery metadata in several possible XML formats so other OAI-enabled sites can harvest them and return them as output of their services.
  • OAI Harvester: an ESG Gateway includes the services and user interface to ingest metadata records from other OAI repositories, persisting them in its RDF Triple Store and making them available to the Gateway search and discovery process.

The ESG architecture for metadata exchange is represented in the Figure below. Here ESG Gateway 1 is acting as an OAI Harvester, requesting metadata records from ESG Gateway 2 in XML/RDF format, and from a partner Data Center in XML/DIF format; all incoming metadata records are converted to RDF triples and ingested in the local RDF Triple Store. On the other hand, ESG Gateway 2 is acting as an OAI Repository: it extracts RDF triples from its own RDF Triple Store, and serializes them into different metadata formats (XML/RDF, XML/DIF, or XML/DC) in response to different OAI harvesting requests.

It must be noted that an ESG Gateway does not act as an OAI Aggregator, i.e. it does not redistribute the metadata records it harvested from other OAI repositories: the only records served by an ESG Gateway acting as an OAI Respoitory are those that are owned by the Gateway itself.

OAI record header

When importing or exporting records, the following rules are implemented for processing or building the header of the OAI record:

  • The OAI identifier is matched to the RDF local identifier (i.e. the RDF URL without the namespace), after the OAI identifier is "sanitized" of those characters that would generate a malformed RDF identifier
  • The OAI datestamp is matched to the RDF property esg:hasLastUpdate, and used to determine wether a record needs to be inserted, updated or deleted
  • The RDF property esg:hasContext is used to create the OAI setSpec tag

For example, an incoming OAI record with the following header:

<header>
   <identifier>oai:nsidc.org:mod10a1</identifier>
   <datestamp>2009-05-04T23:00:23Z</datestamp>
</header>

would result in the following RDF triple ingested in the store:

(esg:oai_nsidc_org_mod10a1, esg:hasLastUpdate, "2009-05-04T23:00:23Z")

Viceversa, the following two RDF triples in the store:

(esg:oai_nsidc_org_mod10a1, esg:hasLastUpdate, "2009-05-04T23:00:23Z")
(esg:oai_nsidc_org_mod10a1, esg:hasContext, esg:context_ESG)

would result in an OAI exported record with the following header:

<header>
   <identifier>oai:nsidc.org:mod10a1</identifier>
   <datestamp>2009-05-04T23:00:23Z</datestamp>
   <setSpec>hasContext:context_ESG</setSpec>
</header>
Metadata Formats

The following XML metadata formats are supported by the ESG Gateway implementation of the OAI-PMH specification:

RDF

The ESG software uses RDF as the XML format of choice for exchanging search and discovery metadata between Gateways. This is an obvious choice  since, as discussed earlier, RDF is the "native" metadata format against which data queries are resolved by an ESG Gateway. Gateways exchange, as RDF records, all kind of objects that are relevant to search and discovery process: Datasets, ModelComponents, Projects etc.

  • Note that the RDF property esg:hasUUID is NOT exported as part of the outgoing RDF metadata, since this property is used to flag RDF records in the triple store that have been harvested from the local relational database (as opposed to harvested from a remote OAI repository).
Dublin Core(DC)

Dublin Core is a very popular XML schema that contains mostly high level descriptive metadata about generic resources. ESG supports generation of metadata records in DC format for the purpose of metadata dissemination to Digital Libraries. DC records created by an ESG Gateway acting as an OAI Repository contain only a limited number of fields, which are generated from the RDF triples according to the crosswalk in the enclosed table. Please note that:

  • Only RDF Dataset objects are exported into DC metadata.
  • Harvesting of external DC records by an ESG Gateway is disabled (since DC records are too generic and there is no guarantee that they represent geophysical datasets).
Directory Interchange Format (DIF)

DIF is an XML metadata format created and supported by NASA GCMD (Global Change Master Directory), which has widespread use within the geoscientific community. DIF support within the ESG infrastructure is mostly meant for interoperability with other partner data centers, so that dataset information can be exchanged and made searchable at both ends. Conversion of DIF records into/from domain model objects is based on the crosswalk documented in the enclosed table. In particular, the following considerations apply:

  • Only RDF Dataset objects are exported as DIF records, although each Dataset DIF document may contain information from ancillary RDF objects (Gateway, Topics, etc.). Conversely, any incoming DIF record triggers the generation of a corresponding RDF Dataset object, and also of other associated RDF objects if not existing already (GcmdTopic, IsoTopic, Project etc.).
  • The Dataset Persistent Identifier of type ID (part of the esg:hasURI property) is mapped to the DIF record unique identifier (Entry_ID). Viceversa, the DIF record unique identifier becomes the value of the esg:hasURI property.
  • As a consequence, the OAI identifier and DIF identifier are NOT equal (for neither import or export of records).
  • The DIF Data Center element is built from the associated RDF Gateway object, leaving detailed contact information blank.
ISO 19115

Currently the ESG Gateway software includes the infrastructure to exchange records in ISO format, although the functionality is not yet implemented.

Metadata Formats Crosswalk Table

Object property

RDF

Dublin Core

DIF

Example Value

Dataset.PersistentIdentifier(ID).resourceURI

esg:hasUri 

identifier

Entry_ID

(resource://ESG-NCAR/ID#)pcm.testsim_1845955462.data.ocean.SingleVariableTimeSeries.yearly

Dataset.PersistentIdentifier(ID).resourceURI
Gateway endpoint

rdfs:seeAlso

 

Related_URL

http://esg.prototype.ucar.edu//browse/viewResource.htm?resourceURI=resource%3A%2F%2FESG-NCAR%2FID%23narccap.crcm.ncep.table3.psl.files

Dataset.name

rdfs:label

title

Entry_Title

PCM testsim_1845955462 Ocean Single Variable Time Series Data (Yearly)

Dataset.description

rdfs:comment

description

Summary

PCM testsim_1845955462 Ocean Single Variable Time Series Data (Yearly) for simulation.....

Dataset.Gateway

esg:hasGateway.rdfs:label
esg:hasGateway.rdfs:comment

 

Data_Center.Short_Name
Data_Center.Long_Name

ESG-NCAR
Earth System Grid gateway at the National Center for Atmospheric Research

Dataset.Topics(GCMD)

esg:hasGcmdTopic.rdfs:label

 

Parameters

EARTH SCIENCE > Oceans > Coastal Processes > Rocky Coasts

Dataset.Topics(ISO)

esg:hasIsoTopic.rdfs:label

 

ISO_Topic_Category

Environment

Dataset.Topics(default)

esg:hasTopic.rdfs:label

 

keyword

Climate

Dataset.lastUpdated

esg:hasLastUpdate

date

Last_DIF_Revision_Date

2009-01-27T13:04:28Z

Dataset.Project

esg:hasProject

 

Project.Short_Name
Project.Long_Name

NARCCAP
North American Regional Climate Change Assesment Program

Dataset.Contact(PI)

esg:hasPI

 

Personnel(Role=Investigator)

John Smith

Dataset.Contact(MetadataContact)

esg:hasMetadataContact

 

Personnel(Role=Technical Contact)

 

Dataset.Contact(DataCenterContact)

esg:hasDataCenterContact

 

Personnel(Role=Data Center Contact)

 

Dataset.SpatialCoverage

esg:hasWest/East/North/SouthLimit

 

SpatialCoverage

 

Dataset.TemporalCoverage

esg:hasStart/StopTime

 

TemporalCoverage

 

Dataset.Location

esg:hasLocation

 

Location

 

 

 

 

Metadata_Name

ESG DIF

 

 

 

Metadata_Version

1.0

Metadata Records Examples

Following are examples of metadata records served by an ESG-CET Gateway acting as an OAI-PMH repository. All examples refer to the same OAI item (i.e. object), but for different metadata formats.

  • RDF
    http://esg.prototype.ucar.edu/oai/repository.htm?verb=GetRecord&metadataPrefix=rdf&identifier=narccap_crcm_ncep_table3_psl_files
    
    <?xml version="1.0" encoding="UTF-8"?>
    <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
      <responseDate>2009-09-25T10:15:32Z</responseDate>
      <request verb="GetRecord" identifier="narccap_crcm_ncep_table3_psl_files" metadataPrefix="rdf">http://esg.prototype.ucar.edu/oai/repository.htm</request>
      <GetRecord>
        <record>
          <header>
            <identifier>narccap_crcm_ncep_table3_psl_files</identifier>
            <datestamp>2009-09-18T17:30:13Z</datestamp>
            <setSpec>hasContext:context_ESG</setSpec>
          </header>
          <metadata>
            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:sesame="http://www.openrdf.org/schema/sesame#" xmlns:esg="http://dataportal.ucar.edu/schemas/esg.owl#">
              <rdf:Description rdf:about="http://dataportal.ucar.edu/schemas/esg.owl#narccap_crcm_ncep_table3_psl_files">
                <esg:hasContext rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#context_ESG" />
                <esg:hasGateway rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#gateway_esg_ncar" />
                <esg:hasLastUpdate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-09-18T17:30:13Z</esg:hasLastUpdate>
                <esg:hasModelComponent rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#model_crcm" />
                <esg:hasParameter rdf:resource="http://dataportal.ucar.edu/schemas/cf.owl#air_pressure_at_sea_level" />
                <esg:hasProject rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#projectimpl_narccap" />
                <esg:hasTopic rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#topic_climate" />
                <esg:hasUri rdf:datatype="http://www.w3.org/2001/XMLSchema#string">resource://ESG-NCAR/ID#narccap.crcm.ncep.table3.psl.files</esg:hasUri>
                <sesame:directType rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#ModelDataset" />
                <rdf:type rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#Dataset" />
                <rdf:type rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#GeophysicalDataset" />
                <rdf:type rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#ModelDataset" />
                <rdf:type rdf:resource="http://dataportal.ucar.edu/schemas/esg.owl#Resource" />
                <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" />
                <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NARCCAP CRCM, NCEP Boundary Conditions, Sea Level Pressure Files</rdfs:label>
                <rdfs:seeAlso rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://esg.prototype.ucar.edu//browse/viewResource.htm?resourceURI=resource%3A%2F%2FESG-NCAR%2FID%23narccap.crcm.ncep.table3.psl.files</rdfs:seeAlso>
              </rdf:Description>
            </rdf:RDF>
          </metadata>
        </record>
      </GetRecord>
    </OAI-PMH>
    
    
  • DIF
    http://esg.prototype.ucar.edu/oai/repository.htm?verb=GetRecord&metadataPrefix=dif&identifier=narccap_crcm_ncep_table3_psl_files
    
    <?xml version="1.0" encoding="UTF-8"?>
    <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
      <responseDate>2009-09-25T10:17:40Z</responseDate>
      <request verb="GetRecord" identifier="narccap_crcm_ncep_table3_psl_files" metadataPrefix="dif">http://esg.prototype.ucar.edu/oai/repository.htm</request>
      <GetRecord>
        <record>
          <header>
            <identifier>narccap_crcm_ncep_table3_psl_files</identifier>
            <datestamp>1970-01-01T00:00:00Z</datestamp>
            <setSpec>hasContext:context_ESG</setSpec>
          </header>
          <metadata>
            <dif:DIF xmlns:dif="http://go-essp.org/dif_v9.4" xsi:schemaLocation="http://go-essp.org/dif_v9.4 http://ndg.nerc.ac.uk/schemas/dif_v9.4.xsd">
              <dif:Entry_ID>narccap_crcm_ncep_table3_psl_files</dif:Entry_ID>
              <dif:Entry_Title>NARCCAP CRCM, NCEP Boundary Conditions, Sea Level Pressure Files</dif:Entry_Title>
              <dif:Temporal_Coverage />
              <dif:Spatial_Coverage />
              <dif:Project>
                <dif:Short_Name>NARCCAP</dif:Short_Name>
                <dif:Long_Name>North American Regional Climate Change Assessment Program</dif:Long_Name>
              </dif:Project>
              <dif:Data_Center>
                <dif:Data_Center_Name>
                  <dif:Short_Name>ESG-NCAR</dif:Short_Name>
                  <dif:Long_Name>Earth System Grid gateway at the National Center for Atmospheric Research</dif:Long_Name>
                </dif:Data_Center_Name>
                <dif:Personnel>
                  <dif:Role />
                  <dif:First_Name />
                  <dif:Last_Name />
                  <dif:Email />
                  <dif:Phone />
                  <dif:Contact_Address>
                    <dif:Address />
                    <dif:City />
                    <dif:Province_or_State />
                    <dif:Postal_Code />
                    <dif:Country />
                  </dif:Contact_Address>
                </dif:Personnel>
              </dif:Data_Center>
              <dif:Related_URL>
                <dif:URL_Content_Type>GET DATA</dif:URL_Content_Type>
                <dif:URL>http://esg.prototype.ucar.edu//browse/viewResource.htm?resourceURI=resource%3A%2F%2FESG-NCAR%2FID%23narccap.crcm.ncep.table3.psl.files</dif:URL>
                <dif:Description>Data Center top-level access page for this resource</dif:Description>
              </dif:Related_URL>
              <dif:Metadata_Name>ESG DIF</dif:Metadata_Name>
              <dif:Metadata_Version>1.0</dif:Metadata_Version>
              <dif:Last_DIF_Revision_Date>2009-09-18T17:30:13Z</dif:Last_DIF_Revision_Date>
            </dif:DIF>
          </metadata>
        </record>
      </GetRecord>
    </OAI-PMH>
    
    
  • OAI DC
    http://esg.prototype.ucar.edu/oai/repository.htm?verb=GetRecord&metadataPrefix=oai_dc&identifier=narccap_crcm_ncep_table3_psl_files
    
    <?xml version="1.0" encoding="UTF-8"?>
    <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
      <responseDate>2009-09-25T10:18:31Z</responseDate>
      <request verb="GetRecord" identifier="narccap_crcm_ncep_table3_psl_files" metadataPrefix="oai_dc">http://esg.prototype.ucar.edu/oai/repository.htm</request>
      <GetRecord>
        <record>
          <header>
            <identifier>narccap_crcm_ncep_table3_psl_files</identifier>
            <datestamp>2009-09-18T17:30:13Z</datestamp>
            <setSpec>hasContext:context_ESG</setSpec>
          </header>
          <metadata>
            <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
              <dc:identifier>resource://ESG-NCAR/ID#narccap.crcm.ncep.table3.psl.files</dc:identifier>
              <dc:title>NARCCAP CRCM, NCEP Boundary Conditions, Sea Level Pressure Files</dc:title>
              <dc:date>2009-09-18T17:30:13Z</dc:date>
            </oai_dc:dc>
          </metadata>
        </record>
      </GetRecord>
    </OAI-PMH>
    
    
  • No labels