Search and discovery metadata is exchanged with partner Data Centers, by leveraging OAI-PMH (Open Archive Initiative - Protocol for Metadata Harvesting) technology. According to the OAI-PMH specification, metadata records can be exchanged among sites as XML documents transmitted over HTTP, as a result of a request/response communication that includes six possible verbs: Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord.
An ESG Gateway provides functionality that implements both sides of the OAI-PMH communication protocol:
- OAI Repository: an ESG Gateway makes available its search and discovery metadata in several possible XML formats so other OAI-enabled sites can harvest them and return them as output of their services.
- OAI Harvester: an ESG Gateway includes the services and user interface to ingest metadata records from other OAI repositories, persisting them in its RDF Triple Store and making them available to the Gateway search and discovery process.
The ESG architecture for metadata exchange is represented in the Figure below. Here ESG Gateway 1 is acting as an OAI Harvester, requesting metadata records from ESG Gateway 2 in XML/RDF format, and from a partner Data Center in XML/DIF format; all incoming metadata records are converted to RDF triples and ingested in the local RDF Triple Store. On the other hand, ESG Gateway 2 is acting as an OAI Repository: it extracts RDF triples from its own RDF Triple Store, and serializes them into different metadata formats (XML/RDF, XML/DIF, or XML/DC) in response to different OAI harvesting requests.
It must be noted that an ESG Gateway does not act as an OAI Aggregator, i.e. it does not redistribute the metadata records it harvested from other OAI repositories: the only records served by an ESG Gateway acting as an OAI Respoitory are those that are owned by the Gateway itself.
OAI record header
When importing or exporting records, the following rules are implemented for processing or building the header of the OAI record:
- The OAI identifier is matched to the RDF local identifier (i.e. the RDF URL without the namespace), after the OAI identifier is "sanitized" of those characters that would generate a malformed RDF identifier
- The OAI datestamp is matched to the RDF property esg:hasLastUpdate, and used to determine wether a record needs to be inserted, updated or deleted
- The RDF property esg:hasContext is used to create the OAI setSpec tag
For example, an incoming OAI record with the following header:
would result in the following RDF triple ingested in the store:
Viceversa, the following two RDF triples in the store:
would result in an OAI exported record with the following header:
The following XML metadata formats are supported by the ESG Gateway implementation of the OAI-PMH specification:
The ESG software uses RDF as the XML format of choice for exchanging search and discovery metadata between Gateways. This is an obvious choice since, as discussed earlier, RDF is the "native" metadata format against which data queries are resolved by an ESG Gateway. Gateways exchange, as RDF records, all kind of objects that are relevant to search and discovery process: Datasets, ModelComponents, Projects etc.
- Note that the RDF property esg:hasUUID is NOT exported as part of the outgoing RDF metadata, since this property is used to flag RDF records in the triple store that have been harvested from the local relational database (as opposed to harvested from a remote OAI repository).
Dublin Core is a very popular XML schema that contains mostly high level descriptive metadata about generic resources. ESG supports generation of metadata records in DC format for the purpose of metadata dissemination to Digital Libraries. DC records created by an ESG Gateway acting as an OAI Repository contain only a limited number of fields, which are generated from the RDF triples according to the crosswalk in the enclosed table. Please note that:
- Only RDF Dataset objects are exported into DC metadata.
- Harvesting of external DC records by an ESG Gateway is disabled (since DC records are too generic and there is no guarantee that they represent geophysical datasets).
Directory Interchange Format (DIF)
DIF is an XML metadata format created and supported by NASA GCMD (Global Change Master Directory), which has widespread use within the geoscientific community. DIF support within the ESG infrastructure is mostly meant for interoperability with other partner data centers, so that dataset information can be exchanged and made searchable at both ends. Conversion of DIF records into/from domain model objects is based on the crosswalk documented in the enclosed table. In particular, the following considerations apply:
- Only RDF Dataset objects are exported as DIF records, although each Dataset DIF document may contain information from ancillary RDF objects (Gateway, Topics, etc.). Conversely, any incoming DIF record triggers the generation of a corresponding RDF Dataset object, and also of other associated RDF objects if not existing already (GcmdTopic, IsoTopic, Project etc.).
- The Dataset Persistent Identifier of type ID (part of the esg:hasURI property) is mapped to the DIF record unique identifier (Entry_ID). Viceversa, the DIF record unique identifier becomes the value of the esg:hasURI property.
- As a consequence, the OAI identifier and DIF identifier are NOT equal (for neither import or export of records).
- The DIF Data Center element is built from the associated RDF Gateway object, leaving detailed contact information blank.
Currently the ESG Gateway software includes the infrastructure to exchange records in ISO format, although the functionality is not yet implemented.
Metadata Formats Crosswalk Table
PCM testsim_1845955462 Ocean Single Variable Time Series Data (Yearly)
PCM testsim_1845955462 Ocean Single Variable Time Series Data (Yearly) for simulation.....
EARTH SCIENCE > Oceans > Coastal Processes > Rocky Coasts
Personnel(Role=Data Center Contact)
Metadata Records Examples
Following are examples of metadata records served by an ESG-CET Gateway acting as an OAI-PMH repository. All examples refer to the same OAI item (i.e. object), but for different metadata formats.