There is a need for metadata to help perform data discovery. There are two approaches I've considered to date:
- metadata at the per variable level saying "this is the one" you are looking for. This would require scanning the list of variables first.
- global metadata which points you to the variables.
This is a standard which should be DataSource type independent (i.e. ICARTT, netCDF, HDF5, etc); any file standard which supports global attributes or any level of sophistication in it's header.
There are several reasons for the need for data discovery:
- Variable names can be cryptic.
- There can be multiple measurements of the same type.
- Automation - Software which wants to find its way into a file.
e.g. From an NCAR Aircraft you will have the following Latitudes to choose from:
and the following redundant ambient temperature measurements:
- ATHR1, ATHR2, ATFR
To that end we have defined 2 global metadata attributes for our netCDF files. One for the aircraft position or coordinate variables and a second to identify the wind field variables.
This probably needs some work. For example I should move from space to comma separation. Possibly add a prefix (namespace).
Jon Caron and Ethan Davis of Unidata made a couple passes at conventions for observational data including data discovery.
Unidata Observation Conventions (Draft)