Overview
- Written in Python3
- ConfigMaster will be employed for configuration parsing, etc.
- Allows users to employ python when building configuration files (i.e. loop over directories to create dictionary elements for archiving)
- strftime formatting employed for adding time elements to paths
- Similar in functionality to Archiver.pl
- Iterative/Evolutionary design
Definitions / Acronyms / Initialisms
- CS - Campaign Store
Resources
- https://github.com/globus/automation-examples
https://www2.cisl.ucar.edu/sites/default/files/CISL_GlobusCLI_Nov2018.html
https://www2.cisl.ucar.edu/resources/storage-and-file-systems/globus-file-transfers).
https://docs.globus.org/how-to/globus-connect-personal-linux/#globus-connect-personal-cli
user-facing page for Globus information in the RAL User Knowledge Base: https://sdg.rap.ucar.edu/confluence/pages/viewpage.action?title=Globus&spaceKey=RUKB
Design
User Stories
Simple Copy
- User wants to copy directories of data every day to the CS
from /ldm3_d2/grib/GFS002/%Y%m%d to
/RAPDMG/grib/GFS002
Simple Tar
- User wants to copy a directory of data into a tar file
from /ldm1_d2/NLDN/DATEYYYYMMDD/ to
/RAPDMG/LDM/ARCHIVE/DATEYYYY/DATEMMDD
- put in tar file DATEYYYYMMDD.nldn.tar
- include cdDirTar to put relative paths into the tar file
Globs in Source
- User wants to copy several directories of data into a tar file
<source>/ldm1_d2/nids/raw/nids/*/BREF1/DATEYYYYMMDD</source>
<destination>/RAPDMG/LDM/ARCHIVE/DATEYYYY/DATEMMDD</destination>
<cdDirTar>/ldm1_d2/nids/raw/</cdDirTar>
<expectedNumFiles>
25000
</expectedNumFiles>
<expectedFileSize>
245000000
</expectedFileSize>
<tarFilename>DATEYYYYMMDD_all.nids.tar</tarFilename>
Do we want to support this? Maybe make user use python to build multiple entries into the archive items dictionary?
Multiple entries into the same tar file
- User gives several archive items with the same destination tar file
- Items are added to the tar file before sending to the CS.
Requirements
Minimum Requirements
- Support user stories - Simple Copy, Simple Tar, Multiple entries into the same tar file
- Email user status of gat run
Stretch Goals
- Support user stories - Globs in Source
- Allow user to define expected file sizes, # of files, etc. and include results in email.
- skip underscore files
- set mode on uploads to CR
- accept meta-data and make available for data discovery to other users.
- zip files
- encrypt files
Iterations
Proof of Functionality - Connect and transfer via globus
- Uses globus-cli to connect to the CS and transfer a file.
Add ConfigMaster Support
- Use ConfigMaster config file to define a set of archive items
- Iterate over them to send to the CS
Support the "Simple Copy" user story