Yannick opened the meeting by announcing the focused topic: the Ewok workflow system and the R2D2 data repository.  He presented an overview of both with the following slides.


Ewok is a generic wrapper that can be used to run specific workflow engines such as ecFlow and Cylc.  Rick and Claude presented further details on how this works, using ecFlow as an example:



In addition to the slides, Claude also gave a demo illustrating the ecflow GUI and running it interactively from his desktop computer.  He emphasized ecflow's great graphical interface and sharable suites that make it well suited for collaboration.  He highlight features of the interface including the server, suite, families, and triggers.  Clicking on different components of the GUI brought up the underlying bash scripts, python scripts, and yaml configuration files.  The configuration is controlled by yaml files and the plan is to integrate more models, including UFS.

Then Claude, Rick, and Yannick addressed questions, many of them posed in the chat.

Q (Chris H): How are inter-cycle dependencies addressed?

A (Claude): Inter-cycle dependencies are not addressed in ecflow - this is a weakness.

Q (Arun): What does type mean?

A (Claude): the type of suite

Q (Arun): Are the init scripts bash or python?

A (Claude): One could run python scripts in principle but ewok used bash scripts for easier access to system-level capabilities

Q (Shun): What is the key difference between ECFLOW and ROCOTO?

A (Chris H): ROCOTO

  • Doesn't have a GUI
  • is xml-based
  • does have inter-cycle dependencies
  • not as user-friendly
  • doesn't rely on a server (which is helpful for portability)

Q (Chris H): Does ECFLOW have a pipelining pattern for parallel execution to prevent overloading of resources?

A (Yannick, Claude): Yes, it has a mechanism to limit how many tasks are submitted at once. It can control on the fly

Q (David S): Can yamls include other yamls?

A (Claude): This is possible in principle. We will implement it when needed.

Q (Chris H): Are the templated scripts in EWOK reusable outside ECFlow?

A Yes, but they need to be carefully written.

Q (Chris H): Does ECFLOW have control-flow patterns in addition to the data-flow ones?  E.g. Can it take some value produced by a task at run time, and use it in a condition to determine workflow execution path?

A (Claude): the answer is mostly no. ECFlow does offer events and labels that can be changed by a job and read by another one, but this is not currently utilized (question). I mentioned that if this is something that is really needed, we could put in place a queue system (e.g. rabbitMQ) so that jobs can pass information to each other. This could be wrapped in ecflow (but I am not too keen to consider that for a long while).

Q (Ming):  Any difference between this ECflow and the one used in NCO?

A: No - NCO also uses ecflow. ewok is generic layer around ecflow.  It can generate cycl and other suites too

Ming: maybe this can help to bridge the gap between R2O

Q (Chris H): How do you abstract data locality when a user wants to run on a new machine no one else has ever used before? What is involved in moving a workflow to a new machine?  For example, the UFS workflow scripts are full of if <platform> do <this> statements.

A (Rick & Claude): ewok is portable and R2D2 gives visibility everywhere.   They make use of the jedi-stack environment modules on each system and the specific configuration for each system, including directories, is handled through yaml files.  It is most convenient if the workflow engine (e.g. ecflow) is installed and configured on the host by the sys admins, outside of the jedi-stack, but it can be installed as an environment module with the stack if needed.  Or, it could be provided through a container.  Chris H mentioned that ROCOTO was deliberately designed to be user-installable so there would be no need to rely on the sys admins.  Claude and Rick emphasized that ewok is user-installable if needed.

Q (Ricardo): in case of a completely new obs type a user might be trying to work w/ that only he/she have access to .. sitting somewhere in a work dir ... would it be simple for the user to point to that w/o the data having to already be in the general archive?

A (Claude): Yes - you can have as many R2D2 data bases and obs types as you want

Q (Chris S_: Related to Ricardo's question: What's involved in setting up a locally stored R2D2 archive?

A (Claude): It's just a line in a yaml R2D2 config file; like fetch queries

Q (Arun): is there a yaml file for each task?

A: no - There is one main yaml for whole experiment, but some separate model-related yamls.

Q (Ricardo): Does the JEDI part of the flow pulls from the repo every time the flow runs? That is, is the user going to get changes that might have been made on a particular branch from one cycle to the next?

A: No, generally you would just build code first, then cycle.  But, there may be some NRT applications where you might want to rebuild after every cycle

Q (Chris H): How easy is it to run step manually?

A: You can do it if you know what you are doing.  But, Yannick emphasized that this is rare.  When running a similar system at ECMWF, many users would not need to log in explicitly to HPC systems; they could run the GUI on laptops & workstations and run applications remotely.

Claude emphasized that you can do this with ewok too: you can install the ecflow GUI client on your laptop and run an application on a remote server.

Q (Shun): Can jobs be triggered by file exist status in ECFLOW?

A: no - not generally, though you could design a work-around

Q (Joe A): Is EWOK useful for generating generic workflows or does it require a specific schema for the problem to be solved?

A (Claude, Yannick): you could use it without expecting an experiment schema.  It could be abstracted.

Q (David S): If an original HofX yaml file has been split into multiple smaller files for readability, can EWORK merge them back into a single file?

A: yes - this is what is happening

David S - the resulting file could be very large

Yannick: we started the implementation with one file but in the future we could construct this by including files, if there is a good use case for this.

Q (Tom): which is the most challenging aspect of ewok/R2D2 to keep generic?

A (Claude, Rick): the model.  Machine dependencies can be abstracted through virtual environments.

Q (David S): What is the time scale for cycl integration?

A (Yannick): Maybe Q1 of AOP 21?  One problem with cycl is that it does not have a nice GUI.

Q: has cycl been upgraded to new python?

A (Oliver): Yes - this is coming in the next few weeks.

With this, we had reached the time allocated for the meeting so the meeting was adjourned.  Yannick noted in closing that this was very well attended - up to 96 participants at one point.  And, noting the interest in the topic, promised more discussions on ewok and R2D2 in the future.


  • No labels