Continuous Integration

GitHub: https://github.com/JCSDA-internal/ci

Documentation: via multiple READMEs inside the GitHub repository

Table of Contents

About

CI System Information

Quick reference


Presubmit tests can be controlled by single-line annotations in the pull
request description. These annotations will be re-examined for each run.
Here is an example of their use:

# Build tests with other unsubmitted packages.
build-group=https://github.com/JCSDA-internal/oops/pull/2284
build-group=https://github.com/JCSDA-internal/saber/pull/651

# Disable the build-cache for tests.
jedi-ci-build-cache=skip

Each configuration setting must be on a single line, but order and
position does not matter.

# Enable tests for your draft PR (disabled by default).
run-ci-on-draft=true

# Select the compiler used by CI (defaults to random choice).
jedi-ci-test-select=gcc

# Select the jedi-bundle branch used for building. Using this option
# disables the build cache.
jedi-ci-bundle-branch=feature/my-bundle-change

Specifying a Build Group

In the default configuration the CI system will build candidate code against
the latest submitted version each package of the jedi-bundle. A pull request
can be built against unsubmitted versions of specific packages by specifying
the version using a tag in the pull request description. Multiple tags may
be added as long as each tag is on its own line of the pull request
description.

build-group=https://github.com/JCSDA-internal/oops/pull/2284

Selecting a Compiler

To save on cloud compute resources the CI test environment selects one of
our three environments randomly. If you want tests with a specific compiler
you can set the annotation jedi-ci-test-select to either gcc, intel,
or gcc11. Please do not use the special value all unless you have an
especially dangerous change known to affect all compilers or the CI
environment.

Build Cache

The CI system relies on a build cache to speed the the build process. Some
changes are capable of causing build failures arising from the use of the
cache. The CI system has two controls to modify cache behavior.

The build cache can be disabled by adding the annotation
jedi-ci-build-cache=skip to the PR description.

If it is necessary to rebuild the entire cache to remove a bug in the cached
binaries, add the annotation jedi-ci-build-cache=rebuild to the PR
description.

CI Development and Debug Options:

USE THESE OPTIONS WITH CAUTION

FAQ

Q: Why is this test running?

A: This test was run by the JEDI CI system whose code is hosted at
github.com/JCSDA-internal/CI.

Q: My draft pull request's tests are not running.

A: You must enable tests for draft PRs by adding the annotation
  run-ci-on-draft=true in the pull request description.

Q: How can a test "pass with failures"?

A: Because the integration test is much larger than typical unit tests, a
small amount of flake test failure is allowed. Over time we will track
the repeatedly flaky tests and fix them. Please examine any failures
carefully to ensure that they were not caused by your change.

Q: Why can't I access the build log?

A: The AWS hosted build logs require a login to the jcsda-usaf AWS
account. We also provide a public build log available to anyone with the
link but this log file is not available until all tests are complete for
an environment.

Administrative Tasks (For JEDI Infra team)

Updating CI instance disk space

Use the following procedure to update the disk space for the ci instances if they are running out of space.

  1. Add an additional EBS volume and mount it on the instance
  2. Move the spack-stack build and source caches there and link to current locations
  3. Turn swapfile off on root filesystem and enable on new volume
  4. Created 500Gb EBS volume “Ubuntu 22.04 CI Intel”
  5. Mounted on EC2 instance following https://docs.aws.amazon.com/ebs/latest/userguide/ebs-attaching-volume.html
  6. Partitioned on EC2 instance, created ext4 filesystem and mounted via /etc/fstab entry following Linux standard practices - mounted as /mnt/addon ; see https://docs.aws.amazon.com/ebs/latest/userguide/ebs-using-volumes.html for one of many tutorials
  7. Moved spack source and build caches to /mnt/addon/spack-stack/{build,source}-cache
  8. Created 128GB swapfile /mnt/addon/swapfile and removed 64GB swapfile /swapfile (incl. fstab entries); this again is Linux boilerplate, see e.g. https://phoenixnap.com/kb/linux-swap-file

Troubleshooting / FAQ

CDash Troubleshooting

CDash is hosted on an AWS EC2 instance in our USAF account in us-east-2 region. Members of the Infrastructure team can access this instance with SSH.

HTTPS / Signing Authority: The CDash server uses SSL connection with a LetsEncrypt SSL signature which is renewed on the 10th of each month by a cron job on the instance. If the job fails or our CDash integration breaks.

Containerized Service Deployment: The CDash server is deployed on the instance via a docker compose deployment with three containers. During certificate updates the "cdash" container is temporarily brought down so that certbot can communicate with the signing authority. It is safe to stop and start the cdash  container although it will cause a temporary service outage when it is stopped. Stopping the MySQL container (without preserving the volume) will clear all data from our CDash server, including repository configurations which will need to be added manually to re-enable test uploading.

Detailed debugging notes for the containerized deployment can be found in the CDash config code repository README file

Running GitHub Workflow Locally

To save on costs and time there is a way to run GitHub Workflow CIs locally, but one that is heavily used that mimics GitHub's locally without minimal setup is act (https://github.com/nektos/act). This will take existing workflow yamls and run them locally using Docker.


MacOS Setup

MacOS has various differences compared to Linux when it comes to using act.  Below are a few items that you need to make sure are installed on your machine:

To set up your environment to make sure the right Docker sockets are used you should run the following command(s):

docker context use default

This makes sure the docker socket(s) are set up in a way that they are properly linked.

Install act  with: brew install act 

act command-line options

The most common command-line options you should remember are: