This guide covers administrative tasks for the R2D2 API server, including managing, updating, fixing, and monitoring the API, database, and AWS infrastructure. It’s meant for JEDI Infrastructure team members and JCSDA AWS admins who need to keep things running smoothly, debug issues, and ensure uptime. In addition to a system overview administration guide, this document will contain playbook entries and work logs to maintain history and procedures of administrative work. If this guide is missing entries, please author them as needed.
aws ssm start-session --target i-0a750b39c8e88d101 --region us-east-2
R2D2’s HTTP API is hosted on AWS utilizing the following components. For more info see this Feb 7, 2025 slide deck.
WARNING: Manually downing or upgrading R2D2 servers without first draining the instance from our load balancer !
The standard process for updating servers involves draining them from the load balancer one at a time, updating the service, and then re-enabling them. This approach ensures continuous network availability and prevents the loss of inbound or queued API calls. A Bash script automates this process while maintaining service stability. However, updates should be performed during low-traffic periods, as serving capacity is temporarily reduced during the update.
1. Build and publish binaries with the prod
tag as described later in this guide.
2. From a developer machine with admin AWS credentials (members of the JEDI-infra team) clone the r2d2 repository.
git clone https://github.com/JCSDA-internal/r2d2.git
&& cd r2d2
3. Run the following command and wait about 10 minutes for the full rollout to complete.
./server/scripts/update_prod_servers.sh --operation update
Note: If the rollout fails for any reason, refer to the update_prod_servers.sh
script, which contains the manual update process and extensive debugging information. This script serves as the central resource for detailed rollout procedures and documentation.
aws s3 cp server/cfn/prod.yaml s3://jcsda-usaf-iac-artifacts/r2d2/prod.yaml
file_version=$(aws s3api list-object-versions \
--bucket jcsda-usaf-iac-artifacts \
--prefix "r2d2/prod.yaml" \
| jq -r '.Versions[] | select(.IsLatest==true) | .VersionId')
echo "https://jcsda-usaf-iac-artifacts.s3.us-east-2.amazonaws.com/r2d2/prod.yaml?versionId=${file_version}"
Logs for all API server replicas can be found under the the "r2d2.*" log groups in AWS CloudWatch. The CloudWatch log exports are on a half-our schedule so expect a delay between an event and being able to view the logs in CloudWatch.
Each server replica writes logs to /var/log/gunicorn/
and those can be viewed in realtime if you have a shell into the replica.
1) Build the docker image
git clone -b feature/restapi https://github.com/jcsda-internal/r2d2.git cd r2d2 docker build -f server/docker/Dockerfile.app \ --platform=linux/amd64 \ -t 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod . |
2) Get local credentials for our ECR repository.
aws ecr get-login-password --region us-east-2 \ | docker login --username AWS \ --password-stdin 747101682576.dkr.ecr.us-east-2.amazonaws.com |
3) Push the docker image
docker push 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod |
This is the process needed to push updated code to an API Server instance. Note that you can build the binary on the instance, but if you do, you should also push it to ECR to keep a persistent copy of our server code.
# If logged into the server, log out now and log back in to reset the shell environment # From your developer machine log into the server and switch to the root user. aws ssm start-session --target i-0a750b39c8e88d101 --region us-east-2 sudo su - # Get local credentials for our ECR repository. aws ecr get-login-password --region us-east-2 \ | docker login --username AWS \ --password-stdin 747101682576.dkr.ecr.us-east-2.amazonaws.com # Pull the latest Docker container docker pull 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod # Stop the running service. systemctl stop r2d2.service # If necessary, edit service definition file: /etc/systemd/system/r2d2.service # Do this only if docker arguments or environment variables have changed. # If updating this file you will need to run `systemctl daemon-reload` to # load in updates to the service. # Restart the systemd service. systemctl start r2d2.service # Check the status of the service docker ps # Sample output from the production server CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cd8745ac7561 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod "run_r2d2_app --port…" 47 seconds ago Up 46 seconds 0.0.0.0:80->80/tcp, :::80->80/tcp r2d2-api-service # Check the output logs docker logs -f r2d2-api-service |
The RDS database cannot be accessed from the public internet and must be accessed from one of our dev servers using a ssm shell login.
1) Log into the dev server (see quick reference)
2) Fetch the server password from the parameters in our cloud formation stack (see link in quick refrence)
3) From the dev server execute this command, using the password retrieved above
mysql -h r2d2-api-prod-rdsinstance-eja4eofrohvy.cg24vilqoa8w.us-east-2.rds.amazonaws.com -u admin -p |
This should not be done! This procedure was used to test the R2D2 database prior to release and is documented here for historical context and in case some of these steps have useful analogs for other work.
For the localhost server container, can we pre-load r2d2-data and the r2d2 mysql user so that end users just download the container and execute it on port 8080? Basically execute the setup database script from r2d2/scripts. Update crontabs on all platforms (experiment scrubber) - Platforms include derecho, discover, hercules, orion, s4 - Update venv used by crontabs to use r2d2-client and not r2d2 Create username and api keys for all users - We will have to massage the user database table since some humans have more than one username - We need an api key generator and a way to send these api keys to the user Go to dev server dump dev database into IAC bucket - Access astromech-nonprod $ aws ssm start-session --target i-0e97feb060f886e43 --region us-east-2 $ sudo su ubuntu && cd ~ - Use password "local-server-root-password" (only works on dev server) - DATE=$(date "+%Y%m%dT%H%M%SZ") - mysqldump -h 127.0.0.1 -u root -p --databases r2d2 > r2d2-dev-backup-${DATE}.sql - aws s3 cp r2d2-dev-backup-20250205T171114Z.sql s3://jcsda-usaf-iac-artifacts/r2d2/ Log into prod server and load database with dev data - Login and load data - ssm login command (see quick ref) - sudo su ubuntu && cd ~ - aws s3 cp s3://jcsda-usaf-iac-artifacts/r2d2/r2d2-dev-backup-20250205T171114Z.sql ./r2d2-dev-backup.sql - Get password cloud formation console parameters - Log into server - mysql -h r2d2-api-prod-rdsinstance-eja4eofrohvy.cg24vilqoa8w.us-east-2.rds.amazonaws.com -u admin -p - Run these sql commands CREATE DATABASE r2d2; USE r2d2; source /home/ubuntu/r2d2-dev-backup.sql |