This guide covers administrative tasks for the R2D2 API server, including managing, updating, fixing, and monitoring the API, database, and AWS infrastructure. It’s meant for JEDI Infrastructure team members and JCSDA AWS admins who need to keep things running smoothly, debug issues, and ensure uptime. In addition to a system overview administration guide, this document will contain playbook entries and work logs to maintain history and procedures of administrative work. If this guide is missing entries, please author them as needed.
Quick Reference
- AWS Resources for R2D2 Prod
- Cloud formation config: //r2d2/server/cfn/prod.yaml
- Cloud formation stack: arn:aws:cloudformation:us-east-2:747101682576:stack/r2d2-api-prod/c15cb090-d2dc-11ef-a026-06c957b621e9
- API Server EC2 Instances: ec2 filtered search
- Load Balancer: arn:aws:elasticloadbalancing:us-east-2:747101682576:loadbalancer/app/r2d2-prod-load-balancer/f526daf391077f70
- SSL certificate (for r2d2-api.jcsda.org): arn:aws:acm:us-east-2:747101682576:certificate/be95f8b1-4162-4615-9ed5-29ed897b527b
- Database
- Admin commands
- Login to a server instance:
aws ssm start-session --target i-0a750b39c8e88d101 --region us-east-2
- Login to a server instance:
Infrastructure Overview
R2D2’s HTTP API is hosted on AWS utilizing the following components. For more info see this Feb 7, 2025 slide deck.
- Application Load Balancer - API Server routing, load balancing, and SSL termination.
- EC2 Compute Hosts - R2D2 API server execution environment.
- AWS Systems Manager - shell access and administration.
- AWS Relational Database Service - R2D2 database.
Common Tasks
Deploy Updated R2D2 API Server Binaries
WARNING: Manually downing or upgrading R2D2 servers without first draining the instance from our load balancer !
The standard process for updating servers involves draining them from the load balancer one at a time, updating the service, and then re-enabling them. This approach ensures continuous network availability and prevents the loss of inbound or queued API calls. A Bash script automates this process while maintaining service stability. However, updates should be performed during low-traffic periods, as serving capacity is temporarily reduced during the update.
1. Build and publish binaries with the prod
tag as described later in this guide.
2. From a developer machine with admin AWS credentials (members of the JEDI-infra team) clone the r2d2 repository.
git clone https://github.com/JCSDA-internal/r2d2.git
&& cd r2d2
3. Run the following command and wait about 10 minutes for the full rollout to complete.
./server/scripts/update_prod_servers.sh --operation update
Note: If the rollout fails for any reason, refer to the update_prod_servers.sh
script, which contains the manual update process and extensive debugging information. This script serves as the central resource for detailed rollout procedures and documentation.
Deploy Cloud Formation Updates
- Copy the updated prod.yaml config to our infrastructure-as-code bucket and get the versioned URL for the file as shown here.
aws s3 cp server/cfn/prod.yaml s3://jcsda-usaf-iac-artifacts/r2d2/prod.yaml
file_version=$(aws s3api list-object-versions \
--bucket jcsda-usaf-iac-artifacts \
--prefix "r2d2/prod.yaml" \
| jq -r '.Versions[] | select(.IsLatest==true) | .VersionId')
echo "https://jcsda-usaf-iac-artifacts.s3.us-east-2.amazonaws.com/r2d2/prod.yaml?versionId=${file_version}" - Go to the CloudFormation stack "r2d2-api-prod" in the AWS console (or use quick-ref link above) and click "Update".
- In the update menu, select "Replace existing template" and enter the URL generated in the first step. Change any parameters and review deploy options.
- Review the change log and ensure that EC2 instances and RDS instances are not replaced.
- Monitor the CloudFormation rollout in console and ensure it executes to completion.
Viewing Service and Error Logs
Logs for all API server replicas can be found under the the "r2d2.*" log groups in AWS CloudWatch. The CloudWatch log exports are on a half-our schedule so expect a delay between an event and being able to view the logs in CloudWatch.
- Direct link to cloudwatch log groups.
- Filtered query in LogsExplorer.
Each server replica writes logs to /var/log/gunicorn/
and those can be viewed in realtime if you have a shell into the replica.
Build and publish API server binaries
1) Build the docker image
git clone -b feature/restapi https://github.com/jcsda-internal/r2d2.git cd r2d2 docker build -f server/docker/Dockerfile.app \ --platform=linux/amd64 \ -t 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod .
2) Get local credentials for our ECR repository.
aws ecr get-login-password --region us-east-2 \ | docker login --username AWS \ --password-stdin 747101682576.dkr.ecr.us-east-2.amazonaws.com
3) Push the docker image
docker push 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod
Update a Running API Server Instance
This is the process needed to push updated code to an API Server instance. Note that you can build the binary on the instance, but if you do, you should also push it to ECR to keep a persistent copy of our server code.
# If logged into the server, log out now and log back in to reset the shell environment # From your developer machine log into the server and switch to the root user. aws ssm start-session --target i-0a750b39c8e88d101 --region us-east-2 sudo su - # Get local credentials for our ECR repository. aws ecr get-login-password --region us-east-2 \ | docker login --username AWS \ --password-stdin 747101682576.dkr.ecr.us-east-2.amazonaws.com # Pull the latest Docker container docker pull 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod # Stop the running service. systemctl stop r2d2.service # If necessary, edit service definition file: /etc/systemd/system/r2d2.service # Do this only if docker arguments or environment variables have changed. # If updating this file you will need to run `systemctl daemon-reload` to # load in updates to the service. # Restart the systemd service. systemctl start r2d2.service # Check the status of the service docker ps # Sample output from the production server CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cd8745ac7561 747101682576.dkr.ecr.us-east-2.amazonaws.com/r2d2-server:prod "run_r2d2_app --port…" 47 seconds ago Up 46 seconds 0.0.0.0:80->80/tcp, :::80->80/tcp r2d2-api-service # Check the output logs docker logs -f r2d2-api-service
Accessing the RDS SQL database
The RDS database cannot be accessed from the public internet and must be accessed from one of our dev servers using a ssm shell login.
1) Log into the dev server (see quick reference)
2) Fetch the server password from the parameters in our cloud formation stack (see link in quick refrence)
3) From the dev server execute this command, using the password retrieved above
mysql -h r2d2-api-prod-rdsinstance-eja4eofrohvy.cg24vilqoa8w.us-east-2.rds.amazonaws.com -u admin -p
Load "dev" server data onto r2d2 database
This should not be done! This procedure was used to test the R2D2 database prior to release and is documented here for historical context and in case some of these steps have useful analogs for other work.
For the localhost server container, can we pre-load r2d2-data and the r2d2 mysql user so that end users just download the container and execute it on port 8080? Basically execute the setup database script from r2d2/scripts. Update crontabs on all platforms (experiment scrubber) - Platforms include derecho, discover, hercules, orion, s4 - Update venv used by crontabs to use r2d2-client and not r2d2 Create username and api keys for all users - We will have to massage the user database table since some humans have more than one username - We need an api key generator and a way to send these api keys to the user Go to dev server dump dev database into IAC bucket - Access astromech-nonprod $ aws ssm start-session --target i-0e97feb060f886e43 --region us-east-2 $ sudo su ubuntu && cd ~ - Use password "local-server-root-password" (only works on dev server) - DATE=$(date "+%Y%m%dT%H%M%SZ") - mysqldump -h 127.0.0.1 -u root -p --databases r2d2 > r2d2-dev-backup-${DATE}.sql - aws s3 cp r2d2-dev-backup-20250205T171114Z.sql s3://jcsda-usaf-iac-artifacts/r2d2/ Log into prod server and load database with dev data - Login and load data - ssm login command (see quick ref) - sudo su ubuntu && cd ~ - aws s3 cp s3://jcsda-usaf-iac-artifacts/r2d2/r2d2-dev-backup-20250205T171114Z.sql ./r2d2-dev-backup.sql - Get password cloud formation console parameters - Log into server - mysql -h r2d2-api-prod-rdsinstance-eja4eofrohvy.cg24vilqoa8w.us-east-2.rds.amazonaws.com -u admin -p - Run these sql commands CREATE DATABASE r2d2; USE r2d2; source /home/ubuntu/r2d2-dev-backup.sql