this is preliminary - please edit this page as needed as you/we gain experience with this


This is a modified version of David Hahn's original instructions for using his awscluster tool.

1. Install the latest ‘awscluster’ python utility to start up a cluster:

git clone https://github.com/JCSDA/jedi-tools.git
cd jedi-tools/AWS/python
git checkout feature/aws-jedi
make install


If this gives you errors you might try changing these lines in the Makefile, as follows:
(uncomment whichever works)

install: egg
#easy_install-${PYTHON_VERSION} --user dist/awscluster-0.0-py${PYTHON_VERSION}.egg
easy_install --user dist/awscluster-0.0-py${PYTHON_VERSION}.egg


This will create a python script called awscluster. By default it should be located in your home directory under /.local/bin.  You might want to add this to your path or replace all instances of awscluster in Step 2 with ~/.local/bin/awscluster.

2. Start a cluster and login

awscluster start -stack_name [your_unique_name_alphanumeric_ONLY] -key [your_ssh_key_name_on_aws] -nodes 6 -ec2type c5.18xlarge
ssh -A -i [your_private_ssh_key] ubuntu@[ip_address_assigned_in_previous_step]


NOTE:
Our account is limited to 50 c5.18xlarge instances.  You can check the CloudFormation for other stacks that are already running. The IOSwebapp stack can be ignored.  If the c5.18xlarge nodes are unavailable, you can use a different node type.    Click here for other options.
A 48-node c5.18xlarge cluster costs about $109/hr to run, so please be sure to shut it down when you are finished, and double-check on the CloudFormation service in the AWS console that it has been deleted.
Step 2a should only take a 2-3 minutes at most. If it does not complete by then, check the AWS CloudFormation console to see if there was an error.

3. Compile and run JEDI

You should be able to build and compile as usual but in order to run across nodes you'll have to pass mpirun the hostfile that exists in the main directory.  You might also want to tag the output to make sure it's running across nodes.  For openmpi, this would look something like this:

module load jedi-gnu
mpirun -hostfile /home/ubuntu/hostfile -np 6 -tag-output <executable>


I'm now trying to figure out how to tell ecbuild to use these options for mpirun when executing tests.


  • No labels