You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got this working today on AWS using No Tears Cluster and wanted to document it before I forget.
After creating the cluster (I chose us-east-2 since nobody uses it and I can get spot instances all the time), I logged in, installed miniconda using the IOOS Python instructions (but skipping the creation of the IOOS environment), installed mamba into the base environment and then did:
mamba env create -f tensorflow-gpu.yml
using this slightly modified version of the environment:
The key was using only the defaults channel. If you include conda-forge in the channel list it pulls in the latest cudatoolkit package that doesn't work with the tensorflow-gpu from defaults. (tensorflow-gpu is not currently available on conda-forge).
I activated the environment: conda activate tensorflow-gpu and then followed the instructions for setting up Pangeo on HPC.
I then started an interactive session on the gpu partition, which spins up an instance to support the request, and then starts a terminal there:
Then I ran a start_jupyter script (~/bin/start_jupyter), which looks like this:
cd /shared
JPORT=$(shuf -i 8400-9400 -n 1)echo""echo""echo"Step 1: Wait until this script says the Jupyter server"echo" has started. "echo""echo"Step 2: Copy this ssh command into a terminal on your"echo" local computer:"echo""echo" ssh -N -i $HOME/.ssh/AWS-HPC-Ohio -L 8889:`hostname`:$JPORT$USER@ec2-3-129-67-246.us-east-2.compute.amazonaws.com"echo""echo"Step 3: Browse to https://localhost:8889 on your local computer"echo""echo""
sleep 2
jupyter lab --no-browser --ip=`hostname` --port=$JPORT
Then on my local windows machine, I opened up a git bash terminal and just paste in the ssh forwarding link echoed above.
Then I opened up localhost:8889 in my local browser and ran the notebook on the remote GPU instance!
Note: I discovered which version of cuda-toolkit to specify by doing:
I got this working today on AWS using No Tears Cluster and wanted to document it before I forget.
After creating the cluster (I chose us-east-2 since nobody uses it and I can get spot instances all the time), I logged in, installed miniconda using the IOOS Python instructions (but skipping the creation of the IOOS environment), installed
mamba
into the base environment and then did:using this slightly modified version of the environment:
The key was using only the defaults channel. If you include conda-forge in the channel list it pulls in the latest cudatoolkit package that doesn't work with the tensorflow-gpu from defaults. (tensorflow-gpu is not currently available on conda-forge).
I activated the environment:
conda activate tensorflow-gpu
and then followed the instructions for setting up Pangeo on HPC.I then started an interactive session on the gpu partition, which spins up an instance to support the request, and then starts a terminal there:
Then I ran a
start_jupyter
script (~/bin/start_jupyter
), which looks like this:Then on my local windows machine, I opened up a git bash terminal and just paste in the ssh forwarding link echoed above.
Then I opened up
localhost:8889
in my local browser and ran the notebook on the remote GPU instance!Note: I discovered which version of cuda-toolkit to specify by doing:
which told me I should not specify any cudatoolkit greater than 11.
Here's the proof that it worked: https://nbviewer.jupyter.org/gist/rsignell-usgs/1e1a7f3ae3483725dd8f78f4d02c023a
cc: @csherwood-usgs
The text was updated successfully, but these errors were encountered: