Skip to content

Commit

Permalink
Deployed bf1ab49 with MkDocs version: 1.6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Oct 28, 2024
0 parents commit caf9b61
Show file tree
Hide file tree
Showing 547 changed files with 1,637,330 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
6,687 changes: 6,687 additions & 0 deletions 404.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

7,144 changes: 7,144 additions & 0 deletions account-project-management/accounts-and-access/alcf-passcode-tokens/index.html

Large diffs are not rendered by default.

6,863 changes: 6,863 additions & 0 deletions account-project-management/accounts-and-access/user-account-overview/index.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

6,933 changes: 6,933 additions & 0 deletions account-project-management/allocation-management/overview/index.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

6,711 changes: 6,711 additions & 0 deletions account-project-management/index.html

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6,993 changes: 6,993 additions & 0 deletions account-project-management/project-management/project-reports/index.html

Large diffs are not rendered by default.

7,293 changes: 7,293 additions & 0 deletions account-project-management/project-management/starting-alcf-award/index.html

Large diffs are not rendered by default.

6,852 changes: 6,852 additions & 0 deletions account-project-management/project-management/team-management/index.html

Large diffs are not rendered by default.

6,861 changes: 6,861 additions & 0 deletions ai-testbed/cerebras/customizing-environment/index.html

Large diffs are not rendered by default.

7,171 changes: 7,171 additions & 0 deletions ai-testbed/cerebras/example-programs/index.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-testbed/cerebras/files/Trust_ctl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-testbed/cerebras/files/compile-vs-run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-testbed/cerebras/files/cs-getting-started.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-testbed/cerebras/files/grafana_ctl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6,813 changes: 6,813 additions & 0 deletions ai-testbed/cerebras/getting-started/index.html

Large diffs are not rendered by default.

6,787 changes: 6,787 additions & 0 deletions ai-testbed/cerebras/job-queuing-and-submission/index.html

Large diffs are not rendered by default.

6,906 changes: 6,906 additions & 0 deletions ai-testbed/cerebras/miscellaneous/index.html

Large diffs are not rendered by default.

7,029 changes: 7,029 additions & 0 deletions ai-testbed/cerebras/running-a-model-or-program/index.html

Large diffs are not rendered by default.

6,757 changes: 6,757 additions & 0 deletions ai-testbed/cerebras/system-overview/index.html

Large diffs are not rendered by default.

6,737 changes: 6,737 additions & 0 deletions ai-testbed/cerebras/tunneling-and-forwarding-ports/index.html

Large diffs are not rendered by default.

6,876 changes: 6,876 additions & 0 deletions ai-testbed/data-management/data-management-overview/index.html

Large diffs are not rendered by default.

84 changes: 84 additions & 0 deletions ai-testbed/files/dictionary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
aitestbed
ALCFUserID
analyser
ANL
arnoldw
AUTOTUNE
Cerebras
conv
cosmictagger
cpus
cuda
cudart
DATADIR
dlerror
dlopen
elif
finetune
flos
gbps
GEMM
Graphcore
graphcore_login
gres
inet
inplace
jsons
kaggle
keras
keygen
keyscan
layernorm
lenet
libcudart
libnvinfer
LOGDIR
logreg
mgmt
mnist
MNIST
modelzoo
nodelist
ntasks
OUTDIR
passcode
petaFLOPS
POPART
POPLIBS
poptorch
popvision
pretrain
pretraining
PYTHONPATH
relu
RELU
resnet
run_unet_256_256_single_4
SambaFlow
sambanova
sbatch
scancel
Slurm
snconfig
snpath
snthreads
sntilestat
snvenv
softmax
squeue
SRAM
srun
tensorrt
tf2tensorrt
TFLOPs
unet
UNet
unet
unet_compile_run_all
Venkat
venv
venvs
vipu
virtualenv
wilsonb
XRDU
64 changes: 64 additions & 0 deletions ai-testbed/files/example-multi-node-programs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#! /bin/bash -x
set -e
#
# Usage: ./unet_all.sh 256 256
#
SECONDS=0

# IMage size.
IM=${1}
# Batch Size
BS=${2}
NUM_WORKERS=1
export OMP_NUM_THREADS=16

source /opt/sambaflow/venv/bin/activate
UNET=$(pwd)/unet

echo "Model: UNET"
echo "Date: " $(date +%m/%d/%y)
echo "Time: " $(date +%H:%M)

echo "COMPILE"

# Compile for parallel RDUs
if [ ! -e out/unet_train_${BS}_${IM}_NN/unet_train_${BS}_${IM}_NN.pef ] ; then
python ${UNET}/unet.py compile -b ${BS} --in-channels=3 --in-width=${IM} --in-height=${IM} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name="unet_train_${BS}_${IM}_NN" --data-parallel -ws 2 > compile_${BS}_${IM}_NN.log 2>&1
fi

# Run Multi-Node, Data Parallel
NN=2
echo "RUN"
echo "NN=${NN}"
sbatch --gres=rdu:1 --tasks-per-node 8 --nodes 2 --nodelist sm-02,sm-01 --cpus-per-task=16 ./unet_batch.sh ${NN} ${NUM_WORKERS}
echo "Duration: " $SECONDS

#! /bin/bash -x
set -e
#
# Usage: ./unet_batch.sh 2 1
#
SECONDS=0

# Batch Size
BS=256

# IMage size
IM=256
NN=${1}
NUM_WORKERS=${2}
export OMP_NUM_THREADS=16
DATADIR=/software/sambanova/dataset/kaggle_3m
UNET=$(pwd)/unet
export SAMBA_CCL_USE_PCIE_TRANSPORT=0

# TODO: Update this.
source /opt/sambaflow/venv/bin/activate

echo "Model: UNET_TRAIN"
echo "Date: " $(date +%m/%d/%y)
echo "Time: " $(date +%H:%M)

srun --mpi=pmi2 python ${UNET}/unet_hook.py run --do-train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 2 --data-dir ${DATADIR} --log-dir log_dir_unet_${NN}_train_kaggle --pef=$(pwd)/out/unet_train_${BS}_${IM}_NN/unet_train_${BS}_${IM}_NN.pef --data-parallel --reduce-on-rdu --num-workers=${NUM_WORKERS}

echo "Duration: " $SECONDS
Binary file added ai-testbed/files/home-cerebras-sambanova.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit caf9b61

Please sign in to comment.