Commit 0cef7bd6 authored by Klaus Zimmermann's avatar Klaus Zimmermann
Browse files

Improve slurm integration (closes #236)

parent 5aa5301e
......@@ -152,7 +152,7 @@ SCHEDULERS = OrderedDict([
def setup_scheduler(args):
scheduler_spec = args.dask_scheduler.split(':')
scheduler_spec = args.dask_scheduler.split('@')
scheduler_name = scheduler_spec[0]
scheduler_kwargs = {k: v for k, v in (e.split('=')
for e in scheduler_spec[1:])}
#SBATCH -J climix-test
#SBATCH -t 10:00:00
#SBATCH -N 1 --exclusive
#SBATCH hetjob
#SBATCH -N 16 --exclusive --cpus-per-task=4
# General approach
# ----------------
# We use slurm's heterogeneous job support for midas, using two components. The
# first component contains dask scheduler and client, the second component runs
# the workers. Since neither scheduler nor client are naturally parallel, we
# run both together on a single node. The workers, however, bring the scaling
# parallelism and can be run on an arbitrary number of nodes, depending on the
# size of the data and time and memory contraints. Note that we often want to
# use several nodes purely to gain access to sufficient memory.
# Bi specific notes
# -----------------
# Cores
# ~~~~~
# As of this writing, bi nodes are setup with hyperthreading active, with every
# node having 32 virtual cores provided by 16 physical cores. We want the
# workers to use 16 threads total per node, thus using all physical cores but
# avoiding conflicts due to hyperthreading. We achieve this by running 8 worker
# processes with 2 threads each on every node. This is implemented via slurm's
# `--cpus-per-task=4` option, which instructs slurm to start one task for every
# 4 (virtual) cpus. That means that the number of nodes can be freely chosen
# using the `-N` option in the header at the top of this file.
# Memory
# ~~~~~~
# Every normal (`thin`) node has 64GB of memory and there is a small number of
# `fat` nodes with 256GB of memory.
# We use a single fat node for the first component of the heterogeneous job,
# giving scheduler and client a bit of headroom for transfer and handling of
# larger chunks of memory.
# The workers are run on normal nodes. To allow for a little bit of breathing
# room for the system and other programs, we use 90% of the available memory,
# equally distributed among the worker processes (or equivalently slurm tasks)
# on each node for a total of 7.2GB per worker.
# MEM_PER_WORKER=$(echo "2 * $SLURM_CPUS_PER_TASK * $SLURM_MEM_PER_CPU * .9" |bc -l)
echo "Number of workers: $NO_WORKERS, memory: $MEM_PER_WORKER"
# >>> conda initialize >>>
__conda_setup="$('/nobackup/rossby20/rossby/software/conda/bi/miniconda3-20201119/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
if [ -f "/nobackup/rossby20/rossby/software/conda/bi/miniconda3-20201119/etc/profile.d/" ]; then
. "/nobackup/rossby20/rossby/software/conda/bi/miniconda3-20201119/etc/profile.d/"
export PATH="/nobackup/rossby20/rossby/software/conda/bi/miniconda3-20201119/bin:$PATH"
unset __conda_setup
# <<< conda initialize <<<
conda activate climix-devel-2
# Start scheduler
srun --het-group=0 --ntasks 1\
dask-scheduler \
--interface ib0 \
--scheduler-file $SCHEDULER_FILE &
srun --het-group=1 \
dask-worker \
--interface ib0 \
--scheduler-file $SCHEDULER_FILE \
--memory-limit "${MEM_PER_WORKER}MB" \
--nthreads 2 &
srun --het-group=0 --ntasks 1\
climix -e -s -d external@scheduler_file=$SCHEDULER_FILE -x tn90p -l debug /home/rossby/imports/cordex/EUR-11/CLMcom-CCLM4-8-17/v1/ICHEC-EC-EARTH/r12i1p1/rcp85/bc/links-hist-scn/day/tasmin_EUR-11_ICHEC-EC-EARTH_rcp85_r12i1p1_CLMcom-CCLM4-8-17_v1_day_*.nc
# wait
# Script ends here
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment