Slurm
Contents
Slurm#
Tip
Got questions? Reach out to us on Slack.
Have a cluster? Go straight to the guide on executing your pipelines.
This tutorial shows you how to export a Ploomber pipeline to SLURM.
If you encounter any issues with this tutorial, let us know.
Pre-requisites#
Important
This integration requires ploomber 0.13.7 or higher and soopervisor 0.6 or higher (To upgrade: pip install ploomber soopervisor --upgrade
)
Setting up the project#
Note
These instructions are based on this article.
First, let’s create a SLURM cluster for testing. Create the following docker-compose.yml
file:
services:
slurmjupyter:
image: rancavil/slurm-jupyter:19.05.5-1
hostname: slurmjupyter
user: admin
volumes:
- shared-vol:/home/admin
ports:
- 8888:8888
slurmmaster:
image: rancavil/slurm-master:19.05.5-1
hostname: slurmmaster
user: admin
volumes:
- shared-vol:/home/admin
ports:
- 6817:6817
- 6818:6818
- 6819:6819
slurmnode1:
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode1
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode1
links:
- slurmmaster
slurmnode2:
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode2
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode2
links:
- slurmmaster
slurmnode3:
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode3
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode3
links:
- slurmmaster
volumes:
shared-vol:
Now, start the cluster:
docker-compose up -d
Important
Ensure you’re running a recent version of docker-compose, older versions may throw an error like this:
Unsupported config option for volumes: 'shared-vol'
Unsupported config option for services: 'slurmmaster'
Tip
Once the cluster is up, go http://localhost:8888 to open JupyterLab, where you can edit files, open terminals, and monitor Slurm jobs (Click on Slurm Queue under HPC Tools in the Launcher menu) from your browser.
Let’s connect to the cluster to submit the jobs:
docker-compose exec slurmjupyter /bin/bash
Configure the environment:
# Install miniconda (to get a Python environment ready, not needed if
# There's already a Python environment up and running)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
# Init conda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
# Create and activate env
conda create --name myenv python=3.9 -y
conda activate myenv
# install ploomber and soopervisor in the base environment
pip install ploomber soopervisor
# Download sample pipeline to example/
ploomber examples -n templates/ml-basic -o example
cd example
# Install project dependencies
pip install -r requirements.txt
# Register a soopervisor environment with the SLURM backend
soopervisor add cluster --backend slurm
The soopervisor add
creates a cluster/
directory with a
template.sh
file, this is a template that Soopervisor uses to submit
the tasks in your pipeline. If should contain the placeholders
{{name}}
, and {{command}}
, which Soopervisor will replace by the
task name and the command to execute such a task, respectively. You can
customize it to suit your needs.
For example, since we want the tasks to run in the conda
environment
we created, edit the template.sh
so it looks like this:
#!/bin/bash
#SBATCH --job-name={{name}}
#SBATCH --output=result.out
#
# Activate myenv
conda activate myenv
srun {{command}}
We can now submit the tasks:
soopervisor export cluster
Once jobs finish execution, you’ll see the outputs in the output
directory.
Tip
If you execute soopervisor export cluster
, only tasks whose source code has changed will be executed again, to force the execution of all tasks, run soopervisor export cluster --mode force
Note
When scheduling jobs, soopervisor
calls the sbatch
command and
passes the --kill-on-invalid-dep=yes
, this causes tasks to abort if
any of its dependencies fails. For example, if you have a load -> clean
pipeline and load
fails, clean
is aborted.
Important
For Ploomber to determine which tasks to schedule, it needs to parse your pipeline and check each task’s status. If your pipeline has functions as tasks, the Python environment where you execute soopervisor export must have all dependencies required to import those functions. e.g., if a function train_model uses sklearn
, then sklearn
must be installed. If your pipeline only contains scripts/notebooks, this is not required.
Stop the cluster:
docker-compose stop