Slurm ===== .. tip:: **Got questions?** Reach out to us on `Slack `_. This tutorial shows you how to export a Ploomber pipeline to `SLURM `_. If you encounter any issues with this tutorial, `let us know `_. Pre-requisites -------------- .. important:: This integration requires ploomber 0.13.7 or higher and soopervisor 0.6 or higher (To upgrade: ``pip install ploomber soopervisor --upgrade``) * `docker and docker-compose `_ Setting up the project ---------------------- .. note:: These instructions are based on `this article `_. First, let's create a SLURM cluster for testing. Create the following ``docker-compose.yml`` file: .. code-block:: yaml services: slurmjupyter: image: rancavil/slurm-jupyter:19.05.5-1 hostname: slurmjupyter user: admin volumes: - shared-vol:/home/admin ports: - 8888:8888 slurmmaster: image: rancavil/slurm-master:19.05.5-1 hostname: slurmmaster user: admin volumes: - shared-vol:/home/admin ports: - 6817:6817 - 6818:6818 - 6819:6819 slurmnode1: image: rancavil/slurm-node:19.05.5-1 hostname: slurmnode1 user: admin volumes: - shared-vol:/home/admin environment: - SLURM_NODENAME=slurmnode1 links: - slurmmaster slurmnode2: image: rancavil/slurm-node:19.05.5-1 hostname: slurmnode2 user: admin volumes: - shared-vol:/home/admin environment: - SLURM_NODENAME=slurmnode2 links: - slurmmaster slurmnode3: image: rancavil/slurm-node:19.05.5-1 hostname: slurmnode3 user: admin volumes: - shared-vol:/home/admin environment: - SLURM_NODENAME=slurmnode3 links: - slurmmaster volumes: shared-vol: Now, start the cluster: .. code-block:: sh docker-compose up -d .. important:: Ensure you're running a recent version of ``docker-compose``, older versions may throw an error like this: .. code-block:: console Unsupported config option for volumes: 'shared-vol' Unsupported config option for services: 'slurmmaster' .. tip:: Once the cluster is up, go `http://localhost:8888 `_ to open JupyterLab, where you can edit files, open terminals, and monitor Slurm jobs (Click on Slurm Queue under HPC Tools in the Launcher menu) from your browser. Let's connect to the cluster to submit the jobs: .. code-block:: sh docker-compose exec slurmjupyter /bin/bash Configure the environment: .. code-block:: sh # Install miniconda (to get a Python environment ready, not needed if # There's already a Python environment up and running) wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda # Init conda eval "$($HOME/miniconda/bin/conda shell.bash hook)" # Create and activate env conda env create --name myenv conda activate myenv # install ploomber and soopervisor in the base environment pip install ploomber soopervisor # Download sample pipeline to example/ ploomber examples -n templates/ml-basic -o example cd example # Install project dependencies pip install -r requirements.txt # Register a soopervisor environment with the SLURM backend soopervisor add cluster --backend slurm The ``soopervisor add`` creates a ``cluster/`` directory with a ``template.sh`` file, this is a template that Soopervisor uses to submit the tasks in your pipeline. If should contain the placeholders ``{{name}}``, and ``{{command}}``, which Soopervisor will replace by the task name and the command to execute such a task, respectively. You can customize it to suit your needs. For example, since we want the tasks to run in the ``conda`` environment we created, edit the ``template.sh`` so it looks like this: .. code-block:: sh #!/bin/bash #SBATCH --job-name={{name}} #SBATCH --output=result.out # # Activate myenv conda activate myenv srun {{command}} We can now submit the tasks: .. code-block:: sh soopervisor export cluster Once jobs finish execution, you'll see the outputs in the ``output`` directory. .. tip:: If you execute ``soopervisor export cluster``, only tasks whose source code has changed will be executed again, to force the execution of all tasks, run ``soopervisor export cluster --mode force`` .. note:: When scheduling jobs, ``soopervisor`` calls the ``sbatch`` command and passes the ``--kill-on-invalid-dep=yes``, this causes tasks to abort if any of its dependencies fails. For example, if you have a ``load -> clean`` pipeline and ``load`` fails, ``clean`` is aborted. .. important:: For Ploomber to determine which tasks to schedule, it needs to parse your pipeline and check each task's status. **If your pipeline has functions as tasks**, the Python environment where you execute ``soopervisor export`` must have all dependencies required to import those functions. e.g., if a function ``train_model`` uses ``sklearn``, then ``sklearn`` must be installed. If your pipeline only contains scripts/notebooks, this is not required. Stop the cluster: .. code-block:: sh docker-compose stop