Soopervisor¶

Soopervisor runs Ploomber pipelines for batch processing (large-scale training or batch serving) or online inference.

pip install soopervisor

Watch our presentation at EuroPython 2021: Develop and Deploy a Machine Learning Pipeline in 30 Minutes With Ploomber.

Supported platforms¶

Batch serving and large-scale training:
- Airflow
- Argo/Kubernetes
- AWS Batch
- Kubeflow
- SLURM
Online inference:
- AWS Lambda

From notebook to a production pipeline¶

We also have an example that shows how to use our ecosystem of tools to go from a monolithic notebook to a pipeline deployed in Kubernetes.

Standard layout¶

Soopervisor expects your Ploomber project to be in the standard project layout, which requires the following:

Dependencies file¶

requirements.lock.txt: pip dependencies file

Tip

You can generate it with pip freeze > requirements.lock.txt

environment.lock.yml: conda environment with pinned dependencies

Tip

You can generate it with conda env export --no-build --file environment.lock.yml

Pipeline declaration¶

A pipeline.yaml file in the current working directory (or in src/{package-name}/pipeline.yaml if your project is a Python package).

Note

If your project is a package (i.e., it has a src/ directory, a setup.py file is also required.

Scaffolding standard layout¶

The fastest way to get started is to scaffold a new project:

# install ploomber
pip install ploomber

# scaffold project
ploomber scaffold

# or to use conda (instead of pip)
ploomber scaffold --conda

# or to use the package structure
ploomber scaffold --package

# or to use conda and the package structure
ploomber scaffold --conda --package

Then, configure the development environment:

# move to your project's root folder
cd {project-name}

# configure dev environment
ploomber install

Note

ploomber install automatically generates the environment.lock.yml or requirements.lock.txt file. If you prefer so, you may skip ploomber install and create the lock files yourself.

Usage¶

Say that you want to train multiple models in a Kubernetes cluster, you may create a new target environment to execute your pipeline using Argo Workflows:

soopervisor add training --backend argo-workflows

After filling in some basic configuration settings, export the pipeline with:

soopervisor export training

Soopervisor will take care of packaging your code and submitting it for execution. Using Argo Workflows will create a Docker image, upload it to the configured registry, generate an Argo’s YAML spec, and submit the workflow.

Depending on the selected backend (Argo, Airflow, AWS Batch, or AWS Lambda), configuration details will change, but the API remains the same: soopervisor add, then soopervisor export.