Full workflow

Important

This tutorial requires soopervisor 0.6.2 or higher, and soorgeon 0.0.10 or higher.

This tutorial shows how to go from a monolithic Jupyter notebook to a modular, production-ready pipeline deployed in workflow by using the tools in our ecosystem:

  1. soorgeon

  2. ploomber

  3. soopervisor

Pre-requisites

Building Docker image

We provide a Docker image so you can quickly run this example:

# get repository
git clone https://github.com/ploomber/soopervisor
cd soopervisor/tutorials/workflow

# build image
docker build --tag ploomber-workflow .

# create a directory to store the pipeline output
export SHARED_DIR=$HOME/ploomber-workflow
rm -rf $SHARED_DIR
mkdir -p $SHARED_DIR

# start (takes ~1 minute to be ready)
docker run -i -t \
    --privileged=true -v /var/run/docker.sock:/var/run/docker.sock \
    --volume $SHARED_DIR:/mnt/project \
    --env SHARED_DIR \
    --env PLOOMBER_STATS_ENABLED=false \
    -p 2746:2746 \
    -p 8888:8888 \
    ploomber-workflow

Note

We need to run docker run in privileged mode since we’ll be running docker commands inside the container. More on that here

Upon initialization, JupyterLab will be running at http://127.0.0.1:8888

Refactor notebook

First, we use soorgeon to refactor the notebook:

soorgeon refactor nb.ipynb -p /mnt/project/output -d parquet

We can generate a plot to visualize the dependencies:

ploomber plot

If you open the generated pipeline.png, you’ll see that soorgeon inferred the dependencies among the sections in the notebook and built a Ploomber pipeline automatically!

Now you can iterate this modular pipeline with Ploomber, but for now, let’s go to the next stage and deploy to Kubernetes.

Configure target platform

Soopervisor allows you to configure the target platform using a soopervisor.yaml file, let’s add it and set the backend to argo-worflows:

# soopervisor add requires a requirements.lock.txt file
cp requirements.txt requirements.lock.txt

# add the taget environment
soopervisor add training --backend argo-workflows

Usually, you’d manually edit soopervisor.yaml to configure your environment; for this example, let’s use one that we already configured, which tells soopervisor to mount a local directory to every pod so we can review results later:

cp /soopervisor-workflow.yaml soopervisor.yaml

Submit pipeline

We finished configuring; let’s now submit the workflow:

# build docker image and generate an argo's yaml spec
soopervisor export training --skip-tests --ignore-git --mode force

# import image to the k8s cluster
k3d image import project:latest --cluster mycluster

# submit workflow
argo submit -n argo --watch training/argo.yaml

Congratulations! You just went from a legacy notebook to production-ready pipeline! 🎉

Note

k3d image import is only required if creating the cluster with k3d.

Once the execution finishes, take a look at the generated artifacts:

ls /mnt/project

Tip

You may also watch the progress from the UI.

# port forwarding to enable the UI
kubectl -n argo port-forward --address 0.0.0.0 svc/argo-server 2746:2746

Then, open: https://127.0.0.1:2746

Clean up

To delete the cluster:

k3d cluster delete mycluster