Full workflow#

Important

This tutorial requires soopervisor 0.6.2 or higher, and soorgeon 0.0.10 or higher.

This tutorial shows how to go from a monolithic Jupyter notebook to a modular, production-ready pipeline deployed in workflow by using the tools in our ecosystem:

  1. soorgeon

  2. ploomber

  3. soopervisor

Pre-requisites#

Building Docker image#

We provide a Docker image so you can quickly run this example:

# get repository
git clone https://github.com/ploomber/soopervisor
cd soopervisor/tutorials/workflow

# build image
docker build --tag ploomber-workflow .

# create a directory to store the pipeline output
export SHARED_DIR=$HOME/ploomber-workflow
rm -rf $SHARED_DIR
mkdir -p $SHARED_DIR

# start (takes ~1 minute to be ready)
docker run -i -t \
    --privileged=true -v /var/run/docker.sock:/var/run/docker.sock \
    --volume $SHARED_DIR:/mnt/shared-folder \
    --env SHARED_DIR \
    --env PLOOMBER_STATS_ENABLED=false \
    --network host \
    ploomber-workflow

Note

NOTE: We need to run docker run in privileged mode since we’ll be running docker commands inside the container. More on that here.

Upon initialization, JupyterLab will be running at http://127.0.0.1:8888

Refactor notebook#

First, we use soorgeon to refactor the notebook:

soorgeon refactor nb.ipynb -p /mnt/project/output -d parquet

We can generate a plot to visualize the dependencies:

ploomber plot

If you open the generated pipeline.png, you’ll see that soorgeon inferred the dependencies among the sections in the notebook and built a Ploomber pipeline automatically!

Now you can iterate this modular pipeline with Ploomber, but for now, let’s go to the next stage and deploy to Kubernetes.

Configure target platform#

Soopervisor allows you to configure the target platform using a soopervisor.yaml file, let’s add it and set the backend to argo-worflows:

# soopervisor add requires a requirements.lock.txt file
cp requirements.txt requirements.lock.txt

# add the taget environment
soopervisor add training --backend argo-workflows

Usually, you’d manually edit soopervisor.yaml to configure your environment; for this example, let’s use one that we already configured, which tells soopervisor to mount a local directory to every pod so we can review results later:

cp /soopervisor-workflow.yaml soopervisor.yaml

Submit pipeline#

We finished configuring; let’s now submit the workflow:

# build docker image and generate an argo's yaml spec
soopervisor export training --skip-tests --ignore-git --mode force

# import image to the k8s cluster
k3d image import project:latest --cluster mycluster

# submit workflow
argo submit -n argo --watch training/argo.yaml

Congratulations! You just went from a legacy notebook to production-ready pipeline! 🎉

Note

k3d image import is only required if creating the cluster with k3d.

Once the execution finishes, take a look at the generated artifacts:

ls /mnt/project

Tip

You may also watch the progress from the UI.

# port forwarding to enable the UI
kubectl -n argo port-forward --address 0.0.0.0 svc/argo-server 2746:2746

Then, open: https://127.0.0.1:2746

Clean up#

To delete the cluster:

k3d cluster delete mycluster