Task communication#
Since soopervisor
executes tasks in isolated environments, you must provide
a way to pass the output files of each task to upcoming tasks that use them as
inputs.
There are two ways of doing so: either mount a shared disk on all containers,
or configure a File
client to use remote storage; we describe both options
in the next sections.
Using remote storage#
As an alternative, you can configure a File
client to ensure each task
has their input files before execution. We currently support Amazon S3 and
Google Cloud Storage.
To configure a client, add the following to your pipeline.yaml
file:
# configure a client
clients:
# note the capital F
File: clients.get
tasks:
# content continues...
Then, create a clients.py
file (in the same directory as your
pipeline.yaml
) and declare a get
function that returns a File
client instance:
from ploomber.clients import S3Client
def get():
return S3Client(bucket_name='YOUR-BUCKET-NAME',
parent='PARENT-FOLDER-IN-BUCKET',
json_credentials_path='credentials.json')
from ploomber.clients import GCloudStorageClient
def get():
return GCloudStorageClient(bucket_name='YOUR-BUCKET-NAME',
parent='PARENT-FOLDER-IN-BUCKET',
json_credentials_path='credentials.json')
Next, create a credentials.json
(in the same directory as your
pipeline.yaml
) with your authentication information. The
file should look like this:
{
"aws_access_key_id": "YOUR-ACCESS-KEY-ID",
"aws_secret_access_key": "YOU-SECRET-ACCESS-KEY"
}
{
"type": "service_account",
"project_id": "project-id",
"private_key_id": "private-key-id",
"private_key": "private-key",
"client_email": "client-email",
"client_id": "client-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account.iam.gserviceaccount.com"
}
Note: If you’re using a Docker-based exporter (K8s/Argo, Airflow, or
AWS Batch),you must ensure that your credentials.json
file is included in
your Docker image. You can ensure this by adding the following to your
soopervisor.yaml
The soopervisor.yaml
file:
some-name:
# tell soopervisor to include the credentials.json file
include: [credentials.json]
# continues
You can check your local configuration by loading your pipeline using
ploomber status
. If you see a table listing your tasks, it means the
client has been configured successfully.
Furthermore, when executing the soopervisor export
command and using
a Docker-based exporter (K8s/Argo, Airflow, and AWS Batch), Soopervisor
will check that the File
client in the Docker image is correctly configured
by trying to establish a connection with your credentials to the remote storage.