What is Composable Logs
?
Composable Logs is an open source Python framework for running Python ML/data workflows.
A main feature of Composable Logs is that workflows can execute on stateless compute infrastructure (that may be ephemeral, serverless). So, a 24/7 running database or service to record past runs or metrics is not needed. Rather, when Composable Logs executes a workflow, all key events (and logged artifacts) are emitted using the OpenTelemetry standard. Thus, after a workflow has completed, an immutable record of the run can be persisted as a JSON file to a data lake (as one storage option).
For reporting and experiment tracking, the structured logs can be converted into various formats. Currently, Composable Logs can convert logs into a static website that can be deployed eg. to Github Pages.
Open source
Composable Logs is distributed under the terms of the MIT license, see repo for details.
There are two main dependencies:
The Ray framework for parallel execution of Python tasks (open source)
Ray makes it possible to develop and test code locally on a laptop. And the same code also scales up to large Ray clusters with multiple nodes. Ray natively supports cluster setup on AWS, Azure, GCP and Kubernetes, see details.
More in detail, Composable Logs uses Ray Workflows. These are essentially DAG:s of Python tasks, but Ray Workflows also support more advanced patterns like computational DAG:s being created dynamically at runtime, or task recursion. As of 1/2023, Ray Workflows is still an alpha feature, see documentation.
Info
Currently, all example usages of Composable Logs are static DAG:s and in this documentation we use pipeline and workflow almost interchangeably. However, with workflow we emphasize that a pipeline is implemented (or executed) as a Ray workflow and could potentially be more involved than a static DAG.
As of 1/2023: some modifications would be expected for Composable Logs to execute workflows on multinode clusters, or to implement non-static DAGs.
The OpenTelemetry for observability and logging task execution details (open standard)
- OpenTelemetry is an open standard and an CNCF incubating project.
- Broad support from various vendors, list. In particular, the main clouds (Azure, AWS, Google Cloud) all support ingestion of OpenTelemetry logs (with different levels of support of the standard).
- OpenTelemetry steering group include broad industry representation.
A motivation to use an open standard to ingest ML/data log data is that this opens option to potentially correlate this data with system metrics (is this feasible?). Eg.,
- To troubleshoot a failed data ingestion task, it can be useful to view its network input/output.
- Before deploying a long running ML-training job, it might be useful to monitor GPU/CPU loads. Eg. is it 10% or 90%.
Example architecture
graph TB;
subgraph " "
Git[Git repository]
Developer[Developer, <br />local development]
Developer --> Git
subgraph "<b>Stateless execution</b>"
subgraph "<b>Execution driver</b> (composable-logs)"
Code[Codes]
end
subgraph "<b>Execution backend</b> (Ray cluster)"
run_1[Workflow run 1]
run_2[Workflow run 2]
run_3[Workflow run 3]
end
Code --> run_1
Code --> run_2
Code --> run_3
end
subgraph "<b>Storage</b>"
subgraph "<b>Persisted logs</b> <br/> (eg. data lake, OpenTelemetry DB, Github build artifact)"
logs_1[Logs for run 1]
logs_2[Logs for run 2]
logs_3[Logs for run 3]
end
end
subgraph "<b>Reporting and UI</b>"
direction TB
ui_data[Processed data]
ui_website[Website with past runs <br/> using modified version of MLFlow]
ui_data --> ui_website
end
run_1 --> logs_1
run_2 -->|Persist <br/>OpenTelemetry<br/>logs.| logs_2
run_3 --> logs_3
logs_1 --> ui_data
logs_2 -->|Convert logged <br/>OpenTelemetry data<br/> into UI-friendly format| ui_data
logs_3 --> ui_data
ui_website ---> Developer
Git --> Code
end
Demo ML training workflow
One motivation for developing Composable Logs is to have a framework to run workflows on limited or no cloud infrastructure. The Demo workflow-section in this documentation describe in more detail how one eg. deploy and run a workflow using only services provided with a (free, personal) Github account.
Use cases and scope
-
Composable Logs can currently run public data/ML pipelines using only services provided with a (free, personal) Github account. Since this can be scheduled to run daily, one could:
- Run (smaller scale) public data pipelines that process and report on open data.
- Showcase how to use a library with a publicly running pipeline.
-
Improve the tooling to collaborate on notebooks and public open source data and ML pipelines.
- Offer example pipeline(s) exploring how ML Ops can scale down to minimum "free tier" setups.
- Reproducible science: schedule the analysis for a paper to run eg. every month, potentially with updated dependencies.
Status
The project is already usable, but not for critical workloads.