
Video and gaming streaming service Netflix has released as open source the workflow orchestrator that its army of data scientists and analysts use every day to understand user behaviors and other large-scale data-driven trends.
The Maestro workflow orchestrator, released under an Apache 2.0 license, was designed to support hundreds of thousands of workflows and has completed up to 2 million jobs in a single day for the media company.
How Maestro Works
According to company engineers, it is highly scalable, extensible and able to meet strict service level objectives (SLO) even during spikes of traffic.
It is built on top of a range of open source technologies, namely Git, Java (21), Gradle and Docker.
Percona is widely recognized as a world-class open source database software, support, and services company for MySQL®, MongoDB®, and PostgreSQL® databases. We are dedicated to helping make your databases and applications run better through a unique combination of expertise and open source software.
Maestro can be evoked from the cURL command line, which provides the ability to create, run, and delete a workflow and an associated batch of data. The workflow is defined in JSON, and the user’s business logic can be packaged into Docker images, Jupyter notebooks, bash scripts, SQL, Python, and other formats.
Behind the scenes, Maestro manages the entire lifecycle of a workflow, handling retries, queuing, and task distribution to compute engines. Not only does it support Directed Acyclic Graphs (DAGs) — table stakes in the AI-driven world of 2024 — but also cyclic workflows and multiple reusable patterns, through for each loop, sub workflows, and conditional branching.
“It supports a wide range of workflow use cases, including ETL pipelines, ML workflows, AB test pipelines, pipelines to move data between different storages,” a group of Netflix engineers collectively wrote in a recent blog post announcing the release. “Maestro’s horizontal scalability ensures it can manage both a large number of workflows and a large number of jobs within a single workflow.”
Birth of Maestro
Netflix is no stranger to open source software, having released many tools it developed internally as open source. System stress-testing tool Chaos Monkey was released in 2011, and inspired a whole generation of chaos testing tools. Other open source projects that Netflix has spun off include the routing gateway Zuul and the microservices routing engine Conductor, since deprecated.
Netflix first let the world know about Maestro in 2022 in a blog post that explained its origins. The orchestrator then being used, called Meson, was straining under the workloads of thousands of daily jobs, particularly around peak usage time.
“Meson was based on a single leader architecture with high availability. As the usage increased, we had to vertically scale the system to keep up and were approaching AWS instance type limits,” the engineers wrote in the 2022 post.
Worse, the workloads were expected to increase by at least 100% per year, and the sizes of the workflows were expected to grow as well.
From the start, Maestro was designed to be highly-scale and extensible. It was built on a DAG architecture, where each workflow was comprised of a series of steps. And each step can have dependencies, triggers and other conditionals. The business logic of each workflow is run in isolation, guaranteeing SLOs are met. All the services are designed to be stateless so they can be scaled out as needed.