Here at Instaclustr, we are big fans of Cadence, the open source workflow orchestration solution developed at Uber. Having been involved in the development of our managed service offering, I try to keep abreast of news and articles that highlight how people are taking advantage of this great open source solution.
One such article was released recently on the Doordash Engineering blog, Enabling Faster Financial Partnership Integrations Using Cadence. It’s a good article that goes into quite interesting detail about the challenges they were facing and how Cadence slotted into the solution.
Of particular interest to me was their investigation into possible alternatives to Cadence, which included:
- An in-house solution: ever tempting but rarely as easy as you hope!
- AWS Step functions: nice AWS integration but perhaps difficult to test and verify
- Netflix Conductor: a workflow orchestration service very similar in philosophy to Cadence and the subject for this article.
Netflix Conductor
Netflix Conductor is a well known workflow orchestration system. We have seen continued interest in the community, and in this article we shall try to answer some questions about it.
So how similar are Conductor and Cadence? You may be unsurprised by the answer: quite similar!
Let’s start with the project names and we already have our first resemblance: An orchestra conductor is responsible for controlling the tempo and cadence of a performance.
A superficial observation? Perhaps, but it does foreshadow a recurring theme of similarity that we will observe.
The Orders Workflow
To compare these two offerings, let’s first define a simple retail ordering workflow.
When an order is received, the following steps are taken:
- Check if the order is in stock
- If it’s not:
- Get the estimated restock date
- Notify the customer of the delay
- Wait for restock
- Start the order process again
- If it is in stock
- Package and send the order to the nominated address
- Notify the customer of pending delivery with a tracking reference
- Complete
We can visualize the workflow like this:
Fig 1. Orders workflow
As we can see it’s a simple workflow with a single decision and an optional loop, depending on if the item is in stock, and a possible wait.
Let’s see how this translates into Cadence and Conductor.
Getting Started
For both systems, the first step is to identify the individual tasks that make up the workflow. In Cadence, these are known as Activities, and in Conductor they are simply known as Tasks.
In both systems they represent the same thing, a distinct piece of work that can be encapsulated and handed off to a worker process in a fault tolerant manner.
For our workflow, we can identify the following actions, which are shown in Fig 1.
- Check stock: See if the order is in stock
- Notify the customer of a delay with an ETA
- Package and send: Send the order
- Notify the customer of delivery with a tracking reference
Now that we have identified these actions, let’s codify them in our systems.
First let’s look at Cadence:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
public interface OrdersActivities { @ActivityMethod StockInfo checkStock(String orderId); @ActivityMethod void notifyDelay(String emailAddress, Long eta); @ActivityMethod String packageAndSendOrder(String orderId, String address); @ActivityMethod void notifySent(String emailAddress, String trackingId); } |
As we see here, we can define our activities in Java (Go is also supported) using an interface class, which will be registered later and used by our worker processes.
At this stage the implementation isn’t important, but we will demonstrate that a bit later.
Now in Conductor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
[ { "name": "check_stock", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180, }, { "name": "notify_customer_of_delay", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180, }, { "name": "package_and_send_order", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180, }, { "name": "notify_customer_order_sent", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180, } ] |
Here we have an object definition in json, which includes various configuration options such as retry counts, timeouts, and failure policies.
This is a great example of what I mentioned before. The differences in implementation are plain to see, but the similarities are also very clear: we must define the actions that our workflow will be doing, and the implementation of that work is elsewhere.
Defining the Workflow
Now that we have codified the activities (or tasks) our workflow can perform, we can write our workflow. The workflow definition contains the logic that will be applied when a workflow operation is started; it will contain the business logic used to fulfil the requirements.
Let’s start again in Cadence:
1 2 3 4 |
public interface OrdersWorkflow { @WorkflowMethod void processOrder(Order order); } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
public class OrdersWorkflowImpl implements OrdersWorkflow { @Override public void processOrder(Order order) { // 1: Configure the workflow stub final OrdersActivities activities = Workflow.newActivityStub(OrdersActivities.class, new ActivityOptions.Builder() .setRetryOptions(new RetryOptions.Builder() .setInitialInterval(Duration.ofSeconds(10)) .setMaximumAttempts(3) .build()) .setScheduleToCloseTimeout(Duration.ofMinutes(5)).build()); // Check stock StockInfo stockInfo = activities.checkStock(order.getOrderId()); if (!stockInfo.isInStock()) { // Notify customer of delay activities.notifyDelay( order.getAccountEmail(), stockInfo.getRestockEta()); // 2: Wait for restock Workflow.sleep(stockInfo.getRestockWait()); // 3: Start a new workflow Workflow.continueAsNew(order); } // Package and send, returning tracking reference String trackingId = activities.packageAndSendOrder( order.getOrderId(), order.getAccountAddress()); // Notify customer activities.notifySent(order.getAccountEmail(), trackingId); } } |
Let’s call out some of the parts of the solution we have here that warrants some additional explanation:
- First, we must create the Cadence activities stub by which we will invoke the activity workers (more on that later).
- Here is where we can set the retry and timeout behaviours of the workflow. Now we can invoke the stub methods as part of our workflow.
- Here we see an important function that is built into Cadence, Workflow.Sleep. This provides a reliable way of pausing a workflow for extended periods of time with a guarantee it will awaken at the right time. And another function, Workflow.continueAsNew. This will restart a workflow as if it’s a new invocation but it will maintain the same unique ID for traceability.
The rest of the logic is self explanatory.
A keen eyed observer may notice this workflow contains no loop, but our workflow needs some way of continually waiting until our order is in stock.
In this case, we are using Workflow.continueAsNew as a means of implementing a type of recursion. As long as the item is out of stock, a new workflow will be created until it has been packaged and sent. Thus the requirement is satisfied.
That’s it for the Cadence workflow definition. We can see it’s a code first solution that gives intuitive control to the developer implementing the business logic. Even without a basic understanding of Java, the reader can follow along the code and determine what’s happening.
Now lets move to the Conductor workflow definition:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
{ "name": "process_order", "description": "Processes an order and sends it to the customer", "version": 1, "schemaVersion": 2, "tasks": [ { "name": "Wait for stock", "taskReferenceName": "LoopTask", "type": "DO_WHILE", "loopCondition": "!$.check_stock.output.instock", "loopOver": [ { "name": "check_stock", "taskReferenceName": "check_stock", "inputParameters": { "orderId": "${workflow.input.orderId}" }, "type": "SIMPLE" }, { "name": "switch_task", "taskReferenceName": "is_order_in_stock", "inputParameters": { "case_value_param": "${check_stock.output.instock}" }, "type": "SWITCH", "evaluatorType": "value-param", "expression": "case_value_param", "decisionCases": { "false": [ { "name": "notify_customer_of_delay", "taskReferenceName": "notify_customer_of_delay", "inputParameters": { "email": "${workflow.input.email}", "restockEta": "${check_stock.output.restockEta}" }, "type": "SIMPLE" }, { "name": "delay_workflow", "taskReferenceName": "delay_workflow", "type": "WAIT" } ] }, "defaultCase": [ { "name": "package_and_send_order", "taskReferenceName": "package_and_send_order_to_address", "inputParameters": { "orderId": "${workflow.input.orderId}", "address": "${workflow.input.email}" }, "type": "SIMPLE" }, { "name": "notify_customer_order_sent", "taskReferenceName": "notify_customer_order_sent", "inputParameters": { "email": "${workflow.input.email}", "trackingRef": "${package_and_send_order_to_address.output.trackingRef}" }, "type": "SIMPLE" }, { "name": "terminate", "taskReferenceName": "terminate", "type": "TERMINATE", "inputParameters": { "terminationStatus": "COMPLETED" } } ], "startDelay": 0, "optional": false } ] } ] } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
{ "name": "process_order", "description": "Processes an order and sends it to the customer", "version": 1, "schemaVersion": 2, "tasks": [ { "name": "Wait for stock", "taskReferenceName": "LoopTask", "type": "DO_WHILE", "loopCondition": "!$.check_stock.output.instock", "loopOver": [ { "name": "check_stock", "taskReferenceName": "check_stock", "inputParameters": { "orderId": "${workflow.input.orderId}" }, "type": "SIMPLE" }, { "name": "switch_task", "taskReferenceName": "is_order_in_stock", "inputParameters": { "case_value_param": "${check_stock.output.instock}" }, "type": "SWITCH", "evaluatorType": "value-param", "expression": "case_value_param", "decisionCases": { "false": [ { "name": "notify_customer_of_delay", "taskReferenceName": "notify_customer_of_delay", "inputParameters": { "email": "${workflow.input.email}", "restockEta": "${check_stock.output.restockEta}" }, "type": "SIMPLE" }, { "name": "delay_workflow", "taskReferenceName": "delay_workflow", "type": "WAIT" } ] }, "defaultCase": [ { "name": "package_and_send_order", "taskReferenceName": "package_and_send_order_to_address", "inputParameters": { "orderId": "${workflow.input.orderId}", "address": "${workflow.input.email}" }, "type": "SIMPLE" }, { "name": "notify_customer_order_sent", "taskReferenceName": "notify_customer_order_sent", "inputParameters": { "email": "${workflow.input.email}", "trackingRef": "${package_and_send_order_to_address.output.trackingRef}" }, "type": "SIMPLE" }, { "name": "terminate", "taskReferenceName": "terminate", "type": "TERMINATE", "inputParameters": { "terminationStatus": "COMPLETED" } } ], "startDelay": 0, "optional": false } ] } ] } |
This is a bit harder to parse, so let’s break it down into the various parts.
First off, Conductor does not have functionality like Cadence’s Workflow.continueAsNew, so we need to achieve the workflow requirement by implementing a loop until the order is in stock.
1 2 3 4 5 6 7 8 9 10 |
... { "name": "Wait for stock", "taskReferenceName": "LoopTask", "type": "DO_WHILE", "loopCondition": "!$.check_stock.output.instock", "loopOver": [ ... ] ... |
The first task we invoke is a do-while loop. This is a system task built into Conductor, it’s the basic looping operator we have and it has a few interesting quirks:
- All the tasks defined within the loop always get executed once, the first time we encounter the loop, then the condition is checked to determine if it needs to loop again.
This seems counter intuitive, but it allows the loopCondition to reference the result of a task defined in the loop, in this case the check_stock task
- The loopCondition is defined as a Javascript expression
In our case, our condition is the “check_stock” task returns an instock boolean result. If false, we set the loopCondition to true, which means we want to continue looping. When loopCondition false, the loop will exit and the workflow will continue with the next task. This last quirk posed a challenge for our desired workflow. We only wanted to loop if the order is not in stock, but to achieve that, checking if it’s in stock must be part of the loop!
Thankfully, we can work around this. Switch is another system task which defines a condition and branches.
Here, we can check if the item is in stock
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
{ "name": "switch_task", "taskReferenceName": "is_order_in_stock", "inputParameters": { "case_value_param": "${check_stock.output.inStock}" }, "type": "SWITCH", "evaluatorType": "value-param", "expression": "case_value_param", "decisionCases": { "false": [ ... ] }, "defaultCase": [ ... ] } |
Note that this task is taking an output parameter from the previous task, and using it to drive the switch statement.
So now we are still inside the loop, remember it always executes the first time, and we can branch our workflow and execute the following tasks depending on the outcome.
If the Item is in stock:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
{ "name": "package_and_send_order", "taskReferenceName": "package_and_send_order_to_address", "inputParameters": { "orderId": "${workflow.input.orderId}", "address": "${workflow.input.email}" }, "type": "SIMPLE" }, { "name": "notify_customer_order_sent", "taskReferenceName": "notify_customer_order_sent", "inputParameters": { "email": "${workflow.input.email}", "trackingRef": "${package_and_send_order_to_address.output.trackingRef}" }, "type": "SIMPLE" }, { "name": "terminate", "taskReferenceName": "terminate", "type": "TERMINATE", "inputParameters": { "terminationStatus": "COMPLETED" } } |
- Invoke the package_and_send_order task
- Invoke the notify_customer_order_sent task
- Invoke the Terminate system task with the COMPLETED status
By terminating the workflow here, we break out of the loop, thereby satisfying our requirement.
If the Item is backordered:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
{ "name": "notify_customer_of_delay", "taskReferenceName": "notify_customer_of_delay", "inputParameters": { "email": "${workflow.input.email}", "restockEta": "${check_stock.output.restockEta}" }, "type": "SIMPLE" }, { "name": "delay_workflow", "taskReferenceName": "delay_workflow", "type": "WAIT" } |
- Invoke the notify_customer_delay task
- Invoke a Wait system task
Similarly to our Cadence example, we want to pause the workflow until we know the item is due to be restocked. Unlike Cadence however, Conductor does not have a method to wait a specific period of time before continuing. The Wait task will pause workflow execution, but the only way to restart from a Wait is by making an external call to the REST API and unblock it.
For this exercise, when it comes time to execute the workflow, we will need to unblock workflows that end up in this state. When the Wait task is completed, we reach the end of the loop and will start another iteration going over the same steps.
And there we have it, a fully defined workflow in Conductor, achieving very similar functionality as Cadence but going about it in a quite different manner.
The Workers
Until now, we have glossed over the work that gets done by the activities (or tasks) our workflows are executing.
In both Cadence and Conductor, we must register worker processes which connect to the central application and register themselves to deliver some functionality.
In Cadence and Conductor, workers can be written in either Java or Go, with Conductor also offering Python support.
We will stick with Java and see how each system does it, starting with Cadence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
WorkflowClient workflowClient = WorkflowClient.newInstance( new WorkflowServiceTChannel( ClientOptions.newBuilder() .setHost(HOST) .setPort(7933) .build()), WorkflowClientOptions.newBuilder().setDomain(DOMAIN).build()); WorkerFactory factory = WorkerFactory.newInstance(workflowClient); Worker worker = factory.newWorker(TASK_LIST); worker.registerWorkflowImplementationTypes(worker.OrdersWorkflowImpl.class); worker.registerActivitiesImplementations( new worker.OrdersWorker(new StockService(), new NotificationService())); factory.start(); |
Stepping through the code
- First we create a workflow client, which we provide connection details for the Cadence cluster front end workers.
- We then register which workflow this worker provides an implementation for
- And we also register the activities and implementation this worker providers
Cadence defers not only the activity execution to workers but also the workflow execution itself. When you start a workflow, Cadence enqueues a workflow request, and a registered workflow worker must pick up that request and start processing it.
Cadence workers are often configured to execute both the workflow and activities, but it is not mandatory. We can have workers that only implement activities or workflows.
If we were to run this code now, it would start up and connect to the Cadence cluster, and periodically poll to see if there was any work to do.
Moving on to Conductor workers, in Conductor before we can register workers we need to register both task and workflow definitions using the provided REST API.
We do this with a simple curl command (truncated):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
curl -X POST \ https://localhost:8080/api/metadata/taskdefs \ -H 'Content-Type: application/json' \ -d '[ { "name": "check_stock", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180, "ownerEmail": "[email protected]" }, { "name": "notify_customer_of_delay", ... }, { "name": "package_and_send_order", ... }, { "name": "notify_customer_order_sent", ... } ]' |
And similarly for the entire workflow definition (truncated here)
1 2 3 4 5 6 7 8 9 |
curl -X POST \ https://localhost:8080/api/metadata/workflow \ -H 'Content-Type: application/json' \ -d '{ "name": "Wait for stock", "taskReferenceName": "LoopTask", "type": "DO_WHILE", ... }' |
Now we can define the workers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
TaskClient taskClient = new TaskClient(); taskClient.setRootURI("https://localhost:8080/api/"); int threadCount = 4; // number of threads used to execute workers. Worker worker1 = new StockWorker("check_stock", new StockService()); Worker worker2 = new PackagingWorker("package_and_send_order", new StockService()); Worker worker3 = new NotificationWorker("notify_customer_of_delay", new NotificationService()); Worker worker4 = new NotificationWorker("notify_customer_order_sent", new NotificationService()); TaskRunnerConfigurer configurer = new TaskRunnerConfigurer.Builder(taskClient, Arrays.asList(worker1, worker2, worker3, worker4)) .withThreadCount(threadCount) .build(); // Start the polling and execution of tasks configurer.init(); |
- We create a Conductor task client, using the Conductor REST API details
- We then register our activity implementations by their unique name
- Finally we configure the taskrunner to serve these tasks
Now here we see another similarity between Cadence and Conductor: both systems must be informed of the workers that can deliver functionality, so that work can be handed off.
One key difference—the workflow code is executed on the Conductor server. Any system level task (do-while, switch, wait etc.) is handled by the workflow engine remotely. Once the system encounters a Simple task, it delegates that execution to the task workers.
Just as with the Cadence workers, running this code would start up and connect to the Conductor cluster, and periodically wait for work to get assigned.
Putting It All Together
Phew!
We now have two implementations of the same workflow in Cadence and Conductor.
We have tasks definitions, workflows and workers, all registered and ready to work—so let’s get them started!
Cadence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
WorkflowOptions workflowOptions = new WorkflowOptions.Builder() .setExecutionStartToCloseTimeout(Duration.ofMinutes(5)) .setTaskList(App.TASK_LIST) .build(); WorkflowClient workflowClient = WorkflowClient.newInstance( new WorkflowServiceTChannel(ClientOptions.newBuilder() .setHost(App.HOST) .setPort(7933) .build()), WorkflowClientOptions.newBuilder().setDomain(App.DOMAIN).build()); OrdersWorkflow ordersWorkflow = workflowClient.newWorkflowStub(OrdersWorkflow.class, workflowOptions); Order order = new Order(“lr-3212-456x", "john@instaclustr.com", "20 Smith St"); WorkflowClient.start(ordersWorkflow::processOrder, order); |
For cadence we must invoke the workflow via code
- Create a workflow client
- Register the client to a particular workflow interface and configure the options
- Start the workflow
Relatively simple, but it does require configuring a client application.
Let’s consider how this would work in a real world scenario. If we assume there is an online store which creates the order request and processes the payment, then the final action it would do is start the workflow process using code similar to ours.
Now the workflow has started, we can use Cadence Web, to track workflow execution.
Here we see our workflow has the status Open.
If we look at the details, we can see that it has been delayed, and it is currently waiting for the restock sleep timer to start another iteration.
Cadence web is a powerful tool which can help query workflows and visualize their progress or problems, but it currently is not able to start workflows.
To start a Conductor workflow, it is much simpler:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
curl -X POST \ https://localhost:8080/api/workflow/ \ -H 'Content-Type: application/json' \ -d '{ "name": "process_order", "version": 1, "correlationId": "process_order_cf_128493x", "input": { "orderId": "cf-4421x", "email" : "[email protected]", "address" : "1 Smith St" } }' |
We just need to fire off a curl command to the REST API.
We can also go one better! We can explore the Conductor web UI, and use it to start our workflow and track its progress.
As you can see, not only can we start a workflow here, but Conductor has even created a flow chart with steps you can inspect and see their inputs and outputs.
This is undoubtedly useful but also incredibly important when trying to fix defects.
On closer inspection of our workflow we can see a bug! Our check_stock task has informed us the order is not in stock, but our switch statement has evaluated that it is.
What happened? If we look closely, we see that it is an issue with casing in the switch statement:
Our workflow is checking for the output field instock, but our task worker is returning a field inStock. Pretty subtle, but has important ramifications. We could fix this either by updating the workflow definition or by ensuring the task worker returns the correct value.
In fact, the task worker could decide to return any value it wants! We must be vigilant and have a well thought out testing framework to avoid these typing issues in the future. Here we see one of the drawbacks of our flexible data model, there are no enforceable data contracts.
Final Verdict
So what do we think? Let’s take a step back and reflect.
Cadence and Conductor provide a powerful way to orchestrate your microservice architecture, and codify and run business logic in a central place.
Both systems are able to handle the unpredictable nature of distributed systems and can help you by abstracting away the error handling logic and retry mechanisms that are necessary when working in a cloud environment.
Ultimately, the decision of which is right for you comes down to what features you find more important.
At Instaclustr, we are proud to be champions of Cadence, and continue to improve on our managed service that we are delivering every day for our customers. We find the code first approach to workflow definitions to be a compelling advantage, which allows the Development team more control over how their workflow is implemented, and also makes the workflow code easy to comprehend and maintain.
Since the workflow is run by workers, there is essentially no limit on the libraries and tools that can be employed to get to our stated goals.
Finally, the tighter data contracts ensure we have more predictable behaviour and makes testing and integration a simpler process.
Conductor offers a different approach. The convenience of a REST API makes a clear case for itself, along with the feature-rich user interface. However the workflow definitions can be cumbersome to design and tricky to debug.
The limited control mechanisms can make it difficult to port across an existing workflow without modification, and the readability of the DSL leaves a bit to be desired.
Either way, we’d love to hear from you about how Uber Cadence or Netflix Conductor has helped you solve problems at your company.