Adding Custom Search Attributes in Cadence®
Cadence, the open source workflow orchestration engine originally developed by Uber, facilitates the writing of asynchronous code that is fault-tolerant, scalable, and more. No wonder it has seen heavy uptake not only within Uber but in other large-scale organizations! However, with scale of use comes difficulty in managing the record-keeping and analysis of workflows run on a given Cadence cluster. Fortunately, Cadence (when run in an ‘advanced visibility’ configuration) offers the ability to add custom ‘search attributes’ to your workflows in order to greatly simplify this task.
The principle of adding a search attribute to a Cadence Workflow—and using it to filter our workflow records—is quite straightforward. We are given the option to include a key-value map of
SearchAttributes when initializing a Cadence workflow, and can use this key-value map to attach arbitrary data that is relevant to our business process. For example, say that we use Cadence to run some workflows to deliver products from Australia to other neighbouring countries and make 2 delivery workflows targeting New Zealand and Indonesia—we can simply include
“destination”: “New Zealand” in the SearchAttribute map for the first workflow, and
“destination”: “Indonesia” in the second.
We could then make use of these attributes by making a query like the following to the Cadence cluster via the inbuilt CLI, for an arbitrary domain called
cadence --do test-domain wf list -q 'destination = “New Zealand”'
This would fetch us a list of the workflow(s) run exclusively for deliveries to New Zealand. This is a simple example, but the query language permits enough complexity in its possible operations and combinations of search attributes to be a powerful tool in accessing and managing your workflow records.
However, there is an important preceding step to leveraging custom search attributes in Cadence, which will be our main focus in this piece: Before adding a given attribute key to a workflow, it must first be created as an available search attribute on a cluster level. Let’s explore how this is done.
In order to add a new attribute to our advanced visibility Cadence cluster, we use the provided Cadence CLI command
add-search-attr, which has the following usage:
cadence --domain test-domain adm cl asa --search_attr_key destination --search_attr_type 0
This command has 2 important effects:
- Adding ‘destination’ as a searchable field in the Cadence cluster’s supporting OpenSearch cluster. The --search_attr_type 0 flag is to indicate that this field is of a ‘Text’ or string datatype.
- It updates the dynamic config of the Cadence node serving the CLI request only, to include the added search attribute in the frontend.validSearchAttributes property. This is a server-side whitelisting map that determines what search attributes can be searched for in OpenSearch via that given Cadence node.
It’s important to note here that in the general case, Cadence clusters should always be provisioned behind and accessed via a load balancer due to their lack of internal load balancing mechanisms. Therefore, if making calls via the load balancer, the node serving our
add-search-attr call may not be the same node that we later query for that same search attribute, which would lead to our query failing validation.
The documented solution by Cadence is to update the dynamic config of all nodes in the cluster to include the new attribute(s) in the frontend.validSearchAttributes property value. However, this can be a problematic and demanding requirement, depending on your specific operational environment. Developers and other users who may want to quickly add and use search attributes to a cluster may be limited if they do not have permissions to access nodes or make direct config changes.
In such situations, the value of an automated solution to propagate changes to one node’s list of valid search attributes throughout the rest of the Cadence cluster is evident. At Instaclustr, we have designed and implemented one such solution, which is outlined in the remainder of this blog.
Our solution involves introducing a scheduled daemon process to each node of the Cadence cluster—let’s call this the propagator process. On each scheduled run, propagator will conduct several actions.
Firstly, it will parse the dynamic config file—which is in YAML format—for the current value of the frontend.validSearchAttributes property, which is a map from the search attribute key to the integer representation of the search attribute datatype.
Having extracted this, propagator can compare the contents of this map in the current run with its contents as of the last run (the current value can be stored for later comparison at the end of each run).
Then, if new custom search attributes are detected in the map on any given node propagator will make the CLI call to add the property to the dynamic config of each other node in the cluster, by specifically targeting their private IP addresses.
cadence --ad <OTHER NODE’S IP>:7933 --domain test-domain adm cl asa --search_attr_key destination --search_attr_type 0
Note that we benefit from the fact that all the operations made by the CLI call are idempotent: that is to say, running the same command against the same target multiple times is safe, as the attribute will only be added to OpenSearch and the Cadence node’s dynamic config once.
There is one additional consideration we should draw attention to with this method: the importance of having a way of avoiding redundant node-to-node ‘return calls’.
Consider that we have 2 Cadence nodes, A and B, and that A has a search attribute added to it via the CLI. Our daemon process on node A will note the change to the whitelist on the next iteration, and make its own CLI call to propagate this to B. But, our daemon process on node B will then notice a change in its own whitelist on the next iteration, which would trigger it to make another CLI call to attempt to propagate it to A.
This last internode communication would evidently be pointless. While it would not cause any direct issues—recall that the command is idempotent and so no change will occur due to a second application of the CLI call to a given node—at scale this could lead to significant wasteful traffic amongst our Cadence nodes. So, how can we mitigate this issue? There are a couple of possibilities.
One option is to be more selective regarding which neighbours each Cadence node targets for whitelist propagation. For example, consider a general case of an N-node cluster, with nodes indexed 0 through (N-1). Rather than our initial design for the daemon process, we could modify it such that instead of making CLI calls to all other nodes in the cluster, the n-th node just makes a call to the (n+1 mod N)-th node. This is a very simple strategy, but it should illustrate the point that we can limit ourselves to making only one redundant CLI call—at the cost of it taking (N-1) process iterations for all nodes to possess the new whitelist attribute. There are certainly more intelligent topological approaches available than the trivial one described here. However, we here at Instaclustr adopt a second, separate approach.
Since there may be various additional motivators for other services or applications to have access to an up-to-date record of the custom search attributes added to the cluster, it may make sense to maintain such a record in a central database—which can then also become a convenient source of truth against which to validate all potential internode CLI calls. This can be done by having the node daemon processes write back all new attributes to a central service when they are detected.
Consider a remedied version of the interaction between Node A and B described earlier in the section.
A new search attribute is added via node A using the CLI. At the next iteration of A’s daemon process, it will:
- Note that a change has occurred in its version of the whitelist. Notice that at this point, A doesn’t necessarily know that this change is the result of a direct client interaction; this could have been propagated to it by B.
- Fetch the current centralized list of search attributes. The detected change is not listed in the centralized service; therefore, A can treat this as a new attribute which currently only appears in its own version of the whitelist.
This will prompt A to do two things:
- Write this new property to the central service, meaning it is now stored in the database.
- Make a CLI call to node B, thus adding it to B’s dynamic config.
At B’s next subsequent daemon iteration, we would then see the following behaviour:
- B notes a change in its version of the whitelist, but cannot know the origin of the change at this stage.
- B fetches the current centralized list of attributes, and finds the new attribute is contained in that list. This allows B to derive that it has received this attribute from another node, which has also centralized the information; therefore, no action is required of B!
In this way, usage of the centralized record prevents redundant internode communication, thus protecting the cluster from wasteful traffic.
Search Attributes are a powerful feature offered by Cadence—but they can be tricky to leverage in a multi-node Cadence setup due to the steps required to add them in the first place. This blog has aimed to illustrate an approach to mitigate these difficulties, based on our experience with running Cadence at scale here at Instaclustr. For more information about the Cadence-related services we offer at Instaclustr, click here or reach out to us at [email protected].