When working with APIs, you’ve likely encountered the term “rate limit.” Whether you’re a developer consuming an API or building one, understanding rate limits is key to avoiding blocked requests and optimizing performance. In this post, we’ll discuss rate limiting in general and then dive into the behind-the-scenes work that NetApp is doing to enhance the Instaclustr Rate Limiting system.

What is rate limiting, and why is it essential?

Rate limiting is a technique used to control the number of requests a client can make to a Server API within a given time interval. It’s ubiquitous in modern web services and API platforms for several reasons:

  1. Benefits clients and integrators:
    • Promotes more predictable API response times, leading to a smoother integration experience.
    • Facilitates the development of robust applications by establishing clear usage patterns.
  2. Fair usage: Prevents any single client from monopolizing resources.
  3. Costs: APIs often come with associated costs, including expenses for computing resources, data transfer, and third-party service integrations. By implementing rate limiting, these costs can be effectively controlled—limiting the number of requests helps manage and predict expenditures. Moreover, rate limiting gives customers real-time feedback if they are making unnecessary or excessive requests. This insight can encourage users to optimize their resource usage and ultimately reduce their overall costs.
  4. User experience: Defend Against DDoS (Distributed Denial of Service attacks) and API abuse: Improves user experience by reducing delays and enhancing the responsiveness of the services provided by APIs.

NetApp Instaclustr API system

It’s crucial to understand the impact on the functionalities of NetApp Instaclustr APIs. Within the NetApp Instaclustr API system, there are four main categories of APIs:

  1. Cluster management
  2. Monitoring
  3. User management
  4. Others (Organization management and pricing information)

The recent implementation of a new dynamic rate limiting solution focuses primarily on the Cluster Management and Monitoring APIs, as these two categories constitute over 90% of our API traffic. For further insights into NetApp Instaclustr APIs, detailed information can be found on our support documentation page.

Challenges with the previous system

Before diving into our new rate limiting solution, it’s essential to reflect on the challenges presented by our previous system. As the Instaclustr customer base grew, managing an ever-increasing number of clusters and nodes across various managed services became a top priority. The old rate limiting approach applied static limits to all API endpoints, regardless of fluctuations in customer fleet size. This one-size-fits-all strategy meant that whether a customer’s fleet expanded or contracted, the limits remained unchanged—potentially impacting performance.

Considering these challenges, we analyzed the current API usage patterns to develop an optimized solution designed to scale dynamically. This new approach adjusts in real-time to match the growth or reduction of a customer’s fleet, to help ensure a more efficient and flexible performance.

New API rate limiting system

Our new API rate limiting solution targets a specific group of NetApp Instaclustr APIs dedicated to cluster management and monitoring. In its design phase, our primary aim was to address the following requirements effectively:

  1. Applying limits that are directly proportional to a customer’s fleet footprint, such as the number of clusters and nodes, rather than imposing fixed limits across the entire fleet.
  2. Implement rate limits at the account level, rather than the API key level. This flexibility enables accounts with many clusters and nodes to dynamically adjust rate limits based on their cluster size.
  3. Optimize the use of monitoring metrics and Prometheus endpoints, and tailor the rate limits to closely align with the metrics refresh rate of every 20 seconds, to ensure efficient utilization.
  4. Cluster management and monitoring API experiencing high latency and CPU usage due to growing customer fleet size (cluster and nodes) especially during our deployments lasting for at least for 30 minutes.

Rate limiter operation modes

The rate limit configuration also allows to manage the state of the rate limited endpoint by allowing different modes of operation. There are two states of operation of rate limiter:

  1. MONITOR mode: In this state the rate limit configured will be evaluated but the request will be allowed. This mode essentially allows us to log and monitor any account requesting the API endpoint without rejecting the request.
  2. ENABLED mode: In this mode rate limiter will be rejecting requests if requests exceed the configured limit.

Dynamic limits

To accommodate the varying needs of our customers, we have integrated our rate limiter with our cluster state database to dynamically retrieve the count of clusters. By using the cluster count factor with pre-configured rate limit for specific endpoint(multiplier), we can effectively tailor the rate limits to align with the specific fleet footprint of our customers making it dynamically scale based on the growth at the account level. For more details refer to our rate limit documentation.

Regular and burst rate limit

Every API endpoint comes with built-in rate limiting, defined by two main parameters.

Regular rate limit
The regular rate limit establishes the number of requests that can be made within a sliding time window—say, any X seconds. With every new incoming request, the API calculates the number of requests made in the previous X seconds and compares that to the predefined limit. If the number of requests exceeds this threshold, the API will reject additional requests to maintain stability.

Burst rate limit
In contrast, the burst rate limit introduces a multiplication factor to the regular window. This allows clients to temporarily exceed the regular request limit when necessary. However, this burst capacity is only utilized if the regular rate limit is already reached. Essentially, while clients can make more requests in a short burst, the overall request volume over a longer period (such as an hour) still adheres to a controlled threshold. This is especially critical for environments like Terraform, where high burst factors are configured to support occasional surges in activity without straining the API over time.

Together, these dual rate limiting strategies ensure that the API can gracefully handle both steady and burst traffic, maintaining performance and reliability for all users.

Technical implementation

Now let’s get into some of the technical implementation details of our New API rate limiting solution

Redis is a highly popular in-memory database renowned for its speed, atomic operations, and distributed capabilities. These features make it the ideal choice for building efficient rate-limiting systems that track and enforce API request limits. In our implementation, we selected Redis to track API requests using sliding window algorithm, leveraging its performance benefits in our rate-limiting system. The diagram below illustrates the sliding window algorithm as implemented using Redis.

What is a sliding window algorithm?

Unlike a fixed window approach of rate limiting, which resets the count after a fixed interval, sliding window algorithm tracks requests in a rolling time frame. This provides a more accurate and smooth rate limit mechanism compared to fixed window approach.

Algorithm steps:

  1. Each request is logged with a timestamp to a sorted set in Redis
  2. When a new request arrives the rate limiter checks how many requests were made in the last N seconds (Window Size)
  3. If the count is within the regular limit, the request is accepted
  4. If exceeding regular limit, but if there is a burst configured for the endpoint and is within the regular + burst limit, then accept the request
  5. If exceeding both regular and burst, reject the request

Implementing in Redis:

  1. Use a Sorted Set (ZSET): Store request timestamp as members and their timestamp as scores
  2. Check current count (ZCOUNT): Count the remaining entries, if they exceed the limit reject the request
  3. Add New Request (ZADD): If allowed, insert the new timestamp to Sorted Set
  4. Remove/Trim Old Entries (ZREMRANGEBYSCORE): Remove older entries than the sliding window interval

dynamic rate limiting chart

Handling customer impact and release

At NetApp Instaclustr, we focus on delivering an exceptional support experience—whether you’re using our services, receiving help, or exploring our latest features. This customer first mindset was also central to the idea when we designed our new rate limiting system.

We understand that introducing this new rate limiting approach might affect how customers integrate with our API, whether directly or via Terraform. To ease the transition, we launched the solution in MONITOR mode first. This allowed us to log rate limit metrics and profile customer accounts without immediately enforcing the new limits.

Working closely with our product management team, we developed clear, tailored communications that outlined the potential impacts and provided details on rate limit metrics. We also shared a lead time for customers to update their integrations to meet the new requirements for a smooth transition.

Results so far

After informing customers about the effects of our new rate limiting solution on their API integrations, we noticed that many began adjusting their API requests.

We initiated the process by enabling our rate limiting (ENABLED mode) on our most frequented and high-demand API category—the Prometheus monitoring endpoints. Gradually, we extended these limits across other API endpoints. This approach led to an approximate 45% reduction in the volume of monitoring API requests, resulting in notable performance improvements. Furthermore, by scaling our monitoring API instances more effectively, we achieved significant cost savings.

For one of our clients, we worked closely to identify and address a pattern of excessive and unnecessary traffic to a monitoring endpoint. Through joint analysis and implementation of request adjustments, the traffic volume was reduced by almost 50% in a short period. This collaborative effort not only boosted system efficiency but also resulted in cost savings for both the client and us.

Conclusion

Our API rate limiting system has shown its key role in maintaining fair resource usage, consistent performance, and keeping costs under control. By enhancing our rate limiting system, we have successfully scaled the API system to support dynamic limits based on demand, proportional to the customer’s fleet size. This has resulted in the optimal use of the NetApp Instaclustr Managed Platform, designed to ensure that our customers receive the best possible service while we maintain efficiency and cost-effectiveness.

Other useful resources