Rate Limiting plugin bundled with
Kong works fine upto a certain throughput, after which cassandra and redis policy become hard to scale. This is mainly due to the following problems:
Problem 1: At high throughputs, updating the rate-limiting counters on each request can increase the load on the database causing Hot-Key problem.
Solution: We created a new policy batch-redis which maintains a counter in local memory and updates the rate-limiting counters in DB in a batch. Hot keys are not an issue anymore as the number of requests to Redis go down by a factor of batch size.
Problem 2: When rate-limiting counters data grows, we cannot shard them with a redis-cluster since the plugin did not have support for redis-cluster.
Solution: The lua-resty-redis client does not support clustered-redis so we replaced it with resty-redis-cluster. We faced some issues during stress tests when one of the Redis shard in a cluster went down. To make it more fault tolerant, we made some tweaks to resty-redis-cluster library and added the modified version to our plugin’s code.
Major changes made to the plugin code:
The plugin uses fixed time windows to maintain rate-limiting counters (similar to the bundled Rate Limiting plugin)
The rate limiting counters are updated on each request. Recommended for low throughput API’s.
Instead of updating the global counter in redis after each request, the request counts are maintained at a (local) node level in the shared cache (amongst nginx workers) which is synchronized with the (global) redis counter whenever a batch is complete. Consider this scenario:
batch-size = 500, and throughput = 1 million RPM
Each nginx worker updates the local shared cache after serving a request. Once the local_counter % 500 == 0 , the global redis counter is incremented (by batch size). This global count which is a representative of total number of requests is then used to update the local cache as well. And this process is repeated until the API limit is reached or the time period expires.
Batching reduces the effective writes made on global redis counter by a factor of
batch_size. It therefore makes the rate limiting scalable and faster as network calls are avoided on each request.
If you’re using
luarocks execute the following:
luarocks install scalable-rate-limiter
You will also need to enable this plugin by adding it to the list of enabled plugins using
KONG_PLUGINS environment variable or the
plugins key in
|second||integer||false||Maximum number of requests allowed in 1 second|
|minute||integer||false||Maximum number of requests allowed in 1 minute|
|hour||integer||false||Maximum number of requests allowed in 1 hour|
|day||integer||false||Maximum number of requests allowed in 1 day|
|limit_by||string||service||true||Property to limit the requests on (service / header)|
|header_name||string||true (limit_by: header)||The header name by which to limit the requests if limit_by is selected as header|
|policy||string||redis||true||Update redis at each request (redis) or in batches (batch-redis)|
|batch_size||integer||10||true (when policy: batch-redis)||Redis counters will be updated in batches of this size|
|redis_connect_timeout(ms)||integer||200||true||Redis connect timeout|
|redis_send_timeout(ms)||integer||100||true||Redis send timeout|
|redis_read_timeout(ms)||integer||100||true||Redis read timeout|
|redis_max_connection_attempts||integer||2||true||Total attempts to connect to a Redis node|
|redis_keepalive_timeout(ms)||integer||60000||true||Keepalive timeout for Redis connections|
|redis_max_redirection||integer||2||true||Number of times a keys is tried when MOVED or ASK response is received from redis server|
|redis_pool_size||integer||4||true||Pool size of redis connection pool|
|redis_backlog||integer||1||true||Size of redis connection pool backlog queue|
|error_message||string||API rate limit exceeded||true||Error message sent when rate limit is exhausted|
batch_size * (number of kong instances)since the local count is used to check if request should be allowed or not and it can be outdated (in the worst case) by
batch_size * (number of kong instances)giving the above error margin.