Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if anyone has switched algorithms after hitting real-world scaling issues with one of those? Curious if there are any “gotchas” that only show up at scale. I only have experience with fixed window rate limiting


I have experience with token bucket and leaky bucket (or at least a variation where a request leaves the bucket when the server is done processing it) to prevent overload of backend servers. I switched from token bucket to leaky bucket. Token bucket is “the server can serve X requests per second,” while leaky bucket is the “the server can process N requests concurrently.” I found the direct limit on concurrency much more responsive to overload and better controlled delay from contention of shared resources. This kind of makes sense because imagine if your server goes from processing 10 QPS to 5 QPS. If the server has a 10 QPS token bucket limit it keeps accepting requests and the request queue and response time goes to infinity.


We used a token bucket one to allow say 100 requests immediately but the limit would actually replenish 10 per minute or something. Makes sense to allow bursts. This was to allow free tier users to test things. Unless they go crazy, they would not even notice a rate limiter.

Sliding window might work good with large intervals. If you have something like a 24h window, fixed window will abruply cut things off for hours.

I mostly work with 1 minute windows so its fixed all the way.


We used leaky bucket IIRC and the issue I saw was that the distributed aspect of it was coded incorrectly and so depending on the node you hit you were rate-limited or not :facepalm:


So it wasn’t really implemented correctly then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: