I wonder if anyone has switched algorithms after hitting real-world scaling issu...

smadge · 2025-06-06T19:29:36 1749238176

I have experience with token bucket and leaky bucket (or at least a variation where a request leaves the bucket when the server is done processing it) to prevent overload of backend servers. I switched from token bucket to leaky bucket. Token bucket is “the server can serve X requests per second,” while leaky bucket is the “the server can process N requests concurrently.” I found the direct limit on concurrency much more responsive to overload and better controlled delay from contention of shared resources. This kind of makes sense because imagine if your server goes from processing 10 QPS to 5 QPS. If the server has a 10 QPS token bucket limit it keeps accepting requests and the request queue and response time goes to infinity.

eknkc · 2025-06-06T15:46:01 1749224761

We used a token bucket one to allow say 100 requests immediately but the limit would actually replenish 10 per minute or something. Makes sense to allow bursts. This was to allow free tier users to test things. Unless they go crazy, they would not even notice a rate limiter.

Sliding window might work good with large intervals. If you have something like a 24h window, fixed window will abruply cut things off for hours.

I mostly work with 1 minute windows so its fixed all the way.

mparnisari · 2025-06-06T15:59:11 1749225551

We used leaky bucket IIRC and the issue I saw was that the distributed aspect of it was coded incorrectly and so depending on the node you hit you were rate-limited or not :facepalm:

hotpocket777 · 2025-06-06T19:25:08 1749237908

So it wasn’t really implemented correctly then.