February 20, 2025
OpenAI API Rate Limiting: How to Avoid Costly Retries
5 min read
OpenAI rate limits are per-minute and per-day. When you hit a limit, your application retries, which can quickly multiply token consumption and costs.
We've seen teams double their OpenAI bills due to aggressive retry logic that doesn't respect rate limits.
Best practices
- Exponential backoff: Wait 2^n seconds before retrying (2s, 4s, 8s, 16s).
- Respect headers: Check X-RateLimit-Reset and X-RateLimit-Remaining headers.
- Queue requests: Use a queue system to smooth out traffic spikes.
- Monitor retries: Track retry rates and alert when they exceed 5%.
Sonar tracks retry rates per endpoint and alerts you when retries are driving up costs.