← Back to Blog

February 20, 2025

OpenAI API Rate Limiting: How to Avoid Costly Retries

5 min read

OpenAI rate limits are per-minute and per-day. When you hit a limit, your application retries, which can quickly multiply token consumption and costs.

We've seen teams double their OpenAI bills due to aggressive retry logic that doesn't respect rate limits.

Best practices

  • Exponential backoff: Wait 2^n seconds before retrying (2s, 4s, 8s, 16s).
  • Respect headers: Check X-RateLimit-Reset and X-RateLimit-Remaining headers.
  • Queue requests: Use a queue system to smooth out traffic spikes.
  • Monitor retries: Track retry rates and alert when they exceed 5%.

Sonar tracks retry rates per endpoint and alerts you when retries are driving up costs.