Rate Limits
Provider rate limits can cause request failures during high traffic periods.
Symptoms¶
- Error messages containing "rate limit exceeded," "too many requests," or "quota exceeded"
- HTTP status code 429
Solutions¶
- Implement Retry Logic: Add automatic retries with exponential backoff
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Http\Exceptions\HttpRequestException;
function withRetry(callable $fn, int $maxRetries = 3): mixed {
$attempt = 0;
$lastException = null;
while ($attempt < $maxRetries) {
try {
return $fn();
} catch (HttpRequestException $e) {
$lastException = $e;
$attempt++;
// Only retry on rate limit errors
if (strpos($e->getMessage(), 'rate limit') === false &&
$e->getCode() !== 429) {
throw $e;
}
if ($attempt >= $maxRetries) {
break;
}
// Exponential backoff
$sleepTime = (2 ** $attempt);
echo "Rate limit hit. Retrying in $sleepTime seconds...\n";
sleep($sleepTime);
}
}
throw $lastException;
}
// Usage
$inference = new Inference();
try {
$response = withRetry(function() use ($inference) {
return $inference->with(
messages: 'What is the capital of France?'
)->get();
});
echo "Response: $response\n";
} catch (HttpRequestException $e) {
echo "All retry attempts failed: " . $e->getMessage() . "\n";
}
-
Request Throttling: Limit the rate of requests from your application
<?php class RateLimiter { private $lastRequestTime = 0; private $requestsPerMinute; private $minTimeBetweenRequests; public function __construct(int $requestsPerMinute = 60) { $this->requestsPerMinute = $requestsPerMinute; $this->minTimeBetweenRequests = 60 / $requestsPerMinute; } public function waitIfNeeded(): void { $currentTime = microtime(true); $timeSinceLastRequest = $currentTime - $this->lastRequestTime; if ($timeSinceLastRequest < $this->minTimeBetweenRequests) { $waitTime = $this->minTimeBetweenRequests - $timeSinceLastRequest; usleep($waitTime * 1000000); } $this->lastRequestTime = microtime(true); } } // Usage $limiter = new RateLimiter(30); // 30 requests per minute $inference = new Inference(); for ($i = 0; $i < 10; $i++) { $limiter->waitIfNeeded(); $response = $inference->with( messages: "This is request $i" )->toText(); echo "Response $i: $response\n"; }
-
Request Batching: Combine multiple requests into batches when possible
<?php
// Instead of making many small requests
$responses = [];
foreach ($questions as $question) {
// This would hit rate limits quickly
$responses[] = $inference->with(messages: $question)->get();
}
// Better: Use a context-aware batch approach
$batchedQuestions = "Please answer the following questions:\n";
foreach ($questions as $i => $question) {
$batchedQuestions .= ($i + 1) . ". $question\n";
}
$batchResponse = $inference->with(messages: $batchedQuestions)->get();
// Then parse the batch response into individual answers
- Upgrade API Plan: Consider upgrading to a higher tier with increased rate limits