Request Options
The options array is the escape hatch for provider-specific request fields that fall
outside Polyglot's unified API. Anything you place in options is passed through to
the underlying provider driver.
Note: Except for
max_tokensandstream, all option keys are provider-specific and may not be available or behave identically across providers. Always consult the provider's API documentation for details.
Setting Options¶
Pass options through withOptions() or the options parameter on with():
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;
$text = Inference::using('openai')
->withMessages(Messages::fromString('Write one short sentence about PHP.'))
->withOptions([
'temperature' => 0.2,
'top_p' => 0.9,
])
->get();
Options are merged additively -- calling withOptions() multiple times will merge the
new keys into the existing set rather than replacing it.
Common Options¶
These options are widely supported across most providers:
$options = [
'temperature' => 0.7, // Controls randomness (0.0 to 1.0)
'max_tokens' => 1000, // Maximum tokens to generate
'top_p' => 0.95, // Nucleus sampling parameter
'frequency_penalty' => 0.0, // Penalize repeated tokens
'presence_penalty' => 0.0, // Penalize repeated topics
'stop' => ["\n\n", "User:"], // Stop sequences
];
Dedicated Helpers Over Raw Options¶
For common behaviors, prefer the dedicated fluent helpers instead of manually placing
values in the options array. The helpers ensure correct handling across all providers:
| Instead of this... | Use this... |
|---|---|
withOptions(['stream' => true]) |
withStreaming(true) |
withOptions(['max_tokens' => 256]) |
withMaxTokens(256) |
withOptions(['retryPolicy' => [...]]) |
withRetryPolicy($policy) |
These helpers set values that the request builder manages separately from the raw options array, ensuring they are applied correctly regardless of the provider.
Provider-Specific Options¶
Different providers accept different option keys. Here are a few examples:
OpenAI¶
$response = Inference::using('openai')
->withMessages(Messages::fromString('Write a poem about programming.'))
->withOptions([
'temperature' => 0.7,
'top_p' => 0.9,
'frequency_penalty' => 0.5,
'presence_penalty' => 0.3,
])
->get();
Anthropic¶
$response = Inference::using('anthropic')
->withMessages(Messages::fromString('Write a poem about programming.'))
->withOptions([
'temperature' => 0.7,
'top_p' => 0.9,
'top_k' => 40,
])
->get();
Retry Policy¶
Retry behavior is configured explicitly through withRetryPolicy() -- never place it
inside the options array. Polyglot will throw an InvalidArgumentException if you do.
<?php
use Cognesy\Polyglot\Inference\Config\InferenceRetryPolicy;
use Cognesy\Polyglot\Inference\Inference;
$retryPolicy = new InferenceRetryPolicy(
maxAttempts: 3,
baseDelayMs: 500,
maxDelayMs: 8000,
jitter: 'full', // none, full, or equal
retryOnStatus: [429, 500, 502, 503, 504],
);
$response = Inference::using('openai')
->withMessages(Messages::fromString('Summarize this article.'))
->withRetryPolicy($retryPolicy)
->get();
The retry policy supports exponential backoff with configurable jitter and can also recover from truncated responses:
| Parameter | Default | Description |
|---|---|---|
maxAttempts |
1 |
Total attempts (1 = no retry) |
baseDelayMs |
250 |
Base delay in milliseconds |
maxDelayMs |
8000 |
Maximum delay cap |
jitter |
'full' |
Jitter strategy: none, full, or equal |
retryOnStatus |
[408,429,500,...] |
HTTP status codes that trigger a retry |
retryOnExceptions |
Timeout, Network | Exception classes that trigger a retry |
lengthRecovery |
'none' |
Recovery mode: none, continue, increase_max_tokens |
lengthMaxAttempts |
1 |
Max attempts for length recovery |
lengthContinuePrompt |
'Continue.' |
Prompt used for continue recovery mode |
maxTokensIncrement |
512 |
Token increment for increase_max_tokens mode |
Response Cache Policy¶
Control whether responses are cached in memory for reuse:
<?php
use Cognesy\Polyglot\Inference\Enums\ResponseCachePolicy;
use Cognesy\Polyglot\Inference\Inference;
$response = Inference::using('openai')
->withMessages(Messages::fromString('What is 2 + 2?'))
->withResponseCachePolicy(ResponseCachePolicy::Memory)
->get();
Available policies:
ResponseCachePolicy::None-- no caching (default)ResponseCachePolicy::Memory-- cache responses in memory for the current process
Cached Context¶
Use withCachedContext() to attach stable, reusable context that should be prepended
to every request. This is useful when you have shared system instructions, tool
definitions, or response formats that remain constant across multiple calls:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Data\ToolChoice;
use Cognesy\Polyglot\Inference\Data\ResponseFormat;
$base = Inference::using('openai')->withCachedContext(
messages: Messages::fromArray([
['role' => 'system', 'content' => 'You are an expert PHP developer.'],
]),
tools: $sharedToolDefinitions,
toolChoice: ToolChoice::auto(),
responseFormat: ResponseFormat::jsonObject(),
);
// Each call inherits the cached context automatically
$response1 = $base->withMessages(Messages::fromString('Explain SOLID principles.'))->get();
$response2 = $base->withMessages(Messages::fromString('What is the Repository pattern?'))->get();
When the request is executed, cached context is merged with the request-level fields: cached messages are prepended to the request messages, and cached tools, tool choice, and response format are used as defaults when the request does not specify its own.
Drivers can map cached context to provider-native caching features (such as Anthropic's prompt caching) when available.