Inference Class
The Inference class is a thin, immutable facade over InferenceRuntime. It provides the
unified entry point for configuring providers, building requests, and retrieving responses
from any supported LLM.
Creating an Instance¶
Choose the factory method that matches your level of control:
<?php
use Cognesy\Polyglot\Inference\Inference;
// Use a named preset from your configuration
$inference = Inference::using('openai');
// Use the default provider
$inference = new Inference();
// Explicit configuration object
$inference = Inference::fromConfig($config);
// From a provider instance
$inference = Inference::fromProvider($provider);
// From a fully assembled runtime
$inference = Inference::fromRuntime($runtime);
Presets¶
The most common pattern is Inference::using(), which loads a named preset from your
configuration files. Each preset defines the provider type, API key, base URL, default
model, and other connection details:
// Use different providers by switching the preset name
$openai = Inference::using('openai');
$anthropic = Inference::using('anthropic');
$ollama = Inference::using('ollama');
Configuring a Request¶
The fluent API lets you build requests step by step. Every method returns a new immutable instance, so you can safely branch from a shared configuration:
Messages and Model¶
$inference = Inference::using('openai')
->withMessages(Messages::fromString('Explain dependency injection in one paragraph.'))
->withModel('gpt-4.1-nano');
Tools and Response Format¶
use Cognesy\Polyglot\Inference\Data\ToolChoice;
use Cognesy\Polyglot\Inference\Data\ResponseFormat;
$inference = Inference::using('openai')
->withTools($toolDefinitions)
->withToolChoice(ToolChoice::auto())
->withResponseFormat(ResponseFormat::jsonObject());
Streaming and Token Limits¶
Provider-Specific Options¶
The Combined with() Method¶
When you prefer a single call, use with() to set multiple fields at once:
use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Data\ToolChoice;
use Cognesy\Polyglot\Inference\Data\ResponseFormat;
$inference = Inference::using('openai')->with(
messages: Messages::fromString('Hello'),
model: 'gpt-4.1-nano',
toolChoice: ToolChoice::auto(),
responseFormat: ResponseFormat::text(),
options: ['temperature' => 0.7],
);
Full Method Reference¶
| Method | Purpose |
|---|---|
withMessages(...) |
Set conversation messages |
withModel(...) |
Override the model |
withTools(...) |
Attach tool/function definitions |
withToolChoice(...) |
Control tool selection strategy |
withResponseFormat(...) |
Specify the response format |
withOptions(...) |
Set provider-specific options |
withStreaming(...) |
Enable or disable streaming |
withMaxTokens(...) |
Set maximum token count |
withCachedContext(...) |
Attach reusable cached context |
withRetryPolicy(...) |
Configure retry behavior |
withResponseCachePolicy(...) |
Configure response caching |
withRequest(...) |
Load all fields from an InferenceRequest |
withRuntime(...) |
Replace the underlying runtime |
Executing Requests¶
Response Shortcuts¶
These methods build the request, execute it, and return the result in a single step:
<?php
use Cognesy\Polyglot\Inference\Inference;
use Cognesy\Messages\Messages;
$inference = Inference::using('openai')
->withMessages(Messages::fromString('What is PHP?'))
->withModel('gpt-4.1-nano');
// Plain text content
$text = $inference->get();
// Full InferenceResponse object (with usage, finish reason, etc.)
$response = $inference->response();
// JSON string extracted from the response
$json = $inference->asJson();
// Parsed JSON as an associative array
$data = $inference->asJsonData();
// JSON from a tool call response
$toolJson = $inference->asToolCallJson();
// Parsed tool call JSON as an array
$toolData = $inference->asToolCallJsonData();
Streaming¶
To receive partial results as they arrive from the provider:
$stream = Inference::using('openai')
->withMessages(Messages::fromString('Write a short story about a robot.'))
->stream();
foreach ($stream->deltas() as $partial) {
echo $partial->contentDelta;
}
The Lazy Handle: PendingInference¶
If you need to defer execution or pass the handle to another part of your system,
call create() to get a PendingInference instance. Execution happens only when
you call a response method on it:
$pending = Inference::using('openai')
->withMessages(Messages::fromString('Hello'))
->create();
// Nothing has been sent to the provider yet.
// Execution happens here:
$text = $pending->get();
PendingInference exposes the same response methods as Inference: get(),
response(), asJson(), asJsonData(), asToolCallJson(), asToolCallJsonData(),
and stream().
Custom Drivers¶
To use a custom driver, implement the CanProvideInferenceDrivers contract and pass
it to Inference::using() or Inference::fromConfig() via the drivers parameter: