Public Api

Most applications only need a small part of the package. The Inference and Embeddings facades provide a fluent, immutable interface that handles provider differences behind the scenes.

Inference¶

The Inference class is the main entry point for LLM interactions. It encapsulates provider complexities behind a unified, fluent interface.

Namespace: Cognesy\Polyglot\Inference\Inference

Creating an Instance¶

Polyglot offers several ways to create an Inference instance depending on how much control you need:

use Cognesy\Polyglot\Inference\Inference;

// From a named preset (resolves config from YAML files)
$inference = Inference::using('openai');

// From an explicit config object
$inference = Inference::fromConfig($llmConfig);

// From a provider object (useful for overrides)
$inference = Inference::fromProvider($provider);

// From an already-built runtime
$inference = Inference::fromRuntime($runtime);

You can also pass a custom driver registry to using() or fromConfig() if you have registered custom drivers:

$inference = Inference::using('my-provider', drivers: $customRegistry);

Building a Request¶

All request methods return a new immutable instance, so you can safely branch configurations:

$base = Inference::using('openai')
    ->withModel('gpt-4.1-nano')
    ->withMaxTokens(1024);

// Branch into two different requests from the same base
$response1 = $base->withMessages(Messages::fromString('Explain PHP traits.'))->get();
$response2 = $base->withMessages(Messages::fromString('Explain PHP enums.'))->get();

Available request methods:

Method	Purpose
`with(...)`	Set multiple parameters at once (messages, model, tools, toolChoice, responseFormat, options)
`withMessages(...)`	Set the conversation messages
`withModel(...)`	Override the model
`withMaxTokens(...)`	Set maximum output tokens
`withTools(...)`	Provide tool/function definitions
`withToolChoice(...)`	Control tool selection behavior
`withResponseFormat(...)`	Request structured output format
`withOptions(...)`	Pass additional provider options (merged with existing)
`withStreaming(...)`	Enable or disable streaming
`withCachedContext(...)`	Set cached context (messages, tools, toolChoice, responseFormat)
`withRetryPolicy(...)`	Configure retry behavior via `InferenceRetryPolicy`
`withResponseCachePolicy(...)`	Control response caching via `ResponseCachePolicy`
`withRequest(...)`	Set all parameters from an existing `InferenceRequest`
`withRuntime(...)`	Swap the underlying runtime

Executing and Reading Results¶

Shortcuts execute the request and return results directly:

// Get the text content
$text = $inference->withMessages(Messages::fromString('Hello'))->get();

// Get the full response object
$response = $inference->withMessages(Messages::fromString('Hello'))->response();

// Parse JSON from the response content
$data = $inference->withMessages(Messages::fromString('Return JSON'))->asJsonData();

// Get JSON as a string
$json = $inference->withMessages(Messages::fromString('Return JSON'))->asJson();

// Parse tool call arguments as JSON array
$args = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJsonData();

// Get tool call arguments as JSON string
$json = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJson();

// Stream the response
$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();

For lower-level control, create() returns a PendingInference without triggering execution:

$pending = $inference->withMessages(Messages::fromString('Hello'))->create();

// Then choose how to consume it
$text = $pending->get();
$response = $pending->response();
$stream = $pending->stream();

Working with Responses¶

The InferenceResponse object provides access to all parts of the provider's response:

$response = $inference->withMessages(Messages::fromString('Hello'))->response();

$response->content();          // string -- the text content
$response->reasoningContent(); // string -- reasoning/thinking content (if supported)
$response->toolCalls();        // ToolCalls -- tool call collection
$response->usage();            // InferenceUsage -- token counts
$response->finishReason();     // InferenceFinishReason enum
$response->responseData();     // HttpResponse -- raw provider response
$response->isPartial();        // bool -- true for partial streaming responses

// Convenience checks
$response->hasContent();
$response->hasReasoningContent();
$response->hasToolCalls();
$response->hasFinishReason();

// JSON extraction
$response->findJsonData();         // Json object from content
$response->findToolCallJsonData(); // Json object from tool call args

Working with Streams¶

The InferenceStream provides several ways to consume streaming data:

$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();

// Iterate over visible deltas
foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;          // incremental text
    echo $delta->reasoningContentDelta; // incremental reasoning (if any)
    echo $delta->toolName;              // tool name (if starting a tool call)
    echo $delta->toolArgs;              // tool arguments fragment
}

// Get the final assembled response
$finalResponse = $stream->final();

// Register a callback for each delta
$stream->onDelta(function (PartialInferenceDelta $delta): void {
    echo $delta->contentDelta;
});

// Functional-style processing
$texts = $stream->map(fn($d) => $d->contentDelta);
$full = $stream->reduce(fn($carry, $d) => $carry . $d->contentDelta, '');
$toolOnly = $stream->filter(fn($d) => $d->toolName !== '');

// Collect all deltas at once
$allDeltas = $stream->all();

Embeddings¶

The Embeddings class provides a unified interface for generating vector embeddings across providers.

Namespace: Cognesy\Polyglot\Embeddings\Embeddings

Creating an Instance¶

use Cognesy\Polyglot\Embeddings\Embeddings;

// From a named preset
$embeddings = Embeddings::using('openai');

// From an explicit config
$embeddings = Embeddings::fromConfig($embeddingsConfig);

// From a provider object
$embeddings = Embeddings::fromProvider($provider);

// From a runtime
$embeddings = Embeddings::fromRuntime($runtime);

Building a Request¶

$embeddings = Embeddings::using('openai')
    ->withInputs('The quick brown fox')
    ->withModel('text-embedding-3-small')
    ->withOptions(['dimensions' => 256]);

Available request methods:

Method	Purpose
`with(...)`	Set input, options, and model at once
`withInputs(...)`	Set input text(s) to embed (string or array of strings)
`withModel(...)`	Override the model
`withOptions(...)`	Pass additional provider options
`withRetryPolicy(...)`	Configure retry behavior via `EmbeddingsRetryPolicy`
`withRequest(...)`	Set all parameters from an existing `EmbeddingsRequest`
`withRuntime(...)`	Swap the underlying runtime

Executing and Reading Results¶

// Get the full response
$response = $embeddings->withInputs('Hello world')->get();

// Get just the vector objects
$vectors = $embeddings->withInputs(['text one', 'text two'])->vectors();

// Get the first vector
$vector = $embeddings->withInputs('Hello world')->first();

For lower-level control, create() returns a PendingEmbeddings:

$pending = $embeddings->withInputs('Hello world')->create();
$response = $pending->get();

Working with Responses¶

The EmbeddingsResponse object provides access to the embedding vectors:

$response = $embeddings->withInputs(['Hello', 'World'])->get();

$response->vectors();       // Vector[] -- all embedding vectors
$response->all();           // Vector[] -- alias for vectors()
$response->first();         // ?Vector -- first vector
$response->last();          // ?Vector -- last vector
$response->usage();         // EmbeddingsUsage -- token counts
$response->toValuesArray(); // array -- raw float arrays

// Split vectors at a given index
[$before, $after] = $response->split(1);

Registering Custom Drivers¶

Inference Drivers¶

Custom inference drivers are registered through the InferenceDriverRegistry and passed to the runtime:

use Cognesy\Polyglot\Inference\Creation\BundledInferenceDrivers;

$registry = BundledInferenceDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$inference = Inference::using('my-provider', drivers: $registry);

Embeddings Drivers¶

Custom embeddings drivers are registered through the EmbeddingsDriverRegistry and passed to the runtime:

use Cognesy\Polyglot\Embeddings\Creation\BundledEmbeddingsDrivers;

$registry = BundledEmbeddingsDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$runtime = EmbeddingsRuntime::fromConfig($config, drivers: $registry);
$embeddings = Embeddings::fromRuntime($runtime);

See the Providers page for details on driver registration and factory patterns.