Skip to content

Public Api

Most applications only need a small part of the package. The Inference and Embeddings facades provide a fluent, immutable interface that handles provider differences behind the scenes.

Inference

The Inference class is the main entry point for LLM interactions. It encapsulates provider complexities behind a unified, fluent interface.

Namespace: Cognesy\Polyglot\Inference\Inference

Creating an Instance

Polyglot offers several ways to create an Inference instance depending on how much control you need:

use Cognesy\Polyglot\Inference\Inference;

// From a named preset (resolves config from YAML files)
$inference = Inference::using('openai');

// From an explicit config object
$inference = Inference::fromConfig($llmConfig);

// From a provider object (useful for overrides)
$inference = Inference::fromProvider($provider);

// From an already-built runtime
$inference = Inference::fromRuntime($runtime);

You can also pass a custom driver registry to using() or fromConfig() if you have registered custom drivers:

$inference = Inference::using('my-provider', drivers: $customRegistry);

Building a Request

All request methods return a new immutable instance, so you can safely branch configurations:

$base = Inference::using('openai')
    ->withModel('gpt-4.1-nano')
    ->withMaxTokens(1024);

// Branch into two different requests from the same base
$response1 = $base->withMessages(Messages::fromString('Explain PHP traits.'))->get();
$response2 = $base->withMessages(Messages::fromString('Explain PHP enums.'))->get();

Available request methods:

Method Purpose
with(...) Set multiple parameters at once (messages, model, tools, toolChoice, responseFormat, options)
withMessages(...) Set the conversation messages
withModel(...) Override the model
withMaxTokens(...) Set maximum output tokens
withTools(...) Provide tool/function definitions
withToolChoice(...) Control tool selection behavior
withResponseFormat(...) Request structured output format
withOptions(...) Pass additional provider options (merged with existing)
withStreaming(...) Enable or disable streaming
withCachedContext(...) Set cached context (messages, tools, toolChoice, responseFormat)
withRetryPolicy(...) Configure retry behavior via InferenceRetryPolicy
withResponseCachePolicy(...) Control response caching via ResponseCachePolicy
withRequest(...) Set all parameters from an existing InferenceRequest
withRuntime(...) Swap the underlying runtime

Executing and Reading Results

Shortcuts execute the request and return results directly:

// Get the text content
$text = $inference->withMessages(Messages::fromString('Hello'))->get();

// Get the full response object
$response = $inference->withMessages(Messages::fromString('Hello'))->response();

// Parse JSON from the response content
$data = $inference->withMessages(Messages::fromString('Return JSON'))->asJsonData();

// Get JSON as a string
$json = $inference->withMessages(Messages::fromString('Return JSON'))->asJson();

// Parse tool call arguments as JSON array
$args = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJsonData();

// Get tool call arguments as JSON string
$json = $inference->withMessages(Messages::fromString('Call a tool'))->asToolCallJson();

// Stream the response
$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();

For lower-level control, create() returns a PendingInference without triggering execution:

$pending = $inference->withMessages(Messages::fromString('Hello'))->create();

// Then choose how to consume it
$text = $pending->get();
$response = $pending->response();
$stream = $pending->stream();

Working with Responses

The InferenceResponse object provides access to all parts of the provider's response:

$response = $inference->withMessages(Messages::fromString('Hello'))->response();

$response->content();          // string -- the text content
$response->reasoningContent(); // string -- reasoning/thinking content (if supported)
$response->toolCalls();        // ToolCalls -- tool call collection
$response->usage();            // InferenceUsage -- token counts
$response->finishReason();     // InferenceFinishReason enum
$response->responseData();     // HttpResponse -- raw provider response
$response->isPartial();        // bool -- true for partial streaming responses

// Convenience checks
$response->hasContent();
$response->hasReasoningContent();
$response->hasToolCalls();
$response->hasFinishReason();

// JSON extraction
$response->findJsonData();         // Json object from content
$response->findToolCallJsonData(); // Json object from tool call args

Working with Streams

The InferenceStream provides several ways to consume streaming data:

$stream = $inference->withMessages(Messages::fromString('Hello'))->stream();

// Iterate over visible deltas
foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;          // incremental text
    echo $delta->reasoningContentDelta; // incremental reasoning (if any)
    echo $delta->toolName;              // tool name (if starting a tool call)
    echo $delta->toolArgs;              // tool arguments fragment
}

// Get the final assembled response
$finalResponse = $stream->final();

// Register a callback for each delta
$stream->onDelta(function (PartialInferenceDelta $delta): void {
    echo $delta->contentDelta;
});

// Functional-style processing
$texts = $stream->map(fn($d) => $d->contentDelta);
$full = $stream->reduce(fn($carry, $d) => $carry . $d->contentDelta, '');
$toolOnly = $stream->filter(fn($d) => $d->toolName !== '');

// Collect all deltas at once
$allDeltas = $stream->all();

Embeddings

The Embeddings class provides a unified interface for generating vector embeddings across providers.

Namespace: Cognesy\Polyglot\Embeddings\Embeddings

Creating an Instance

use Cognesy\Polyglot\Embeddings\Embeddings;

// From a named preset
$embeddings = Embeddings::using('openai');

// From an explicit config
$embeddings = Embeddings::fromConfig($embeddingsConfig);

// From a provider object
$embeddings = Embeddings::fromProvider($provider);

// From a runtime
$embeddings = Embeddings::fromRuntime($runtime);

Building a Request

$embeddings = Embeddings::using('openai')
    ->withInputs('The quick brown fox')
    ->withModel('text-embedding-3-small')
    ->withOptions(['dimensions' => 256]);

Available request methods:

Method Purpose
with(...) Set input, options, and model at once
withInputs(...) Set input text(s) to embed (string or array of strings)
withModel(...) Override the model
withOptions(...) Pass additional provider options
withRetryPolicy(...) Configure retry behavior via EmbeddingsRetryPolicy
withRequest(...) Set all parameters from an existing EmbeddingsRequest
withRuntime(...) Swap the underlying runtime

Executing and Reading Results

// Get the full response
$response = $embeddings->withInputs('Hello world')->get();

// Get just the vector objects
$vectors = $embeddings->withInputs(['text one', 'text two'])->vectors();

// Get the first vector
$vector = $embeddings->withInputs('Hello world')->first();

For lower-level control, create() returns a PendingEmbeddings:

$pending = $embeddings->withInputs('Hello world')->create();
$response = $pending->get();

Working with Responses

The EmbeddingsResponse object provides access to the embedding vectors:

$response = $embeddings->withInputs(['Hello', 'World'])->get();

$response->vectors();       // Vector[] -- all embedding vectors
$response->all();           // Vector[] -- alias for vectors()
$response->first();         // ?Vector -- first vector
$response->last();          // ?Vector -- last vector
$response->usage();         // EmbeddingsUsage -- token counts
$response->toValuesArray(); // array -- raw float arrays

// Split vectors at a given index
[$before, $after] = $response->split(1);

Registering Custom Drivers

Inference Drivers

Custom inference drivers are registered through the InferenceDriverRegistry and passed to the runtime:

use Cognesy\Polyglot\Inference\Creation\BundledInferenceDrivers;

$registry = BundledInferenceDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$inference = Inference::using('my-provider', drivers: $registry);

Embeddings Drivers

Custom embeddings drivers are registered through the EmbeddingsDriverRegistry and passed to the runtime:

use Cognesy\Polyglot\Embeddings\Creation\BundledEmbeddingsDrivers;

$registry = BundledEmbeddingsDrivers::registry()
    ->withDriver('my-provider', MyCustomDriver::class);

$runtime = EmbeddingsRuntime::fromConfig($config, drivers: $registry);
$embeddings = Embeddings::fromRuntime($runtime);

See the Providers page for details on driver registration and factory patterns.