Overview

Polyglot is a PHP library that provides a unified API for interacting with various Large Language Model (LLM) providers. It serves as the low-level transport and normalization layer for InstructorPHP, but can also be used as a standalone library for direct LLM interactions.

The core philosophy behind Polyglot is to create a consistent, provider-agnostic interface that abstracts away the differences between LLM APIs while staying close to provider-native request shapes. This enables developers to:

Write code once and use it with any supported LLM provider
Easily switch between providers without changing application code
Use different providers in different environments (development, testing, production)
Fall back to alternative providers if one becomes unavailable
Use local models (via Ollama) for development and cloud providers for production

Polyglot has two main entrypoints:

Cognesy\Polyglot\Inference\Inference for model responses (chat completions)
Cognesy\Polyglot\Embeddings\Embeddings for vector embeddings

In 2.0, Polyglot stays close to provider-native request shapes, supporting:

Plain text output by default
Native JSON output through responseFormat
Native JSON schema output through responseFormat
Tool calling through tools and toolChoice
Streaming through withStreaming() and stream()
Embeddings through the Embeddings facade

Polyglot is a transport and normalization layer. If you need higher-level structured output workflows, fallback prompting, or schema-to-object extraction, use Instructor on top.

Key Features¶

Unified LLM API¶

Polyglot's primary feature is its unified API that works across multiple LLM providers:

Consistent interface for making inference and embedding requests
Common message format across all providers
Standardized response handling with InferenceResponse and EmbeddingsResponse
Unified error handling and retry policies

Framework-Agnostic¶

Polyglot is designed to work with any PHP framework or even in plain PHP applications. It does not depend on any specific framework, making it easy to integrate into existing projects.

Compatible with Laravel, Symfony, CodeIgniter, and others
Can be used in CLI scripts or web applications
Lightweight with minimal dependencies

Configuration Flexibility¶

Polyglot offers a flexible configuration system built around YAML preset files:

Configure multiple providers simultaneously using named presets
Environment-based configuration with ${ENV_VAR} interpolation in preset files
Runtime provider switching via Inference::using('preset-name')
Per-request customization through the fluent builder API
DSN-based configuration via LLMConfig::fromDsn()

Main Concepts¶

Inference¶

The Inference class is the main facade for sending requests to LLM providers and receiving responses. It provides a fluent builder API for constructing requests and several convenience methods for consuming responses.

Use Inference when you want a model response as:

Plain text via get() -- returns the raw content string
A full InferenceResponse via response() -- gives access to content, tool calls, usage, finish reason, and reasoning content
Decoded JSON via asJsonData() -- extracts and parses JSON from the response content
Tool call arguments via asToolCallJsonData() -- extracts arguments from tool/function calls
Streamed deltas via stream() -- returns an InferenceStream for real-time processing

The InferenceResponse object provides rich access to the full response, including:

content() -- the text content of the response
reasoningContent() -- reasoning/thinking content (for models that support it)
toolCalls() -- any tool calls made by the model
usage() -- token usage statistics
finishReason() -- why the model stopped generating
hasContent(), hasToolCalls(), hasReasoningContent() -- presence checks

Streaming¶

Polyglot provides first-class streaming support through the InferenceStream class. When you call stream(), you receive a stream object that yields PartialInferenceDelta objects as they arrive from the provider.

The stream supports several consumption patterns:

deltas() -- a generator that yields each visible delta as it arrives
all() -- collects all deltas into an array
map(callable) -- transforms each delta through a mapper function
filter(callable) -- yields only deltas matching a predicate
reduce(callable, initial) -- reduces the stream to a single value
final() -- drains the stream and returns the finalized InferenceResponse
onDelta(callable) -- registers a callback for each visible delta

Streaming also dispatches events for monitoring, including StreamFirstChunkReceived for time-to-first-chunk measurement.

Embeddings¶

The Embeddings class is the facade for generating vector embeddings from text inputs. It follows the same fluent builder pattern as Inference.

Use Embeddings when you want vectors from one or more text inputs. The EmbeddingsResponse gives you:

first() -- the first embedding vector (useful for single-input requests)
vectors() -- all embedding vectors as an array of Vector objects
all() -- alias for vectors()
last() -- the last embedding vector
split(index) -- splits vectors into two groups at a given index
usage() -- provider-reported token usage
toValuesArray() -- raw float arrays for all vectors

Presets¶

The usual entrypoint is Inference::using('openai') or Embeddings::using('openai'), which loads a named preset configuration.

Preset files are YAML files that define the connection details for a provider. They are loaded from the following locations (searched in order):

config/llm/presets (or config/embed/presets) in your application root
packages/polyglot/resources/config/llm/presets within the monorepo
vendor/cognesy/instructor-php/packages/polyglot/resources/config/llm/presets when installed via Composer
vendor/cognesy/instructor-polyglot/resources/config/llm/presets for standalone installs

A typical preset file looks like this:

driver: openai
apiUrl: 'https://api.openai.com/v1'
apiKey: '${OPENAI_API_KEY}'
endpoint: /chat/completions
model: gpt-4.1-nano
maxTokens: 1024
contextLength: 1000000
maxOutputLength: 16384

You can override any preset value at runtime using the fluent API -- for example, withModel() to change the model or withMaxTokens() to adjust the token limit.

Providers and Drivers¶

Each LLM provider is backed by a driver that knows how to format requests and parse responses for that provider's API. Polyglot ships with drivers for all supported providers, and you can register custom drivers when needed.

The LLMProvider and EmbeddingsProvider classes act as configuration holders that pair a config with an optional explicit driver. They are typically created behind the scenes when you use Inference::using() or Inference::fromConfig().

What Polyglot Covers¶

Provider selection -- choose any supported provider through presets or programmatic configuration
Request building -- fluent API for constructing messages, setting models, tools, response formats, and options
Request execution -- handles HTTP communication with provider APIs
Response normalization -- unified InferenceResponse and EmbeddingsResponse regardless of provider
Streaming deltas -- real-time streaming with event-driven processing
Retry policy -- configurable retry behavior for transient failures via InferenceRetryPolicy and EmbeddingsRetryPolicy
Custom drivers and runtimes -- extensible architecture for adding new providers or custom execution logic
Response caching -- configurable cache policy for inference responses

What It Does Not Try To Hide¶

Polyglot does not invent a synthetic output mode system. You shape requests with explicit fields that match current provider APIs -- responseFormat for JSON output, tools and toolChoice for function calling, and withStreaming() for streamed responses. This keeps the abstraction thin and predictable, making it easy to understand what will be sent to the provider.

Supported Providers¶

Inference Providers¶

Polyglot ships with drivers for the following LLM providers:

A21 -- API access to Jamba models
Anthropic -- Claude family of models
AWS Bedrock -- Amazon Bedrock hosted models
Microsoft Azure -- Azure-hosted OpenAI models
Cerebras -- Cerebras high-performance inference
Cohere -- Command models (v2 API)
Deepseek -- Deepseek models including reasoning capabilities
Fireworks -- Fireworks AI hosted models
GLM -- GLM (ChatGLM) models
Google Gemini -- Google's Gemini models (native API)
Google Gemini (OpenAI compatible) -- Gemini via OpenAI-compatible endpoint
Groq -- High-performance inference platform
Hugging Face -- Hugging Face hosted models
Inception -- Inception AI models
Meta -- Meta AI models
MiniMaxi -- MiniMax models (native and OpenAI compatible)
Mistral -- Mistral AI models
Moonshot -- Kimi models
Ollama -- Self-hosted open source models
OpenAI -- GPT models family (Chat Completions API)
OpenAI Responses -- OpenAI Responses API
OpenAI Compatible -- Generic driver for any OpenAI-compatible API
OpenRouter -- Multi-provider routing service
Perplexity -- Perplexity models
Qwen -- Qwen (Tongyi Qianwen) models
SambaNova -- SambaNova hosted models
Together -- Together AI hosted models
xAI -- xAI's Grok models

Embeddings Providers¶

For vector embeddings generation, Polyglot supports:

Microsoft Azure -- Azure-hosted OpenAI embeddings
Cohere -- Cohere embedding models
Google Gemini -- Google's embedding models
Jina -- Jina AI embeddings
Ollama -- Self-hosted embedding models
OpenAI -- OpenAI text embedding models

Use Cases¶

Polyglot is a good choice for a variety of scenarios:

Applications requiring LLM provider flexibility -- switch between providers based on cost, performance, or feature needs without rewriting application code
Multi-environment deployments -- use different LLM providers in development, staging, and production through preset configuration
Redundancy and fallback -- implement fallback strategies when a provider is unavailable
Hybrid approaches -- combine different providers for different tasks based on their strengths
Local + cloud development -- use local models via Ollama for development and cloud providers for production
Direct LLM access -- when you need raw LLM responses without the higher-level extraction that Instructor provides