Overview
Polyglot is a PHP library that provides a unified API for interacting with various Large Language Model (LLM) providers. It serves as the low-level transport and normalization layer for InstructorPHP, but can also be used as a standalone library for direct LLM interactions.
The core philosophy behind Polyglot is to create a consistent, provider-agnostic interface that abstracts away the differences between LLM APIs while staying close to provider-native request shapes. This enables developers to:
- Write code once and use it with any supported LLM provider
- Easily switch between providers without changing application code
- Use different providers in different environments (development, testing, production)
- Fall back to alternative providers if one becomes unavailable
- Use local models (via Ollama) for development and cloud providers for production
Polyglot has two main entrypoints:
Cognesy\Polyglot\Inference\Inferencefor model responses (chat completions)Cognesy\Polyglot\Embeddings\Embeddingsfor vector embeddings
In 2.0, Polyglot stays close to provider-native request shapes, supporting:
- Plain text output by default
- Native JSON output through
responseFormat - Native JSON schema output through
responseFormat - Tool calling through
toolsandtoolChoice - Streaming through
withStreaming()andstream() - Embeddings through the
Embeddingsfacade
Polyglot is a transport and normalization layer. If you need higher-level structured output workflows, fallback prompting, or schema-to-object extraction, use Instructor on top.
Key Features¶
Unified LLM API¶
Polyglot's primary feature is its unified API that works across multiple LLM providers:
- Consistent interface for making inference and embedding requests
- Common message format across all providers
- Standardized response handling with
InferenceResponseandEmbeddingsResponse - Unified error handling and retry policies
Framework-Agnostic¶
Polyglot is designed to work with any PHP framework or even in plain PHP applications. It does not depend on any specific framework, making it easy to integrate into existing projects.
- Compatible with Laravel, Symfony, CodeIgniter, and others
- Can be used in CLI scripts or web applications
- Lightweight with minimal dependencies
Configuration Flexibility¶
Polyglot offers a flexible configuration system built around YAML preset files:
- Configure multiple providers simultaneously using named presets
- Environment-based configuration with
${ENV_VAR}interpolation in preset files - Runtime provider switching via
Inference::using('preset-name') - Per-request customization through the fluent builder API
- DSN-based configuration via
LLMConfig::fromDsn()
Main Concepts¶
Inference¶
The Inference class is the main facade for sending requests to LLM providers and receiving responses. It provides a fluent builder API for constructing requests and several convenience methods for consuming responses.
Use Inference when you want a model response as:
- Plain text via
get()-- returns the raw content string - A full
InferenceResponseviaresponse()-- gives access to content, tool calls, usage, finish reason, and reasoning content - Decoded JSON via
asJsonData()-- extracts and parses JSON from the response content - Tool call arguments via
asToolCallJsonData()-- extracts arguments from tool/function calls - Streamed deltas via
stream()-- returns anInferenceStreamfor real-time processing
The InferenceResponse object provides rich access to the full response, including:
content()-- the text content of the responsereasoningContent()-- reasoning/thinking content (for models that support it)toolCalls()-- any tool calls made by the modelusage()-- token usage statisticsfinishReason()-- why the model stopped generatinghasContent(),hasToolCalls(),hasReasoningContent()-- presence checks
Streaming¶
Polyglot provides first-class streaming support through the InferenceStream class. When you call stream(), you receive a stream object that yields PartialInferenceDelta objects as they arrive from the provider.
The stream supports several consumption patterns:
deltas()-- a generator that yields each visible delta as it arrivesall()-- collects all deltas into an arraymap(callable)-- transforms each delta through a mapper functionfilter(callable)-- yields only deltas matching a predicatereduce(callable, initial)-- reduces the stream to a single valuefinal()-- drains the stream and returns the finalizedInferenceResponseonDelta(callable)-- registers a callback for each visible delta
Streaming also dispatches events for monitoring, including StreamFirstChunkReceived for time-to-first-chunk measurement.
Embeddings¶
The Embeddings class is the facade for generating vector embeddings from text inputs. It follows the same fluent builder pattern as Inference.
Use Embeddings when you want vectors from one or more text inputs. The EmbeddingsResponse gives you:
first()-- the first embedding vector (useful for single-input requests)vectors()-- all embedding vectors as an array ofVectorobjectsall()-- alias forvectors()last()-- the last embedding vectorsplit(index)-- splits vectors into two groups at a given indexusage()-- provider-reported token usagetoValuesArray()-- raw float arrays for all vectors
Presets¶
The usual entrypoint is Inference::using('openai') or Embeddings::using('openai'), which loads a named preset configuration.
Preset files are YAML files that define the connection details for a provider. They are loaded from the following locations (searched in order):
config/llm/presets(orconfig/embed/presets) in your application rootpackages/polyglot/resources/config/llm/presetswithin the monorepovendor/cognesy/instructor-php/packages/polyglot/resources/config/llm/presetswhen installed via Composervendor/cognesy/instructor-polyglot/resources/config/llm/presetsfor standalone installs
A typical preset file looks like this:
driver: openai
apiUrl: 'https://api.openai.com/v1'
apiKey: '${OPENAI_API_KEY}'
endpoint: /chat/completions
model: gpt-4.1-nano
maxTokens: 1024
contextLength: 1000000
maxOutputLength: 16384
You can override any preset value at runtime using the fluent API -- for example, withModel() to change the model or withMaxTokens() to adjust the token limit.
Providers and Drivers¶
Each LLM provider is backed by a driver that knows how to format requests and parse responses for that provider's API. Polyglot ships with drivers for all supported providers, and you can register custom drivers when needed.
The LLMProvider and EmbeddingsProvider classes act as configuration holders that pair a config with an optional explicit driver. They are typically created behind the scenes when you use Inference::using() or Inference::fromConfig().
What Polyglot Covers¶
- Provider selection -- choose any supported provider through presets or programmatic configuration
- Request building -- fluent API for constructing messages, setting models, tools, response formats, and options
- Request execution -- handles HTTP communication with provider APIs
- Response normalization -- unified
InferenceResponseandEmbeddingsResponseregardless of provider - Streaming deltas -- real-time streaming with event-driven processing
- Retry policy -- configurable retry behavior for transient failures via
InferenceRetryPolicyandEmbeddingsRetryPolicy - Custom drivers and runtimes -- extensible architecture for adding new providers or custom execution logic
- Response caching -- configurable cache policy for inference responses
What It Does Not Try To Hide¶
Polyglot does not invent a synthetic output mode system. You shape requests with explicit fields that match current provider APIs -- responseFormat for JSON output, tools and toolChoice for function calling, and withStreaming() for streamed responses. This keeps the abstraction thin and predictable, making it easy to understand what will be sent to the provider.
Supported Providers¶
Inference Providers¶
Polyglot ships with drivers for the following LLM providers:
- A21 -- API access to Jamba models
- Anthropic -- Claude family of models
- AWS Bedrock -- Amazon Bedrock hosted models
- Microsoft Azure -- Azure-hosted OpenAI models
- Cerebras -- Cerebras high-performance inference
- Cohere -- Command models (v2 API)
- Deepseek -- Deepseek models including reasoning capabilities
- Fireworks -- Fireworks AI hosted models
- GLM -- GLM (ChatGLM) models
- Google Gemini -- Google's Gemini models (native API)
- Google Gemini (OpenAI compatible) -- Gemini via OpenAI-compatible endpoint
- Groq -- High-performance inference platform
- Hugging Face -- Hugging Face hosted models
- Inception -- Inception AI models
- Meta -- Meta AI models
- MiniMaxi -- MiniMax models (native and OpenAI compatible)
- Mistral -- Mistral AI models
- Moonshot -- Kimi models
- Ollama -- Self-hosted open source models
- OpenAI -- GPT models family (Chat Completions API)
- OpenAI Responses -- OpenAI Responses API
- OpenAI Compatible -- Generic driver for any OpenAI-compatible API
- OpenRouter -- Multi-provider routing service
- Perplexity -- Perplexity models
- Qwen -- Qwen (Tongyi Qianwen) models
- SambaNova -- SambaNova hosted models
- Together -- Together AI hosted models
- xAI -- xAI's Grok models
Embeddings Providers¶
For vector embeddings generation, Polyglot supports:
- Microsoft Azure -- Azure-hosted OpenAI embeddings
- Cohere -- Cohere embedding models
- Google Gemini -- Google's embedding models
- Jina -- Jina AI embeddings
- Ollama -- Self-hosted embedding models
- OpenAI -- OpenAI text embedding models
Use Cases¶
Polyglot is a good choice for a variety of scenarios:
- Applications requiring LLM provider flexibility -- switch between providers based on cost, performance, or feature needs without rewriting application code
- Multi-environment deployments -- use different LLM providers in development, staging, and production through preset configuration
- Redundancy and fallback -- implement fallback strategies when a provider is unavailable
- Hybrid approaches -- combine different providers for different tasks based on their strengths
- Local + cloud development -- use local models via Ollama for development and cloud providers for production
- Direct LLM access -- when you need raw LLM responses without the higher-level extraction that Instructor provides