Features

A comprehensive overview of Instructor's capabilities.

Core Features¶

Structured Output Extraction¶

Define a PHP class, get a populated object back:

<?php
class Person {
    public string $name;
    public int $age;
    public string $occupation;
}

$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages("Extract: Sarah, 32, software architect")
    ->get();

Key capabilities:

Works with any PHP class with typed properties
Supports nested objects and arrays
Handles nullable fields gracefully
Preserves type information throughout

Automatic Validation¶

Built-in support for Symfony Validator:

<?php
use Symfony\Component\Validator\Constraints as Assert;

class User {
    #[Assert\NotBlank]
    #[Assert\Email]
    public string $email;

    #[Assert\Range(min: 18, max: 120)]
    public int $age;

    #[Assert\Choice(['active', 'inactive', 'pending'])]
    public string $status;
}

Validation features:

All Symfony validation constraints supported
Custom validators work out of the box
Validation errors trigger automatic retry
Error messages sent to LLM for self-correction

Self-Correcting Retries¶

When validation fails, Instructor automatically retries:

<?php
use Cognesy\Instructor\StructuredOutputRuntime;
use Cognesy\Polyglot\Inference\LLMProvider;

$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->withMaxRetries(3);

$result = (new StructuredOutput($runtime))
    ->withResponseClass(User::class)
    ->withMessages($text)
    ->get();

Retry behavior:

LLM generates response
Response validated against constraints
On failure: errors sent back to LLM with context
LLM attempts correction
Repeat until valid or max retries reached

Input Flexibility¶

Text Input¶

Simple string input:

<?php
->withMessages("John is 25 years old and works at Acme Corp")

Chat Messages¶

OpenAI-style message arrays:

<?php
->withMessages([
    ['role' => 'system', 'content' => 'You are a data extraction expert.'],
    ['role' => 'user', 'content' => 'Extract the person: John, 25, engineer']
])

Image Input¶

Process images with vision-capable models:

<?php
use Cognesy\Addons\Image\Image;

->with(messages: Image::fromFile('path/to/image.jpg')->toMessage())
->withPrompt("Extract all text from this document")

Supported formats: JPEG, PNG, GIF, WebP

Structured Input¶

Pass objects or arrays as input:

<?php
$inputData = [
    'document' => $documentText,
    'metadata' => ['source' => 'email', 'date' => '2024-01-15']
];

$result = (new StructuredOutput)
    ->withResponseClass(Analysis::class)
    ->withInput($inputData)
    ->get();

Output Modes¶

Tools Mode (Default)¶

Uses LLM function/tool calling:

<?php
use Cognesy\Instructor\Enums\OutputMode;
use Cognesy\Instructor\StructuredOutputRuntime;
use Cognesy\Polyglot\Inference\LLMProvider;

$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->withOutputMode(OutputMode::Tools);

Best for: OpenAI, Anthropic, most modern models

JSON Schema Mode¶

Strict schema enforcement:

<?php
$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->withOutputMode(OutputMode::JsonSchema);

Best for: GPT-4, models with strict JSON Schema support

JSON Mode¶

Basic JSON response format:

<?php
$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->withOutputMode(OutputMode::Json);

Best for: Models supporting JSON mode without strict schemas

Markdown JSON Mode¶

Prompting-based extraction:

<?php
$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->withOutputMode(OutputMode::MdJson);

Best for: Models without JSON mode, fallback option

Response Types¶

Single Object¶

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->get();

Arrays of Objects¶

Use Sequence::of() to extract lists:

<?php
use Cognesy\Instructor\Extras\Sequence\Sequence;

$people = (new StructuredOutput)
    ->withResponseClass(Sequence::of(Person::class))
    ->withMessages($text)
    ->get();

// Iterate over results
foreach ($people as $person) {
    echo $person->name;
}

// Or use array-like access
$first = $people->first();
$count = $people->count();
$all = $people->toArray();

Scalar Values¶

Extract simple types with adapters:

<?php
use Cognesy\Instructor\Extras\Scalars\Scalar;

// Boolean
$isSpam = (new StructuredOutput)
    ->withResponseClass(Scalar::boolean('isSpam'))
    ->get();

// Integer
$count = (new StructuredOutput)
    ->withResponseClass(Scalar::integer('count'))
    ->get();

// String
$summary = (new StructuredOutput)
    ->withResponseClass(Scalar::string('summary'))
    ->get();

Enums¶

<?php
enum Sentiment: string {
    case Positive = 'positive';
    case Negative = 'negative';
    case Neutral = 'neutral';
}

$sentiment = (new StructuredOutput)
    ->withResponseClass(Scalar::enum(Sentiment::class, 'sentiment'))
    ->get();

Streaming¶

Partial Updates¶

Get incremental results as they arrive:

<?php
$stream = (new StructuredOutput)
    ->withResponseClass(Article::class)
    ->with(
        messages: $text,
        options: ['stream' => true]
    )
    ->stream();

foreach ($stream->partials() as $partial) {
    // $partial has incrementally populated fields
    updateUI($partial);
}

$final = $stream->finalValue();

Or subscribe to streaming events:

<?php
use Cognesy\Instructor\Events\PartialsGenerator\PartialResponseGenerated;

$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new())
    ->onEvent(PartialResponseGenerated::class, fn(PartialResponseGenerated $event) => updateUI($event->partialResponse));

$stream = (new StructuredOutput)
    ->withRuntime($runtime)
    ->with(
        responseModel: Article::class,
        messages: $text,
        options: ['stream' => true],
    )
    ->stream();

$article = $stream->finalValue();

Sequence Streaming¶

Stream sequence items as they complete:

<?php
use Cognesy\Instructor\Extras\Sequence\Sequence;

$list = (new StructuredOutput)
    ->withResponseClass(Sequence::of(Person::class))
    ->with(
        messages: $text,
        options: ['stream' => true],
    )
    ->stream();

foreach ($list->sequence() as $seq) {
    processComplete($seq->last());
}

$final = $list->finalValue();

LLM Providers¶

Supported Providers¶

Provider	API Type	Streaming	Vision	Tool Calling
OpenAI	Native	✓	✓	✓
Anthropic	Native	✓	✓	✓
Google Gemini	Native	✓	✓	✓
Azure OpenAI	OpenAI-compatible	✓	✓	✓
Mistral	Native	✓	-	✓
Cohere	OpenAI-compatible	✓	-	✓
Groq	OpenAI-compatible	✓	-	✓
Fireworks AI	OpenAI-compatible	✓	✓	✓
Together AI	OpenAI-compatible	✓	✓	✓
Ollama	OpenAI-compatible	✓	✓	✓
OpenRouter	OpenAI-compatible	✓	✓	✓
Perplexity	OpenAI-compatible	✓	-	-
DeepSeek	OpenAI-compatible	✓	-	✓
xAI (Grok)	OpenAI-compatible	✓	-	✓
Cerebras	OpenAI-compatible	✓	-	✓
SambaNova	OpenAI-compatible	✓	-	✓

Provider Selection¶

<?php
use Cognesy\Instructor\StructuredOutputRuntime;

// Use preset from config
$structuredOutput = StructuredOutput::using('anthropic');

// Or configure runtime explicitly (advanced)
$structuredOutput = (new StructuredOutput)->withRuntime(
    StructuredOutputRuntime::fromConfig(
        \Cognesy\Polyglot\Inference\Config\LLMConfig::fromDsn('preset=anthropic,model=claude-3-5-sonnet-latest')
    )
);

Schema Definition¶

Type-Hinted Classes¶

<?php
class Order {
    public string $orderId;
    public Customer $customer;
    /** @var LineItem[] */
    public array $items;
    public float $total;
    public string|null $notes;
}

PHP DocBlocks for Instructions¶

<?php
class Product {
    /** The product SKU, e.g., "SKU-12345" */
    public string $sku;

    /** Price in USD, without currency symbol */
    public float $price;

    /** @var string[] List of applicable categories */
    public array $categories;
}

Attributes for Detailed Control¶

<?php
use Cognesy\Instructor\Schema\Attributes\Description;
use Cognesy\Instructor\Schema\Attributes\Instructions;

class Analysis {
    #[Description("Sentiment score from -1.0 (negative) to 1.0 (positive)")]
    public float $sentiment;

    #[Instructions("Extract the 3 most important points only")]
    /** @var string[] */
    public array $keyPoints;
}

Dynamic Schemas with Structure¶

<?php
use Cognesy\Dynamic\StructureBuilder;
use Cognesy\Instructor\StructuredOutput;

$schema = StructureBuilder::define('user')
    ->string('name')
    ->int('age', required: false)
    ->collection('tags', 'string', required: false)
    ->build();

$result = (new StructuredOutput)
    ->with(
        messages: 'Extract user profile from this text...',
        responseModel: $schema,
    )
    ->get();

Advanced Features¶

Context Caching¶

Reduce costs with cached context (Anthropic):

<?php
->withCachedContext([
    'Large document or context here...',
    'This won\'t be re-sent on retries'
])

Custom Prompts¶

Override default extraction prompts:

<?php
use Cognesy\Instructor\Config\StructuredOutputConfig;

->withPrompt("Extract the following fields precisely: ...")
->withConfig(new StructuredOutputConfig(
    retryPrompt: "The previous attempt had errors: {errors}. Please correct."
))

Event System¶

Monitor internal processing:

<?php
use Cognesy\Instructor\Events\StructuredOutput\StructuredOutputRequestReceived;
use Cognesy\Instructor\Events\StructuredOutput\StructuredOutputResponseGenerated;

$runtime = StructuredOutputRuntime::fromProvider(LLMProvider::new());

$runtime->onEvent(StructuredOutputRequestReceived::class, function($event) {
    logger()->info('Request received', $event->toArray());
});

$runtime->onEvent(StructuredOutputResponseGenerated::class, function($event) {
    logger()->info('Response generated', $event->toArray());
});

Debug Mode¶

See all LLM interactions:

<?php
$runtime->wiretap(fn($event) => logger()->debug((string) $event))

Outputs: - Full request payloads - Raw LLM responses - Validation errors - Retry attempts

Framework Integration¶

Laravel¶

<?php
// Service provider auto-registers

// Use facade
use Cognesy\Instructor\Facades\Instructor;

$result = Instructor::respond()
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->get();

// Or inject
public function handle(StructuredOutput $instructor)
{
    return $instructor->withResponseClass(Person::class)->get();
}

Symfony¶

<?php
// Configure as service
// services.yaml
services:
    Cognesy\Instructor\StructuredOutput:
        autowire: true

// Use in controller
public function extract(StructuredOutput $instructor): Response
{
    $result = $instructor->withResponseClass(Person::class)->get();
    return $this->json($result);
}

Standalone¶

<?php
// No framework needed
require 'vendor/autoload.php';

$result = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->get();

Observability¶

Token Usage¶

<?php
$response = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->getResponse();

echo $response->usage->inputTokens;
echo $response->usage->outputTokens;
echo $response->usage->totalTokens;

Timing¶

<?php
echo $response->timing->total; // Total processing time

Event-Based Logging¶

<?php
$runtime->onEvent('*', function($event) {
    $logger->log($event->name(), $event->toArray());
});

What's Next¶

Getting Started - Quick installation guide
Why Instructor - Understanding the value proposition
Use Cases - Industry-specific examples
Cookbook - 60+ working examples
API Reference - Complete documentation