Why Instructor

LLMs are powerful, but their outputs are unpredictable. Instructor solves this.

The Problem¶

You've integrated an LLM into your PHP application. Now what?

<?php
// Typical LLM integration without Instructor
$response = $openai->chat([
    'messages' => [['role' => 'user', 'content' => 'Extract the person name and age from: "John is 25"']]
]);

$text = $response['choices'][0]['message']['content'];
// $text = "The person's name is John and they are 25 years old."
// or "Name: John, Age: 25"
// or "{ name: 'John', age: 25 }"
// or something else entirely...

// Now you need to:
// 1. Parse this somehow
// 2. Handle all possible formats
// 3. Validate the data
// 4. Handle errors
// 5. Retry on failure
// 6. Hope it works

The result? Fragile code, inconsistent data, and endless edge cases.

The Solution¶

Instructor gives you structured, validated, type-safe outputs:

<?php
class Person {
    public string $name;
    public int $age;
}

$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages('John is 25')
    ->get();

// Always a Person object
// Always with string $name
// Always with int $age
// Validated automatically
// Retries on failure

How It Works¶

Instructor uses a three-step process:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Define    │ ──▶ │   Extract   │ ──▶ │   Validate  │
│  PHP Class  │     │   via LLM   │     │  & Return   │
└─────────────┘     └─────────────┘     └─────────────┘

Define - You create a PHP class with typed properties
Extract - Instructor sends your schema to the LLM with optimized prompts
Validate - Results are validated; failures trigger automatic retry with feedback

Key Benefits¶

1. Type Safety¶

Your IDE understands the response. Autocomplete works. Static analysis catches errors.

<?php
$person = (new StructuredOutput)->withResponseClass(Person::class)->get();

// IDE knows $person->name is a string
// IDE knows $person->age is an int
// Typos like $person->naem are caught immediately

2. Automatic Validation¶

Use Symfony Validator constraints. Invalid responses trigger automatic retry:

<?php
class Person {
    #[Assert\NotBlank]
    #[Assert\Length(min: 2, max: 100)]
    public string $name;

    #[Assert\Range(min: 0, max: 150)]
    public int $age;
}

// If LLM returns age: -5, Instructor:
// 1. Detects validation failure
// 2. Sends error feedback to LLM
// 3. Requests corrected response
// 4. Repeats until valid or max retries

3. Self-Correcting Retries¶

LLMs make mistakes. Instructor handles this gracefully:

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->withMaxRetries(3)  // Try up to 3 times
    ->get();

On validation failure, Instructor tells the LLM exactly what went wrong:

"Validation failed: age must be greater than 0. Please correct and try again."

4. Provider Independence¶

Write once, run anywhere. Switch LLM providers without changing your code:

<?php
use Cognesy\Instructor\StructuredOutputRuntime;

// Development: Use local Ollama
$result = StructuredOutput::using('ollama')->withResponseClass(Task::class)->get();

// Staging: Use Groq for speed
$result = StructuredOutput::using('groq')->withResponseClass(Task::class)->get();

// Production: Use OpenAI for quality
$result = StructuredOutput::using('openai')->withResponseClass(Task::class)->get();

5. Multiple Output Modes¶

Works with any model capability:

Mode	Best For	How It Works
`Tools`	OpenAI, Claude	Uses function/tool calling
`JsonSchema`	GPT-4, newer models	Strict JSON Schema mode
`Json`	Most models	JSON response format
`MdJson`	Any model	Prompting-based extraction

6. Streaming Support¶

Get partial results as they arrive:

<?php
use Cognesy\Instructor\StructuredOutput;

$stream = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->with(messages: $text, options: ['stream' => true])
    ->stream();

foreach ($stream->partials() as $partial) {
    echo "Processing: " . ($partial->name ?? '...') . "\n";
}

$person = $stream->finalValue();

7. Multimodal Inputs¶

Process text, images, and chat conversations with the same API:

<?php
use Cognesy\Addons\Image\Image;

// Text
->withMessages("Extract from this text...")

// Images
->with(messages: Image::fromFile('receipt.jpg')->toMessage())
->withPrompt("Extract line items")

// Chat history
->withMessages([
    ['role' => 'system', 'content' => 'You extract data'],
    ['role' => 'user', 'content' => 'Process this...']
])

Comparison¶

Without Instructor¶

<?php
$response = $client->chat(['messages' => [...]]);
$json = json_decode($response['choices'][0]['message']['content'], true);

if (json_last_error() !== JSON_ERROR_NONE) {
    // Handle JSON parse error
    // Try to extract with regex?
    // Log and retry?
}

if (!isset($json['name']) || !is_string($json['name'])) {
    // Handle missing/invalid field
}

if (!isset($json['age']) || !is_int($json['age'])) {
    // Handle missing/invalid field
}

if ($json['age'] < 0) {
    // Handle validation error
    // Retry somehow?
}

$person = new Person();
$person->name = $json['name'];
$person->age = $json['age'];

With Instructor¶

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMessages($text)
    ->get();

Same result. Zero boilerplate.

Why Not Just Use JSON Mode / JSON Schema?¶

"But OpenAI has response_format: json_object and strict JSON Schema mode now. Why do I need Instructor?"

Good question. Here's what you're still stuck with:

1. Provider Inconsistency¶

Every provider does it differently:

Provider	JSON Mode	JSON Schema	Tool Calling
OpenAI	`response_format: {type: "json_object"}`	`response_format: {type: "json_schema", ...}`	Yes
Anthropic	❌ No native support	❌ No native support	Yes (different format)
Gemini	Different API entirely	Different API entirely	Yes (different format)
Mistral	Partial support	No	Yes
Ollama	Model-dependent	Model-dependent	Model-dependent

With raw APIs: You write different code for each provider.

With Instructor: One API. Instructor picks the best extraction method automatically.

<?php
// Same code works everywhere
$result = StructuredOutput::using('anthropic')->withResponseClass(Person::class)
    ->get();

2. No Object Hydration¶

JSON Schema gives you... JSON. Not objects.

<?php
// OpenAI with JSON Schema
$response = $openai->chat([
    'messages' => [...],
    'response_format' => [
        'type' => 'json_schema',
        'json_schema' => [
            'name' => 'person',
            'schema' => [
                'type' => 'object',
                'properties' => [
                    'name' => ['type' => 'string'],
                    'age' => ['type' => 'integer'],
                ],
                'required' => ['name', 'age'],
            ],
        ],
    ],
]);

$json = json_decode($response['choices'][0]['message']['content'], true);
// $json = ['name' => 'John', 'age' => 25]

// Now you manually hydrate:
$person = new Person();
$person->name = $json['name'];
$person->age = $json['age'];
// For nested objects? More manual work.
// For arrays of objects? Even more.

With Instructor: Direct to typed objects, including nested structures.

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->get();
// $person is already a Person object

3. Schema Definition Hell¶

JSON Schema is verbose and lives separately from your code:

<?php
// JSON Schema approach - 20+ lines for a simple object
$schema = [
    'type' => 'object',
    'properties' => [
        'name' => [
            'type' => 'string',
            'description' => 'The person\'s full name',
            'minLength' => 1,
        ],
        'age' => [
            'type' => 'integer',
            'description' => 'Age in years',
            'minimum' => 0,
            'maximum' => 150,
        ],
        'email' => [
            'type' => 'string',
            'format' => 'email',
            'description' => 'Contact email',
        ],
    ],
    'required' => ['name', 'age'],
    'additionalProperties' => false,
];

With Instructor: Your PHP class IS the schema.

<?php
class Person {
    /** The person's full name */
    #[Assert\NotBlank]
    public string $name;

    /** Age in years */
    #[Assert\Range(min: 0, max: 150)]
    public int $age;

    #[Assert\Email]
    public string|null $email;
}

Schema and validation rules in one place. IDE autocomplete. Type checking. Refactoring support.

4. No Validation Beyond Types¶

JSON Schema validates structure, not business logic:

<?php
// JSON Schema says this is valid:
{ "name": "", "age": -5, "email": "not-an-email" }
// All correct types! But completely useless data.

With Instructor: Full validation with Symfony constraints.

<?php
class Person {
    #[Assert\NotBlank]
    #[Assert\Length(min: 2)]
    public string $name;  // Empty string? Rejected.

    #[Assert\Positive]
    public int $age;  // Negative? Rejected.

    #[Assert\Email]
    public string $email;  // Invalid format? Rejected.
}

5. No Retry Mechanism¶

JSON Schema mode fails silently or throws. You handle recovery:

<?php
// What happens when the LLM returns invalid JSON despite schema?
try {
    $response = $openai->chat([...]);
    $json = json_decode($response['choices'][0]['message']['content'], true);
} catch (Exception $e) {
    // Now what?
    // Retry with same prompt? Probably same error.
    // Modify the prompt? How?
    // Log and give up?
}

With Instructor: Automatic retry with error feedback.

<?php
$person = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->withMaxRetries(3)
    ->get();

// On failure, Instructor tells the LLM:
// "Validation failed: 'age' must be positive. You returned -5. Please correct."
// LLM tries again with that context.

6. No Streaming Support for Structured Data¶

JSON Schema mode gives you complete-or-nothing:

<?php
// Can't do this with raw JSON Schema mode:
// - Show partial results as they arrive
// - Update UI progressively
// - Stream array items one by one

With Instructor: Full streaming with partial updates.

<?php
use Cognesy\Instructor\StructuredOutput;

$stream = (new StructuredOutput)
    ->withResponseClass(Person::class)
    ->with(messages: $text, options: ['stream' => true])
    ->stream();

foreach ($stream->partials() as $partial) {
    updateUI($partial);
}

$person = $stream->finalValue();

7. Anthropic Doesn't Have JSON Mode¶

Claude is one of the best models, but Anthropic has no native JSON mode:

<?php
// This doesn't exist for Anthropic:
$response = $anthropic->messages([
    'response_format' => ['type' => 'json_object'],  // ❌ Not supported
]);

// You're stuck with:
// - Prompt engineering ("respond only in JSON...")
// - Hoping it complies
// - Parsing whatever comes back

With Instructor: Works seamlessly with Claude.

<?php
$person = StructuredOutput::using('anthropic')->withResponseClass(Person::class)
    ->get();
// Instructor uses tool calling or optimized prompts automatically

8. The Real-World Comparison¶

Capability	Raw JSON/JSON Schema	Instructor
Works with all providers	❌ Different APIs	✅ Unified
Object hydration	❌ Manual	✅ Automatic
Nested objects	❌ Manual recursion	✅ Automatic
Business validation	❌ None	✅ Full
Retry on failure	❌ Manual	✅ Automatic
Error feedback to LLM	❌ None	✅ Built-in
Streaming partials	❌ Not possible	✅ Supported
Type safety in IDE	❌ None	✅ Full
Schema = Code	❌ Separate	✅ Same file
Works with Claude	❌ No JSON mode	✅ Yes

The Bottom Line¶

JSON Schema mode is a step forward, but it's a low-level primitive. You still need to:

Write provider-specific code
Manually deserialize to objects
Implement your own validation
Build your own retry logic
Handle streaming yourself
Maintain schemas separate from code

Instructor handles all of this. You define a PHP class and call ->get().

When to Use Instructor¶

Great for:

Extracting structured data from unstructured text
Building forms that accept natural language
Processing documents (invoices, resumes, contracts)
Content classification and tagging
Data transformation pipelines
Any task requiring reliable LLM output structure

Not designed for:

Open-ended creative writing
Tasks where free-form text is the desired output
Simple completions without structure requirements

The Instructor Family¶

Instructor exists in multiple languages with consistent APIs:

Language	Repository
PHP (this)	cognesy/instructor-php
Python (original)	jxnl/instructor
JavaScript	instructor-ai/instructor-js
Elixir	instructor-ai/instructor-ex
Ruby	instructor-ai/instructor-rb

Ready to get started? Jump to the Getting Started Guide or explore the Cookbook for practical examples.