Misc

Beyond simple iteration, InferenceStream provides a set of functional helpers for processing deltas. These methods build on top of the deltas() generator, so each one consumes the stream -- you should use only one of them per stream instance.

Reducing to a Single Value¶

The reduce() method works like array_reduce: it folds every delta into an accumulator and returns the final value. This is useful when you need a single result derived from the entire stream:

<?php

use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Inference;

$text = Inference::using('openai')
    ->withMessages(Messages::fromString('Write three short lines about queues.'))
    ->stream()
    ->reduce(
        fn(string $carry, $delta) => $carry . $delta->contentDelta,
        '',
    );

echo $text;

Because reduce() drains the entire stream before returning, it blocks until the response is complete.

Mapping Deltas¶

The map() method transforms each delta into a new value and yields the results as a generator. Use it to extract or reshape data from each chunk without consuming the stream eagerly:

<?php

use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Inference;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('List five fun facts about PHP.'))
    ->stream();

foreach ($stream->map(fn($delta) => strtoupper($delta->contentDelta)) as $chunk) {
    echo $chunk;
}

Filtering Deltas¶

The filter() method yields only the deltas that satisfy a given predicate. Deltas for which the callback returns false are silently skipped:

<?php

use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Inference;

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Count from one to ten.'))
    ->stream();

// Only process deltas that contain digits
foreach ($stream->filter(fn($delta) => preg_match('/\d/', $delta->contentDelta)) as $delta) {
    echo $delta->contentDelta;
}

Collecting All Deltas¶

The all() method drains the stream and returns every visible delta as an array. This is handy for inspection or testing, but keep in mind that it loads the entire stream into memory:

<?php

use Cognesy\Messages\Messages;
use Cognesy\Polyglot\Inference\Inference;

$deltas = Inference::using('openai')
    ->withMessages(Messages::fromString('Say hello.'))
    ->stream()
    ->all();

echo "Received " . count($deltas) . " deltas.\n";

Accessing the Last Delta¶

After the stream has been consumed (either partially or fully), you can retrieve the most recently yielded delta with lastDelta():

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('What is 2 + 2?'))
    ->stream();

foreach ($stream->deltas() as $delta) {
    // process...
}

$last = $stream->lastDelta();
echo $last->finishReason; // e.g. "stop"

This is particularly useful for inspecting the finish reason or final usage data without keeping track of it manually during iteration.

Token Usage¶

The usage() method returns the accumulated InferenceUsage object for the stream, containing input tokens, output tokens, and any cache or reasoning token counts reported by the provider:

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Summarize the theory of relativity.'))
    ->stream();

foreach ($stream->deltas() as $delta) {
    echo $delta->contentDelta;
}

$usage = $stream->usage();
echo "\nTokens used: input={$usage->inputTokens}, output={$usage->outputTokens}\n";

Execution Metadata¶

The execution() method returns the underlying InferenceExecution object, which contains the original request, the finalized response (once the stream completes), and execution metadata such as the execution ID:

$stream = Inference::using('openai')
    ->withMessages(Messages::fromString('Hello!'))
    ->stream();

$stream->final(); // ensure stream is consumed

$execution = $stream->execution();
echo "Execution ID: " . $execution->id->toString() . "\n";
echo "Model used: " . $execution->request()->model() . "\n";

Summary of Available Methods¶

Method	Returns	Consumes stream?	Description
`deltas()`	`Generator<PartialInferenceDelta>`	Yes	Yields visible deltas one by one.
`map(callable)`	`iterable<T>`	Yes	Transforms each delta via a callback.
`filter(callable)`	`iterable<PartialInferenceDelta>`	Yes	Yields only deltas matching a predicate.
`reduce(callable, initial)`	`mixed`	Yes (blocking)	Folds all deltas into a single value.
`all()`	`array<PartialInferenceDelta>`	Yes (blocking)	Collects all deltas into an array.
`onDelta(callable)`	`self`	No (registers callback)	Registers a callback fired for each visible delta.
`final()`	`?InferenceResponse`	Drains if needed	Returns the assembled final response.
`lastDelta()`	`?PartialInferenceDelta`	No	Returns the most recently yielded delta.
`usage()`	`InferenceUsage`	No	Returns accumulated token usage.
`execution()`	`InferenceExecution`	No	Returns the execution context and metadata.