Prioritize Uncertain Examples

Overview¶

When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label?

Active prompting identifies effective examples for human annotation using: - Uncertainty Estimation: Measure uncertainty on each example. - Selection: Choose the most uncertain examples for human labeling. - Annotation: Humans label selected examples. - Inference: Use newly labeled data to improve prompts.

Uncertainty Estimation (Disagreement)¶

Query the same example k times and measure disagreement: unique responses / total responses.

Example¶

<?php
require 'examples/boot.php';

use Cognesy\Instructor\Extras\Scalar\Scalar;
use Cognesy\Instructor\StructuredOutput;

class EstimateUncertainty {
    public function __invoke(int $k = 5) : float {
        $values = [];
        for ($i = 0; $i < $k; $i++) {
            $values[] = $this->queryHeight();
        }
        return $this->disagreement($values);
    }

    private function queryHeight() : int {
        return (new StructuredOutput)->with(
            messages: [['role' => 'user', 'content' => 'How tall is the Empire State Building in meters?']],
            responseModel: Scalar::integer('height'),
        )->get();
    }

    private function disagreement(array $responses) : float {
        $n = count($responses);
        if ($n === 0) return 0.0;
        return count(array_unique($responses)) / $n;
    }
}

$score = (new EstimateUncertainty)(k: 5);
dump($score);
?>

Selection & Annotation¶

Select the top-n most uncertain unlabeled examples for human annotation.

Inference¶

Use newly annotated examples as few-shot context during inference.

References¶

1) Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2) The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)