Customizing prompts¶

In case you want to take control over the prompts sent by Instructor to LLM for different modes, you can use the prompt parameter in the request() or respond() methods.

It will override the default Instructor prompts, allowing you to fully customize how LLM is instructed to process the input.

Prompting models with tool calling support¶

Mode::Tools is usually most reliable way to get structured outputs following provided response schema.

Mode::Tools can make use of $toolName and $toolDescription parameters to provide additional semantic context to the LLM, describing the tool to be used for processing the input. Mode::Json and Mode::MdJson ignore these parameters, as tools are not used in these modes.

<?php
$user = (new Instructor)
    ->request(
        messages: "Our user Jason is 25 years old.",
        responseModel: User::class,
        prompt: "\nYour task is to extract correct and accurate data from the messages using provided tools.\n",
        toolName: 'extract',
        toolDescription: 'Extract information from provided content',
        mode: Mode::Tools)
    ->get();

Prompting models supporting JSON output¶

Aside from tool calling Instructor supports two other modes for getting structured outputs from LLM: Mode::Json and Mode::MdJson.

Mode::Json uses JSON mode offered by some models and API providers to get LLM respond in JSON format rather than plain text.

<?php
$user = (new Instructor)->respond(
    messages: "Our user Jason is 25 years old.",
    responseModel: User::class,
    prompt: "\nYour task is to respond correctly with JSON object.",
    mode: Mode::Json
);

Note that various models and API providers have specific requirements on the input format, e.g. for OpenAI JSON mode you are required to include JSON string in the prompt.

Including JSON Schema in the prompt¶

Instructor takes care of automatically setting the response_format parameter, but this may not be sufficient for some models or providers - some of them require specifying JSON response format as part of the prompt, rather than just as response_format parameter in the request (e.g. OpenAI).

For this reason, when using Instructor's Mode::Json and Mode::MdJson you should include the expected JSON Schema in the prompt. Otherwise, the response is unlikely to match your target model, making it impossible for Instructor to deserialize it correctly.

<?php
$jsonSchema = json_encode([
    "type" => "object",
    "properties" => [
        "name" => ["type" => "string"],
        "age" => ["type" => "integer"]
    ],
    "required" => ["name", "age"]
]);

$user = $instructor
    ->request(
        messages: "Our user Jason is 25 years old.",
        responseModel: User::class,
        prompt: "\nYour task is to respond correctly with JSON object. Response must follow JSONSchema: $jsonSchema\n",
        mode: Mode::Json)
    ->get();

The example above demonstrates how to manually create JSON Schema, but with Instructor you do not have to build the schema manually - you can use prompt template placeholder syntax to use Instructor-generated JSON Schema.

Prompt as template¶

Instructor allows you to use a template string as a prompt. You can use <|variable|> placeholders in the template string, which will be replaced with the actual values during the execution.

Currently, the following placeholders are supported: - <|json_schema|> - replaced with the JSON Schema for current response model

Example below demonstrates how to use a template string as a prompt:

<?php
$user = (new Instructor)
    ->request(
        messages: "Our user Jason is 25 years old.",
        responseModel: User::class,
        prompt: "\nYour task is to respond correctly with JSON object. Response must follow JSONSchema:\n<|json_schema|>\n",
        mode: Mode::Json)
    ->get();

Prompting the models with no support for tool calling or JSON output¶

Mode::MdJson is the most basic (and least reliable) way to get structured outputs from LLM. Still, you may want to use it with the models which do not support tool calling or JSON output.

Mode::MdJson relies on the prompting to get LLM response in JSON formatted data.

Many models prompted in this mode will respond with a mixture of plain text and JSON data. Instructor will try to find JSON data fragment in the response and ignore the rest of the text.

This approach is most prone to deserialization and validation errors and needs providing JSON Schema in the prompt to increase the probability that the response is correctly structured and contains the expected data.

<?php
$user = (new Instructor)
    ->request(
        messages: "Our user Jason is 25 years old.",
        responseModel: User::class,
        prompt: "\nYour task is to respond correctly with strict JSON object containing extracted data within a ```json {} ``` codeblock. Object must validate against this JSONSchema:\n{json_schema}\n",
        mode: Mode::MdJson)
    ->get();