Specifying Data Model¶
Type Hints¶
Use PHP type hints to specify the type of extracted data.
Use nullable types to indicate that given field is optional.
Instructor will only fill in the fields that are public. Private and protected fields are ignored and their values are not going to be extracted (they will be left empty, with default values set as defined in your class).
Private vs public object field¶
Instructor only sets public fields of the object with the data provided by LLM. Private and protected fields are left unchanged. If you want to access them directly after extraction, consider providing default values for them.
See examples/PrivateVsPublicFields/run.php
to check the details on the behavior of extraction for classes with private and public fields.
DocBlock type hints¶
You can also use PHP DocBlock style comments to specify the type of extracted data. This is useful when you want to specify property types for LLM, but can't or don't want to enforce type at the code level.
<?php
class Person {
/** @var string */
public $name;
/** @var int */
public $age;
/** @var Address $address person's address */
public $address;
}
See PHPDoc documentation for more details on DocBlock: https://docs.phpdoc.org/3.0/guide/getting-started/what-is-a-docblock.html#what-is-a-docblock
Using DocBlocks as Additional Instructions for LLM¶
You can use PHP DocBlocks (/** */) to provide additional instructions for LLM at class or field level, for example to clarify what you expect or how LLM should process your data.
Instructor extracts PHP DocBlocks comments from class and property defined and includes them in specification of response model sent to LLM.
Using PHP DocBlocks instructions is not required, but sometimes you may want to clarify your intentions to improve LLM's inference results.
/**
* Represents a skill of a person and context in which it was mentioned.
*/
class Skill {
public string $name;
/** @var SkillType $type type of the skill, derived from the description and context */
public SkillType $type;
/** Directly quoted, full sentence mentioning person's skill */
public string $context;
}
Attributes for data model descriptions and instructions¶
Instructor supports #[Description]
and #[Instructions]
attributes to provide more context to the language model or to provide additional instructions to the model.
#[Description]
attribute is used to describe a class or property in your data model. Instructor will use this text to provide more context to the language model.
#[Instructions]
attribute is used to provide additional instructions to the language model, such as how to process the data.
You can add multiple attributes to a class or property - Instructor will merge them into a single block of text.
Instructor will still include any PHPDoc comments provided in the class, but using attributes might be more convenient and easier to read.
<?php
#[Description("Information about user")]
class User {
#[Description("User's age")]
public int $age;
#[Instructions("Make it ALL CAPS")]
public string $name;
#[Description("User's job")]
#[Instructions("Ignore hobbies, identify profession")]
public string $job;
}
Typed Collections / Arrays¶
PHP currently does not support generics or typehints to specify array element types.
Use PHP DocBlock style comments to specify the type of array elements.
<?php
class Person {
// ...
}
class Event {
// ...
/** @var Person[] list of extracted event participants */
public array $participants;
// ...
}
Complex data extraction¶
Instructor can retrieve complex data structures from text. Your response model can contain nested objects, arrays, and enums.
<?php
use Cognesy/Instructor/Instructor;
// define a data structures to extract data into
class Person {
public string $name;
public int $age;
public string $profession;
/** @var Skill[] */
public array $skills;
}
class Skill {
public string $name;
public SkillType $type;
}
enum SkillType : string {
case Technical = 'technical';
case Other = 'other';
}
$text = "Alex is 25 years old software engineer, who knows PHP, Python and can play the guitar.";
$person = (new Instructor)->respond(
messages: [['role' => 'user', 'content' => $text]],
responseModel: Person::class,
); // client is passed explicitly, can specify e.g. different base URL
// data is extracted into an object of given class
assert($person instanceof Person); // true
// you can access object's extracted property values
echo $person->name; // Alex
echo $person->age; // 25
echo $person->profession; // software engineer
echo $person->skills[0]->name; // PHP
echo $person->skills[0]->type; // SkillType::Technical
// ...
var_dump($person);
// Person {
// name: "Alex",
// age: 25,
// profession: "software engineer",
// skills: [
// Skill {
// name: "PHP",
// type: SkillType::Technical,
// },
// Skill {
// name: "Python",
// type: SkillType::Technical,
// },
// Skill {
// name: "guitar",
// type: SkillType::Other
// },
// ]
// }
Dynamic data schemas with Structure
class¶
See Structures for more details on how to work with dynamic data schemas.