Web page to PHP objects
Overview¶
This example demonstrates how to extract structured data from a web page and get it as PHP object.
Example¶
In this example we will be extracting list of Laravel companies from The Manifest
website. The result will be a list of Company objects.
We use Webpage extractor to get the content of the page and specify 'none' scraper,
which means that we will be using built-in file_get_contents function to get the
content of the page.
In production environment you might want to use one of the supported scrapers:
- browsershot
- scrapingbee
- scrapfly
- jinareader
Commercial scrapers require API key, which can be set in the auxiliary web
configuration files (/packages/auxiliary/config/web/default.yaml and
/packages/auxiliary/config/web/scrapers/*.yaml).
<?php
require 'examples/boot.php';
use Cognesy\Auxiliary\Web\Webpage;
use Cognesy\Config\BasePath;
use Cognesy\Instructor\StructuredOutput;
use Cognesy\Instructor\StructuredOutputRuntime;
use Cognesy\Instructor\Enums\OutputMode;
use Cognesy\Polyglot\Inference\LLMProvider;
use Cognesy\Schema\Attributes\Instructions;
class Company {
public string $name = '';
public string $location = '';
public string $description = '';
public int $minProjectBudget = 0;
public string $companySize = '';
#[Instructions('Remove any tracking parameters from the URL')]
public string $websiteUrl = '';
/** @var string[] */
public array $clients = [];
}
$companyGen = Webpage::withScraper('none')
->get(BasePath::get('examples/A05_Extras/WebToObjects/companies.html'))
->select('.directory-providers__list')
->selectMany(
selector: '.provider-card',
callback: fn($item) => $item->asMarkdown(),
limit: 3
);
$companies = [];
echo "Extracting company data from:\n\n";
foreach($companyGen as $companyDiv) {
/** @var string $companyDiv */
echo " > " . substr($companyDiv, 0, 32) . "...\n\n";
$company = new StructuredOutput(
StructuredOutputRuntime::fromProvider(LLMProvider::using('openai'))
->withOutputMode(OutputMode::Json)
)
->with(
messages: $companyDiv,
responseModel: Company::class,
)->get();
$companies[] = $company;
dump($company);
}
assert(count($companies) === 3);
?>