Skip to content

Text Classification using LLM

This tutorial showcases how to implement text classification tasks—specifically, single-label and multi-label classifications—using LLM (via OpenAI API), PHP's enums and classes.

Motivation

Text classification is a common problem in many NLP applications, such as spam detection or support ticket categorization. The goal is to provide a systematic way to handle these cases using language models in combination with PHP data structures.

Single-Label Classification

Defining the Structures

For single-label classification, we first define an enum for possible labels and a PHP class for the output.

<?php
// Enumeration for single-label text classification. 
enum Label : string {
    case SPAM = "spam";
    case NOT_SPAM = "not_spam";
}

// Class for a single class label prediction. 
class SinglePrediction {
    public Label $classLabel;
}

Classifying Text

The function classify will perform the single-label classification.

<?php
use Cognesy\Instructor\Instructor;

/**
 * Perform single-label classification on the input text. 
 */
function classify(string $data) : SinglePrediction {
    return (new Instructor())->respond(
        messages: [[
            "role" => "user",
            "content" => "Classify the following text: $data",
        ]],
        responseModel: SinglePrediction::class,
        model: "gpt-3.5-turbo-0613",
    );
}

Testing and Evaluation

Let's run an example to see if it correctly identifies a spam message.

<?php

// Test single-label classification
$prediction = classify("Hello there I'm a Nigerian prince and I want to give you money");
assert($prediction->classLabel == Label::SPAM);

Multi-Label Classification

Defining the Structures

For multi-label classification, we introduce a new enum class and a different PHP class to handle multiple labels.

<?php
/** Potential ticket labels */
enum Label : string {
    case TECH_ISSUE = "tech_issue";
    case BILLING = "billing";
    case SALES = "sales";
    case SPAM = "spam";
    case OTHER = "other";
}

/** Represents analysed ticket data */
class Ticket {
    /** @var Label[] */
    public array $ticketLabels = [];
}

Classifying Text

The function multi_classify executes multi-label classification using LLM.

<?php
use Cognesy\Instructor\Instructor;

// Perform single-label classification on the input text.
function multi_classify(string $data) : Ticket {
    return (new Instructor())->respond(
        messages: [[
            "role" => "user",
            "content" => "Classify following support ticket: {$data}",
        ]],
        responseModel: Ticket::class,
        model: "gpt-3.5-turbo-0613",
    );
}

Testing and Evaluation

Finally, we test the multi-label classification function using a sample support ticket.

<?php
// Test single-label classification
$ticket = "My account is locked and I can't access my billing info.";
$prediction = multi_classify($ticket);

assert(in_array(Label::TECH_ISSUE, $prediction->classLabels));
assert(in_array(Label::BILLING, $prediction->classLabels));