Data classification method based on combination of LLM and PLM

By combining large language models and pre-trained language models, and employing data augmentation and classification knowledge bases, the problems of high cost and low accuracy in constructing training datasets for multi-level and multi-label classification of government service data were solved, achieving efficient and low-cost multi-level label classification.

CN118227789BActive Publication Date: 2026-06-26UNIV OF SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF SCI & TECH OF CHINA
Filing Date
2024-03-19
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for high-precision multi-level multi-label classification of government service data suffer from problems such as high cost of training dataset construction, difficulty in achieving high accuracy in LLM classification results, and information loss due to PLM length limitations.

Method used

By combining Large Language Model (LLM) and Pre-trained Language Model (PLM), a high-quality training dataset is constructed using data augmentation and human intervention. Multi-level classification is achieved by utilizing a classification knowledge base and hierarchical prompts.

Benefits of technology

It reduces the cost of building training datasets, improves classification accuracy, solves the problem of PLM length limitation, and achieves high-precision multi-level label classification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118227789B_ABST
    Figure CN118227789B_ABST
Patent Text Reader

Abstract

The application discloses a data classification method based on combination of LLM and PLM, relates to the technical field of data classification, and the training process of a target classification model is as follows: S1, constructing a training data set; S2, training a PLM through seed data in the training set to construct a classification "small model"; S3, constructing a classification knowledge base, inputting classification knowledge base, a multi-level label list published by an authoritative organization and event information in a selected data set into an LLM to obtain a classification label result A; S4, inputting event information in the selected data set into the PLM for multi-level classification to output a classification label result B; S5, judging whether the classification label result A and the classification label result B are consistent, and outputting a final classification label after judgment; and S6, outputting all classification labels based on a superior label list corresponding to the final classification label; the data classification method improves the precision of label classification.
Need to check novelty before this filing date? Find Prior Art