Data classification method based on combination of LLM and PLM

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining large language models and pre-trained language models, and employing data augmentation and classification knowledge bases, the problems of high cost and low accuracy in constructing training datasets for multi-level and multi-label classification of government service data were solved, achieving efficient and low-cost multi-level label classification.

CN118227789BActive Publication Date: 2026-06-26UNIV OF SCI & TECH OF CHINA

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: UNIV OF SCI & TECH OF CHINA
Filing Date: 2024-03-19
Publication Date: 2026-06-26

Application Information

Patent Timeline

19 Mar 2024

Application

26 Jun 2026

Publication

CN118227789B

IPC: G06F16/353; G06F18/214; G06F18/15; G06F18/22; G06F18/24; G06N5/022; G06N5/04; G06F40/30

AI Tagging

Technology Topics

Data set Bioinformatics

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A three-dimensional spatial organ medical image feature extraction system and method for reducing false positives
JP7877550B1Image analysis Sensors Data setImage code
Spectral equalization diffractive neural network for image classification
CN122242605APhysical realisation Data set Grating
Intelligent selection method for soil heavy metal pollution remediation agent based on data fusion
CN121919821BEnvironmental resource management Data set
A traffic car customer service marketing method and system based on a large language model
CN120851956Baccurate perceptionaccurate quantitative analysisInput/output for user-computer interaction Biological models Personalization Data set
Distributed adaptive framing output method based on large-scale remote sensing images
CN121214183BReduce read volumeAvoid the dilemma of being idleResource allocation Character and pattern recognition Data set Image resolution

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies for high-precision multi-level multi-label classification of government service data suffer from problems such as high cost of training dataset construction, difficulty in achieving high accuracy in LLM classification results, and information loss due to PLM length limitations.

Method used

By combining Large Language Model (LLM) and Pre-trained Language Model (PLM), a high-quality training dataset is constructed using data augmentation and human intervention. Multi-level classification is achieved by utilizing a classification knowledge base and hierarchical prompts.

Benefits of technology

It reduces the cost of building training datasets, improves classification accuracy, solves the problem of PLM length limitation, and achieves high-precision multi-level label classification.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118227789B_ABST

Patent Text Reader

Abstract

The application discloses a data classification method based on combination of LLM and PLM, relates to the technical field of data classification, and the training process of a target classification model is as follows: S1, constructing a training data set; S2, training a PLM through seed data in the training set to construct a classification "small model"; S3, constructing a classification knowledge base, inputting classification knowledge base, a multi-level label list published by an authoritative organization and event information in a selected data set into an LLM to obtain a classification label result A; S4, inputting event information in the selected data set into the PLM for multi-level classification to output a classification label result B; S5, judging whether the classification label result A and the classification label result B are consistent, and outputting a final classification label after judgment; and S6, outputting all classification labels based on a superior label list corresponding to the final classification label; the data classification method improves the precision of label classification.

Need to check novelty before this filing date? Find Prior Art