Data classification method based on combination of LLM and PLM
By combining large language models and pre-trained language models, and employing data augmentation and classification knowledge bases, the problems of high cost and low accuracy in constructing training datasets for multi-level and multi-label classification of government service data were solved, achieving efficient and low-cost multi-level label classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF SCI & TECH OF CHINA
- Filing Date
- 2024-03-19
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies for high-precision multi-level multi-label classification of government service data suffer from problems such as high cost of training dataset construction, difficulty in achieving high accuracy in LLM classification results, and information loss due to PLM length limitations.
By combining Large Language Model (LLM) and Pre-trained Language Model (PLM), a high-quality training dataset is constructed using data augmentation and human intervention. Multi-level classification is achieved by utilizing a classification knowledge base and hierarchical prompts.
It reduces the cost of building training datasets, improves classification accuracy, solves the problem of PLM length limitation, and achieves high-precision multi-level label classification.
Smart Images

Figure CN118227789B_ABST