Large language model training method, device and electronic equipment
By setting digit labels and sources in a large language model and utilizing the reward mechanism of reinforcement learning, the problem of inaccurate digit output was solved, and the accuracy of digit generation in the model was improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SANKUAI CLOUD COMPUTING TECH CO LTD
- Filing Date
- 2025-05-28
- Publication Date
- 2026-06-16
AI Technical Summary
Large language models suffer from the illusion problem in numerical output, where the generated numbers are irrelevant to or inaccurate with the original text, and the accuracy of numbers may be sacrificed during the reinforcement learning alignment process.
By setting system prompts for the large language model, its output carries preset numerical labels and numerical sources. During training, it utilizes proximal policy optimization or group-relative policy optimization, combined with the first and second reward values, to perform reinforcement learning training, thereby optimizing the accuracy of the model's numerical output.
It improves the accuracy of the output numbers of large language models, reduces the occurrence of hallucination problems, and achieves low-cost improvement in digital output.
Smart Images

Figure CN120632448B_ABST