Large language model training method, device and electronic equipment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By setting digit labels and sources in a large language model and utilizing the reward mechanism of reinforcement learning, the problem of inaccurate digit output was solved, and the accuracy of digit generation in the model was improved.

CN120632448BActive Publication Date: 2026-06-16BEIJING SANKUAI CLOUD COMPUTING TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING SANKUAI CLOUD COMPUTING TECH CO LTD
Filing Date: 2025-05-28
Publication Date: 2026-06-16

Application Information

Patent Timeline

28 May 2025

Application

16 Jun 2026

Publication

CN120632448B

IPC: G06F18/214; G06F18/241; G06F40/284; G06F16/334; G06N5/04

CPC: G06F18/214; G06F18/241; G06F40/284; G06F16/3344; G06N5/041

AI Tagging

Application Domain

Digital data information retrieval Natural language data processing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Unified data & analytics system and method using virtualized data access
US20260161648A1Visual data mining Structured data browsing
Digital twin system construction method and device, electronic equipment and storage medium
CN122263947Alower build costsReduce the difficulty of buildingDigital data information retrieval Biological models Computer hardwareSystems analysis
An agent-based urban digital twin interaction method
CN122195320ADigital data information retrieval Inference methods
A method and system for personalizing a self-service meal ordering package
CN122199101AMathematical models Digital data information retrieval
A product public opinion report generation method, device and system
CN122196278ADigital data information retrieval Office automation

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Large language models suffer from the illusion problem in numerical output, where the generated numbers are irrelevant to or inaccurate with the original text, and the accuracy of numbers may be sacrificed during the reinforcement learning alignment process.

Method used

By setting system prompts for the large language model, its output carries preset numerical labels and numerical sources. During training, it utilizes proximal policy optimization or group-relative policy optimization, combined with the first and second reward values, to perform reinforcement learning training, thereby optimizing the accuracy of the model's numerical output.

Benefits of technology

It improves the accuracy of the output numbers of large language models, reduces the occurrence of hallucination problems, and achieves low-cost improvement in digital output.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120632448B_ABST

Patent Text Reader

Abstract

The present disclosure provides a large language model training method, device and electronic equipment. The method comprises: setting control information for a large language model through a system prompt word of the large language model; inputting training data to the large language model, obtaining output data of the large language model, and extracting N numbers from the output data; obtaining a first reward value according to the number M of numbers carrying a preset number label and a number source in the N numbers; determining a standard value corresponding to the number according to the number source in the M numbers carrying the preset number label and the number source, and obtaining a second reward value according to the comparison result of the standard value and the number; and performing reinforcement learning training on the large language model by using a proximal policy optimization method or a group relative policy optimization method, wherein the training reward value in the proximal policy optimization method or the group relative policy optimization method is formed according to the first reward value and the second reward value. The present disclosure can improve the accuracy of the numbers generated by the large language model.

Need to check novelty before this filing date? Find Prior Art