Importance sampling ratio aggregation method, electronic device, medium and product

CN121543044BActive Publication Date: 2026-06-23INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2026-01-21
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, when calculating the aggregated value of importance sampling ratio, the task model is easily affected by the extremely important sampling ratio, causing weaker components to be ignored, resulting in a model that appears convergent but is inherently fragile. This reduces the success rate of the task model, especially in long-link inference scenarios.

Method used

Aggregation models such as the harmonic mean formula are used to aggregate candidate word positions with importance sampling ratios less than a threshold, thereby increasing the contribution of the importance sampling ratio of these positions to the aggregation value. Weaknesses are identified through behavioral change information and strengthened during the training process to fix the weaknesses of the task model.

Benefits of technology

It improves the success rate of task models in long-link inference scenarios, enhances the accuracy and stability of the models, and performs particularly well in image analysis and code generation tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121543044B_ABST
    Figure CN121543044B_ABST
Patent Text Reader

Abstract

The application discloses an importance sampling ratio aggregation method, an electronic device, a medium and a product, relates to the technical field of reinforcement learning, and comprises the following steps: firstly, for the same target input condition, first probability information of a first sequence output by a current task model and second probability information of a second sequence output by a historical task model are acquired respectively. And, a target aggregation model currently used is acquired. Then, after the first probability information and the second probability information are compared, behavior change information of the current task model compared with the historical task model is determined. In the case that the target aggregation model is a first aggregation model, the target aggregation model can be directly used to aggregate the importance sampling ratio of at least one word position included in the behavior change information to obtain a target aggregation value, so as to calculate a loss value. In this way, in the training process, the weak points of the task model can be repaired, and the success rate of the task model in executing a task can be improved.
Need to check novelty before this filing date? Find Prior Art