Machine learning device, inference device, machine learning method, recording medium, and method for generating trained model

WO2026140531A1PCT designated stage Publication Date: 2026-07-02ENEOS HLDG INC

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: ENEOS HLDG INC
Filing Date: 2025-11-07
Publication Date: 2026-07-02

AI Technical Summary

Technical Problem

Existing reinforcement learning methods using neural networks are inefficient and inaccurate in determining optimal actions due to insufficient consideration of preferable actions during the learning process.

Method used

A machine learning device that performs reinforcement learning by acquiring the current state of the environment, determining agent behavior based on a probability distribution, estimating the change in a predetermined evaluation index, calculating a loss, and updating the learning model based on this loss to improve the efficiency and accuracy of learning.

Benefits of technology

Enhances the efficiency and accuracy of learning by refining the probability distribution to align with favorable actions, thereby improving the performance of the learning model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure JP2025039067_02072026_PF_FP_ABST

Patent Text Reader

Abstract

This machine learning device performs reinforcement learning with regard to a trained model for performing output pertaining to an action of an agent (A1) from the state of an environment (E1), and comprises a memory in which a program is stored, and a processor that executes the program stored in the memory. The processor acquires the current state of the environment (E1) as a first state, determines an action of the agent (A1) on the basis of a first probability distribution, which is a probability distribution of a plurality of actions of the agent (A1) with regard to the first state and is obtained using the trained model, infers a change state of a prescribed evaluative index in accordance with if the action is changed, calculates a loss on the basis of the first probability distribution and the change state of the evaluative index, and updates the trained model on the basis of the loss.

Need to check novelty before this filing date? Find Prior Art