Neural network training method, neural network training architecture, and program

WO2026133890A1PCT designated stage Publication Date: 2026-06-25WASEDA UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: WASEDA UNIV
Filing Date: 2025-11-27
Publication Date: 2026-06-25

Application Information

Patent Timeline

27 Nov 2025

Application

25 Jun 2026

Publication

WO2026133890A1

IPC: G06N3/084; G06N3/09

AI Tagging

Application Domain

Biological models

Technology Topics

Network onData acquisition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Learning unified embedding
US12664419B2Biometric pattern recognition Machine learningNetwork onData mining
IoT device and method for wireless network management
WO2026131290A1Assess restriction Network topologies TelecommunicationsNetwork on
Terminal, network node, and communication method
WO2026133551A1Network traffic/resource management Security arrangement PathPing Telecommunications
Local interface error recovery for node-to-node transfers in mesh network on an integrated circuit (IC) and related methods
US20260178436A1Non-redundant fault processing Transmission Computer networkNetwork on
Monitoring a secure network using a network tap device
CN117203936BSecuring communicationNetwork tapNetwork packet

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure JP2025041345_25062026_PF_FP_ABST

Patent Text Reader

Abstract

According to one aspect of the present disclosure, provided is a neural network training method comprising: a training data acquisition step of acquiring data to be input to a neural network and a teacher probability distribution accompanying the data; and an output error minimization step of, on the basis of an error between a pseudo probability distribution output from the neural network on the basis of the data and the teacher probability distribution, calculating a gradient for executing minimization that can reflect the properties of the teacher probability distribution for a gradient used in a procedure for minimizing the error, and updating each of a plurality of parameters by the calculated gradient.

Need to check novelty before this filing date? Find Prior Art

Claims

1. A method for training a neural network, comprising the following steps: a training data acquisition step, in which data to be input to the neural network and a teacher probability distribution associated with the data are acquired; and an output error minimization step, in which, based on the error between a pseudo-probability distribution output from the neural network based on the data and the teacher probability distribution, a gradient used in the procedure for minimizing the error is calculated to perform minimization that can reflect the properties of the teacher probability distribution, and each of the plurality of parameters is updated with the calculated gradient.

2. A method for training a neural network according to claim 1, wherein the teacher probability distribution and the pseudo-probability distribution are probability vectors consisting of probability values for each class.

3. A method for training a neural network according to claim 1 or claim 2, wherein minimization that reflects the properties of the teacher probability distribution means minimizing a loss defined such that each component of the gradient includes at least a coefficient that depends on the teacher probability value of the class corresponding to each component, or a coefficient based on the difference between the teacher probability distribution and the pseudo-probability distribution.

4. A training method according to any one of claims 1 to 3, wherein the output error minimization step involves calculating an alpha divergence based on the error and calculating the gradient such that the alpha divergence is minimized.

5. The training method according to claim 4, wherein the alpha divergence is defined to include the product of the i-th training data and the i-th output node (where i is any natural number).

6. A training method according to claim 4 or claim 5, wherein in the output error minimization step, parameter α > -1, and the gradient is calculated such that the alpha divergence is minimized, where parameter α is a parameter that adjusts the alpha divergence.

7. A training method according to any one of claims 4 to 6, wherein the alpha divergence is a plurality of alpha divergences, and in the output error minimization step, the gradient is calculated by selecting one of the plurality of alpha divergences according to the value of the plurality of alpha divergences.

8. A training method according to claim 7, wherein in the output error minimization step, the gradient is calculated by selecting the alpha divergence with the largest value from among the plurality of alpha divergences.

9. A training method according to any one of claims 4 to 8, wherein in the acquisition step, the teacher probability distribution is configured such that only one component of the teacher vector is single and the other components are zero, and in the output error minimization step, the gradient is calculated that reflects the output position corresponding to the single component.

10. A training method according to any one of claims 4 to 8, wherein in the acquisition step, a second matrix different from the first matrix is applied as the teacher probability distribution, where the first matrix is a matrix in which only one component of the teacher vector is the total probability and the other components are zero, the second matrix is a matrix in which a portion of the total probability of the single component of the first matrix is distributed to the other elements, and in the output error minimization step, the gradient is calculated that reflects the output position corresponding to the single component.

11. A training method according to any one of claims 4 to 8, wherein in the acquisition step, a third matrix different from both the first matrix and the second matrix is applied as the teacher probability distribution, where the first matrix is a matrix in which only one component of the teacher vector is the total probability and the other components are zero, the second matrix is a matrix in which a portion of the total probability of the single component of the first matrix is distributed to the other elements, the third matrix is a matrix in which at least the second matrix has been quenched or annealed, and in the output error minimization step, the gradient reflecting the output position corresponding to the single component is calculated.

12. A training method according to any one of claims 4 to 11, wherein in the output error minimization step, processing by an optimizer is performed before or after calculating the gradient based on the alpha divergence.

13. A neural network training architecture comprising a processor capable of executing a program so that each step of the training method according to any one of claims 1 to 12 is performed.

14. A program for causing at least one computer to perform each step of the neural network training method described in any one of claims 1 to 12.