Neural network device

JP2026110362APending Publication Date: 2026-07-02KNOWLEDGE SATELLITE CO LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
KNOWLEDGE SATELLITE CO LTD
Filing Date
2024-12-20
Publication Date
2026-07-02

Smart Images

  • Figure 2026110362000001_ABST
    Figure 2026110362000001_ABST
Patent Text Reader

Abstract

To provide a neural network device that improves the accuracy of predictions. [Solution] The neural network device 100 includes a neural network processing unit 130 that performs computational processing of a trained neural network on input data and outputs output data for said input data. The activation function f(x) of the neural network is expressed by equation (1) using parameters α, β, and γ, and the parameters α, β, and γ are optimized during training. [Math 1] TIFF2026110362000016.tif24128
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0005] , ,

[0001] This disclosure relates to a neural network device.

Background Art

[0002] In recent years, neural networks that mathematically model neurons in the human nervous system have attracted attention. A neural network is a mathematical model that includes one or more non-linear units and is a machine learning model that predicts an output corresponding to an input.

Prior Art Documents

Non-Patent Documents

[0003]

Non-Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] This disclosure has been made in such a situation, and an exemplary object of one aspect thereof is to provide a neural network device with improved prediction accuracy.

Means for Solving the Problems

[0005] To solve the above problems, a neural network device according to an aspect of the present invention includes a neural network processing unit that executes arithmetic processing of a learned neural network on input data and outputs output data for the input data. The activation function f(x) of the neural network is represented by Equation (1) using parameters α, β, and γ, and the parameters α, β, and γ are optimized during learning.

Number

[0006] Another aspect of the present invention is also a neural network device. This neural network device comprises a learning unit that learns a neural network. The activation function f(x) of the neural network is expressed by equation (1) using parameters α, β, and γ, and the learning unit optimizes parameters α, β, and γ.

number

[0007] Furthermore, any combination of the above components, or any substitution of the components or expressions of this disclosure between methods, apparatus, systems, etc., is also valid as a form of this disclosure. [Effects of the Invention]

[0008] According to this disclosure, a neural network device with improved prediction accuracy can be provided. [Brief explanation of the drawing]

[0009] [Figure 1] This is a block diagram showing the functions and configuration of a neural network device according to an embodiment. [Figure 2] Figure 1 is a flowchart illustrating an example of the operation of a neural network device during the learning phase. [Figure 3] Figure 1 is a flowchart illustrating an example of the operation of the neural network device during the inference phase. [Modes for carrying out the invention]

[0010] The present disclosure will be described below with reference to the drawings, based on preferred embodiments. The embodiments are illustrative and not limiting, and not all features or combinations thereof described in the embodiments are necessarily essential to the disclosure.

[0011] Conventionally, various functions have been proposed as activation functions for the intermediate layer of neural networks. The sigmoid function is well known as an activation function, but it is prone to the vanishing gradient problem.

[0012] When the tanh function is used as the activation function, the vanishing gradient problem can be alleviated, but the gradient still tends to vanish when the input is far from 0.

[0013] Also, when the ReLU function is used as the activation function, the vanishing gradient problem can be alleviated. However, since the gradient for negative inputs is 0, the gradient completely vanishes at an expected value of 1 / 2, causing learning to stagnate.

[0014] The eLU function, which is an improvement of the ReLU function, has also been proposed, but it has not led to an improvement in accuracy.

[0015] Therefore, the inventor has considered a new activation function that can achieve stable learning and improve the accuracy of prediction, and has thus obtained the neural network device of the present disclosure.

[0016] FIG. 1 is a block diagram showing the functions and configuration of a neural network device (hereinafter referred to as "NN device") 100 according to an embodiment. Each block shown here can be realized hardware-wise by elements and mechanical devices including a computer's CPU (central processing unit), and software-wise by a computer program or the like. Here, however, functional blocks realized by their cooperation are depicted. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by combinations of hardware and software.

[0017] In the example of FIG. 1, the NN device 100 is realized by a single device (housing), but there is no limit to the physical number of housings of the NN device 100, and it may be realized by the cooperation of a plurality of devices each equipped with a CPU or the like.

[0018] In the learning phase, the NN device 100 functions as a machine learning device that learns the parameters of a neural network, and in the inference phase, it functions as an inference device that executes a predetermined inference process using the learned neural network.

[0019] In the learning phase, the NN device 100 executes the arithmetic processing of the neural network on the input data for learning (hereinafter referred to as learning data), and outputs the output data for the learning data. The NN device 100 updates the parameters to be optimized (learned) for the neural network so that the output data approaches the correct value. By repeating this, the parameters to be optimized are optimized, that is, learned.

[0020] In the inference phase, the NN device 100 executes the arithmetic processing of the learned neural network, that is, the neural network with the parameters to be optimized optimized, on the input data, and outputs the output data for the input data. The NN device 100 interprets the output data and executes a predetermined inference process.

[0021] The NN device 100 includes an input data acquisition unit 110, a neural network processing unit (hereinafter referred to as the "NN processing unit") 112, a learning unit 114, an inference unit 116, and a storage unit 120. The machine learning function is mainly realized by the NN processing unit 112 and the learning unit 114, and the inference process is mainly realized by the NN processing unit 112 and the inference unit 116.

[0022] The storage unit 120 includes a learning data storage unit 122 and a neural network storage unit (hereinafter referred to as the "NN storage unit") 124. The learning data storage unit 122 stores a plurality of learning data and the correct values corresponding to each of the plurality of learning data. The NN storage unit 124 stores data related to the neural network. The NN storage unit 124 stores, for example, a program for executing the arithmetic processing of the neural network, parameters such as the weights and biases of the neural network, and parameters α, β, γ of the activation function f(x) described later.

[0023] The input data acquisition unit 110 acquires multiple training datasets during the learning process, each of which contains training data and the corresponding ground truth values. The input data acquisition unit 110 stores the acquired training datasets in the training data storage unit 122. The input data acquisition unit 110 acquires unknown data to be inferred during the inference process.

[0024] The NN processing unit 112 performs neural network calculations on the input data and outputs output data for the input data. The NN processing unit 112 performs neural network calculations using, for example, the neural network program, weights and biases, and parameters such as α, β, and γ of the activation function f(x) stored in the NN memory unit 124.

[0025] Various structures can be employed in neural networks. For example, for time series prediction, LSTM (Long Short-Term Memory), Peephole Connection LSTM, and GRU (Gated Recurrent Unit) may be used, while for image recognition, CNN (Convolutional Neural Network) may be used. In any case, a neural network includes an input layer, one or more hidden layers, and an output layer.

[0026] The NN processing unit 112 includes functions for performing calculations on each node of the input layer of the neural network, functions for performing calculations on each node of one or more intermediate layers (hidden layers), and functions for performing calculations on each node of the output layer.

[0027] The NN processing unit 112 performs an activation process, applying an activation function f(x) to the input data from the previous layer (i.e., the input layer or the previous intermediate layer), as a function of each node in each layer of the hidden layer. In addition to the activation process, the NN processing unit 112 may also perform convolution, decimation, or other processes.

[0028] The activation function f(x) is given by the following equation (1).

number

[0029] This activation function f(x) is a function whose output value is continuous for all input values, whose output value is uniquely determined for any given input value, and whose slope is smoothly continuous.

[0030] The form of the activation function f(x) for each node is common to all nodes, as shown in equation (1), but the parameters α, β, and γ are preferably independent for each node.

[0031] The NN processing unit 112 performs calculations that combine functions such as the softmax function, sigmoid function, and cross-entropy function as part of the output layer's functionality.

[0032] The learning unit 114 optimizes the parameters of the neural network to be optimized. The learning unit 114 calculates an error by comparing the output obtained by inputting the training data into the NN processing unit 112 and, consequently, into the neural network, with the correct values ​​corresponding to the training data. Based on the calculated error, the learning unit 114 calculates the gradient of the parameters to be optimized using methods such as gradient backpropagation, and updates the parameters of the neural network to be optimized based on an update algorithm. The update algorithm can be, for example, gradient descent, stochastic gradient descent, or momentum method. By repeatedly updating the parameters by the learning unit 114, the parameters to be optimized are optimized.

[0033] The parameters to be optimized include the weights and biases, as well as the parameters α, β, and γ of the activation function f(x) in equation (1). As mentioned above, the parameters α, β, and γ are preferably independent at each node, in which case the parameters α, β, and γ can be optimized to different values ​​at each node.

[0034] The initial values ​​of parameters α, β, and γ are set to, for example, "1".

[0035] The parameters α, β, and γ may satisfy 0 ≤ α, 0 < β ≤ 1, and 1 ≤ γ. In this case, gradient explosion is less likely to occur.

[0036] Furthermore, if the output error is E, the backpropagation errors of α, β, and γ are defined by the following equations (2) to (4).

number

number

number

[0037] The inference unit 116 interprets the output obtained by inputting the input data into the NN processing unit 112 and, consequently, into the trained neural network, and performs a predetermined inference process. The inference process may be, for example, time series prediction or image recognition.

[0038] The above describes the basic configuration of the NN device 100 according to the embodiment. Next, its operation will be explained.

[0039] Figure 2 is a flowchart showing an example of the operation of the NN device 100 during the learning phase.

[0040] The input data acquisition unit 110 acquires multiple training datasets (S10). Subsequent processing is performed sequentially for each of the multiple training datasets.

[0041] The NN processing unit 112 performs neural network calculations on the training data included in the training dataset and outputs output data for the training data (S12). The learning unit 114 updates the parameters to be optimized based on the output for the training data and the correct values ​​included in the training dataset (S14). In this update of the parameters to be optimized, in addition to the weights and biases, the parameters α, β, and γ of the activation function f(x) are also updated as targets for optimization.

[0042] The learning unit 114 determines whether or not to terminate the learning process (S16). The termination conditions for terminating the learning process include, for example, that parameter updates (i.e., learning) have been performed a predetermined number of times, that a termination instruction has been received from an external source, that the average value of the update amount of the parameters to be optimized has reached a predetermined value, or that the calculated error has fallen within a predetermined range. If the termination conditions are not met (N in S16), the process returns to S10. If the termination conditions are met (Y in S16), the process terminates.

[0043] Figure 3 is a flowchart showing an example of the operation of the NN device 100 during the inference phase.

[0044] The input data acquisition unit 110 acquires the input data to be inferred (S20). The NN processing unit 112 performs computational processing of the trained neural network on the input data and outputs output data for the input data (S22). The inference unit 116 interprets the output data and performs predetermined inference processing (S24).

[0045] Next, the effects of this embodiment will be explained. According to this embodiment, the activation function f(x) of the neural network is the function represented by equation (1). Verification conducted by the inventors has confirmed that the prediction accuracy is improved when using a trained neural network of the NN device 100 according to this embodiment, in which the activation function f(x) is the function of equation (1).

[0046] The present disclosure has been described above based on embodiments. These embodiments are illustrative, and it will be understood by those skilled in the art that various modifications are possible in combinations of their components and processing processes, and that such modifications are also within the scope of the present disclosure. Such modifications will be described below.

[0047] (Variation 1) In the embodiment described, the case in which the NN device 100 has both the function of a machine learning device and the function of an inference device was explained, but the NN device 100 may have only one of the functions of a machine learning device or an inference device.

[0048] For example, the NN device 100 may only have the functions of a machine learning device. In this case, the NN device 100 does not need to have an inference unit 116. The trained neural network learned by the NN device 100 can be provided to other inference devices and used by those other inference devices.

[0049] Furthermore, the NN device 100 may only have the functionality of an inference device. In this case, the NN device 100 does not need to have a learning unit 114. The NN device 100 can perform inference processing using a pre-trained neural network provided by another machine learning device, similar to the pre-trained neural network in the embodiment.

[0050] Any combination of the embodiments and modifications described above is also useful as an embodiment of this disclosure. The new embodiments resulting from such combinations possess the combined effects of the respective embodiments and modifications.

[0051] The above-described embodiment can be generalized to obtain the following configuration.

[0052] [Aspect 1] It includes a neural network processing unit that performs computational processing of a trained neural network on input data and outputs output data for said input data. The activation function f(x) of the aforementioned neural network is given by equation (1) using parameters α, β, and γ.

number

[0053] [Aspect 2] Equipped with a learning unit that trains neural networks, The activation function f(x) of the aforementioned neural network is given by equation (1) using parameters α, β, and γ.

number

[0054] [Aspect 3] The data processing system according to embodiment 1 or 2, wherein the parameters α, β, and γ are optimized independently for each node.

[0055] [Aspect 4] The data processing system according to any one of embodiments 1 to 3, wherein the parameters α, β, and γ satisfy 0 ≤ α, 0 < β ≤ 1, and 1 ≤ γ.

[0056] [Aspect 5] A program that causes a computer to perform computational processing of a trained neural network on input data and output output data for said input data, The activation function f(x) of the aforementioned neural network is obtained using parameters α, β, and γ.

number

[0057] [Aspect 6] A program that causes a computer to perform the function of learning a neural network, The activation function f(x) of the aforementioned neural network is obtained using parameters α, β, and γ.

number

[0058] 100 Data processing system, 112 Neural network processing unit, 114 Learning unit, 116 Inference unit.

Claims

1. It includes a neural network processing unit that performs computational processing of a trained neural network on input data and outputs output data for said input data. The activation function f(x) of the neural network is given by equation (1) using parameters α, β, and γ. [Math 1] It is expressed as follows, and the parameters α, β, and γ are optimized during learning. A neural network device.

2. Equipped with a learning unit that trains neural networks, The activation function f(x) of the neural network is given by equation (1) using parameters α, β, and γ. [Math 2] It is represented as, The learning unit optimizes the parameters α, β, and γ. A neural network device.

3. The neural network device according to claim 1 or 2, wherein the parameters α, β, and γ are optimized independently for each node.

4. The neural network device according to claim 1 or 2, wherein the parameters α, β, and γ satisfy 0 ≤ α, 0 < β ≤ 1, and 1 ≤ γ.

5. A program that causes a computer to perform computational processing of a trained neural network on input data and output output data for said input data, The activation function f(x) of the neural network is given by equation (1) using parameters α, β, and γ. [Math 3] It is expressed as follows, and the parameters α, β, and γ are optimized during learning. program.

6. A program that causes a computer to perform the function of learning a neural network, The activation function f(x) of the neural network is given by equation (1) using parameters α, β, and γ. [Math 4] It is represented as, The learning function optimizes the parameters α, β, and γ. program.