Electronic device and method for training neural network

By adjusting the output ranges of neural network layers to be substantially the same through standard deviation-based loss minimization, the method addresses inefficiencies in quantization, ensuring accurate and efficient computation in environments with limited computing power.

WO2026142349A1PCT designated stage Publication Date: 2026-07-0242DOT INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
42DOT INC
Filing Date
2025-12-24
Publication Date
2026-07-02

Smart Images

  • Figure KR2025022813_02072026_PF_FP_ABST
    Figure KR2025022813_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed are an electronic device and a method for training a neural network. The method for training a neural network, according to an embodiment, may include a step of obtaining an output of each of a first layer and a second layer included in the neural network. The method may include a step of training the neural network such that the range of the output of the first layer and the range of the output of the second layer become substantially the same. The output of the first layer and the output of the second layer may be input to a third layer included in the neural network.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic Devices and Neural Network Learning Methods

[0001] The following disclosure relates to an electronic device and a neural network learning method.

[0002] Quantization can refer to a technique for converting the data size of parameters (e.g., weights, biases, and / or activations) of a neural network into a fixed-point or lower-bit representation (e.g., 8-bit, and / or 4-bit). Although quantization results in some loss of information, it can reduce memory usage and increase computation speed.

[0003] For quantization, it is necessary to analyze the range of the quantization target (e.g., data) to find the minimum and maximum values ​​of the data. However, calculating the minimum and maximum values ​​of parameters for every layer of the neural network is inefficient, and this can be even more so in environments with limited computing power.

[0004] The background technology described above is possessed or acquired by the inventor in the process of deriving the content of the disclosure of the present application, and cannot necessarily be considered as prior art disclosed to the general public prior to the filing of this application.

[0005] A method for training a neural network according to one embodiment may include the operation of obtaining the outputs of each of the first layer and the second layer included in the neural network. The method may include the operation of training the neural network such that the range of the output of the first layer and the range of the output of the second layer become substantially the same. The output of the first layer and the output of the second layer may be input to a third layer included in the neural network.

[0006] The output of the first layer and the output of the second layer can be quantized to the same level and input to the third layer.

[0007] The above-mentioned training operation may include an operation to calculate the standard deviation of the output of the first layer. The above-mentioned training operation may include an operation to calculate the standard deviation of the output of the second layer. The above-mentioned training operation may include an operation to adjust the range of the output of the first layer and the range of the output of the second layer based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer.

[0008] The above adjusting operation may include an operation to calculate a loss function based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer. The above adjusting operation may include an operation to reduce the difference between the standard deviation of the output of the first layer and the standard deviation of the second layer based on the loss function.

[0009] The operation of reducing the difference between the standard deviation of the output of the first layer and the standard deviation of the second layer may include the operation of updating the parameters of the first layer and the parameters of the second layer so that the loss function is minimized.

[0010] The above loss function may include MSE (mean square error).

[0011] The first layer above may include one or more convolution layers and one or more normalization layers.

[0012] The second layer above may include one or more convolution layers and one or more normalization layers.

[0013] An electronic device for training a neural network according to one embodiment may include a processor. The electronic device may include a memory for storing instructions. The instructions may be executed individually or collectively by the processor to cause the electronic device to obtain the outputs of each of the first layer and the second layer included in the neural network. The instructions may be executed individually or collectively by the processor to cause the electronic device to train the neural network such that the range of the output of the first layer and the range of the output of the second layer become substantially the same. The output of the first layer and the output of the second layer may be input to a third layer included in the neural network.

[0014] The output of the first layer and the output of the second layer can be quantized to the same level and input to the third layer.

[0015] The above instructions may be executed individually or collectively by the processor to cause the electronic device to calculate the standard deviation of the output of the first layer. The above instructions may be executed individually or collectively by the processor to cause the electronic device to calculate the standard deviation of the output of the second layer. The above instructions may be executed individually or collectively by the processor to cause the electronic device to adjust the range of the output of the first layer and the range of the output of the second layer based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer.

[0016] The above instructions may be executed individually or collectively by the processor to cause the electronic device to calculate a loss function based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer. The above instructions may be executed individually or collectively by the processor to cause the electronic device to reduce the difference between the standard deviation of the output of the first layer and the standard deviation of the second layer based on the loss function.

[0017] The above instructions may be executed individually or collectively by the processor to cause the electronic device to update the parameters of the first layer and the parameters of the second layer so that the loss function is minimized.

[0018] The above loss function may include MSE (mean square error).

[0019] The first layer above may include one or more convolution layers and one or more normalization layers.

[0020] The second layer above may include one or more convolution layers and one or more normalization layers.

[0021] FIG. 1a is a diagram illustrating a deep learning computation method using an artificial neural network according to one embodiment.

[0022] FIG. 1b is a diagram illustrating a learning method for an artificial neural network model according to one embodiment.

[0023] FIG. 2 is a diagram illustrating a quantization method according to one embodiment.

[0024] FIG. 3 is a diagram illustrating a quantization method of a plurality of layers according to one embodiment.

[0025] FIG. 4 is a diagram illustrating a method for training a neural network so that the output range of a plurality of layers becomes substantially the same according to one embodiment.

[0026] FIG. 5 is an example of a flowchart of a neural network learning method according to one embodiment.

[0027] FIG. 6 is an example of an electronic device according to one embodiment.

[0028] Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments.

[0029] Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component.

[0030] When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or coupled with that other component, or that there may be other components in between.

[0031] The singular expression includes the plural expression unless the context clearly indicates otherwise. In this specification, terms such as "comprising" or "having" are intended to specify the existence of the described features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0032] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification.

[0033] Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted.

[0034] FIG. 1a is a diagram illustrating a deep learning computation method using an artificial neural network according to one embodiment.

[0035] Artificial intelligence (AI) algorithms, including deep learning, are characterized by inputting input data (10) into an artificial neural network (ANN) and learning output data (30) through operations such as convolution. An artificial neural network may refer to a computational architecture that models a biological brain. Within an artificial neural network, nodes corresponding to neurons of the brain are connected to each other and operate collectively to process input data. Examples of various types of neural networks include, but are not limited to, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Belief Networks (DBN), and Restricted Boltzmann Machines (RBM). In a feed-forward neural network, the neurons of the neural network have links with other neurons. These connections can be extended in one direction, for example, in the forward direction, through a neural network.

[0036] Referring to FIG. 1a, an artificial neural network structure is illustrated in which input data (10) is input into the artificial neural network, and output data (30) is output through the artificial neural network (e.g., Convolutional Neural Network (CNN) (20)) which includes one or more layers. The artificial neural network may be a deep neural network having two or more layers.

[0037] A convolutional neural network (20) can be used to extract "features" such as borders, line colors, etc. from input data (10). The convolutional neural network (20) may include multiple layers. Each layer can receive data and process the data input to that layer to generate data output from that layer. The data output from the layer may be a feature map generated by convolutional operation with the weight values ​​of one or more filters on an image or feature map input to the convolutional neural network (20). The initial layers of the convolutional neural network (20) may be operated to extract low-level features such as edges or gradients from the input. The subsequent layers of the convolutional neural network (20) may extract progressively more complex features such as eyes, noses, etc. within the image.

[0038] FIG. 1b is a diagram illustrating a learning method for an artificial neural network model according to one embodiment.

[0039] Referring to FIG. 1b, the learning device (100) corresponds to a computing device having various processing functions such as generating a neural network, training (or learning) a neural network, or retraining a neural network. For example, the learning device (100) can be implemented as various types of devices such as a PC (personal computer), a server device, or a mobile device.

[0040] A learning device (100) can generate one or more trained neural networks (110) by repeatedly training (learning) a given initial neural network. Generating one or more trained neural networks (110) may mean determining neural network parameters. Here, the parameters may include various types of data input to / output to the neural network, such as input / output activations, weights, biases, etc. As the repeated training of the neural network progresses, the parameters of the neural network may be tuned to compute a more accurate output for a given input.

[0041] Below, we will describe a method for quantizing the output (e.g., output activation) of a layer included in a neural network (110).

[0042] FIG. 2 is a diagram illustrating a quantization method according to one embodiment.

[0043] Referring to FIG. 2, the output (e.g., output activation) (210) of a layer included in a neural network (e.g., neural network 110 of FIG. 1b) can be quantized through a quantization function. When the output (210) is quantized, it can be converted from a range (e.g., a range from a minimum value (230) to a maximum value (240)) to a range (e.g., a range from a minimum value (250) to a maximum value (260)), as in the quantization result (220).

[0044] The scale factor of the quantization function can be determined according to the range of the output (210). The scale factor may be a ratio for matching the range of the output (210) to the quantization range. To quantize the output (210), it may be necessary to determine the minimum value (230) and maximum value (240) of the output (210) and measure the range of the output (210) (e.g., a range from the minimum value (230) to the maximum value (240)). The scale factor can be calculated by methods such as dividing the range of the output (210) into a defined quantization range (e.g., a range from the minimum value (250) to the maximum value (260)). Through the scale factor obtained in this way, the output (210) can be quantized to match the quantization range.

[0045] At this time, if multiple layers included in the neural network (110) are all quantized to the same quantization range, the following problems may occur. Calculating the output ranges of each of the multiple layers included in the neural network (110) can be quite inefficient. Furthermore, even if the output ranges are calculated individually, if the outputs of multiple layers are used as inputs to the same layer and the output ranges of the multiple layers are different, the quantization performance may be significantly reduced. For example, it is assumed that the outputs of multiple layers (e.g., first layer and / or second layer) are combined into a single channel (e.g., third layer) through operations (e.g., add and / or concat), such as in a feature pyramid network (FPN). The output of the first layer may correspond to a range of (-128, 127) in FP32 precision, and the output of the second layer may correspond to a range of (-0.5, 0.5) in FP32 precision. When the output of the first layer and the output of the second layer are quantized into a range of INT(integer) 8, the quantization result of the first layer has a resolution of 1, but the quantization result of the second layer may have a resolution of 1 / 256. In this case, when an additive operation is performed on the quantization result of the first layer and the quantization result of the second layer, the influence of the quantization result of the first layer on the result of the second layer is very minimal, so information from the second layer may be lost. As such, when the outputs of multiple layers are quantized and connected as the input of a single layer, if the output ranges of the multiple layers differ significantly from each other, the influence of the layer with the smaller output range may be weakened.

[0046] Below, with reference to FIGS. 3 and FIGS. 4, we will explain a method for adjusting the output range of multiple layers to prevent problems such as the degradation of quantization performance described above.

[0047] FIG. 3 is a diagram illustrating a quantization method of a plurality of layers according to one embodiment.

[0048] Referring to FIG. 3, a neural network (e.g., the neural network (110) of FIG. 1b) may include a plurality of layers (e.g., a first layer (310) to a third layer (350)). As illustrated in FIG. 3, for convenience of explanation, the description is based on the first layer (310) to the third layer (350), but is not limited thereto.

[0049] A learning device (e.g., the learning device (100) of FIG. 1b) can adjust the output range of the first layer (310) and the output range of the second layer (320). The layer whose output range is adjusted may be a layer having a structure in which the input and / or output are different branches, or the output becomes the input of a different layer (e.g., the third layer (350)), such as the first layer (310) and the second layer (320) shown in FIG. 3. The output (315) of the first layer (310) and the output (325) of the second layer (320) may be quantized to the same level and input to the third layer (350). Being quantized to the same level may mean being quantized to the same quantization range or being quantized to the same resolution. For example, if the range of output (315) is (-n, n) and the range of output (325) is (-m, m), both outputs (315, 325) can be quantized into (x, y). In this case, if the difference between n and m is large, problems such as degradation of quantization performance as described in FIG. 2 may occur. Therefore, below, we will describe a method for training a neural network (110) so that n and m become substantially the same.

[0050] The first input (301) and the second input (302) may be feature maps of different modalities, such as image and / or radar, but may also be feature maps of the same modality.

[0051] The first layer (310) may include a plurality of layers. The first layer (310) may include, for example, one or more convolution layers and one or more normalization layers. The normalization layer may include a batch normalization layer. The second layer (320) may also be configured substantially the same as the first layer (310), so a redundant description will be omitted.

[0052] The learning device (100) can obtain outputs (315, 325) of the first layer (310) and the second layer (320), respectively. The learning device (100) can obtain a first output (315) through the first layer (310) based on a first input (301). The first output (315) may be an output activation of the first layer (310). The first output (315) may be normalized by a normalization layer (e.g., batch normalization layer) included in the first layer (310). The learning device (100) can generate a second output (325) through the second layer (320) based on a second input (302). The second output (325) may be an output activation of the second layer (320). The second output (325) can be normalized by a normalization layer (e.g., batch normalization layer) included in the second layer (320).

[0053] The learning device (100) can train the neural network (110) so that the range of the first output (315) and the range of the second output (325) become substantially the same. The learning device (100) can calculate a loss function (330) through the standard deviations of the first output (315) and the second output (325), respectively, so that the range of the first output (315) and the range of the second output (325) become substantially the same. The loss function (330) may include the mean square error (MSE). For example, the loss function (330) may include the MSE between the standard deviation of the first output (315) and the standard deviation of the second output (325).

[0054] The learning device (100) can update (or adjust) the parameters of the first layer (310) and the second layer (320) to reduce the difference in the standard deviation of each of the first output (315) and the second output (325) based on the loss function (330). Hereinafter, with reference to FIG. 4, a method for updating the parameters of the first layer (310) and the second layer (320) through a loss function based on the standard deviation of the outputs (315, 325) will be described.

[0055] FIG. 4 is a diagram illustrating a method for training a neural network so that the output range of a plurality of layers becomes substantially the same according to one embodiment.

[0056] Referring to FIG. 4, a learning device (e.g., the learning device (100) of FIG. 1b) can calculate the standard deviation (410) of the output (315) of the first layer (310). The learning device (100) can calculate the standard deviation (420) of the output (325) of the second layer (320). For example, the standard deviations (410, 420) can be calculated using the following Equation 1.

[0057]

[0058] In mathematical formula 1, represents the standard deviation, and represents output (315, 325), and N represents the number of outputs (315, 325), represents the average of the output (315, 325).

[0059] The learning device (100) can adjust the range of the output (315) of the first layer (310) and the range of the output (325) of the second layer (320) based on the standard deviation (410) and the standard deviation (420). The learning device (100) can calculate a loss function (430) (e.g., the loss function (330) of FIG. 3) based on the standard deviation (410) and the standard deviation (420). The learning device (100) can calculate the loss function (430) by squaring the error (or difference) between the standard deviation (410) and the standard deviation (420) and averaging them.

[0060] The learning device (100) can reduce the difference between the standard deviation (410) of the output (315) of the first layer (310) and the standard deviation (420) of the output (325) of the second layer (320) based on the loss function (430). The learning device (100) can update (or adjust) the parameters of the first layer (310) and the parameters of the second layer (320) so that the loss function (430) is minimized. As the loss function (430) (e.g., MSE between the standard deviation (410) and the standard deviation (420)) is minimized, the difference between the standard deviation (410) and the standard deviation (420) can be minimized. As the difference between the standard deviation (410) and the standard deviation (420) is minimized, the difference between the range of the output (315) and the range of the output (325) can be minimized. That is, the range of output (315) and the range of output (325) can be substantially the same.

[0061] If the range of output (315) and the range of output (325) are substantially the same through the learning method of the neural network (110) described above (e.g., adjusting the parameters of the first layer (310) and the second layer (320) so that the loss function (430) is minimized), then even if the output (315) and the output (325) are quantized to the same level (e.g., quantized according to the same quantization range or quantized with the same resolution) and computation (e.g., Add and / or Concat) is performed, neither of the two layers (310, 320) may suffer any loss of information due to quantization.

[0062] Through FIGS. 3 and 4, a method for training a neural network (110) so that the output ranges of the first layer (310) and the second layer (320) become substantially the same has been described. Although the above description was based on two layers for convenience of explanation, the number of layers is not limited to two, and can be applied substantially the same way in situations where multiple layers are combined (e.g., situations where the outputs of multiple layers become the input of a single layer).

[0063] FIG. 5 is an example of a flowchart of a neural network learning method according to one embodiment.

[0064] Referring to FIG. 5, operations 510 and 530 may be performed sequentially, but are not limited thereto. For example, the two operations may be performed in parallel. Operations 510 and 530 may be substantially identical to the operations of the learning device (e.g., the learning device (100) of FIG. 1b) described with reference to FIG. 1 through 4. Accordingly, a detailed description is omitted.

[0065] In operation 510, the learning device (100) can obtain the outputs of the first layer and the second layer, respectively, included in the neural network. The outputs of the first layer and the second layer can be input to the third layer. The outputs of the first layer and the second layer can be quantized to the same level (e.g., the same minimum-maximum range and / or resolution) and input to the third layer.

[0066] In operation 530, the learning device (100) can train the neural network so that the range of the output of the first layer and the range of the output of the second layer become substantially the same.

[0067] FIG. 6 is an example of an electronic device according to one embodiment.

[0068] Referring to FIG. 6, the electronic device (600) may include a memory (610) and a processor (630). The description with reference to FIG. 1a through 5 may be applied in the same way to FIG. 6. For example, the learning device (100) of FIG. 1b may be the electronic device (600).

[0069] The memory (610) can store instructions (e.g., programs) executable by the processor (630). For example, the instructions may include instructions for executing the operation of the processor (630) and / or the operation of each component of the processor (630).

[0070] The memory (610) can be implemented as a volatile memory device or a non-volatile memory device.

[0071] Volatile memory devices can be implemented as DRAM (dynamic random access memory), SRAM (static random access memory), T-RAM (thyristor RAM), Z-RAM (zero capacitor RAM), or TTRAM (Twin Transistor RAM).

[0072] Non-volatile memory devices can be implemented as EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, MRAM (Magnetic RAM), Spin-Transfer Torque (STT)-MRAM, Conductive Bridging RAM (CBRAM), FeRAM (Ferroelectric RAM), PRAM (Phase change RAM), Resistive RAM (RRAM), Nanotube RRAM, Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), holographic memory, Molecular Electronic Memory Device, or Insulator Resistance Change Memory.

[0073] The processor (630) can process data stored in memory (610). The processor (630) can execute computer-readable code (e.g., software) stored in memory (610) and instructions triggered by the processor (630).

[0074] The processor (630) may be a data processing device implemented in hardware having a circuit having a physical structure for executing desired operations. For example, the desired operations may include code or instructions included in a program.

[0075] For example, a data processing device implemented in hardware may include a microprocessor, a central processing unit, a processor core, a multi-core processor, a multiprocessor, an Application-Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).

[0076] The processor (630) can cause the electronic device (600) to perform one or more operations by executing code and / or instructions stored in memory (610). The operations performed by the electronic device (600) may be substantially the same as the operations performed by the learning device (100) described with reference to FIGS. 1a through 5. Such redundant descriptions are omitted.

[0077] The embodiments described above may be implemented as hardware components, software components, and / or combinations of hardware and software components. For example, the devices, methods, and components described in the embodiments may be implemented using a general-purpose computer or a special-purpose computer, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.

[0078] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or command the processing unit independently or collectively. Software and / or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal wave so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and may be stored or executed in a distributed manner. Software and data may be stored on computer-readable recording media.

[0079] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may store program instructions, data files, data structures, etc., either alone or in combination, and the program instructions recorded on the medium may be those specifically designed and configured for the embodiment or may be those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

[0080] The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

[0081] Although the embodiments have been described above with reference to the limited drawings, those skilled in the art can apply various technical modifications and variations based thereon. For example, suitable results may be achieved even if the described techniques are performed in a different order than described, and / or if the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.

[0082] Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims set forth below.

Claims

1. Regarding the method of training a neural network, An operation to obtain the output of each of the first layer and the second layer included in the above neural network; and The operation of training the neural network so that the output range of the first layer and the output range of the second layer become substantially the same. Includes, The output of the first layer and the output of the second layer are, A method input to a third layer included in the above neural network.

2. In Paragraph 1, The output of the first layer and the output of the second layer are, A method in which quantization to the same level is input to the third layer.

3. In Paragraph 1, The above-mentioned learning operation is, An operation to calculate the standard deviation of the output of the first layer above; The operation of calculating the standard deviation of the output of the second layer above; and An operation to adjust the range of the output of the first layer and the range of the output of the second layer based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer. A method including 4. In Paragraph 3, The above-mentioned adjusting operation is, An operation to calculate a loss function based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer; and Based on the above loss function, an operation to reduce the difference between the standard deviation of the output of the first layer and the standard deviation of the output of the second layer. A method including 5. In Paragraph 4, The operation of reducing the difference between the standard deviation of the output of the first layer and the standard deviation of the second layer is, An operation to update the parameters of the first layer and the parameters of the second layer so that the above loss function is minimized A method including 6. In the fourth, The above loss function is, MSE (mean square error) A method including 7. In the first, The above first layer is, One or more convolution layers and one or more normalization layers A method including 8. In the first, The above second layer is, One or more convolution layers and one or more normalization layers A method including 9. In an electronic device for training a neural network, processor; and Memory that stores instructions Includes, The above instructions are executed individually or collectively by the processor, causing the electronic device, Obtaining the outputs of each of the first layer and the second layer included in the above neural network, The neural network is trained so that the output range of the first layer and the output range of the second layer become substantially the same, and The output of the first layer and the output of the second layer are, An electronic device input to the third layer included in the above neural network.

10. In Paragraph 9, The output of the first layer and the output of the second layer are, An electronic device that is quantized to the same level and input to the third layer.

11. In Paragraph 9, The above instructions are executed individually or collectively by the processor, causing the electronic device, Calculate the standard deviation of the output of the first layer above, and Calculate the standard deviation of the output of the second layer above, and An electronic device that adjusts the range of the output of the first layer and the range of the output of the second layer based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer.

12. In Paragraph 11, The above instructions are executed individually or collectively by the processor, causing the electronic device, Based on the standard deviation of the output of the first layer and the standard deviation of the output of the second layer, a loss function is calculated, and An electronic device that reduces the difference between the standard deviation of the output of the first layer and the standard deviation of the second layer based on the above loss function.

13. In Paragraph 12, The above instructions are executed individually or collectively by the processor, causing the electronic device, An electronic device that updates the parameters of the first layer and the parameters of the second layer so that the above loss function is minimized.

14. In the 12th, The above loss function is, MSE (mean square error) An electronic device including 15. In the 9th, The above first layer is, One or more convolution layers and one or more normalization layers An electronic device including 16. In the 9th, The above second layer is, One or more convolution layers and one or more normalization layers An electronic device including