Method for training a neural network, and corresponding classification method and computer program
By determining a majority class and using sparse data structures to convert label matrices for storage in RAM, the method addresses the time and memory constraints of neural network training, achieving substantial time and cost reductions in training neural networks for image or sound classification.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- THALES SA
- Filing Date
- 2025-12-22
- Publication Date
- 2026-07-02
AI Technical Summary
The existing methods for training neural networks for image or sound classification, particularly for segmentation tasks, face significant time and memory constraints due to the need to repeatedly access annotations from mass memory, which are not feasible to store entirely in RAM, leading to prolonged learning phases.
A method that involves determining a majority class, assigning intermediate class numbers, and using a sparse data structure to convert initial label matrices into intermediate label matrices, allowing storage in RAM, thereby reducing memory requirements and enabling faster training by direct access from RAM.
This approach significantly reduces training time and memory costs by enabling direct access to label data from RAM, resulting in a time saving proportional to the number of learning epochs, batches, and the difference in access times between storage and RAM, with potential time savings of up to several seconds.
Smart Images

Figure EP2025088801_02072026_PF_FP_ABST
Abstract
Description
[0001] DESCRIPTION
[0002] TITLE: METHOD FOR TRAINING A NEURAL NETWORK, CLASSIFICATION METHOD AND CORRESPONDING COMPUTER PROGRAM
[0003] The present invention relates to a method for training a neural network configured for the classification of input data, in particular images, a method for classifying input data using such a neural network, and the corresponding computer program.
[0004] In the field of artificial intelligence, and in particular the automatic analysis of data such as image or sound files, it is known to train neural networks by deep learning, in order to generate a classification model.
[0005] As is known, the classification model is performed on a neural network, presenting an initial architecture chosen in terms of the number of layers, the number of neurons per layer, and the types of activation functions.
[0006] The values of the neural network parameters, including synaptic weights and any biases, are adjusted during an adaptation phase of the neural network, also called the learning phase.
[0007] The learning phase is carried out using a set of digital input files and respective annotations, including information relating to the classes to be assigned to these files.
[0008] The annotation of the input files provided for the learning phase is carried out by an operator, based on a set of classes that the operator chooses or that is provided to him.
[0009] The classification model obtained at the end of the learning phase allows the automatic classification of new input files during a subsequent use phase of the classification model, also called the inference phase.
[0010] Input file annotation can be performed at different scales.
[0011] It is therefore possible to assign a single class to the entirety of an input file to be classified, for example, to each image provided as input to the neural network. Typically, in the field of image analysis, any image containing a car would be assigned the class "car". It is also possible to process each individual piece of data in the input file in such a way as to assign a class to each of these individual data points. In the case where the input file is an image, a class would be assigned to each pixel of the image. In this alternative, the classification is called segmentation.
[0012] Segmentation is called "semantic" if no distinction is made between different instances of the same class. Segmentation is called "instance" if, on the contrary, it allows the identification of individual components belonging to a single instance of a given class.
[0013] For example, if the input files are images and the classes to be assigned are the road class and the car class, instance segmentation will separate the pixels corresponding to a given car from the pixels belonging to other cars, unlike semantic segmentation.
[0014] The learning phase for semantic classification requires processing large volumes of data, since each input file is in bijection with the corresponding annotation.
[0015] Thus, in the example of image-type files, if the input images include L*H pixels, where L is the image width and H is its height, expressed in terms of the characteristic dimension of a pixel, the corresponding annotations include L*H class values, each pixel being able to include several sub-pixels if the sensor includes multiple recording channels.
[0016] It is therefore understood that it is most often not possible for the learning phase to place all the annotations of the training dataset in RAM associated with a processor on which the learning phase is implemented, nor a fortiori all the input images themselves.
[0017] As a result, the processor must repeatedly retrieve the information corresponding to the annotations from mass memory, for which the read and write access time is much higher than for RAM.
[0018] The learning phase is therefore particularly time-consuming in the case of classification by segmentation, whether it is semantic or instance segmentation.
[0019] One aim of the invention is therefore to propose a method for training a neural network for the purpose of classifying files by segmentation for which the cost in time, and optionally in memory, is reduced, including when no a priori simplification of the input data is possible.
[0020] To this end, the invention relates to a method for training a neural network configured for the classification of matrix input data, comprising: a) the initialization of a neural network,
[0021] b) the provision of an initial training dataset comprising a plurality of matrix pairs, each matrix pair comprising an input matrix containing input data and a respective initial label matrix of the same dimension as the input matrix and each term of which is an initial class number representing a class of a respective term of the input matrix, said class being selected from a set of predetermined classes,
[0022] c) training from the initial set, a training set, a final validation set, and an intermediate validation set; d) iterative training of the neural network using the training set and the intermediate validation set; and
[0023] e) the final validation of the trained neural network using the final validation set,
[0024] characterized in that:
[0025] training c) of the training sets, final validation and intermediate validation includes:
[0026] i) the determination, for each class in the set of predetermined classes, of a frequency indicator representative of the total number of occurrences of that class in the set of terms of the initial label matrices of the initial set,
[0027] ii) the selection of a majority class, which is the class with the highest frequency indicator,
[0028] iii) assigning each class a respective intermediate class number such that the intermediate class number of the majority class is equal to zero, and
[0029] (iv) the formation of an intermediate training set from the initial set, by converting each initial label matrix into an intermediate label matrix whose terms are the intermediate class numbers corresponding to the initial class numbers of the initial label matrix, the intermediate label matrices being stored in memory using an intermediate data structure different from an initial data structure used for storing the initial label matrices in memory,
[0030] said training, final validation and intermediate validation sets being formed from the intermediate set.
[0031] Determining the majority class allows the conversion of the initial label matrices into intermediate label matrices, without any prior assumptions about the distribution of classes in the training set being necessary.
[0032] Unlike the input matrices and the initial label matrices, the intermediate label matrices in many cases have a significant proportion of nuisance terms, so they can be stored using a specifically adapted intermediate data structure.
[0033] In particular, data structures adapted for the representation of so-called sparse matrices can be used as an intermediate data structure.
[0034] Intermediate label matrices therefore have a lower, or even much lower, memory cost than the initial label matrices. Consequently, converting the initial label matrices into intermediate label matrices makes it possible in many cases to store all the intermediate label matrices needed for the iterative training of the neural network in RAM.
[0035] Unlike a conventional training method, the processor can then read the data relating to the labels directly from RAM instead of having to read them from storage memory.
[0036] It is therefore understood that the process according to the invention allows a time saving for the iterative training step which is on the order of the product NE*B*At aCThis is based on the number of learning epochs NE, the number B of input matrices used in each epoch, and the difference between the access time to storage memory and the access time to RAM. This time saving is clearly very significant.
[0037] Furthermore, the process also allows for a saving in reading time of relevant information in the label matrices, this time being shorter for the initial label matrices than for the intermediate label matrices thanks to the use of the intermediate data structure.
[0038] According to other advantageous aspects of the invention, the method comprises one or more of the following features, taken individually or in all technically possible combinations:
[0039] - the process includes before the iterative training d) of the neural network: c2) a step of choosing a RAM exchanging data with a processor used for the implementation of the process having sufficient memory space for storing information relating to intermediate label matrices and storing this information in said RAM for use during the iterative training;- during iterative training, at least one set of estimated intermediate label matrices is obtained as output from the neural network in response to the input of the neural network of a set of input matrices chosen from the intermediate set, and is compared either directly with the respective intermediate label matrices, or with the initial label matrices after inverse conversion of the estimated intermediate label matrices into respective estimated initial label matrices, said label matrices being stored in RAM;
[0040] - the RAM is chosen at the selection stage from a RAM memory dedicated to the processor and a VRAM memory optimized for managing graphics data;
[0041] - VRAM memory is chosen in preference to RAM memory;
[0042] - if the RAM selection step includes, if necessary:
[0043] * the determination of a set of meta-classes from the set of predetermined classes based on the determined majority class,
[0044] * the formation of secondary label matrices whose terms are the meta-class numbers corresponding to the initial class numbers of the initial label matrix, and * the reiteration of steps i), ii) and iii) from the secondary label matrices before step iv) of formation of the intermediate set (68);
[0045] - the input matrices are obtained beforehand using a digital image sensor or a digital sound sensor;
[0046] - the intermediate data structure is specifically adapted for sparse matrices.
[0047] The invention also relates to a method of data classification by segmentation using a neural network obtained at the end of the training process as described above.
[0048] The invention finally relates to a computer program product comprising software instructions which, when executed by a computer, implement the training method as defined above.
[0049] The invention will become clearer upon reading the following description, given solely by way of non-limiting example, and made with reference to the drawings in which:
[0050] [Fig. 1] Figure 1 schematically represents an electronic classification device on which the method according to the invention can be implemented.
[0051] [Fig. 2] Figure 2 schematically represents a particular example of a neural network implemented by the electronic classification device of Figure 1. [Fig. 3] Figure 3 represents in the form of a logic diagram the training method according to the invention for training the neural network of Figure 2 and the classification method implemented at the end of the training method.
[0052] The invention relates to a method for training 10 a neural network 15 implemented by means of an electronic classification device 20.
[0053] The electronic classification device 20 is described with reference to Figure 1. The electronic classification device 20 includes, for example, at least one processor 25 exchanging data with at least one mass storage 30 and at least one random access memory 35.
[0054] The processor 25 is for example chosen from a central processing unit (CPU) and / or a graphics processing unit (GPU).
[0055] Mass memory 30 is a non-volatile memory, with a storage capacity suitable for implementing the training process 10 and optionally for the subsequent classification of at least one matrix input data to be classified 45.
[0056] Mass storage 30 is preferably optimized in terms of processor access time 25 to the data it stores for reading, and optionally for writing.
[0057] As an example, mass storage 30 is of the SSD type (in English, "Solid State Drive").
[0058] Mass memory 30 is configured to receive and store software instructions to be executed by the processor 25 for the implementation of the training process 10 which will be described later, and optionally for the classification of at least one matrix input data to be classified 45.
[0059] Mass storage 30 is also configured to receive and store an initial set 50 of training data and optionally at least one input data to be classified 45.
[0060] The matrix input data to be classified 45 includes information relating to a physical object obtained by means of a sensor and stored electronically by means of a matrix data structure.
[0061] Each matrix input data to be classified 45 is a matrix of dimension NI*N2, where Ni and N2 are integers greater than or equal to 1 and such that at most one of Ni and N2 is equal to 1. Each term of the matrix comprises C sub-terms, where C denotes a number of sensor channels. Each term is therefore a singlet in cases where C is equal to 1 and a multiplet in cases where C is strictly greater than 1. Each matrix input data to be classified 45 is obtained using a suitable digital sensor, for example, a digital image sensor or a digital sound sensor.
[0062] In a first embodiment, the matrix input data to be classified 45 is stored in mass memory 30 in a matrix image file of dimension NI*N2. Ni, respectively N2, represents in this case an integer number of pixels along a longitudinal direction of the digital image sensor by means of which the file was obtained, Ni and N2 being strictly greater than 1. Each term of the corresponding input matrix then encodes, for example as one or more floating-point numbers, for example a triplet of floating-point numbers, (in English, "float"), a color level on one or more channels, for example three channels.
[0063] In this first embodiment, the number of channels C of the image sensor is, for example, equal to 1 for a grayscale image, 3 for an RGB color image, or even greater than three for a multispectral, hyperspectral, or ultraspectral sensor. In the case of an RGB color image, each pixel therefore comprises three subpixels, so that each term of the raster input data to be classified is a triplet of floating-point numbers. The raster format is chosen, for example, from among JPEG, PNG, GIF, TIF, and BMP.
[0064] In a second embodiment, the matrix input data to be classified 45 is stored in mass storage 30 in a matrix-format audio file of dimension NI*N2. In a particular embodiment, one of Ni and N2 is then equal to 1, the terms of the corresponding matrix, reduced to a row vector or a column vector, representing a sample of a digital signal obtained by means of a digital sound sensor. The position of any term of the matrix then represents an acquisition time of the sample of the sound signal represented by that term.
[0065] In this second embodiment, the number of channels C of the sound sensor is for example equal to 1 or strictly greater than 1. In the case of a sound acquired by means of a multichannel sound sensor, each element of the matrix data 45 to be classified therefore comprises as many sub-terms as channels, so that each term of the matrix input data to be classified 45 is a multiplet of floating-point numbers.
[0066] The raster data to be classified 45 is, for example, a medical image obtained on a patient using a medical imaging device such as a 2D digital radiography device, a scanner or a magnetic resonance imaging device. Alternatively, the raster data to be classified 45 is an image file of a portion of space intended to be traversed by an autonomous vehicle or one equipped with a driver assistance system.
[0067] The initial set 50 of training data comprises a plurality of matrix pairs 51.
[0068] Each pair of matrices 51 comprises an input matrix 51A containing input data and a respective initial label matrix 51B of the same dimension as the input matrix 51A.
[0069] The data format used to store the input matrices 51 A in mass memory 30 is preferably the same as that of the matrix input data to be classified 45.
[0070] Thus, in the case of image classification, or audio file classification respectively, the input matrices 51A are preferably stored in memory 30 as image files, or audio files respectively, of the same format as that used for the matrix input data to be classified 45.
[0071] Each initial label matrix 51 B is of the same dimension Ni*N2 as the respective input matrix 51 A.
[0072] Each term in the initial label matrix 51B is an initial class number numj, representing a respective class of the corresponding term in the input matrix 51A selected from a set {C} of N predetermined classes Ci.
[0073] The classes Ci are chosen a priori by an operator who annotates the input matrices 51 A upstream of the training process 10 to generate the initial label matrices 51 B.
[0074] Each initial label matrix 51 B is therefore formed at the end of an annotation step of the segmentation type of annotation of a respective input matrix 51A, that is to say by assigning a class Ci to each term of the respective input matrix 51A, the annotation step being carried out by an operator upstream of the implementation of the training process 10.
[0075] Therefore, each class Ci is represented by a respective class number numj, preferably a positive integer.
[0076] The initial label matrices 51 B are stored in mass memory 30 using a suitable matrix data structure, for example the numpy ndarray type available in the NumPy library of the Python language or the torch.Tensor type available in the PyTorch library of the Python language.
[0077] The random access memory 35 is a volatile memory configured to temporarily store data from the mass storage 30 for the implementation of the training process 10 and optionally for the subsequent classification of at least one input data to be classified 45.
[0078] The RAM 35 includes, for example, at least one RAM type memory (in English, "Random Access Memory") and / or one VRAM type memory (in English, "Video Random Access Memory").
[0079] 35GB of RAM is advantageously chosen depending on the nature of the input data.
[0080] In particular, in the case where the matrix input data to be classified 45 is an image, the RAM 35 is preferably of the VRAM type.
[0081] The classification device 20 is configured to implement the training method 10 of the neural network 15, an example of which is shown in Figure 2.
[0082] The neural network 15 comprises an ordered succession of 55-i layers (i an integer between 1 and k, with k greater than or equal to three) of 60 neurons, each of which takes its inputs from the outputs of the previous layer.
[0083] More specifically, each 55-i layer comprises respective 60 neurons, taking their inputs from the outputs of the 60 neurons of the previous 55-(i-1) layer as appropriate, or receiving input data for the first 55-1 layer.
[0084] In the example in Figure 2, the neural network 15 includes an input layer 55-1, two hidden layers 55-2 and 55-3 and an output layer 55-4.
[0085] Alternatively, more complex neural network structures can be considered. In this case, a given 55-i layer can be linked to a 55-j layer further away than the immediately preceding 55-(i-1) layer.
[0086] Each neuron 60 is also associated with an operation, that is to say a type of processing, to be carried out by said neuron within the corresponding processing layer.
[0087] Each 55-i layer is connected to the other 55-j layers by one or more synapses 65.
[0088] A synaptic weight is associated with each synapse 65, and each synapse forms a link between two respective neurons 60. Each synaptic weight is, for example, a real number or a complex number.
[0089] Each neuron 60 is specific to:
[0090] - perform a weighted sum of the value(s) received from the neurons 60 of the previous layer, each value then being multiplied by the respective synaptic weight of the corresponding synapse, then
[0091] - apply an activation function, typically a non-linear function, to said weighted sum, and- deliver at the output of said neuron 60 a value resulting from the application of the activation function.
[0092] The activation function allows for the introduction of non-linearity in the processing carried out by each neuron 60. The sigmoid function, the hyperbolic tangent function, the Heaviside function are examples of activation functions.
[0093] As an optional additional step, each neuron 60 is also capable of applying an additive factor, also called a bias, to the output of the activation function. The value delivered at the output of said neuron 60 is then the sum of the bias value and the value resulting from the activation function.
[0094] The training process 10 aims to adjust the synaptic weights, including any biases, so that the neural network 15 performs the classification of the matrix input data 45 with the lowest possible error rate.
[0095] The training method 10, described with reference to Figure 3, comprises: a) the initialization 110 of the neural network 15,
[0096] b) the provision 120 of the initial set 50 of training data comprising a plurality of matrix pairs 51, each matrix pair 51 comprising an input matrix 51A and a respective initial label matrix 51B of the same dimension as the input matrix 51A and each term of which is the initial class number numj representing the class Ci of the respective term of the input matrix 51A selected from the set {Ci} of predetermined classes,
[0097] c) training 130 from the initial set 50, a training set 65, a final validation set 66 and an intermediate validation set 67, d) iterative training 135 of the neural network 15 using the training set 65 and the intermediate validation set 67, and
[0098] e) the final validation 140 of the neural network 15 trained using the final validation set 66.
[0099] Initialization 110 includes:
[0100] - the choice of a type of neural network 15, including the choice of the different layers 55-i and the synapses 65 that link their respective neurons 60,
[0101] - the choice of activation functions for the different neurons 60, and
[0102] - the initialization of synaptic weights, for example in a random manner, and where appropriate biases, of a neural network 15.
[0103] At the end of the initialization step 110, an initial neural network 15_in is thus formed, whose representative information is stored in the RAM 35 for the subsequent steps. The provision 120 of the initial set 50 includes the storage of the initial set 50 on the mass memory 30.
[0104] Initialization 110 and supply 120 can be performed in any order or simultaneously.
[0105] The training 130 of the training sets 65, the intermediate validation set 67 and the final validation set 66 is performed by the processor 25 and includes:
[0106] i) the determination 130-A for each class Ci of the set {C} of predetermined classes of a frequency indicator fi representative of the total number of occurrences of this class Ci in the set of terms of the initial label matrices 51 B of the initial set 50, by reading these terms in mass memory 30.
[0107] ii) the 130-B selection of a majority class C ma j, which is the class Ci whose respective frequency indicator (fi) (hereafter designated by f ma j) is the highest (or any one of these classes if there are several),
[0108] iii) the assignment 130-C to each of the classes Ci of a respective intermediate class number numjntj, such that the intermediate class number num_int_maj of the majority class C ma j equals zero,
[0109] (iv) the formation 130-D of an intermediate training set 68 from the initial set 50, by converting each initial label matrix 51 B into an intermediate label matrix 51 C whose terms are the intermediate class numbers numjntj corresponding to the initial class numbers numj of the initial label matrix 51 B, the intermediate label matrices 51 C being stored in memory by means of an intermediate data structure different from the initial data structure used for the in-memory storage of the initial label matrices, said training sets 65, intermediate validation sets 67 and final validation sets 66 being formed from the intermediate set 68.
[0110] For determination 130-A, the set of terms from the initial label matrices 51 B are read by the processor 25 into mass memory 30, so as to count the total number of occurrences of each of the classes Ci among these terms.
[0111] The frequency indicator fi of the class Ci is for example equal to the total number of occurrences of this class in the set of terms of the initial label matrices 51 B, or equal to the ratio of this total number and the number of terms of the set of initial label matrices 51 B.
[0112] For selection 130-B, processor 25 then searches for the frequency indicator(s) fi with the highest value. The class, or any of the corresponding classes, is then identified as the majority class C ma j, frequency indicator f ma j.
[0113] Processor 25 then proceeds to the 130-C allocation of intermediate class numbers numjntj.
[0114] The intermediate class numbers numjntj are preferably integers. The intermediate class numbers numjntj are in bijection with the initial class numbers numj, under the constraint that the intermediate class number numjnt_maj of the majority class is chosen to be equal to zero.
[0115] This bijection allows us to define an operation of conversion of an initial label matrix 51B into an intermediate label matrix 51C and conversely, the operation of inverse conversion of an intermediate label matrix 51C into an initial label matrix 51B.
[0116] The processor 25 can then proceed to the formation 130-D of the intermediate set 68. For this, to each initial label matrix 51 B is associated an intermediate label matrix 51 C, whose terms are the intermediate class numbers numjntj corresponding to the initial class numbers numj present at the same position in the initial label matrix 51 B.
[0117] It is therefore understood that the initial label matrices 51B and intermediate 51C have the same dimension NI*N2.
[0118] In contrast, the intermediate label matrices 51 C contain a proportion of nuisance terms, corresponding to the majority class C ma j. This proportion is in many cases significant, notably exceeding 30%, 40%, 50%, 60%, 70%, 80%, 90%, or even more.
[0119] As an example, in the context of semantic segmentation of road images, the proportion of harmful terms is commonly between 95% and 99%.
[0120] Consequently, the intermediate data structure used for storing the intermediate matrices 51 C in memory can be chosen taking into account the frequency indicator f ma j of the majority class C ma j, therefore of the expected proportion of expected zeros, and the intermediate data structure is thus different from the initial data structure used for the initial label matrices 51 B.
[0121] The intermediate data structure is chosen so that the memory cost of the set of intermediate label matrices 51 C is less than the memory cost of the set of initial label matrices 51 B.
[0122] In particular, the intermediate data structure is advantageously chosen from among the data structures adapted to sparse matrices, for which only information relating to non-nuisance terms is explicit in memory, information relating to nuisance terms being implicit.
[0123] As an example, a sparse matrix structure such as the scipy.sparse.coo_array structure from the SciPy library of the Python language can be used as an intermediate data structure, the initial data structure being, for example, the numpy array type available in the Numpy library of the Python language.
[0124] The intermediate training set 68 therefore comprises a plurality of matrix pairs, each consisting of an input matrix 51A and a corresponding intermediate label matrix 51C. The intermediate label matrices 51C are advantageously stored in RAM 35.
[0125] The processor 25 then proceeds to form the training set 65, the final validation set 66 and the intermediate validation set 67 from the intermediate set 68, in a manner known to the person skilled in the art.
[0126] The training set 65 and the intermediate validation set 67 are used for the iterative training 135 of the initial neural network 15_in.
[0127] The training set 65 and the intermediate validation set 67 are preferably independent of each other.
[0128] The training set 65 and the final validation set 66 are preferably independent of each other.
[0129] The final validation set 66 and the intermediate validation set 67 are preferably independent of each other.
[0130] Alternatively, the final validation sets 66 and intermediate validation sets 67 are not disjoint, or even identical.
[0131] In what follows, the reference signs for matrices relating to the training set 65 are followed by "_ent". Similarly, in what follows, the reference signs for matrices relating to the intermediate validation set 67 are followed by "_VI"; and the reference signs for matrices relating to the final validation set 66 are followed by "_VF".
[0132] As is known, iterative training 135 comprises a plurality of iterations, called epochs E p In the following, we will denote the number of epochs Ep by NE, where p is an integer ranging from 1 to NE.
[0133] In every era E p The input matrices 51A_ent of the training set 65 are read by the processor 25 from the mass memory 30 and provided as input to the neural network 15 in its configuration for epoch E p in progress, so as to generate for each an estimated training label matrix 51D_ent, at the output of the neural network 15 in its configuration for epoch E p ongoing. Each era E p optionally includes a step of dividing the training set 65 into K batches of B input matrices 51A_ent. The batches of input matrices 51A_ent are in this case provided successively to the neural network 15, each once at each epoch E p Each era E p includes K intermediate iterations, so that at the end of an epoch E p, each input matrix 51 A_ent of the training set 65 was received as input to the neural network 15 once.
[0134] Alternatively, the division step is performed with replacement at each epoch E p , so that at the end of a period E p Given, some 51A_ent input matrices may have been selected from multiple batches, and / or some 51A_ent input matrices may never have been selected.
[0135] In one particular embodiment, the division step is performed only once for all epochs E p .
[0136] The terms of an estimated training label matrix 51D_ent are the intermediate class numbers numjntj estimated by the neural network 15 in its configuration for epoch E p ongoing in response to the provision of the input matrix 51A_ent on the input layer 55-1.
[0137] At the end of each epoch E p , a first distance d(51C_ent, 51D_ent) between each of the estimated training label matrices 51D_ent and the respective intermediate label matrix 51C_ent is evaluated by the processor 25 and the value of a cost function, configured to measure a distance between the set of estimated label matrices 51 D and the respective intermediate label matrices 51 C, is calculated on the basis of the first distances d(51C_ent, 51D_ent) thus evaluated.
[0138] The first distance d(51C_ent, 51D_ent) is, for example, the cross-entropy distance.
[0139] Advantageously, each estimated training label matrix 51D_ent is cleared from RAM 35 once the first respective distance d(51C_ent, 51D_ent) has been obtained.
[0140] Finally, at the end of each epoch E pThe synaptic weights, and optionally the biases, are adjusted according to the value taken by the cost function. The adjustment of the synaptic weights and optionally the biases is carried out in a manner known to those skilled in the art, for example using the backpropagation of the gradient method, to provide an updated neural network.
[0141] The updated neural network 15 is validated before the implementation of epoch E p+i next using the intermediate validation set 67.
[0142] For this purpose, the input matrices 51A_VI of the intermediate validation set 67 are read by the processor 25 from the mass memory 30 and provided as input to the neural network 15 in its configuration for epoch E p in progress, so as to generate for each output of the neural network 15 an estimated label matrix of intermediate validation 51D_VI.
[0143] A performance indicator for the neural network, configured to track the evolution of the neural network's classification performance over successive epochs, is evaluated at the end of each epoch E p based on the set of estimated intermediate validation label matrices 51D_VI and the respective intermediate label matrices 51C, in particular so as to allow the transition to epoch E p+i following, according to the principle of early stopping (in English, “early-stopping”) or not and / or to adapt one or more hyperparameters of the neural network 15 used during the initialization 110 of this neural network 15.
[0144] In particular, the performance indicator calculated for the intermediate validation set 67 is compared to the same performance indicator calculated for the training set 65 at the same time E pas well as to one or more of the performance indicators calculated for the intermediate validation set 67 in previous periods.
[0145] The neural network initialized 15_in is used for the first epoch Ei. Each epoch E p after the first is carried out using the updated neural network 15 from the previous epoch E p .i.
[0146] At the end of the last ENE epoch, the trained neural network 15_ent is obtained.
[0147] For iterative training 135, at least a part, advantageously the whole, of the intermediate label matrices 51C_ent, 51C_VI of the training set 65 and of the intermediate validation set 67 is, if this has not already been done, brought up into RAM 35 at the beginning of the first epoch Ei and kept in RAM 35 until the end of the last epoch ENE.
[0148] This is made possible by the choice of the intermediate data structure, the available RAM space 35 generally not being sufficient for the initial label matrices 51 B but being sufficient for the intermediate label matrices 51C.
[0149] The memory cost of the set of intermediate label matrices 51C_ent, 51C_VI of the training set 65 and the intermediate validation set 67 is in fact in many cases lower, or even much lower, than that of the set of corresponding initial label matrices 51B_ent, 51B_VI, due to the proportion of nuis terms.
[0150] It should be noted that it is generally not possible to perform a similar operation on the 51A input matrices, for which there is no reason why a particular value of a term should have a significantly higher frequency of occurrence than other values.
[0151] The evaluation of the initial distances and, ultimately, the calculation of the cost function at the end of each epoch E p The training steps 135 according to the invention have a lower time cost than prior art learning methods. In particular, according to the invention, the evaluation of the first distances and the calculation of the cost function only require readings of information stored in the RAM 35, whereas in prior art methods, the initial label matrices 51 B must be read from the mass memory 30.
[0152] If we denote At aC Given the difference between the mass access time t_mass to mass memory 30 and the access time t_live to RAM 35, the order of magnitude of the time gain for the training step 135 according to the invention is on the order of K* NE* B* At aCFrom this, we must deduct the time cost of the determination steps 130-A, selection 130-B, and allocation 130-C, which is on the order of K*B*t_masse, generally negligible compared to K*NE*B*At aC these.
[0153] Furthermore, the initial distance evaluation can be performed on the intermediate data structure and can therefore be optimized in time compared to a distance calculation performed on the initial data structure, so the time gain is even greater.
[0154] Alternatively, the evaluation of each first distance includes a prior step of inverse conversion of the respective training estimated intermediate label matrix 51 D into a respective estimated initial label matrix 51 D' and of the respective intermediate label matrix 51 C into the respective initial label matrix 51 B, based on the bijection between the initial class numbers numj and the intermediate class numbers numjntj, only the corresponding first distance being kept in RAM 35 until the end of the cost function calculation.
[0155] In this case, the time cost of the reverse conversion is on the order of K*B*t_mass.
[0156] It is therefore understood that the training method 10 according to the invention allows a time saving greater than K*(NE-2)*B*At aC ces. If we keep in mind that:
[0157] - the number of NE learning epochs is generally greater than 10, frequently greater than 100 or even 1000,
[0158] - the number K*B of 51A input matrices is frequently greater than a thousand, - t_mass is on the order of a few tens of milliseconds, t_live is on the order of a few nanoseconds to a few tens of nanoseconds, At aC Since this is itself on the order of a few tens of milliseconds, the time saving is significant, and can reach a relative factor of two or more, and in absolute terms, a few seconds to a few tens of seconds.
[0159] Optionally, the training process 10 includes a memory selection step 150, before the iterative training step 135, for example between the allocation 130-C and the training 130-D of the intermediate set 68.
[0160] The memory selection step 150 includes an evaluation of a memory cost 150-A of the set of intermediate label matrices 51 C of the training set 65 and the intermediate validation set 67, and optionally of the final validation set 66 based on the frequency indicator f ma j of the majority class C ma j as well as the number of corresponding 51A input matrices and the NI*N2 dimension of these 51A input matrices.
[0161] The memory selection step 150 then includes the search 150B for one or more available RAM 35s based on this memory cost.
[0162] If a RAM 35 exchanging data with a processor used for the implementation of the process has sufficient memory space for storing information relating to the intermediate label matrices 51 C, this information is retrieved into RAM 35 as previously described for use during iterative training 135.
[0163] In a particular embodiment, the RAM 35 is chosen from a RAM memory dedicated to the processor 25 and a VRAM memory optimized for graphics data management.
[0164] In this case, VRAM memory may be chosen in preference to RAM memory when VRAM memory is available, especially if the processor 25 is of the GPU type.
[0165] Optionally, if no RAM 35 allows the storage of the set of intermediate label matrices 51C corresponding to the input matrices 51A whose memory cost has been evaluated, the memory selection step 150 may include a step of selecting the largest subset of the training set 65 allowing the implementation of the determination step 135 as described previously on the basis of a training set with reduced memory cost.
[0166] Alternatively, if no RAM 35 has sufficient available storage space, the memory selection step 150 optionally includes:
[0167] - the determination of a set {MCk} of meta-classes MCk from the set {G} of predetermined classes Ci, based on the majority class C maj determined, and - the formation of secondary label matrices whose terms are the meta-class numbers corresponding to the initial class numbers of the initial label matrix, and - the reiteration of the determination steps 130-A, selection 130-B and assignment 130-D from the secondary label matrices before the formation step 130-D.
[0168] The set {MCk} of MCk metaclasses has a lower dimension than the set {C} of predetermined Ci classes, which allows us to determine a majority metaclass MC ma j whose frequency indicator is higher than that of the majority class, and thus reduce the memory cost, at the cost of less fine annotation.
[0169] Finally, the processor 25 performs the final validation 140 of the trained neural network 15_ent thus obtained using the final validation set 66.
[0170] For the final validation 140, the set of intermediate label matrices 51C_VF of the final validation set 66 are, if possible and if not already done, retrieved into RAM 35.
[0171] Optionally, the intermediate label matrices 51C_VF of the entire training set 68 are loaded back into RAM 35 at the beginning of the first epoch Ei.
[0172] For final validation 140, an estimated final validation label matrix 51F is estimated using the trained neural network 15_ent for each input matrix 51A_VF of the final validation set 66.
[0173] For this purpose, the input matrices 51A of the final validation set 66 are read by the processor 25 into the mass memory 30, so as to generate for each one an estimated final validation label matrix 51D_VF, estimated by the trained neural network 15_ent.
[0174] A classification performance indicator of the trained neural network 15_ent is evaluated by the processor 25 on the same principle as for the intermediate validation set 67, on the basis of the set of estimated final validation label matrices 51D_VF and the respective intermediate label matrices 51C_VF.
[0175] The performance indicator can be based on the same or a different metric as that used for intermediate validations.
[0176] The performance indicator can then be compared to a predetermined setpoint value in order to validate or invalidate the trained neural network 15_ent for later use in an inference phase.
[0177] Optionally, the training method 10 according to the invention includes a step of cutting one or more portions of the input matrices 51A of the training set 68 based on the majority class C ma j.
[0178] The cutting step based on the majority class C ma j (in English, "class-aware cropping") includes the search and cutting of portions of the input matrices 51 A corresponding to rare classes, that is, whose frequency indicator fi is much lower than the majority frequency indicator f ma j, for learning these rare classes. This search and this cutting are particularly fast in the training method 10 according to the invention, since the intermediate data structure includes these rare classes explicitly and not diluted by the majority class Cmaj.
[0179] The invention also relates to a data classification method 170 by segmentation using the trained neural network 15_ent obtained at the end of the training method 10 as described above for the classification of at least one matrix input data 45.
[0180] As can be seen in Figure 2, in a first substep 170A, matrix data 45 is provided on the input layer 55-1 of the trained neural network 15_ent and an intermediate label matrix response 70 is obtained at the output 55-p of the trained neural network 15_ent.
[0181] In a second substep 170B, the intermediate label matrix response 70 is converted into the initial label matrix response 75 on the principle of the inverse conversion described previously.
[0182] The invention finally relates to a computer program product comprising software instructions which, when executed by a computer, implement the training process 10 or classification 170 as described above.
Claims
DEMANDS 1. Method for training (10) a neural network (15) configured for classifying matrix input data (45) comprising: a) the initialization (110) of a neural network (15), b) the provision (120) of an initial set (50) of training data comprising a plurality of matrix pairs (51), each matrix pair (51) comprising an input matrix (51A) comprising input data and a respective initial label matrix (51B) of the same dimension as the input matrix (51A) and each term of which is an initial class number (numj) representing a class (Ci) of a respective term of the input matrix (51A), said class being selected from a set ({Ci}) of predetermined classes, c) training (130) from the initial set (50), a training set (65), a final validation set (66) and an intermediate validation set (67), d) iterative training (135) of the neural network (15) using the training set (65) and the intermediate validation set (67), and e) the final validation (140) of the trained neural network (15_ent) using the final validation set (66), characterized in that: training c) of the training sets (65), final validation (66) and intermediate validation (67) includes: i) the determination (130-A) for each class (Ci) of the set ({C}) of predetermined classes of a frequency indicator (fi) representative of the total number of occurrences of this class (Ci) in the set of terms of the initial label matrices (51 B) of the initial set (50), ii) the selection (130-B) of a majority class (C ma j), which is the class whose frequency indicator (f ma j) is the highest, iii) the assignment (130-C) to each of the classes (Ci) of a respective intermediate class number (numjntj) such that the intermediate class number of the majority class (num_int_maj) is equal to zero, and (iv) the formation (130-D) of an intermediate training set (68) from the initial set (50), by converting each initial label matrix (51 B) into an intermediate label matrix (51 C) whose terms are the intermediate class numbers (numjntj) corresponding to the initial class numbers (numj) of the initial label matrix (51 B), the intermediate label matrices (51 C) being stored in memory by means of an intermediate data structure different from an initial data structure used for the in-memory storage of the initial label matrices (51 B), said training sets (65), final validation set (66) and intermediate validation set (67) being formed from the intermediate set (68).
2. Method (10) according to claim 1, comprising before the iterative training (135) of the neural network (15): c2) a selection step (150) of a random access memory (35) exchanging data with a processor (25) used for the implementation of the process (10) having sufficient memory space for the storage of information relating to the intermediate label matrices (51 C) and the storage of this information in said random access memory (35) for its use during the iterative training (135).
3. Method (10) according to claim 2, wherein during the iterative training (135), at least one set of estimated intermediate label matrices (51 D) is obtained at the output of the neural network (15) in response to the input of the neural network of a set of input matrices (51 A) chosen from the intermediate set (68), and is compared either directly with the respective intermediate label matrices (51 C) or with the initial label matrices (51 B) after inverse conversion of the estimated intermediate label matrices (51 D) into respective estimated initial label matrices (51 D'), said label matrices being stored in RAM (35).
4. Method (10) according to claim 2 or claim 3, wherein the RAM (35) is chosen at the selection step (150) from a RAM memory dedicated to the processor and a VRAM memory optimized for graphics data management.
5. Method (10) according to claim 4, wherein VRAM memory is chosen in preference to RAM memory.
6. A method (10) according to any one of claims 2 to 5, wherein the RAM selection step (150) includes, if necessary: - the determination of a set ({MCk}) of meta-classes (MCk) from the set ({Ci}) of predetermined classes based on the majority class (C ma (j) determined, - the formation of secondary label matrices whose terms are the meta-class numbers (MCk) corresponding to the initial class numbers (numj) of the initial label matrix (51A), and - the reiteration of steps i), ii) and iii) from the secondary label matrices before step iv) of formation (130-D) of the intermediate assembly (68).
7. Method (10) according to any one of the preceding claims wherein the input matrices (51A) are previously obtained by means of a digital image sensor or by means of a digital sound sensor.
8. Method (10) according to any one of the preceding claims wherein the intermediate data structure is specifically adapted for sparse matrices.
9. Method for classifying (170) data by segmentation using a neural network (15_ent) obtained at the end of the training method (10) according to any one of the preceding claims.
10. Product computer program comprising software instructions which, when executed by a computer, implement the method (10) according to any one of claims 1 to 8.