Face recognition model training method and device for difficult example samples
By calculating the sine and negative cosine values of the class centers in the face training set, the class representation degree is calculated, and the loss function is updated twice. This solves the problem of insufficient mining of difficult samples in the loss function of the face recognition model and improves the model accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN XUMI YUNTU SPACE TECH CO LTD
- Filing Date
- 2022-10-25
- Publication Date
- 2026-06-19
AI Technical Summary
The loss function of existing face recognition models lacks effective mining of difficult examples, resulting in low accuracy of the final trained model.
By calculating the sine and negative cosine values of each class center in the face training set, the class representation degree is calculated, and the loss function is updated twice. The updated loss function is then used for two training sessions to focus on and explore different aspects of hard examples.
It improves the accuracy of face recognition models and solves the problem that the loss function is insufficient for effectively mining difficult examples.
Smart Images

Figure CN115661899B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of face recognition technology, and in particular to a training method and apparatus for a face recognition model for difficult sample data. Background Technology
[0002] Current face recognition models lack effective methods for identifying difficult examples in their loss functions. Existing training methods for difficult examples are basically fixed-weighting approaches, which neglect high-quality images and lead to low accuracy in the final trained model.
[0003] In realizing the concept disclosed herein, the inventors discovered at least the following technical problems in the related technology: the loss function of the face recognition model lacks effective mining of difficult examples, resulting in low accuracy of the final trained model. Summary of the Invention
[0004] In view of this, the present disclosure provides a training method, apparatus, electronic device, and computer-readable storage medium for a face recognition model targeting difficult examples, in order to solve the problem in the prior art that the loss function of the face recognition model lacks effective mining of difficult examples, resulting in low accuracy of the final trained model.
[0005] A first aspect of this disclosure provides a method for training a face recognition model for difficult example samples, comprising: acquiring a face training set and calculating multiple sine and negative cosine values corresponding to each class center in the face training set; calculating the mean of sine and negative cosine values corresponding to each class center based on the multiple sine and negative cosine values corresponding to each class center, and calculating the class representation degree corresponding to each class center based on the mean of sine and negative cosine values corresponding to each class center; updating the negative cosine value in the loss function of the face recognition model for the first time based on the mean of sine and negative cosine values and the class representation degree corresponding to each class center, and updating the class representation degree based on the class representation degree corresponding to each class center. The sine and cosine values in the loss function are updated for the first time based on the face training set. The face recognition model is then trained for the first time using the updated loss function. Using the face recognition model trained for the first time, the sample representation degree corresponding to each sample in the face training set is determined. Based on the mean sine and cosine values and class representation degree corresponding to each class center, as well as the sample representation degree corresponding to each sample belonging to that class center, the sine and cosine values in the loss function are updated for the second time. After freezing the first and second stages of the face recognition model's network, the face recognition model is trained for the second time using the updated loss function based on the face training set.
[0006] A second aspect of this disclosure provides a training apparatus for a face recognition model targeting difficult examples, comprising: a first calculation module configured to acquire a face training set and calculate multiple sine and negative cosine values corresponding to each class center in the face training set; a second calculation module configured to calculate the mean of sine and negative cosine values corresponding to each class center based on the multiple sine and negative cosine values corresponding to each class center, and calculate the class representation degree corresponding to each class center based on the mean of sine and negative cosine values corresponding to each class center; and a first update module configured to perform a first update on the negative cosine value in the loss function of the face recognition model based on the mean of sine and negative cosine values and the class representation degree corresponding to each class center, and update the loss function based on the class representation degree corresponding to each class center. The loss function is updated for the first time. The first training module is configured to train the face recognition model for the first time based on the face training set using the loss function updated for the first time. The determination module is configured to determine the sample representation degree corresponding to each sample in the face training set using the face recognition model trained for the first time. The second update module is configured to update the sine and cosine values in the loss function for the second time based on the mean sine and cosine values and the class representation degree corresponding to each class center and the sample representation degree corresponding to each sample belonging to that class center. The second training module is configured to train the face recognition model for the second time based on the face training set using the loss function updated for the second time after freezing the first and second stage networks of the face recognition model.
[0007] A third aspect of this disclosure provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.
[0008] A fourth aspect of this disclosure provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method.
[0009] The beneficial effects of this disclosure embodiment compared with the prior art are as follows: This disclosure embodiment obtains a face training set and calculates multiple sine and negative cosine values corresponding to each class center in the face training set; it calculates the mean of sine and negative cosine values corresponding to each class center based on the multiple sine and negative cosine values corresponding to each class center, and calculates the class representation degree corresponding to each class center based on the mean of sine and negative cosine values corresponding to each class center; it updates the negative cosine value in the loss function of the face recognition model for the first time based on the mean of sine and cosine values and the class representation degree corresponding to each class center, and updates the sine and cosine values in the loss function for the first time based on the class representation degree corresponding to each class center; based on the face training set, it uses the first updated loss function to... The face recognition model undergoes its first training. Using this first-trained model, the sample representation degree of each sample in the face training set is determined. Based on the sine and cosine mean values and class representation degree of each class center, as well as the sample representation degree of each sample belonging to that class center, the sine and cosine values in the loss function are updated a second time. After freezing the first and second stages of the face recognition model's network, the model is trained a second time using the updated loss function on the face training set. Therefore, by employing the above techniques, the problem of low accuracy in the final trained model due to the lack of effective mining of difficult examples in the loss function of existing face recognition models can be solved, thereby improving the accuracy of the face recognition model. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1 This is a schematic diagram illustrating an application scenario of an embodiment of this disclosure;
[0012] Figure 2 This is a flowchart illustrating a method for training a face recognition model for difficult sample cases provided in an embodiment of this disclosure;
[0013] Figure 3 This is a schematic diagram of the structure of a training device for a face recognition model for difficult example samples provided in an embodiment of this disclosure;
[0014] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation
[0015] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, so as to provide a thorough understanding of the embodiments of this disclosure. However, those skilled in the art will understand that this disclosure may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this disclosure with unnecessary detail.
[0016] The following describes in detail, with reference to the accompanying drawings, a method and apparatus for training a face recognition model for difficult sample data according to an embodiment of the present disclosure.
[0017] Figure 1 This is a schematic diagram illustrating an application scenario of an embodiment of this disclosure. The application scenario may include terminal devices 101, 102, and 103, server 104, and network 105.
[0018] Terminal devices 101, 102, and 103 can be hardware or software. When terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with displays that support communication with server 104, including but not limited to smartphones, tablets, laptops, and desktop computers. When terminal devices 101, 102, and 103 are software, they can be installed in the aforementioned electronic devices. Terminal devices 101, 102, and 103 can be implemented as multiple software programs or software modules, or as a single software program or software module; this disclosure does not impose any limitations on this. Furthermore, various applications can be installed on terminal devices 101, 102, and 103, such as data processing applications, instant messaging tools, social platform software, search applications, shopping applications, etc.
[0019] Server 104 can be a server that provides various services, such as a backend server that receives requests sent by terminal devices with which it has established communication connections. This backend server can receive and analyze the requests sent by the terminal devices and generate processing results. Server 104 can be a single server, a server cluster consisting of several servers, or a cloud computing service center. This embodiment of the disclosure does not impose any limitations on these aspects.
[0020] It should be noted that server 104 can be either hardware or software. When server 104 is hardware, it can be various electronic devices that provide various services to terminal devices 101, 102, and 103. When server 104 is software, it can be multiple software programs or software modules that provide various services to terminal devices 101, 102, and 103, or it can be a single software program or software module that provides various services to terminal devices 101, 102, and 103. This disclosure does not limit the scope of the embodiments.
[0021] Network 105 can be a wired network using coaxial cable, twisted pair, and fiber optic connection, or it can be a wireless network that enables interconnection of various communication devices without wiring, such as Bluetooth, Near Field Communication (NFC), Infrared, etc. This disclosure does not limit the scope of the network.
[0022] Users can establish a communication connection with server 104 via network 105 through terminal devices 101, 102, and 103 to receive or send information, etc. It should be noted that the specific types, quantities, and combinations of terminal devices 101, 102, and 103, server 104, and network 105 can be adjusted according to the actual needs of the application scenario, and this disclosure embodiment does not impose any limitations on this.
[0023] Figure 2 This is a flowchart illustrating a method for training a face recognition model for difficult sample data, provided in an embodiment of this disclosure. Figure 2 The training method for face recognition models targeting difficult examples can be achieved by... Figure 1 The computer or server, or the software on the computer or server, executes the command. For example... Figure 2 As shown, the training method for the face recognition model targeting difficult examples includes:
[0024] S201, Obtain the face training set and calculate multiple sine and negative cosine values corresponding to each class center in the face training set;
[0025] S202, calculate the mean of sine and cosine for each class center based on the multiple sine and cosine values and the mean of negative cosine for each class center, and calculate the class representation degree for each class center based on the mean of sine and cosine for each class center.
[0026] S203, based on the mean sine and cosine values and class representation degree corresponding to each class center, the negative cosine value in the loss function of the face recognition model is updated for the first time, and based on the class representation degree corresponding to each class center, the sine and cosine values in the loss function are updated for the first time.
[0027] S204, Based on the face training set, the face recognition model is trained for the first time using the loss function updated for the first time;
[0028] S205, using the face recognition model after the first training, determine the sample representation degree corresponding to each sample in the face training set;
[0029] S206, based on the mean sine and cosine values and class characterization of each class center and the sample characterization of each sample belonging to that class center, the sine and cosine values in the loss function are updated for the second time.
[0030] S207: After freezing the first and second stages of the face recognition model's network, the face recognition model is trained a second time based on the face training set using the updated loss function.
[0031] The face recognition model is a residual network, which consists of five stages: stage 0 (zero stage), stage 1 (first stage), stage 2 (second stage), stage 3 (third stage), and stage 4 (fourth stage). The second training step freezes stages 1 and 2 and trains stages 0, 3, and 4. Since stage 0 is essentially a preprocessed network, the actual training focuses on stages 3 and 4. This embodiment of the present disclosure updates the loss function of the face recognition model twice, allowing each updated loss function to focus on or uncover different aspects of difficult examples. Through these two training steps, the face recognition model can effectively learn from difficult examples, improving its final accuracy.
[0032] According to the technical solution provided in this disclosure, a face training set is obtained, and multiple sine and negative cosine values corresponding to each class center in the face training set are calculated. Based on the multiple sine and negative cosine values corresponding to each class center, the mean of sine and negative cosine values corresponding to each class center is calculated, and the class representation degree corresponding to each class center is calculated. The negative cosine value in the loss function of the face recognition model is updated for the first time based on the mean of sine and cosine values and the class representation degree corresponding to each class center, and the sine and cosine values in the loss function are updated for the first time based on the class representation degree corresponding to each class center. Based on the face training set, the face recognition model is then processed using the updated loss function. The first training step involves using the face recognition model after the first training to determine the sample representation degree corresponding to each sample in the face training set. Based on the sine and cosine mean values and class representation degree corresponding to each class center, as well as the sample representation degree corresponding to each sample belonging to that class center, the sine and cosine values in the loss function are updated a second time. After freezing the first and second stages of the face recognition model's network, the face recognition model is trained a second time using the updated loss function based on the face training set. Therefore, by employing the above techniques, the problem of low accuracy in the final trained model due to the lack of effective mining of difficult examples in the loss function of the face recognition model in existing technologies can be solved, thereby improving the accuracy of the face recognition model.
[0033] In one optional embodiment, the method for obtaining the class representation degree corresponding to each class center includes: calculating the cosine value between the class center vector of each class center and the sample vector of the sample belonging to the class center, to obtain multiple sine and cosine values corresponding to each class center; calculating the cosine value between the class center vector of each class center and the sample vector of the sample not belonging to the class center, to obtain multiple negative cosine values corresponding to each class center; taking the mean of the multiple sine and cosine values corresponding to each class center as the mean of the sine and cosine values corresponding to the class center, and taking the mean of the multiple negative cosine values corresponding to each class center as the mean of the negative cosine values corresponding to the class center; and taking the difference between the mean of the sine and cosine values corresponding to each class center as the class representation degree corresponding to the class center.
[0034] In the face training set, each person has multiple face images, each image is a sample, and each person's multiple samples correspond to a class center.
[0035] In S203, the negative cosine value in the loss function of the face recognition model is updated for the first time based on the sine and cosine mean and class representation degree corresponding to each class center. This includes: when the negative cosine value corresponding to each sample belonging to each class center is greater than a preset threshold, the sum of the sine and cosine mean and class representation degree corresponding to the class center is multiplied by the negative cosine value corresponding to each sample belonging to that class center, and the result of this product is used as the negative cosine value in the loss function; when the negative cosine value corresponding to each sample belonging to that class center is not greater than the preset threshold, the negative cosine value corresponding to each sample belonging to that class center is used as the negative cosine value in the loss function.
[0036] The initial loss function is as follows:
[0037]
[0038] Generally, s=64, m=0.5, and N is the sample size. It is the cosine of the sample and its own class center, where yi represents the i-th sample under that class center. That is, the sine and cosine values corresponding to the i-th sample belonging to the center of this class. It is the cosine of the sample and the other class centers, where j represents the j-th sample that is not a class center. That is, the negative cosine value corresponding to the j-th sample belonging to the center of this class.
[0039] When the negative cosine value corresponding to each class center is greater than a preset threshold, the negative cosine value in the loss function is calculated using the following formula. :
[0040]
[0041] p is the sine and cosine mean of the class center, and d is the class representativeness of the class center. Let be the negative cosine value corresponding to the j-th sample belonging to this class center.
[0042] When the negative cosine value corresponding to each class center is not greater than a preset threshold, then let the negative cosine value in the loss function... .
[0043] In S203, the sine and cosine values in the loss function are updated for the first time based on the class representation degree corresponding to each class center. This includes: calculating the first pressure value corresponding to each sample belonging to each class center based on the class representation degree corresponding to each class center and the sine, cosine, and negative cosine values corresponding to each sample belonging to that class center; using the product of the first pressure value corresponding to each sample belonging to each class center and the mean of the sine and cosine values corresponding to each class center as the second pressure value corresponding to each sample belonging to that class center; and updating the sine and cosine values in the loss function for the first time based on the second pressure value corresponding to each sample belonging to that class center.
[0044] The first pressure value is calculated using the following formula.
[0045]
[0046] max() is the maximum value function.
[0047] The second pressure value is calculated using the following formula.
[0048]
[0049] The loss function after the first update is as follows:
[0050]
[0051] In S205, using the face recognition model after the first training, the sample representation degree corresponding to each sample in the face training set is determined, including: calculating the cosine value between the sample vector of each sample and the class center vector of the class center to which the sample belongs, to obtain the sine and cosine values corresponding to each sample; calculating the cosine value between the sample vector of each sample and the class center vector of other class centers besides the class center to which the sample belongs, to obtain multiple negative cosine values corresponding to each sample; and taking the difference between the mean of the multiple negative cosine values corresponding to each sample and the sine and cosine values corresponding to the sample as the sample representation degree corresponding to the sample.
[0052] In S206, based on the sine and cosine mean and class representation degree corresponding to each class center, and the sample representation degree corresponding to each sample belonging to that class center, the negative cosine value in the loss function is updated a second time. This includes: when the negative cosine value corresponding to each class center is greater than a preset threshold, the sum of the sine and cosine mean corresponding to the class center and the first weight corresponding to each sample belonging to that class center is multiplied by the negative cosine value corresponding to each sample belonging to that class center, and the result of this product is used as the negative cosine value in the loss function. The first weight corresponding to each sample belonging to that class center is determined by the class representation degree corresponding to the class center, the sample representation degree corresponding to each sample belonging to that class center, the total number of training rounds, and the current training round. When the negative cosine value corresponding to each sample belonging to that class center is not greater than the preset threshold, the negative cosine value corresponding to each sample belonging to that class center is used as the negative cosine value in the loss function.
[0053] The first weight for each sample is calculated using the following formula. :
[0054]
[0055] T represents the total number of training rounds, and t represents the current training round. This represents the sample characterization degree corresponding to the sample.
[0056] When the negative cosine value corresponding to each sample belonging to the class center is greater than a preset threshold, the negative cosine value in the loss function is determined by the following formula. :
[0057]
[0058] In S206, based on the sine and cosine mean values and class representation degree corresponding to each class center, as well as the sample representation degree corresponding to each sample belonging to that class center, the sine and cosine values in the loss function are updated a second time. This includes: calculating the first pressure value corresponding to each sample belonging to that class center based on the class representation degree corresponding to each class center and the sine, cosine, and negative cosine values corresponding to each sample belonging to that class center; using the product of the first pressure value corresponding to each sample belonging to each class center and the sine and cosine mean values corresponding to each class center as the second pressure value corresponding to each sample belonging to that class center; determining the second weight corresponding to each sample belonging to that class center based on the sine and cosine mean values, the sample representation degree corresponding to each sample belonging to that class center, and the second pressure value; and updating the sine and cosine values in the loss function a second time based on the second weight corresponding to each sample belonging to that class center.
[0059] The second weight corresponding to each sample of the class center is calculated using the following formula. :
[0060]
[0061] The loss function after the second update:
[0062]
[0063] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.
[0064] The following are embodiments of the apparatus disclosed herein, which can be used to execute embodiments of the method disclosed herein. For details not disclosed in the apparatus embodiments of this disclosure, please refer to the embodiments of the method disclosed herein.
[0065] Figure 3 This is a schematic diagram of a training device for a face recognition model targeting difficult examples, provided in an embodiment of this disclosure. Figure 3 As shown, the training device for the face recognition model targeting difficult examples includes:
[0066] The first calculation module 301 is configured to acquire a face training set and calculate multiple sine and negative cosine values corresponding to each class center in the face training set.
[0067] The second calculation module 302 is configured to calculate the mean of sine and cosine for each class center based on the multiple sine and cosine values and the mean of negative cosine for each class center, and to calculate the class representation degree for each class center based on the mean of sine and cosine for each class center.
[0068] The first update module 303 is configured to update the negative cosine value in the loss function of the face recognition model for the first time based on the mean sine and cosine values and class representation degree corresponding to each class center, and update the sine and cosine values in the loss function for the first time based on the class representation degree corresponding to each class center.
[0069] The first training module 304 is configured to train the face recognition model for the first time based on the face training set and using the loss function updated for the first time.
[0070] The determination module 305 is configured to use the face recognition model after the first training to determine the sample representation degree corresponding to each sample in the face training set;
[0071] The second update module 306 is configured to update the sine and cosine values in the loss function a second time based on the sine and cosine mean and class characterization corresponding to each class center and the sample characterization corresponding to each sample belonging to that class center.
[0072] The second training module 307 is configured to perform a second training on the face recognition model based on the face training set and using the second updated loss function after freezing the first and second phases of the face recognition model network.
[0073] The face recognition model is a residual network, which consists of five stages: stage 0 (zero stage), stage 1 (first stage), stage 2 (second stage), stage 3 (third stage), and stage 4 (fourth stage). The second training step freezes stages 1 and 2 and trains stages 0, 3, and 4. Since stage 0 is essentially a preprocessed network, the actual training focuses on stages 3 and 4. This embodiment of the present disclosure updates the loss function of the face recognition model twice, allowing each updated loss function to focus on or uncover different aspects of difficult examples. Through these two training steps, the face recognition model can effectively learn from difficult examples, improving its final accuracy.
[0074] According to the technical solution provided in this disclosure, a face training set is obtained, and multiple sine and negative cosine values corresponding to each class center in the face training set are calculated. Based on the multiple sine and negative cosine values corresponding to each class center, the mean of sine and negative cosine values corresponding to each class center is calculated, and the class representation degree corresponding to each class center is calculated. The negative cosine value in the loss function of the face recognition model is updated for the first time based on the mean of sine and cosine values and the class representation degree corresponding to each class center, and the sine and cosine values in the loss function are updated for the first time based on the class representation degree corresponding to each class center. Based on the face training set, the face recognition model is then processed using the updated loss function. The first training step involves using the face recognition model after the first training to determine the sample representation degree corresponding to each sample in the face training set. Based on the sine and cosine mean values and class representation degree corresponding to each class center, as well as the sample representation degree corresponding to each sample belonging to that class center, the sine and cosine values in the loss function are updated a second time. After freezing the first and second stages of the face recognition model's network, the face recognition model is trained a second time using the updated loss function based on the face training set. Therefore, by employing the above techniques, the problem of low accuracy in the final trained model due to the lack of effective mining of difficult examples in the loss function of the face recognition model in existing technologies can be solved, thereby improving the accuracy of the face recognition model.
[0075] Optionally, the first calculation module 301 is further configured to calculate the cosine value between the class center vector of each class center and the sample vector of the sample belonging to the class center, to obtain multiple sine and cosine values corresponding to each class center; calculate the cosine value between the class center vector of each class center and the sample vector of the sample not belonging to the class center, to obtain multiple negative cosine values corresponding to each class center; take the mean of the multiple sine and cosine values corresponding to each class center as the mean of the sine and cosine values corresponding to the class center, take the mean of the multiple negative cosine values corresponding to each class center as the mean of the negative cosine values corresponding to the class center; and take the difference between the mean of the sine and cosine values corresponding to each class center as the class representation degree corresponding to the class center.
[0076] In the face training set, each person has multiple face images, each image is a sample, and each person's multiple samples correspond to a class center.
[0077] Optionally, the first update module 303 is further configured to multiply the sum of the mean sine and cosine values of the class center and the class representation value by the negative cosine value of each sample belonging to the class center when the negative cosine value of each sample belonging to the class center is greater than a preset threshold, and use the result of the product as the negative cosine value in the loss function; when the negative cosine value of each sample belonging to the class center is not greater than the preset threshold, use the negative cosine value of each sample belonging to the class center as the negative cosine value in the loss function.
[0078] The initial loss function is as follows:
[0079]
[0080] Generally, s=64, m=0.5, and N is the sample size. It is the cosine of the sample and its own class center, where yi represents the i-th sample under that class center. That is, the sine and cosine values corresponding to the i-th sample belonging to the center of this class. It is the cosine of the sample and the other class centers, where j represents the j-th sample that is not a class center. That is, the negative cosine value corresponding to the j-th sample belonging to the center of this class.
[0081] When the negative cosine value corresponding to each class center is greater than a preset threshold, the negative cosine value in the loss function is calculated using the following formula. :
[0082]
[0083] p is the sine and cosine mean of the class center, and d is the class representativeness of the class center. Let be the negative cosine value corresponding to the j-th sample belonging to this class center.
[0084] When the negative cosine value corresponding to each class center is not greater than a preset threshold, then let the negative cosine value in the loss function... .
[0085] Optionally, the first update module 303 is further configured to calculate a first pressure value for each sample belonging to a class center based on the class representation degree corresponding to each class center and the sine and negative cosine values corresponding to each sample belonging to that class center; to use the product of the first pressure value corresponding to each sample belonging to each class center and the mean of the sine and cosine values corresponding to each class center as a second pressure value corresponding to each sample belonging to that class center; and to update the sine and cosine values in the loss function for the first time based on the second pressure value corresponding to each sample belonging to that class center.
[0086] The first pressure value is calculated using the following formula.
[0087]
[0088] max() is the maximum value function.
[0089] The second pressure value is calculated using the following formula.
[0090]
[0091] The loss function after the first update is as follows:
[0092]
[0093] Optionally, the determining module 305 is further configured to calculate the cosine value between the sample vector of each sample and the class center vector of the class center to which the sample belongs, to obtain the sine and cosine values corresponding to each sample; calculate the cosine value between the sample vector of each sample and the class center vector of other class centers besides the class center to which the sample belongs, to obtain multiple negative cosine values corresponding to each sample; and use the difference between the mean of the multiple negative cosine values corresponding to each sample and the sine and cosine values corresponding to the sample as the sample representation degree corresponding to the sample.
[0094] Optionally, the second update module 306 is further configured to, when the negative cosine value corresponding to each class center is greater than a preset threshold, multiply the sum of the mean of the sine and cosine values corresponding to the class center and the first weight corresponding to each sample belonging to the class center by the negative cosine value corresponding to each sample belonging to the class center, and use the result of the product as the negative cosine value in the loss function, wherein the first weight corresponding to each sample belonging to the class center is determined by the class representation degree corresponding to the class center, the sample representation degree corresponding to each sample belonging to the class center, the total number of training rounds, and the current training round; when the negative cosine value corresponding to each sample belonging to the class center is not greater than the preset threshold, use the negative cosine value corresponding to each sample belonging to the class center as the negative cosine value in the loss function.
[0095] The first weight for each sample is calculated using the following formula. :
[0096]
[0097] T represents the total number of training rounds, and t represents the current training round. This represents the sample characterization degree corresponding to the sample.
[0098] When the negative cosine value corresponding to each sample belonging to the class center is greater than a preset threshold, the negative cosine value in the loss function is determined by the following formula. :
[0099]
[0100] Optionally, the second update module 306 is further configured to: calculate a first pressure value for each sample belonging to a class center based on the class representation degree corresponding to each class center and the sine and negative cosine values corresponding to each sample belonging to that class center; use the product of the first pressure value corresponding to each sample belonging to each class center and the sine and cosine mean corresponding to each class center as the second pressure value corresponding to each sample belonging to that class center; determine a second weight corresponding to each sample belonging to that class center based on the sine and cosine mean corresponding to each class center, the sample representation degree corresponding to each sample belonging to that class center, and the second pressure value; and update the sine and cosine values in the loss function for the second time based on the second weight corresponding to each sample belonging to that class center.
[0101] The second weight corresponding to each sample of the class center is calculated using the following formula. :
[0102]
[0103] The loss function after the second update:
[0104]
[0105] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this disclosure.
[0106] Figure 4 This is a schematic diagram of the electronic device 4 provided in an embodiment of this disclosure. Figure 4 As shown, the electronic device 4 of this embodiment includes: a processor 401, a memory 402, and a computer program 403 stored in the memory 402 and executable on the processor 401. When the processor 401 executes the computer program 403, it implements the steps in the various method embodiments described above. Alternatively, when the processor 401 executes the computer program 403, it implements the functions of each module / unit in the various device embodiments described above.
[0107] Electronic device 4 can be a desktop computer, laptop, handheld computer, cloud server, or other electronic device. Electronic device 4 may include, but is not limited to, processor 401 and memory 402. Those skilled in the art will understand that... Figure 4 This is merely an example of electronic device 4 and does not constitute a limitation on electronic device 4. It may include more or fewer components than shown, or different components.
[0108] The processor 401 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0109] The memory 402 can be an internal storage unit of the electronic device 4, such as a hard disk or RAM of the electronic device 4. The memory 402 can also be an external storage device of the electronic device 4, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc., equipped on the electronic device 4. The memory 402 can also include both internal and external storage units of the electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.
[0110] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0111] If an integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program may include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in a computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in a jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.
[0112] The above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit it. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure, and should all be included within the protection scope of this disclosure.
Claims
1. A training method for a face recognition model targeting difficult examples, characterized in that, include: Obtain a face training set and calculate multiple sine and negative cosine values corresponding to each class center in the face training set; Based on the multiple sine and cosine values corresponding to each class center, calculate the mean of sine and cosine for each class center, and calculate the class representation degree corresponding to each class center based on the mean of sine and cosine for each class center. The negative cosine value in the loss function of the face recognition model is updated for the first time based on the mean sine and cosine values and class representation degree corresponding to each class center. Then, based on the class representation degree corresponding to each class center and the sine and cosine values and negative cosine values corresponding to each sample belonging to that class center, the first pressure value corresponding to each sample belonging to that class center is calculated. The product of the first pressure value corresponding to each sample belonging to each class center and the mean sine and cosine values corresponding to each class center is used as the second pressure value corresponding to each sample belonging to that class center. The sine and cosine values in the loss function are updated for the first time based on the second pressure value corresponding to each sample belonging to that class center. Based on the face training set, the face recognition model is trained for the first time using the loss function updated for the first time; Using the face recognition model after the first training, the cosine value between the sample vector of each sample and the class center vector of the class center to which the sample belongs is calculated to obtain the sine and cosine values corresponding to each sample. The cosine value between the sample vector of each sample and the class center vectors of other class centers besides the class center to which the sample belongs is also calculated to obtain multiple negative cosine values corresponding to each sample. The difference between the mean of the multiple negative cosine values corresponding to each sample and the sine and cosine values corresponding to the sample is used to determine the sample representation degree corresponding to each sample in the face training set. Based on the mean sine and cosine values and class characterization of each class center, as well as the sample characterization of each sample belonging to that class center, the negative cosine value in the loss function is updated a second time. Based on the mean sine and cosine values corresponding to each class center, the sample representation degree corresponding to each sample belonging to that class center, and the second pressure value, the second weight corresponding to each sample belonging to that class center is determined, and the sine and cosine values in the loss function are updated a second time according to the second weight corresponding to each sample belonging to that class center. After freezing the first and second phases of the face recognition model's network, the face recognition model is trained a second time based on the face training set using the second updated loss function.
2. The method according to claim 1, characterized in that, include: Calculate the cosine value between the class center vector of each class center and the sample vector of the sample belonging to that class center, and obtain multiple sine and cosine values corresponding to each class center; Calculate the cosine value between the class center vector of each class center and the sample vector of samples that do not belong to that class center, and obtain multiple negative cosine values corresponding to each class center; The mean of multiple sine and cosine values corresponding to each class center is taken as the mean of sine and cosine values corresponding to that class center, and the mean of multiple negative cosine values corresponding to each class center is taken as the mean of negative cosine values corresponding to that class center. The difference between the sine and cosine mean values and the negative cosine mean values corresponding to each class center is used as the class representation degree corresponding to that class center.
3. The method according to claim 1, characterized in that, The first update of the negative cosine value in the loss function of the face recognition model based on the mean sine and cosine values and class representation degree corresponding to each class center includes: When the negative cosine value corresponding to each sample belonging to each class center is greater than a preset threshold, the sum of the mean of the positive and negative cosine values corresponding to the class center and the class representation value is multiplied by the negative cosine value of each sample belonging to that class center, and the result of the product is used as the negative cosine value in the loss function. When the negative cosine value corresponding to each sample belonging to each class center is not greater than the preset threshold, the negative cosine value corresponding to each sample belonging to that class center is used as the negative cosine value in the loss function.
4. The method according to claim 1, characterized in that, Based on the mean sine and cosine values and class representation degree corresponding to each class center, as well as the sample representation degree corresponding to each sample belonging to that class center, the negative cosine value in the loss function is updated a second time, including: When the negative cosine value corresponding to each sample belonging to each class center is greater than a preset threshold, the sum of the mean of the positive and negative cosine values corresponding to the class center and the first weight corresponding to each sample belonging to the class center is multiplied by the negative cosine value corresponding to each sample belonging to the class center, and the result of the product is used as the negative cosine value in the loss function. The first weight corresponding to each sample belonging to the class center is determined by the class representation degree corresponding to the class center, the sample representation degree corresponding to each sample belonging to the class center, the total number of training rounds, and the current training round. When the negative cosine value corresponding to each sample belonging to each class center is not greater than the preset threshold, the negative cosine value corresponding to each sample belonging to that class center is used as the negative cosine value in the loss function.
5. A training device for a face recognition model targeting difficult example samples, characterized in that, include: The first calculation module is configured to acquire a face training set and calculate multiple sine and negative cosine values corresponding to each class center in the face training set. The second calculation module is configured to calculate the mean of sine and cosine for each class center based on the multiple sine and cosine values and the mean of negative cosine for each class center, and to calculate the class representation degree for each class center based on the mean of sine and cosine for each class center. The first update module is configured to update the negative cosine value in the loss function of the face recognition model for the first time based on the mean sine and cosine values and class representation degree corresponding to each class center, and calculate the first pressure value corresponding to each sample belonging to each class center based on the class representation degree corresponding to each class center and the sine and cosine values and negative cosine values corresponding to each sample belonging to that class center. The product of the first pressure value corresponding to each sample belonging to each class center and the mean sine and cosine values corresponding to each class center is used as the second pressure value corresponding to each sample belonging to that class center. The first update is performed on the sine and cosine values in the loss function based on the second pressure value corresponding to each sample belonging to that class center. The first training module is configured to perform the first training on the face recognition model based on the face training set and using the loss function updated for the first time. The determination module is configured to use the face recognition model after the first training to calculate the cosine value between the sample vector of each sample and the class center vector of the class center to which the sample belongs, to obtain the sine and cosine values corresponding to each sample, and to calculate the cosine value between the sample vector of each sample and the class center vectors of other class centers besides the class center to which the sample belongs, to obtain multiple negative cosine values corresponding to each sample; and to determine the sample representation degree corresponding to each sample in the face training set based on the difference between the mean of the multiple negative cosine values corresponding to each sample and the sine and cosine values corresponding to the sample. The second update module is configured to update the negative cosine value in the loss function a second time based on the mean sine and cosine of each class center, the class characterization degree, and the sample characterization degree of each sample belonging to that class center. Based on the mean sine and cosine values corresponding to each class center, the sample representation degree corresponding to each sample belonging to that class center, and the second pressure value, the second weight corresponding to each sample belonging to that class center is determined, and the sine and cosine values in the loss function are updated a second time according to the second weight corresponding to each sample belonging to that class center. The second training module is configured to perform a second training on the face recognition model based on the face training set and using the second updated loss function after freezing the first and second phase networks of the face recognition model.
6. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 4.
7. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 4.
Citation Information
Patent Citations
Training method and device of face recognition neural network
CN111626235A
Deep network model construction method and system based on difficult sample mining
CN114548366A