Data processing methods, model training methods, devices and electronic equipment
By calculating the similarity and frequency in a face image dataset, low-quality images are cleaned out, solving the problem of high noise in massive face data and improving the accuracy of data processing and the training effect of face recognition models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIAOMI TECH (WUHAN) CO LTD
- Filing Date
- 2023-04-10
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, the acquisition and cleaning of massive amounts of facial data is often accompanied by noise that is uneven and leads to poor data processing accuracy, which in turn affects the training effect of facial recognition models.
By acquiring a dataset of facial images of the target object, calculating the similarity between image pairs and deleting image pairs with similarity less than a threshold, counting the frequency of image occurrences and deleting images with a frequency exceeding a threshold, data cleaning is performed to improve data quality.
It improves the accuracy and efficiency of facial data processing and enhances the training accuracy of facial recognition models.
Smart Images

Figure CN116664968B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a data processing method, a model training method, an apparatus, and an electronic device. Background Technology
[0002] With the development of machine learning technology, such as machine learning-based facial recognition technology, it has been widely applied to various aspects of social life. Facial recognition technology typically uses sample facial images to train facial recognition models, and training deep learning-based facial recognition models usually requires massive amounts of facial data.
[0003] In related technologies, the acquisition of massive amounts of data is usually done by crawling web crawlers to scrape and save the data from the Internet using keywords. This type of data is characterized by high noise and data imbalance, and data cleaning is required to ensure that the image data in each dataset meets the quality requirements. Therefore, how to improve the accuracy of data processing in datasets is a technical problem that urgently needs to be solved. Summary of the Invention
[0004] This application aims to at least partially address one of the technical problems in the related art.
[0005] To this end, this application proposes a data processing method, a model training method, an apparatus, and an electronic device that improve the accuracy of data processing.
[0006] One embodiment of this application proposes a data processing method, including:
[0007] A target dataset of facial images of a target object is obtained; wherein the target dataset includes at least one image pair, and the facial images in the image pair are different;
[0008] Based on the similarity between the face images in the at least one image pair, a target image pair with a similarity less than a similarity threshold is determined;
[0009] Based on the target image pair including the target face images, determine the frequency of occurrence of each of the target face images;
[0010] In response to any target face image appearing more frequently than a quantity threshold, any target face image is deleted from the target dataset.
[0011] One embodiment of this application proposes a model training method, including:
[0012] Obtain a sample target dataset; wherein, the sample target dataset contains sample face images of the target objects; the sample face images are labeled with the identity information of the included faces;
[0013] Input any of the sample face images into the recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images;
[0014] The recognition model is trained based on the difference between the predicted identity information and the labeled identity information.
[0015] Another embodiment of this application proposes a data processing apparatus, including:
[0016] An acquisition module is used to acquire a target dataset of facial images of a target object; wherein the target dataset includes at least one image pair, and the facial images in the image pair are different;
[0017] The first determining module is used to determine target image pairs with a similarity less than a similarity threshold based on the similarity between face images in the at least one image pair;
[0018] The second determining module is used to determine the frequency of occurrence of each of the target face images included in the target image pair;
[0019] The processing module is configured to delete any target face image from the target dataset in response to any target face image appearing more frequently than a quantity threshold.
[0020] Another embodiment of this application proposes a model training apparatus, comprising:
[0021] The acquisition module is used to acquire a sample target dataset; wherein, the sample target dataset contains sample face images of target objects; the sample face images are labeled with the identity information of the included faces;
[0022] The recognition module is used to input any of the sample face images into the recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images;
[0023] The training module is used to train the recognition model based on the difference between the predicted identity information and the labeled identity information.
[0024] Another embodiment of this application proposes an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements the method described in the foregoing embodiments.
[0025] Another embodiment of this application proposes a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the foregoing embodiments.
[0026] Another embodiment of this application proposes a computer program product having a computer program stored thereon, which, when executed by a processor, implements the method described in the foregoing embodiments.
[0027] The data processing method, model training method, apparatus, and electronic device proposed in this application acquire a target dataset of facial images of a target object. The target dataset includes at least one image pair, where the facial images in each image pair are different. Based on the similarity between the facial images in the at least one image pair, target image pairs with a similarity less than a similarity threshold are identified. Based on the target facial images included in each target image pair, the frequency of occurrence of each target facial image is determined. In response to any target facial image having a frequency greater than a quantity threshold, that target facial image is deleted from the target dataset. By using the similarity between facial images, target image pairs with poor similarity are identified. Within the range of target image pairs, the frequency of occurrence of target facial images in each target image pair is further statistically determined. Based on the frequency of occurrence, the target dataset is cleaned to ensure that all facial images in the target dataset meet quality requirements, thus improving accuracy and efficiency.
[0028] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0029] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0030] Figure 1 This is a schematic flowchart illustrating a data processing method provided in an embodiment of this application.
[0031] Figure 2 A flowchart illustrating another data processing method provided in an embodiment of this application;
[0032] Figure 3 A flowchart illustrating another data processing method provided in an embodiment of this application;
[0033] Figure 4 A flowchart illustrating another model training method provided in an embodiment of this application;
[0034] Figure 5 This is a schematic diagram of the structure of a data processing device provided in an embodiment of this application;
[0035] Figure 6 This is a schematic diagram of the structure of a model training device provided in an embodiment of this application;
[0036] Figure 7This is a block diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0037] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0038] The data processing method, model training method, apparatus, and electronic device of this application are described below with reference to the accompanying drawings.
[0039] In related technologies, cleaning facial images in a dataset to remove outlier or noisy data involves face clustering. This clustering method categorizes facial images in the dataset, determines cluster centers, and then filters out images far from the cluster centers to remove noise. However, this approach is simplistic and has poor accuracy. Furthermore, the number of cluster centers in the clustering algorithm itself is uncertain; different folders may have different numbers of cluster centers, leading to bias. Additionally, this method is computationally intensive and inefficient.
[0040] Figure 1 This is a schematic flowchart of a data processing method provided in an embodiment of this application.
[0041] The data processing method in this embodiment is executed by a data processing device, which can be located in an electronic device. The electronic device can be a terminal device or a server, but this embodiment does not limit it.
[0042] like Figure 1 As shown, the method may include the following steps:
[0043] Step 101: Obtain the target dataset of facial images of the target object.
[0044] The target object is the target user.
[0045] In this embodiment, the face images included in the target dataset are face images of the target object. The face images in the target dataset are paired in pairs to obtain at least one image pair, where the face images in each image pair are different. As one implementation, the face images in the target dataset are numbered, with different face images corresponding to different numbers. Face images with different numbers are paired in pairs to obtain at least one image pair.
[0046] Step 102: Based on the similarity between face images in at least one image pair, determine the target image pair with a similarity less than a similarity threshold.
[0047] In this embodiment of the application, for each image pair, the similarity between the face images in the image pair is determined. The similarity indicates the degree of similarity between the face images in the image pair. The similarity indicates how likely two face images are to be the face images of the same object. The higher the similarity, the greater the probability. Each similarity is compared with a similarity threshold. Target image pairs with similarity less than the similarity threshold are determined, i.e., target image pairs that do not meet the quality requirements are filtered out.
[0048] Step 103: Determine the frequency of occurrence of each target face image based on the target image pair.
[0049] In this embodiment, target face images are acquired for each target image pair. The frequency of each face image is statistically analyzed to obtain the frequency of each face image appearing in all target image pairs. The frequency of a target face image indicates the frequency at which the similarity between the target face image and other face images in the target sample set is low. The higher the frequency, the greater the difference between the target face image and other face images in the target sample set, and the more likely it is to be an abnormal target face image.
[0050] As an example, there are four target image pairs: target image pair 1, target image pair 2, target image pair 3, and target image pair 4. Target image pair 1 contains target face images numbered 1 and 3, target image pair 2 contains target face images numbered 2 and 3, target image pair 3 contains target face images numbered 1 and 4, and target image pair 4 contains target face images numbered 1 and 2. Statistically, the target face image numbered 1 appears 3 times, the target face image numbered 2 appears 2 times, the target face image numbered 3 appears 2 times, and the target face image numbered 4 appears once.
[0051] Step 104: In response to the frequency of any target face image appearing being greater than the quantity threshold, delete any target face image from the target dataset.
[0052] In this embodiment, the frequency and quantity threshold of each target face image are compared to determine any target face image whose frequency exceeds the quantity threshold. For ease of identification, this can be referred to as the first target face image. If the first target face data is noisy or abnormal face data, it is deleted from the target dataset, thus improving the accuracy of face data processing in the target dataset. Furthermore, the processed target dataset can be used to train relevant face recognition models. By improving the accuracy of the face images used as training samples, the training accuracy of the face recognition model can be improved.
[0053] For example, if the quantity threshold is 2 times, then the target face image numbered 1 appears 3 times, which is greater than the quantity threshold. Therefore, the target face image numbered 1 is determined to be abnormal data and needs to be deleted from the target dataset to ensure that the face images of the target objects in the target image set are all face images that meet the quality requirements.
[0054] It should be understood that the same processing method of the present application embodiment can be used to process target image sets of other objects to obtain target image sets of other objects, thereby obtaining a large number of image sets that meet the quality requirements. Multiple processed image sets can be used to train the recognition model to improve the training accuracy of the recognition model. The recognition model obtained from the face training is used for face recognition.
[0055] In the data processing method of this application embodiment, a target dataset of facial images of a target object is obtained. The target dataset includes at least one image pair, where the facial images in each image pair are different. Based on the similarity between the facial images in the at least one image pair, target image pairs with a similarity less than a similarity threshold are identified. Based on the target facial images included in each target image pair, the frequency of occurrence of each target facial image is determined. In response to any target facial image having a frequency greater than a quantity threshold, that target facial image is deleted from the target dataset. By using the similarity between facial images, target image pairs with poor similarity are identified. Within the range of target image pairs, the frequency of occurrence of the target facial images in each target image pair is further statistically determined. Based on the frequency of occurrence, the target dataset is cleaned to ensure that all facial images in the target dataset meet quality requirements, thus improving accuracy and efficiency.
[0056] Based on the above embodiments, Figure 2 A flowchart illustrating another data processing method provided in this application embodiment is shown below. Figure 2 As shown, the method includes the following steps:
[0057] Step 201: Obtain the initial dataset of face images of the target object.
[0058] As one approach, an initial dataset containing facial images of the target object is obtained from the internet using keywords. However, the facial images of the target object obtained from the internet contain a lot of noisy data, or anomalous data, such as facial images that do not belong to the target object, or facial images that are of poor quality and cannot be used. Therefore, the initial dataset needs to be cleaned to identify the noisy or anomalous data in the initial dataset.
[0059] Step 202: Recognize the face images in the initial dataset.
[0060] One approach is to use a face detection model to detect face images in the initial dataset, identifying whether each face image contains a face and the size of any faces that are included.
[0061] Step 203: Remove the identified face images whose face size is smaller than a set threshold, or face images that do not contain a face, from the initial dataset to obtain the target dataset.
[0062] As one implementation, for each face image in the initial dataset, if the face image does not contain a face, then the face image is determined to be noisy data and is deleted from the initial dataset.
[0063] As another implementation, for each face image in the initial dataset, if the face image contains a face, but the size of the face in the face image is smaller than the size threshold, or the proportion is smaller than the proportion threshold, then the face image is determined to be abnormal data and is deleted from the initial dataset. For example, the size threshold is 30*30 pixels and the proportion threshold is one-tenth.
[0064] In this embodiment, the initial dataset is inspected to remove images with face dimensions or without faces to obtain the target dataset. This achieves preliminary cleaning of the initial dataset, removes obviously abnormal data, and improves the effect of subsequent processing.
[0065] Step 204: Obtain the target dataset of face images of the target object.
[0066] The explanations and descriptions of the foregoing embodiments also apply to this embodiment, as the principles are the same, and will not be repeated here.
[0067] Step 205: For each image pair, determine the similarity between the face images in the image pair.
[0068] One implementation involves inputting the face images from an image pair into a recognition model for feature extraction. This yields the image features of the face images within the image pair. Based on these features, the distance between them is determined, and then the similarity between the face images in the image pair is calculated. Specifically, the greater the distance between the image features of two face images in an image pair, the less similar the two face images are, and the lower their similarity. Conversely, the smaller the distance between the image features of two face images in an image pair, the more similar the two face images are, and the higher their similarity.
[0069] The distance between image features can be any distance that can be used to determine the similarity between image features, such as cosine distance or Euclidean distance. This embodiment does not limit the distance between image features.
[0070] Step 206: Compare each similarity score with a similarity threshold to determine the target image pairs with similarity scores less than the similarity threshold.
[0071] Step 207: Determine the frequency of occurrence of each target face image based on the target image pair.
[0072] Step 208: In response to the frequency of any target face image appearing being greater than the quantity threshold, delete any target face image from the target dataset.
[0073] Steps 206 to 208 can be explained in the foregoing embodiments, as the principle is the same, and will not be repeated here.
[0074] In the data processing method of this application embodiment, an initial dataset including face images of the target object is obtained. A portion of data that is obviously noise or abnormal is removed through preliminary screening to obtain the target dataset. The target dataset is further cleaned, that is, by calculating the distance between the face image features in each image pair, the similarity between the face images in the image pair is determined. From the target image pairs with poor similarity, the frequency of the target face image in the target image pair is statistically obtained. The higher the frequency, the greater the difference between the target face image and other face images in the target data, and the greater the possibility that the target face image belongs to noise data or abnormal data. Furthermore, the target dataset is cleaned for noise based on the frequency of occurrence, so that the face image data in the target dataset are all face images that meet the quality requirements, thereby improving the accuracy.
[0075] Based on the above embodiments, Figure 3 A flowchart illustrating another data processing method provided in this application embodiment is shown below. Figure 3 As shown, the method includes the following steps:
[0076] Step 301: Obtain the target dataset of facial images of the target object.
[0077] Step 302: Input the face image in the image pair into the recognition model for feature extraction to obtain the image features of the face image in the image pair.
[0078] The recognition model is a neural network model, for example, a recognition model trained based on the face recognition algorithm arcface, or a recognition model trained based on the face recognition algorithm cosface. This embodiment does not impose any limitations.
[0079] As one implementation, the recognition model includes a backbone network and fully connected layers in the head network. The backbone network, for example, is a lightweight network called Mobilenet, which outputs image features of the face image. The image features are, for example, 512-dimensional features. The fully connected layers perform classification based on the image features to obtain the classification result. In the context of face recognition, the classification result is to determine the identity information of the face in the face image by recognizing the face image. This identity information can be the name, ID card information, etc. of the person to whom the face belongs.
[0080] Step 303: Determine the distance between image features based on the image features of the face images in the image pair.
[0081] Step 304: Determine the similarity between face images in an image pair based on the distance between image features.
[0082] One approach is to convert distance into corresponding similarity based on the mapping relationship between distance and similarity between image features.
[0083] Steps 303 and 304 can be explained in the foregoing embodiments, as the principle is the same, and will not be repeated here.
[0084] Step 305: Sort the distances and determine the target distance from the sorted distances based on the set selection ratio.
[0085] Step 306: Determine the similarity threshold based on the target distance and the established mapping relationship.
[0086] In this embodiment of the application, the distances between target face images in each target image pair in the target dataset are sorted. For example, the distances are sorted in descending order to obtain a sorting result. A set selection ratio is determined according to the accuracy requirements of the target dataset cleaning, such as 5%, 7%, or 10%. Then, the corresponding target distance is selected from the distance sorting result according to the set selection ratio. The similarity corresponding to the target distance is determined according to the target distance and the set mapping relationship, and the similarity is used as the similarity threshold.
[0087] As an example, 20 target image pairs are identified from the target dataset. The distances between the target face images in these 20 pairs are s1, s2, s3, s4...s19 and s20, respectively. Setting the selection ratio to 20%, that is, selecting 4 distances from the 20, the 20 distances are then sorted in descending order. The top 4 distances are s4, s7, s15, and s11, respectively. Therefore, s11 is determined as the target distance. Based on the mapping relationship between target distance and similarity, the similarity threshold corresponding to the target distance is determined, improving the accuracy of the similarity threshold determination.
[0088] Step 307: Compare each similarity score with a similarity threshold to determine the target image pairs with similarity scores less than the similarity threshold.
[0089] Step 308: Determine the frequency of occurrence of each target face image based on the target image pair. Step 309: In response to any target face image having a frequency greater than a quantity threshold, delete that target face image from the target dataset.
[0090] Steps 307 to 309 can be explained in the foregoing embodiments, and the principle is the same, so they will not be repeated here.
[0091] Step 310: In response to the frequency of any target face image appearing being less than or equal to the quantity threshold, the deletion of any target face image from the target dataset is prohibited.
[0092] In this embodiment, the frequency and quantity threshold of each target face image are compared. If the frequency of any target face image is less than or equal to the quantity threshold, it indicates that the target face image is a face image of the target object that meets the quality requirements. Thus, the deletion of the target face image from the target dataset is prohibited, and face images that meet the quality requirements are retained. This improves the quality of face images in the target dataset and can improve the training accuracy when further training the recognition model based on the target dataset.
[0093] In the data processing method of this application embodiment, the similarity between face images in each image pair in the target dataset is determined by calculating the distance between face image features in the image pair. From the target image pairs with poor similarity, the frequency of occurrence of the target face image in the target image pair is statistically obtained. The higher the frequency, the greater the difference between the target face image and other face images in the target dataset, and the greater the possibility that the target face image belongs to noisy data or abnormal data. Furthermore, the target dataset is cleaned for noisy data based on the frequency of occurrence, so that the face image data in the target dataset are all face images that meet the quality requirements, thereby improving the accuracy of the target dataset.
[0094] Based on the above embodiments, this application provides a model training method. Figure 4 A flowchart illustrating another model training method provided in this application embodiment is shown below. Figure 4 As shown, the method includes the following steps:
[0095] Step 401: Obtain the target dataset for the sample.
[0096] The sample target dataset contains sample face images of the target objects; the sample face images are labeled with the identity information of the included faces.
[0097] Among them, identity information is used to uniquely identify an object, namely a user, including name, ID card number, etc.
[0098] Step 402: Input any face image into the recognition model for classification in order to determine the identity information obtained by predicting any face image.
[0099] In one implementation of this application, any face image is input into the feature extraction layer of a classification model to extract features, thereby obtaining high-dimensional image features of the face image, such as 512-dimensional image features. The image features of the face image are then input into the fully connected layer of the classification model, and the fully connected layer performs classification based on the image features to determine the identity information predicted from the face image.
[0100] The description of the structure of the recognition model in the foregoing embodiments also applies to this embodiment, and the principle is the same, so it will not be repeated here.
[0101] Step 403: Train the recognition model based on the difference between the predicted identity information and the labeled identity information.
[0102] In this embodiment, the recognition model is trained based on the difference between the predicted identity information and the labeled identity information. Specifically, the parameters of the recognition model are adjusted based on the difference, and the adjusted model is then trained again using a sample target dataset. Multiple iterations are performed using multiple samples until the recognition model converges. As one implementation, the trained recognition model can be tested using a sample target dataset. If the difference between the predicted and labeled identity information is less than a set threshold, the recognition model is considered to have completed training.
[0103] Optionally, the recognition model can be trained by continuously adjusting its parameters and performing a set number of training rounds. Then, a test set containing multiple target sample datasets can be used for testing. If the recall rate of face recognition is greater than a first threshold and the non-recognition rate is less than a second threshold, the recognition model is considered to have completed training. The first threshold is, for example, 90%, and the second threshold is, for example, 1%.
[0104] It should be understood that if the target dataset is a target dataset that has not been processed by the data processing method of the embodiments of this application, such as the target dataset in step 101 of the above embodiments, it can be used to perform preliminary training on the recognition model. The accuracy of the recognition model obtained by training does not meet the final requirements, but it can be used to recognize the face images in the target dataset and obtain the image features of the face images.
[0105] If the target dataset is a target dataset that has been processed by the data processing method of the embodiments of this application, then the recognition model can be trained, so that the accuracy of the trained recognition model is higher and the training effect of the model is improved.
[0106] In the model training method of this application embodiment, by training the recognition model, the difference between the identity information of the face image recognized by the trained recognition model and the identity information of the labeled face image is less than a set threshold, which satisfies the accuracy requirement. This makes the face image features output by the recognition model have high accuracy, thereby improving the effect of data cleaning of the target dataset.
[0107] To implement the above embodiments, this application also proposes a data processing apparatus.
[0108] Figure 5 This is a schematic diagram of the structure of a data processing device provided in an embodiment of this application.
[0109] like Figure 5 As shown, the device may include:
[0110] The acquisition module 51 is used to acquire a target dataset of face images of a target object; wherein the target dataset includes at least one image pair, and the face images in the image pair are different.
[0111] The first determining module 52 is used to determine a target image pair with a similarity less than a similarity threshold based on the similarity between the face images in the at least one image pair.
[0112] The second determining module 53 is used to determine the frequency of occurrence of each of the target face images included in the target image pair.
[0113] Processing module 54 is configured to delete any target face image from the target dataset in response to any target face image appearing more frequently than a quantity threshold.
[0114] Furthermore, in one implementation of this application embodiment, the first determining module 52 is specifically used for:
[0115] For each image pair, determine the similarity between the face images in the image pair;
[0116] Each similarity score is compared with a similarity threshold to determine target image pairs whose similarity scores are less than the similarity threshold.
[0117] In one implementation of this application embodiment, the first determining module 52 is specifically used for:
[0118] The face images in the image pair are input into the recognition model for feature extraction to obtain the image features of the face images in the image pair;
[0119] Based on the image features of the face images in the image pair, determine the distance between the image features;
[0120] The similarity between the face images in the image pair is determined based on the distance.
[0121] In one implementation of this application embodiment, the processing module 54 is further configured to:
[0122] In response to any target face image appearing at a frequency less than or equal to the quantity threshold, deletion of any target face image from the target dataset is prohibited.
[0123] In one implementation of this application, the method further includes:
[0124] The recognition module is used to acquire an initial dataset of facial images of the target object; recognize the facial images in the initial dataset; and delete facial images with facial dimensions smaller than a set threshold, or facial images that do not contain a face, from the initial dataset to obtain the target dataset.
[0125] In one implementation of this application, the method further includes:
[0126] The third determining module is used to sort the various distances; determine the target distance from the sorted distances based on a set selection ratio; and determine the similarity threshold according to the mapping relationship between the target distance and similarity.
[0127] It should be noted that the foregoing explanation of the method embodiments also applies to the apparatus of this embodiment, and will not be repeated here.
[0128] In the data processing apparatus of this application embodiment, a target dataset of facial images of a target object is acquired. The target dataset includes at least one image pair, where the facial images in each image pair are different. Based on the similarity between the facial images in the at least one image pair, target image pairs with a similarity less than a similarity threshold are identified. Based on the target facial images included in each target image pair, the frequency of occurrence of each target facial image is determined. In response to any target facial image having a frequency greater than a quantity threshold, that target facial image is deleted from the target dataset. By using the similarity between facial images, target image pairs with poor similarity are identified. Within the range of target image pairs, the frequency of occurrence of the target facial images in each target image pair is further statistically determined. Based on the frequency of occurrence, the target dataset is cleaned to ensure that all facial images in the target dataset meet quality requirements, thus improving accuracy.
[0129] To implement the above embodiments, this application also proposes a model training device.
[0130] Figure 6 This is a schematic diagram of the structure of a model training device provided in an embodiment of this application.
[0131] like Figure 6 As shown, the device may include:
[0132] The acquisition module 61 is used to acquire a sample target dataset; wherein the sample target dataset contains sample face images of target objects; the sample face images are labeled with the identity information of the included faces.
[0133] The recognition module 62 is used to input any of the sample face images into the recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images.
[0134] Training module 63 is used to train the recognition model based on the difference between the predicted identity information and the labeled identity information.
[0135] It should be noted that the foregoing explanation of the method embodiments also applies to the apparatus of this embodiment, and will not be repeated here.
[0136] In the model training device of this application embodiment, by training the recognition model, the difference between the identity information of the face image recognized by the trained recognition model and the identity information of the labeled face image is less than a set threshold, which satisfies the accuracy requirement. This makes the face image features output by the recognition model have high accuracy, thereby improving the effect of data cleaning of the target dataset.
[0137] To implement the above embodiments, this application also proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements the method described in the foregoing method embodiments.
[0138] To implement the above embodiments, this application also proposes a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the method described in the foregoing method embodiments.
[0139] To implement the above embodiments, this application also proposes a computer program product having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method described in the foregoing method embodiments.
[0140] Figure 7 This is a block diagram of an electronic device provided in an embodiment of this application. For example, the electronic device 800 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.
[0141] Reference Figure 7 The electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input / output (I / O) interface 812, sensor component 814, and communication component 816.
[0142] Processing component 802 typically controls the overall operation of electronic device 800, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the methods described above. Furthermore, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
[0143] Memory 804 is configured to store various types of data to support the operation of electronic device 800. Examples of this data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0144] Power component 806 provides power to various components of electronic device 800. Power component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800.
[0145] Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 808 includes a front-facing camera and / or a rear-facing camera. When the electronic device 800 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
[0146] Audio component 810 is configured to output and / or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when electronic device 800 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
[0147] I / O interface 812 provides an interface between processing component 802 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.
[0148] Sensor assembly 814 includes one or more sensors for providing state assessments of various aspects of electronic device 800. For example, sensor assembly 814 can detect the on / off state of electronic device 800, the relative positioning of components such as the display and keypad of electronic device 800, changes in position of electronic device 800 or a component of electronic device 800, the presence or absence of user contact with electronic device 800, orientation or acceleration / deceleration of electronic device 800, and temperature changes of electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 814 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.
[0149] Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. Electronic device 800 can access wireless networks based on communication standards, such as WiFi, 4G, or 5G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 816 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
[0150] In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.
[0151] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 804 including instructions, which can be executed by a processor 820 of an electronic device 800 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0152] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0153] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0154] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of this application pertain.
[0155] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0156] It should be understood that various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0157] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
[0158] Furthermore, the functional units in the various embodiments of this application can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0159] The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.
Claims
1. A data processing method, characterized in that, include: A target dataset of facial images of a target object is obtained; wherein the target dataset includes at least one image pair, and the facial images in the image pair are different; Based on the similarity between the face images in the at least one image pair, a target image pair with a similarity less than a similarity threshold is determined; Based on the target image pair including the target face images, determine the frequency of occurrence of each of the target face images; In response to any target face image appearing more frequently than a quantity threshold, any target face image is deleted from the target dataset.
2. The method as described in claim 1, characterized in that, The step of determining target image pairs with similarity less than a similarity threshold based on the similarity between face images in the at least one image pair includes: For each image pair, determine the similarity between the face images in the image pair; Each similarity score is compared with a similarity threshold to determine target image pairs whose similarity scores are less than the similarity threshold.
3. The method as described in claim 2, characterized in that, Determining the similarity between face images in each image pair includes: The face images in the image pair are input into the recognition model for feature extraction to obtain the image features of the face images in the image pair; Based on the image features of the face images in the image pair, determine the distance between the image features; The similarity between the face images in the image pair is determined based on the distance.
4. The method as described in claim 3, characterized in that, The method further includes: Sort the distances as described above; The target distance is determined from the sorting based on a set selection ratio; The similarity threshold is determined based on the target distance and the established mapping relationship.
5. The method according to any one of claims 1-4, characterized in that, The method further includes: In response to any target face image appearing at a frequency less than or equal to the quantity threshold, deletion of any target face image from the target dataset is prohibited.
6. The method according to any one of claims 1-4, characterized in that, The method further includes: Obtain an initial dataset of facial images of the target object; The face images in the initial dataset are identified; The identified face images with a face size smaller than a set threshold, or face images that do not contain a face, are deleted from the initial dataset to obtain the target dataset.
7. A model training method, characterized in that, The method includes: Obtain a sample target dataset; wherein the sample target dataset is obtained by processing using the method described in any one of claims 1-6, and the sample target dataset contains sample face images of target objects; the sample face images are labeled with the identity information of the included faces; Input any of the sample face images into the recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images; The recognition model is trained based on the difference between the predicted identity information and the labeled identity information.
8. A data processing apparatus, characterized in that, include: An acquisition module is used to acquire a target dataset of facial images of a target object; wherein the target dataset includes at least one image pair, and the facial images in the image pair are different; The first determining module is used to determine target image pairs with a similarity less than a similarity threshold based on the similarity between face images in the at least one image pair; The second determining module is used to determine the frequency of occurrence of each of the target face images included in the target image pair; The processing module is configured to delete any target face image from the target dataset in response to any target face image appearing more frequently than a quantity threshold.
9. A model training device, characterized in that, The device includes: An acquisition module is used to acquire a sample target dataset; wherein the sample target dataset is obtained by processing using the method described in any one of claims 1-6, and the sample target dataset contains sample face images of target objects; the sample face images are labeled with the identity information of the included faces; The recognition module is used to input any of the sample face images into the recognition model for classification, so as to determine the identity information obtained by predicting any of the sample face images; The training module is used to train the recognition model based on the difference between the predicted identity information and the labeled identity information.
10. An electronic device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements the method as described in any one of claims 1-6, or implements the method as described in claim 7.
11. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-6, or the method as described in claim 7.