A wearable standard identification method, device, medium, and electronic device

By using training sample data with human weight recognition and clothing recognition images in the clothing recognition model, the generalization and robustness of the model are improved, the problem of decreased recognition accuracy of the work clothes detection model when the scene changes is solved, and a higher accuracy of clothing standard recognition is achieved.

CN115984897BActive Publication Date: 2026-06-30ZHEJIANG DAHUA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG DAHUA TECH CO LTD
Filing Date
2022-12-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, deep learning-based work uniform wearing standard detection models suffer from decreased recognition accuracy when the work scene changes, have poor versatility, and are difficult to improve the accuracy of wearing standard recognition.

Method used

By training sample data including human body weight recognition images with first-class labels and clothing recognition images with second-class labels, a clothing recognition model is trained using a feature extraction model to obtain the feature vector of the human body image to be recognized, and the similarity feature value is calculated to determine the clothing wearing standard evaluation information.

Benefits of technology

It improves the generalization and robustness of the clothing recognition model and increases the recognition accuracy of new types of clothing that did not appear in the training sample data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115984897B_ABST
    Figure CN115984897B_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, medium, and electronic device for clothing compliance recognition, relating to the field of image detection technology. The method involves acquiring a human image containing the subject; inputting the human image into a trained clothing compliance recognition model to obtain a first feature vector; the clothing compliance recognition model is trained using training sample data, which includes human body weight recognition images with a first type of label and clothing recognition images with a second type of label; calculating the similarity between the first feature vector and a preset baseline feature vector to obtain a similarity feature value; and determining the clothing compliance evaluation information of the subject based on the similarity feature value. This method can improve the accuracy of clothing compliance recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image detection technology, and in particular to a method, apparatus, medium and electronic device for wearable specification recognition. Background Technology

[0002] In recent years, some workplaces have imposed strict requirements on the attire of their employees. They are required to wear standardized work clothes in specific locations. For example, production workshops and construction sites require relevant employees to wear appropriate work clothes due to safety and other factors. Employees who do not wear the correct clothes will be warned or even prohibited from entering the workplace.

[0003] With the rapid development of deep learning, some deep learning-based techniques have also been applied to the field of work uniform compliance detection. These techniques typically involve collecting images of properly worn work uniforms in specific work scenarios, feeding them into a network model for training, and then using the trained model to extract features from both the image of the work uniform to be identified and the registered image, calculating the similarity between the two to determine whether the worker is wearing the correct uniform. However, the accuracy of these models decreases when the work scenario changes, indicating poor versatility. Therefore, improving the accuracy of uniform compliance detection is a pressing issue that needs to be addressed. Summary of the Invention

[0004] To address the existing technical problems, embodiments of this application provide a method, apparatus, medium, and electronic device for wearable specification recognition, which can improve the accuracy of wearable specification recognition.

[0005] To achieve the above objectives, the technical solution of this application embodiment is implemented as follows:

[0006] In a first aspect, embodiments of this application provide a method for identifying wearable specifications, including:

[0007] Obtain an image of the human body to be identified, which contains the human body to be processed;

[0008] The image of the human body to be identified is input into a trained clothing recognition model to obtain a first feature vector of the image. The clothing recognition model is obtained by training a feature extraction model using training sample data, which includes human body recognition images with a first type of label and clothing recognition images with a second type of label. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition. The first type of label represents the human body identity; the second type of label represents the type of clothing.

[0009] The similarity between the first feature vector and the preset benchmark feature vector is calculated to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting the benchmark clothing image into the trained clothing wearing recognition model.

[0010] Based on the similarity feature values, the clothing wearing standard evaluation information of the human body to be processed is determined.

[0011] The clothing recognition method provided in this application involves acquiring a human image containing the subject; inputting the human image into a trained clothing recognition model to obtain a first feature vector; the clothing recognition model is trained using training sample data, which includes human body recognition images with first-class labels and clothing recognition images with second-class labels; the feature extraction model is used to acquire the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition; the first-class label represents the human identity; the second-class label represents the type of clothing; the first feature vector and a preset benchmark feature vector are used to calculate the similarity to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting a benchmark clothing image into the trained clothing recognition model; and the clothing recognition evaluation information of the subject is determined based on the similarity feature value. This method, in clothing standard recognition, includes body recognition images with a first type of label and clothing recognition images with a second type of label in the training sample data. The first type of label represents the human body identity, and the second type of label represents the clothing type. The feature extraction model is trained using this training sample data to obtain a trained clothing recognition model. This improves the generalization of the clothing recognition model, enhances its robustness in extracting feature vectors from new types of clothing that do not appear in the training sample data, and improves the accuracy of clothing standard recognition.

[0012] In one optional embodiment, the output of the feature extraction model includes global features, upper body local features, and lower body local features; the feature vector is obtained by concatenating and fusing the global features, the upper body local features, and the lower body local features.

[0013] In one optional embodiment, the human body weight recognition images are a first quantity, and the clothing recognition images are a second quantity; the first quantity is greater than the second quantity.

[0014] In one optional embodiment, the training process for obtaining the trained clothing wearing recognition model includes the following steps:

[0015] Obtain training sample data; the training sample data includes human weight recognition images with a first type of label and clothing recognition images with a second type of label; the first type of label represents human identity; the second type of label represents clothing type;

[0016] Based on the training sample data, a batch of training images is selected; the batch of training images consists of the clothing recognition images and the human weight recognition images.

[0017] The batch training images are input into the clothing and clothing recognition model to be trained for training, and the recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0018] Determine whether the recognition loss value has converged to the preset target value. If it has, end the training to obtain the trained clothing recognition model. Otherwise, adjust the parameters of the clothing recognition model to be trained according to the determined recognition loss value and train it again.

[0019] In this embodiment, during the identification of clothing specifications, the training sample data includes human body recognition images with a first type of label and clothing recognition images with a second type of label. The first type of label represents human identity, and the second type of label represents clothing type. The feature extraction model is trained using this training sample data to obtain a trained clothing identification model. This improves the generalization of the clothing identification model, enhances its robustness in extracting feature vectors from new types of clothing that do not appear in the training sample data, and improves the accuracy of clothing specification identification.

[0020] In one optional embodiment, the training sample data of the clothing wearing recognition model contains N color categories, where N is an integer greater than 2.

[0021] In an optional embodiment, both the human body weight recognition image and the clothing recognition image further include color labels; the color labels are used to divide the training dataset into multiple color sample data; the step of selecting batch training images based on the training sample data includes:

[0022] The color sample data is selected one by one. For each selected color sample data, a category nearest neighbor graph corresponding to the currently selected color sample data is constructed using the current clothing recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data.

[0023] One color sample is randomly selected from the color sample data as the target color sample data;

[0024] A category is randomly selected from the target color sample data as the baseline category;

[0025] According to the category nearest neighbor relationship graph, a preset first number of categories are selected from the target color sample data in ascending order of vector distance from the benchmark category, as the target nearest neighbor categories of the benchmark category;

[0026] The batch training image data is obtained by randomly selecting a preset second number of images from the benchmark category, and randomly selecting the second number of images from each of the target nearest neighbor categories of the benchmark category.

[0027] In this embodiment, the training dataset is divided into multiple color sample data using color labels; and a category nearest neighbor graph is constructed based on the color sample data. Then, images are selected based on the category nearest neighbor graph to obtain batch training images. This method constructs a category nearest neighbor graph for each color sample data. Each batch of training images consists of randomly selected categories and their similar nearest neighbor categories, ensuring that the images in the batch training images have the same or similar colors. The iteratively trained network model has a stronger ability to recognize clothing with the same or similar colors, improving the discrimination accuracy when faced with similar-colored clothing and increasing the accuracy of clothing style recognition.

[0028] In an optional embodiment, the training sample data includes first training sample data and second training sample data; the first training sample data includes human weight recognition images with the first type of label and clothing recognition images with the second type of label; the second training sample data is obtained by removing the human weight recognition images that meet preset conditions from the first training sample data; the training process for obtaining the trained clothing recognition model includes the following steps:

[0029] The clothing and clothing recognition model to be trained is trained in one stage based on the first training sample data, and the first recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0030] Determine whether the first recognition loss value converges to the preset first target value. If yes, end the training to obtain the intermediate clothing wearing model. Otherwise, adjust the parameters of the clothing wearing recognition model to be trained according to the determined first recognition loss value and conduct another stage of training using the first training sample data.

[0031] The intermediate model for clothing wearing is trained in two stages based on the second training sample data, and a second recognition loss value for the intermediate model for clothing wearing is determined; wherein, in the second stage training, if a clothing recognition image is selected from the second training sample data, the selected clothing recognition image is resampled.

[0032] Determine whether the second recognition loss value converges to the preset second target value. If it does, end the training to obtain the trained clothing recognition model. Otherwise, fine-tune the parameters of the intermediate clothing model according to the determined recognition loss value and perform a second-stage training using the second training sample data.

[0033] In the method of this embodiment, the network is trained in the first stage using all pedestrian re-identification data and clothing recognition images. In the second stage, the training sample data consists of a partial pedestrian re-identification dataset and all clothing recognition images. The clothing recognition images are resampled, and the parameters of the network model trained in the first stage are fine-tuned to improve the network model's ability to recognize work clothes and improve the efficiency of clothing standard recognition.

[0034] In one alternative embodiment, the clothing recognition model uses ResNet18 as the backbone network.

[0035] In one alternative embodiment, the clothing recognition model is trained using a cross-entropy loss function.

[0036] Secondly, embodiments of this application also provide a wearable specification recognition device, including:

[0037] The image acquisition module is used to acquire an image of the human body to be identified, which contains the human body to be processed.

[0038] A vector generation module is used to input the human image to be identified into a trained clothing recognition model to obtain a first feature vector of the human image to be identified. The clothing recognition model is obtained by training a feature extraction model using training sample data. The training sample data includes human body recognition images with a first type of label and clothing recognition images with a second type of label. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition. The first type of label represents human identity; the second type of label represents clothing type.

[0039] The similarity determination module is used to calculate the similarity between the first feature vector and the preset benchmark feature vector to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting the benchmark clothing image into the trained clothing wearing recognition model.

[0040] The benchmarking and identification module is used to determine the clothing wearing standard evaluation information of the human body to be processed based on the similarity feature value.

[0041] In one optional embodiment, the output of the feature extraction model includes global features, upper body local features, and lower body local features; the feature vector is obtained by concatenating and fusing the global features, the upper body local features, and the lower body local features.

[0042] In one optional embodiment, the human body weight recognition images are a first quantity, and the clothing recognition images are a second quantity; the first quantity is greater than the second quantity.

[0043] In an optional embodiment, the apparatus further includes a first model training unit; the first model training unit is used to obtain the trained clothing wearing recognition model; the first model training unit is specifically used for:

[0044] Obtain training sample data; the training sample data includes human weight recognition images with a first type of label and clothing recognition images with a second type of label; the first type of label represents human identity; the second type of label represents clothing type;

[0045] Based on the training sample data, a batch of training images is selected; the batch of training images consists of the clothing recognition images and the human weight recognition images.

[0046] The batch training images are input into the clothing and clothing recognition model to be trained for training, and the recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0047] Determine whether the recognition loss value has converged to the preset target value. If it has, end the training to obtain the trained clothing recognition model. Otherwise, adjust the parameters of the clothing recognition model to be trained according to the determined recognition loss value and train it again.

[0048] In one optional embodiment, the training sample data of the clothing wearing recognition model contains N color categories, where N is an integer greater than 2.

[0049] In an optional embodiment, both the human body weight recognition image and the clothing recognition image are further equipped with color labels; the color labels are used to divide the training dataset into multiple color sample data; the first model training unit is specifically used for:

[0050] The color sample data is selected one by one. For each selected color sample data, a category nearest neighbor graph corresponding to the currently selected color sample data is constructed using the current clothing recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data.

[0051] One color sample is randomly selected from the color sample data as the target color sample data;

[0052] A category is randomly selected from the target color sample data as the baseline category;

[0053] According to the category nearest neighbor relationship graph, a preset first number of categories are selected from the target color sample data in ascending order of vector distance from the benchmark category, as the target nearest neighbor categories of the benchmark category;

[0054] The batch training image data is obtained by randomly selecting a preset second number of images from the benchmark category, and randomly selecting the second number of images from each of the target nearest neighbor categories of the benchmark category.

[0055] In an optional embodiment, the training sample data includes first training sample data and second training sample data; the first training sample data includes human weight recognition images with the first type of label and clothing recognition images with the second type of label; the second training sample data is obtained by removing the human weight recognition images that meet preset conditions from the first training sample data; the device further includes a second model training unit; the second model training unit is specifically used for:

[0056] The clothing and clothing recognition model to be trained is trained in one stage based on the first training sample data, and the first recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0057] Determine whether the first recognition loss value converges to the preset first target value. If yes, end the training to obtain the intermediate clothing wearing model. Otherwise, adjust the parameters of the clothing wearing recognition model to be trained according to the determined first recognition loss value and conduct another stage of training using the first training sample data.

[0058] The intermediate model for clothing wearing is trained in two stages based on the second training sample data, and a second recognition loss value for the intermediate model for clothing wearing is determined; wherein, in the second stage training, if a clothing recognition image is selected from the second training sample data, the selected clothing recognition image is resampled.

[0059] Determine whether the second recognition loss value converges to the preset second target value. If it does, end the training to obtain the trained clothing recognition model. Otherwise, fine-tune the parameters of the intermediate clothing model according to the determined recognition loss value and perform a second-stage training using the second training sample data.

[0060] In one alternative embodiment, the clothing recognition model uses ResNet18 as the backbone network.

[0061] In one alternative embodiment, the clothing recognition model is trained using a cross-entropy loss function.

[0062] Thirdly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the wearable specification recognition method of the first aspect.

[0063] Fourthly, embodiments of this application also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program that can run on the processor, and when the computer program is executed by the processor, the processor enables the processor to implement the wearable specification recognition method of the first aspect.

[0064] The technical effects of any of the implementation methods in the second to fourth aspects can be found in the technical effects of the corresponding implementation methods in the first aspect, and will not be repeated here. Attached Figure Description

[0065] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0066] Figure 1 A schematic flowchart illustrating a wearable specification recognition method provided in an embodiment of this application;

[0067] Figure 2 One of the flowcharts for training a clothing wear recognition model for a clothing wear recognition method provided in this application embodiment;

[0068] Figure 3 A schematic diagram illustrating the process of constructing batch training data for a wearability specification recognition method provided in this application embodiment;

[0069] Figure 4 The second schematic diagram of the process for training a clothing wear recognition model for a clothing wear recognition method provided in this application embodiment;

[0070] Figure 5 This is one of the structural schematic diagrams of a wearable specification recognition device provided in an embodiment of this application;

[0071] Figure 6 This is a second schematic diagram of the structure of a wearable specification recognition device provided in an embodiment of this application;

[0072] Figure 7This is the third schematic diagram of a wearable specification recognition device provided in the embodiments of this application;

[0073] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0074] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0075] It should be noted that the terms "comprising" and "having" and their variations used in this application are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or units that are not explicitly listed or that are inherent to such process, method, product, or device.

[0076] The following are explanations of some of the words that appear in the text:

[0077] (1) Person Re-identification (ReID): Person re-identification, also known as person re-identification, is a technique that uses computer vision technology to determine whether a specific person exists in an image or video sequence; or, in other words, person re-identification refers to identifying a target person in a video sequence with possible sources and non-overlapping camera fields of view.

[0078] (2) Euclidean Distance: Also known as Euclidean distance, Euclidean distance is a common distance metric that measures the absolute distance between two points in multidimensional space. Named after the ancient Greek mathematician Euclid, it is the intuitive shortest straight-line distance between two points.

[0079] (3) Epoch: The training process of a network model is usually as follows: first, the weights and biases between each neuron in the model are initialized; then, the training set is preprocessed; the preprocessed data is input into the model to obtain the predicted values; the loss is calculated using the predicted values ​​and the true values; and the weights between each neuron in the model are updated through backpropagation. One Epoch is to train the model on all the data in the training set.

[0080] (4) ResNet18 Network Model: The basic meaning of ResNet18 is that the basic architecture of the network is ResNet, and the network depth is 18 layers. The network depth refers to the weight layers, which include pooling, activation, and linear layers. The weight layers do not include the batch normalization layer or the pooling layer.

[0081] (5) Cross-entropy Loss Function: The cross-entropy loss function, also known as the cross-entropy function, is a commonly used loss function in classification problems. Cross-entropy is an important concept in information theory, mainly used to measure the difference between two probability distributions. Cross-entropy measures the degree of difference between two different probability distributions of the same random variable; in machine learning, it is represented as the difference between the true probability distribution and the predicted probability distribution. The smaller the cross-entropy value, the better the model's prediction performance. In classification problems, cross-entropy is often used in conjunction with softmax. Softmax processes the output results to make the sum of the predicted values ​​of multiple categories equal to 1, and then the loss is calculated using the cross-entropy loss function.

[0082] In recent years, some workplaces have imposed strict requirements on the attire of their employees. They are required to wear standardized work clothes in specific locations. For example, production workshops and construction sites require relevant employees to wear appropriate work clothes due to safety and other factors. Employees who do not wear the correct clothes will be warned or even prohibited from entering the workplace.

[0083] With the rapid development of deep learning, some deep learning-based techniques have also been applied to the field of work uniform compliance detection. These techniques typically involve collecting images of properly worn work uniforms in specific work scenarios, feeding them into a network model for training, and then using the trained model to extract features from both the image of the work uniform to be identified and the registered image, calculating the similarity between the two to determine whether the worker is wearing the correct uniform. However, the accuracy of these models decreases when the work scenario changes, indicating poor versatility. Therefore, improving the accuracy of uniform compliance detection is a pressing issue that needs to be addressed.

[0084] To address existing technical problems, this application provides a method for clothing standard recognition, which involves acquiring an image of a human body to be recognized, including the human body to be processed; inputting the image of the human body to be recognized into a trained clothing recognition model to obtain a first feature vector of the human body image; the clothing recognition model is obtained by training a feature extraction model using training sample data, which includes human body weight recognition images with a first type of label and clothing recognition images with a second type of label; the feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition; the first type of label represents the human body identity; the second type of label represents the type of clothing; the similarity between the first feature vector and a preset benchmark feature vector is calculated to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting a benchmark clothing image into the trained clothing recognition model; and the clothing standard evaluation information of the human body to be processed is determined based on the similarity feature value. This method, in clothing standard recognition, includes body recognition images with a first-class label and clothing recognition images with a second-class label in the training sample data. The first-class label represents the human body identity, and the second-class label represents the clothing type. The feature extraction model is trained using this training sample data to obtain a trained clothing recognition model. This improves the generalization of the clothing recognition model, enhances its robustness in extracting feature vectors from new types of clothing that do not appear in the training sample data, and improves the accuracy of clothing standard recognition.

[0085] The technical solutions provided in the embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0086] This application provides a method for identifying wearable specifications, such as... Figure 1 As shown, it includes the following steps:

[0087] Step S101: Obtain an image of the human body to be identified, which includes the human body to be processed.

[0088] In practice, video images from the surveillance video to be identified can be acquired sequentially, and images of the human body to be identified can be extracted from these video images; these images include the human body to be processed. Wearing specifications are then identified from the acquired video images sequentially.

[0089] For example, the human image Tar_P to be identified is extracted from the acquired video image Video_P; the human image Tar_P to be identified includes the human body R_People to be processed.

[0090] Step S102: Input the human image to be identified into the trained clothing recognition model to obtain the first feature vector of the human image to be identified.

[0091] The clothing recognition model is obtained by training the feature extraction model with training sample data. The training sample data includes human body recognition images with first-class labels and clothing recognition images with second-class labels. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to the preset target condition. The first-class label represents human identity; the second-class label represents clothing type.

[0092] In one optional embodiment, the human body weight recognition images are a first quantity, and the clothing recognition images are a second quantity; the first quantity is greater than the second quantity.

[0093] In practice, the clothing recognition model is trained using training sample data to train the feature extraction model. The training sample data includes a first number of human body recognition images with a first type of label and a second number of clothing recognition images with a second type of label. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to the preset target condition. The first type of label represents the human body identity; the second type of label represents the type of clothing; the first number is greater than the second number.

[0094] For example, the human image Tar_P to be identified is input into the trained clothing recognition model Mold_trained to obtain the first feature vector Info_1 of the human image Tar_P. The clothing recognition model Mold_trained is obtained by training the feature extraction model Mold_initial using training sample data Train_data. The training sample data Train_data includes a first number N_1 human body recognition images Person_pic with a first class label Label_1 and a second number N_2 clothing recognition images WorkClo_pic with a second class label Label_2. The feature extraction model Mold_initial is used to obtain the feature vector of the input object, and the trained clothing recognition model Mold_trained is obtained when the loss of the feature extraction model Mold_initial converges to a preset target condition. The first class label Label_1 represents the human identity; the second class label Label_2 represents the clothing type; the first number N_1 is greater than the second number N_2.

[0095] In the embodiments of this application, different types of work clothes collected in different scenarios are used as clothing recognition images. At the same time, a pedestrian re-identification dataset with a quantity several times greater than that of clothing recognition images is added and fed into the feature extraction model for training. This improves the generalization of the clothing recognition model and enables it to extract robust features even when faced with new types of clothing.

[0096] Step S103: Calculate the similarity between the first feature vector and the preset benchmark feature vector to obtain the similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting the benchmark clothing image into the trained clothing wearing recognition model.

[0097] Optionally, in some embodiments of this application, the similarity feature value is obtained by taking the cosine similarity between the first feature vector and a preset benchmark feature vector.

[0098] For example, the similarity between the first feature vector Info_1 and the preset baseline feature vector Info_Basic is calculated to obtain the similarity feature value Similar_Score; the baseline feature vector Info_Basic is the feature vector of the baseline clothing Work_Cloth obtained by inputting the baseline clothing image Work_Cloth_Pict into the trained clothing wearing recognition model and Mold_trained.

[0099] Step S104: Determine the clothing wearing standard evaluation information of the human body to be processed based on the similarity feature value.

[0100] In the embodiments of this application, multiple reference feature vectors corresponding to the reference clothing can be configured. The similarity calculation between the first feature vector and the preset reference feature vectors can be performed by calculating the similarity between the first feature vector and each of the reference feature vectors separately, resulting in multiple similarity feature values.

[0101] In one optional embodiment, the clothing wearing standard evaluation information of the human body to be processed is determined based on the similarity feature value. Specifically, if the maximum value in the similarity feature value reaches a preset judgment threshold, the clothing wearing standard of the object to be identified is determined, that is, the baseline clothing of the object to be identified meets the clothing wearing standard.

[0102] The clothing recognition method provided in this application involves acquiring a human image containing the subject; inputting the human image into a trained clothing recognition model to obtain a first feature vector; the clothing recognition model is trained using training sample data, which includes human body recognition images with first-class labels and clothing recognition images with second-class labels; the feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition; the first-class label represents the human identity; the second-class label represents the clothing type; the similarity between the first feature vector and a preset benchmark feature vector is calculated to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting a benchmark clothing image into the trained clothing recognition model; and the clothing recognition evaluation information of the subject is determined based on the similarity feature value. This method, in clothing standard recognition, includes body recognition images with a first-class label and clothing recognition images with a second-class label in the training sample data. The first-class label represents the human body identity, and the second-class label represents the clothing type. The feature extraction model is trained using this training sample data to obtain a trained clothing recognition model. This improves the generalization of the clothing recognition model, enhances its robustness in extracting feature vectors from new types of clothing that do not appear in the training sample data, and improves the accuracy of clothing standard recognition.

[0103] In one alternative embodiment, during the process of obtaining the trained clothing wearing recognition model, the training, as follows: Figure 2 As shown, it includes the following steps:

[0104] Step S201: Obtain training sample data; the training sample data includes human body weight recognition images with first-class labels and clothing recognition images with second-class labels; the first-class labels represent human identity; the second-class labels represent clothing type.

[0105] In practice, different types of work clothes collected in different scenarios are used as clothing recognition images. Simultaneously, a pedestrian re-identification dataset, several times larger in number than the clothing recognition images, is added and fed into the feature extraction model for training. For example, the training sample data could include a first number of pedestrian re-identification images with a first type of label and a second number of clothing recognition images with a second type of label; the first type of label represents the person's identity; the second type of label represents the type of clothing; and the first number is greater than the second number.

[0106] For example, training sample data Train_data is obtained; the training sample data Train_data includes a first number N_1 of human body recognition images Person_pic with a first class label Label_1 and a second number N_2 of clothing recognition images WorkClo_pic with a second class label Label_2; the first class label Label_1 represents human identity; the second class label Label_2 represents clothing type; the first number N_1 is greater than the second number N_2.

[0107] Step S202: Select batch training images based on training sample data.

[0108] In one embodiment of this application, the batch training data consists of clothing recognition images and human body weight recognition images.

[0109] In another embodiment of this application, the batch training image data is obtained by randomly selecting from clothing recognition images and human body weight recognition images.

[0110] For example, a batch of training images, Train_batch, is selected based on the training sample data Train_data.

[0111] In one alternative embodiment, the training sample data of the clothing recognition model contains N color categories, where N is an integer greater than 2.

[0112] In some embodiments of this application, the training sample data serves as a training set, which comprises two parts: one part consists of human body recognition images, and the other part consists of clothing recognition images collected in different scenes. In the human body recognition images, the category is represented by the pedestrian's identity, and in the clothing recognition images, the category is represented by the type of clothing. Each image in the training set is labeled with a category. Furthermore, each category is labeled with a color tag. In some embodiments, when labeling categories with color tags, there are nine major color categories: black, white, red, yellow, green, blue, purple, gray, and others, which is equivalent to the training set being divided into nine color sample data.

[0113] In one optional embodiment, both the human body weight recognition image and the clothing recognition image also have color labels; the color labels are used to divide the training sample data into multiple color sample data; based on the training sample data, batch training images are selected, such as... Figure 3 As shown, this can be achieved through the following steps:

[0114] Step S301: Select color sample data one by one. For each selected color sample data, construct a category nearest neighbor graph corresponding to the currently selected color sample data through the current clothing recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data.

[0115] For example, both the human body weight recognition image Person_pic and the clothing recognition image WorkClo_pic also have color labels color_labels; the color labels color_labels are used to divide the training sample data Train_data into multiple color sample data; before the start of each epoch of training, for each color sample data, the distance between each category in the color sample data is calculated using the clothing recognition model trained in the previous epoch, and a category nearest neighbor graph can be constructed based on the distance.

[0116] In the embodiments of this application, the process of selecting a color sample data and constructing a category nearest neighbor graph corresponding to the currently selected color sample data through the current clothing wearing recognition model can be achieved by executing the following process: Assuming that a color sample data contains C category samples, one image is extracted from each category as the representative of that category. The distance between the two images is calculated using Euclidean distance through the model trained in the previous round, thus obtaining a distance matrix of size C*C. A category nearest neighbor graph G = (V, E) can be constructed in this way, where V = {c|1,2,…,C} is the vertex set, and each vertex represents a category; E = {(c1,c2)|c1,c2∈{1,2,…,C},c1≠c2} is the edge set.

[0117] Step S302: Randomly select one from the color sample data as the target color sample data.

[0118] When there are N color labels, meaning the training sample data contains N colors, we can construct N undirected graphs. In any undirected graph, each category can find p-1 nearest neighbor categories. Therefore, in each iteration, a color is randomly selected as the target color sample data.

[0119] Step S303: Randomly select a category from the target color sample data as the baseline category.

[0120] In practice, during each iteration, a color sample data of one color is randomly selected as the target color sample data, and then a category is selected from the target color sample data as the baseline category.

[0121] Step S304: Based on the category nearest neighbor relationship graph, select a preset first number of categories from the target color sample data in ascending order of vector distance from the benchmark category, as the target nearest neighbor categories of the benchmark category.

[0122] In practice, assuming that each batch of training data consists of p categories, the category nearest neighbor graph can be retrieved according to the category nearest neighbor graph. The p-1 categories with the smallest vector distance from the benchmark category are selected as the target nearest neighbor categories of the benchmark category in ascending order of vector distance from the benchmark category.

[0123] Step S305: Select a preset second number of images from the benchmark category, and select a second number of images from each target nearest neighbor category of the benchmark category to obtain batch training images.

[0124] In practice, there are a total of p categories, including the baseline category and the target nearest neighbor category of the baseline category. From each of these p categories in the target color sample data, k images are extracted to obtain a batch of training images totaling p*k images. This batch is then fed into the network for training. Specifically, if a category contains more than k images, the extracted k images are unique; otherwise, the extracted k images are duplicates.

[0125] The method in this embodiment constructs a category nearest neighbor relationship graph for each color sample data. Each batch of training images consists of randomly selected categories and their similar nearest neighbor categories, so that the images in the batch of training images have the same or similar colors. The network model after iterative training has a stronger ability to identify work clothes with the same or similar colors, improves the discrimination when faced with clothing with similar colors, and improves the accuracy of clothing standard recognition.

[0126] In some embodiments of this application, in order to further shorten the time for constructing the category nearest neighbor graph, when the number of categories in a color sample data exceeds 10,000, for the category selected in the iteration, the top 3,000 categories that are close to the category are selected from the category nearest neighbor graph of the previous round, and then 2,000 categories are randomly selected from the remaining categories in the color sample data, for a total of 5,000 categories. The distance between these 5,000 categories and the selected categories is calculated to construct the category nearest neighbor graph. In the initial epoch, the 5,000 categories are randomly selected. After that, as in the previous steps, p*k images are selected to form a training batch and sent to the network for training.

[0127] Step S203: Input the batch training images into the clothing and clothing recognition model to be trained for training, and determine the recognition loss value of the clothing and clothing recognition model to be trained.

[0128] For example, the batch training image data Train_batch is input into the clothing and clothing recognition model to be trained for training, and the recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0129] In one alternative embodiment, the clothing recognition model uses ResNet18 as the backbone network.

[0130] In one alternative embodiment, the output of the feature extraction model includes global features, upper body local features, and lower body local features; the feature vector is obtained by concatenating and fusing the global features, upper body local features, and lower body local features.

[0131] In some embodiments, the feature extraction model of this application makes slight modifications to the ResNet18 network model, which typically contains 17 convolutional layers and 1 fully connected layer. This application adds a branch network after the 13th convolutional layer of ResNet18. This branch network includes a split module, which divides the output features of the convolutional layer into upper and lower parts, thus enabling the feature extraction model of this application to output a three-part feature vector. When a human image to be identified is input into this feature extraction model, three parts of features are obtained: global features of the overall work clothes, first local features of the upper body work clothes, and second local features of the lower body work clothes. The feature extraction model fuses these three parts of features in a concatenated manner to form the feature vector of the input object.

[0132] In one alternative embodiment, the clothing recognition model is trained using a cross-entropy loss function.

[0133] It should be noted that the use of ResNet18 as the backbone network in the clothing recognition model is merely an illustrative example of the clothing recognition model in this application. In other embodiments of this application, the clothing recognition model may also employ other deep learning network models, such as GoogLeNet. This application does not specifically limit the network model used in the clothing recognition model.

[0134] Step S204: Determine whether the recognition loss value has converged to the preset target value. If so, end the training and obtain the trained clothing recognition model. Otherwise, adjust the parameters of the clothing recognition model to be trained according to the determined recognition loss value and retrain it.

[0135] The method in this embodiment, when identifying clothing specifications, includes body recognition images with a first type of label and clothing recognition images with a second type of label in the training sample data. The first type of label represents the identity of the person, and the second type of label represents the type of clothing. The feature extraction model is trained using this training sample data to obtain a trained clothing recognition model. This improves the generalization of the clothing recognition model, enhances the robustness when extracting feature vectors from new types of clothing that do not appear in the training sample data, and improves the accuracy of clothing specification recognition.

[0136] In some embodiments of this application, the parameters of the clothing wearing recognition model can be fine-tuned through two-stage training.

[0137] In one optional embodiment, the training sample data includes first training sample data and second training sample data; the first training sample data includes human weight recognition images with a first type of label and clothing recognition images with a second type of label; the second training sample data is obtained by removing human weight recognition images that meet preset conditions from the first training sample data; in the process of obtaining the trained clothing recognition model, the training, such as... Figure 4 As shown, it includes the following steps:

[0138] Step S401: Perform a one-stage training on the clothing recognition model to be trained based on the first training sample data, and determine the first recognition loss value of the clothing recognition model to be trained.

[0139] Step S402: Determine whether the first recognition loss value has converged to the preset first target value. If so, end the training to obtain the intermediate clothing wearing model. Otherwise, adjust the parameters of the clothing wearing recognition model to be trained according to the determined first recognition loss value and conduct another stage of training using the first training sample data.

[0140] Step S403: Perform two-stage training on the intermediate clothing model based on the second training sample data, and determine the second recognition loss value of the intermediate clothing model; wherein, in the two-stage training, if a clothing recognition image is selected from the second training sample data, the selected clothing recognition image is resampled.

[0141] In specific implementation, the training sample data includes first training sample data and second training sample data. The first training sample data includes human weight recognition images with first-class labels and clothing recognition images with second-class labels. The second training sample data is obtained by removing human weight recognition images that meet preset conditions from the first training sample data. Compared with the first training sample data used in the first stage of training, the second training sample data obtained in the second stage of fine-tuning selectively removes some human weight recognition images, such as removing human weight recognition images containing features such as pedestrians riding non-motorized vehicles, which can fine-tune the quality of the training data. In addition, in each epoch of network training, clothing recognition images are repeatedly sampled several times, which is equivalent to increasing the proportion of clothing recognition images in the overall training sample data. In each training iteration, p categories are extracted from the training sample data. These p categories may contain duplicate work clothes categories. Then, k images are extracted from each category, resulting in a training batch of p*k images, which are then fed into the network for training.

[0142] Step S404: Determine whether the second recognition loss value has converged to the preset second target value. If so, end the training to obtain the trained clothing recognition model. Otherwise, fine-tune the parameters of the intermediate clothing model according to the determined recognition loss value and perform a second-stage training using the second training sample data.

[0143] In one optional embodiment, during the two-stage training process, the convolutional layer parameters of the model trained in the first stage are fixed so that the convolutional layer parameters of the model trained in the first stage do not participate in gradient updates; the parameters other than the convolutional layer parameters of the model trained in the first stage participate in gradient updates until the network converges.

[0144] In the method of the above embodiment, the network is trained in the first stage using all pedestrian re-identification data and clothing recognition images. In the second stage, the training sample data consists of a partial pedestrian re-identification dataset and all clothing recognition images. The clothing recognition images are resampled, and the parameters of the network model trained in the first stage are fine-tuned to improve the network model's ability to recognize work clothes and improve the efficiency of clothing standard recognition.

[0145] and Figure 1 The wearability identification method shown is based on the same inventive concept. This application also provides a wearability identification device. Since this device corresponds to the wearability identification method of this application, and the principle by which this device solves the problem is similar to that of the method, the implementation of this device can refer to the implementation of the above method; repeated details will not be elaborated further.

[0146] Figure 5 This application provides a schematic diagram of the structure of a wearable specification recognition device according to an embodiment of the present application. Figure 5 As shown, the wearable standard recognition device includes an image acquisition module 501, a vector generation module 502, a similarity determination module 503, and a benchmark recognition module 504.

[0147] The image acquisition module 501 is used to acquire an image of a human body to be identified, which contains the human body to be processed.

[0148] The vector generation module 502 is used to input the human image to be identified into the trained clothing recognition model to obtain the first feature vector of the human image to be identified. The clothing recognition model is obtained by training the feature extraction model with training sample data. The training sample data includes human body recognition images with first-class labels and clothing recognition images with second-class labels. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to the preset target condition. The first-class label represents the human identity; the second-class label represents the type of clothing.

[0149] The similarity determination module 503 is used to calculate the similarity between the first feature vector and the preset benchmark feature vector to obtain the similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting the benchmark clothing image into the trained clothing wearing recognition model.

[0150] The benchmarking and recognition module 504 is used to determine the clothing wearing standard evaluation information of the human body to be processed based on the similarity feature value.

[0151] In one alternative embodiment, the output of the feature extraction model includes global features, upper body local features, and lower body local features; the feature vector is obtained by concatenating and fusing the global features, upper body local features, and lower body local features.

[0152] In one optional embodiment, the human body weight recognition images are a first quantity, and the clothing recognition images are a second quantity; the first quantity is greater than the second quantity.

[0153] In one alternative embodiment, such as Figure 6 As shown, the device also includes a first model training unit 601; the first model training unit 601 is used to obtain a trained clothing wearing recognition model; the first model training unit 601 is specifically used for:

[0154] Obtain training sample data; the training sample data includes human weight recognition images with first-class labels and clothing recognition images with second-class labels; the first-class labels represent human identity; the second-class labels represent clothing type;

[0155] Based on the training sample data, batch training images are selected; the batch training images consist of clothing recognition images and human weight recognition images.

[0156] The batch training images are input into the clothing and clothing recognition model to be trained for training, and the recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0157] Determine whether the recognition loss value has converged to the preset target value. If it has, end the training to obtain the trained clothing recognition model. Otherwise, adjust the parameters of the clothing recognition model to be trained according to the determined recognition loss value and retrain it.

[0158] In one alternative embodiment, the training sample data of the clothing recognition model contains N color categories, where N is an integer greater than 2.

[0159] In an optional embodiment, both the human body weight recognition image and the clothing recognition image are further labeled with color tags; the color tags are used to divide the training dataset into multiple color sample data; the first model training unit 601 is specifically used for:

[0160] Color sample data is selected one by one. For each selected color sample data, a category nearest neighbor graph corresponding to the currently selected color sample data is constructed using the current clothing recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data.

[0161] One color sample is randomly selected from the color sample data and used as the target color sample data;

[0162] Randomly select a category from the target color sample data as the baseline category;

[0163] Based on the category nearest neighbor graph, a preset number of categories are selected from the target color sample data in ascending order of vector distance from the baseline category, and these categories are used as the target nearest neighbor categories of the baseline category.

[0164] Select a second set of images from the baseline category, and then select a second set of images from each target nearest neighbor category of the baseline category to obtain batch training images.

[0165] In one optional embodiment, the training sample data includes first training sample data and second training sample data; the first training sample data includes human weight recognition images with a first type of label and clothing recognition images with a second type of label; the second training sample data is obtained by removing human weight recognition images that meet preset conditions from the first training sample data; such as Figure 7 As shown, the device also includes a second model training unit 701; the second model training unit 701 is specifically used for:

[0166] The clothing and clothing recognition model to be trained is trained in one stage based on the first training sample data, and the first recognition loss value of the clothing and clothing recognition model to be trained is determined.

[0167] Determine whether the first recognition loss value converges to the preset first target value. If it does, end the training and obtain the intermediate clothing wearing model. Otherwise, adjust the parameters of the clothing wearing recognition model to be trained according to the determined first recognition loss value and conduct another stage of training using the first training sample data.

[0168] The intermediate model for clothing wearing is trained in two stages based on the second training sample data, and the second recognition loss value of the intermediate model for clothing wearing is determined. In the second stage training, if a clothing recognition image is selected from the second training sample data, the selected clothing recognition image is resampled.

[0169] Determine whether the second recognition loss value converges to the preset second target value. If it does, end the training to obtain the trained clothing recognition model. Otherwise, fine-tune the parameters of the intermediate clothing model according to the determined recognition loss value and conduct a second-stage training using the second training sample data.

[0170] In one alternative embodiment, the clothing recognition model uses ResNet18 as the backbone network.

[0171] In one alternative embodiment, the clothing recognition model is trained using a cross-entropy loss function.

[0172] Based on the same inventive concept as the above-described method embodiments, this application also provides an electronic device. This electronic device can be used for wearable specification identification. In one embodiment, the electronic device can be a server, a terminal device, or other electronic devices. In this embodiment, the structure of the electronic device can be as follows... Figure 8 As shown, it includes a memory 101, a communication module 103, and one or more processors 102.

[0173] The memory 101 is used to store computer programs executed by the processor 102. The memory 101 may mainly include a program storage area and a data storage area. The program storage area may store the operating system and programs required to run instant messaging functions, etc.; the data storage area may store various instant messaging information and operation instruction sets, etc.

[0174] Memory 101 may be volatile memory, such as random-access memory (RAM); memory 101 may also be non-volatile memory, such as read-only memory, flash memory, hard disk drive (HDD), or solid-state drive (SSD); or memory 101 may be any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but is not limited thereto. Memory 101 may be a combination of the above-described memories.

[0175] The processor 102 may include one or more central processing units (CPUs) or digital processing units, etc. The processor 102 is used to implement the aforementioned wearable specification recognition method when calling the computer program stored in the memory 101.

[0176] The communication module 103 is used to communicate with terminal devices and other servers.

[0177] This application does not limit the specific connection medium between the memory 101, communication module 103, and processor 102 described above. This disclosure embodiment... Figure 8 The memory 101 and the processor 102 are connected via a bus 104, and the bus 104 is in Figure 8 The connections between other components are shown in thick lines only and are not intended to be limiting. Bus 104 can be divided into address bus, data bus, control bus, etc. For ease of illustration, Figure 8 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0178] According to one aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the wearable specification recognition method described in the above embodiments. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof.

[0179] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application.

Claims

1. A method of wear norm recognition, characterized by, The method includes: Obtain an image of the human body to be identified, which contains the human body to be processed; The image of the human body to be identified is input into a trained clothing recognition model to obtain a first feature vector of the image. The clothing recognition model is obtained by training a feature extraction model using training sample data, which includes human body recognition images with a first type of label and clothing recognition images with a second type of label. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition. The first type of label represents the human body identity; the second type of label represents the type of clothing. The similarity between the first feature vector and the preset benchmark feature vector is calculated to obtain a similarity feature value; the benchmark feature vector is the feature vector of the benchmark clothing obtained by inputting the benchmark clothing image into the trained clothing wearing recognition model. Based on the similarity feature values, the clothing wearing standard evaluation information of the human body to be processed is determined; The training sample data of the clothing wearing recognition model contains N color categories, where N is an integer greater than 2; Both the human body weight recognition image and the clothing recognition image also have color labels; the color labels are used to divide the training dataset into multiple color sample data; in the process of obtaining the trained clothing wearing recognition model, batch training images are selected based on the training sample data; the selection of batch training images based on the training sample data includes: The color sample data is selected one by one. For each selected color sample data, a category nearest neighbor graph corresponding to the currently selected color sample data is constructed using the current clothing wear recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data. One color sample is randomly selected from the color sample data as the target color sample data; A category is randomly selected from the target color sample data as the baseline category; According to the category nearest neighbor relationship graph, a preset first number of categories are selected from the target color sample data in ascending order of vector distance from the benchmark category, as the target nearest neighbor categories of the benchmark category; The batch training image data is obtained by randomly selecting a preset second number of images from the benchmark category, and randomly selecting the second number of images from each of the target nearest neighbor categories of the benchmark category.

2. The method of claim 1, wherein, The output of the feature extraction model includes global features, upper body local features, and lower body local features; the feature vector is obtained by concatenating and fusing the global features, upper body local features, and lower body local features.

3. The method of claim 1, wherein, The human body weight recognition image is a first quantity, and the clothing recognition image is a second quantity; the first quantity is greater than the second quantity.

4. The method of claim 1, wherein, The process of obtaining the trained clothing recognition model includes the following steps: Obtain training sample data; the training sample data includes human weight recognition images with a first type of label and clothing recognition images with a second type of label; the first type of label represents human identity; The second type of label indicates the type of clothing; Based on the training sample data, a batch of training images is selected; the batch of training images consists of the clothing recognition images and the human weight recognition images. The batch training images are input into the clothing and clothing recognition model to be trained for training, and the recognition loss value of the clothing and clothing recognition model to be trained is determined. Determine whether the recognition loss value has converged to the preset target value. If it has, end the training to obtain the trained clothing recognition model. Otherwise, adjust the parameters of the clothing recognition model to be trained according to the determined recognition loss value and retrain it.

5. The method of claim 1, wherein, The training sample data includes first training sample data and second training sample data; the first training sample data includes human weight recognition images with the first type of label and clothing recognition images with the second type of label; the second training sample data is obtained by removing the human weight recognition images that meet preset conditions from the first training sample data; the training process for obtaining the trained clothing recognition model includes the following steps: The clothing and clothing recognition model to be trained is trained in one stage based on the first training sample data, and the first recognition loss value of the clothing and clothing recognition model to be trained is determined. Determine whether the first recognition loss value converges to the preset first target value. If yes, end the training to obtain the intermediate clothing wearing model. Otherwise, adjust the parameters of the clothing wearing recognition model to be trained according to the determined first recognition loss value and conduct another stage of training using the first training sample data. The intermediate model for clothing wearing is trained in two stages based on the second training sample data, and a second recognition loss value for the intermediate model for clothing wearing is determined; wherein, in the second stage training, if a clothing recognition image is selected from the second training sample data, the selected clothing recognition image is resampled. Determine whether the second recognition loss value converges to the preset second target value. If it does, end the training to obtain the trained clothing recognition model. Otherwise, fine-tune the parameters of the intermediate clothing model according to the determined recognition loss value and perform a second-stage training using the second training sample data.

6. A wear norm recognition apparatus characterized by comprising: include: The image acquisition module is used to acquire an image of the human body to be identified, which contains the human body to be processed. A vector generation module is used to input the human image to be identified into a trained clothing recognition model to obtain a first feature vector of the human image to be identified. The clothing recognition model is obtained by training a feature extraction model using training sample data. The training sample data includes human body recognition images with a first type of label and clothing recognition images with a second type of label. The feature extraction model is used to obtain the feature vector of the input object, and the trained clothing recognition model is obtained when the loss of the feature extraction model converges to a preset target condition. The first type of label represents human identity; the second type of label represents clothing type. The similarity determination module is used to calculate the similarity between the first feature vector and the preset benchmark feature vector to obtain a similarity feature value; The baseline feature vector is the feature vector of the baseline clothing obtained by inputting the baseline clothing image into the trained clothing wearing recognition model; The benchmarking and identification module is used to determine the clothing wearing standard evaluation information of the human body to be processed based on the similarity feature value. The training sample data of the clothing wearing recognition model contains N color categories, where N is an integer greater than 2; Both the human body weight recognition image and the clothing recognition image are further equipped with color labels; the color labels are used to divide the training dataset into multiple color sample data; the device also includes a first model training unit; the first model training unit is used to obtain the trained clothing wearing recognition model; the first model training unit is specifically used for: The color sample data is selected one by one. For each selected color sample data, a category nearest neighbor graph corresponding to the currently selected color sample data is constructed using the current clothing wear recognition model. The category nearest neighbor graph includes the vector distance between any two different categories in the same color sample data. One color sample is randomly selected from the color sample data as the target color sample data; A category is randomly selected from the target color sample data as the baseline category; According to the category nearest neighbor relationship graph, a preset first number of categories are selected from the target color sample data in ascending order of vector distance from the benchmark category, as the target nearest neighbor categories of the benchmark category; The batch training image data is obtained by randomly selecting a preset second number of images from the benchmark category, and randomly selecting the second number of images from each of the target nearest neighbor categories of the benchmark category.

7. A computer readable storage medium having stored therein a computer program, characterized in that: When the computer program is executed by a processor, it implements the method of any one of claims 1 to 5.

8. An electronic device, comprising: It includes a memory and a processor, wherein the memory stores a computer program that can run on the processor, and when the computer program is executed by the processor, it implements the method of any one of claims 1 to 5.