Data processing method and device for garment image

A data processing device and data processing technology, applied in the field of computer vision, can solve the problems of poor processing effect of key points and the like

Pending Publication Date: 2019-03-26
BEIJING MOSHANGHUA TECH CO LTD
5 Cites 8 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0005] The main purpose of this application is to provide a data processing method and devi...
View more

Abstract

The present application discloses a data processing method and apparatus for garment images. The method comprises the following steps of: inputting a picture to be detected; Detecting the position ofthe clothing in the picture according to the pre-training detection model; Generating a heat map and obtaining a feature map of key points in the image; pre-training The pre-training detection model once, and acquiring the position of the clothing key point in the picture according to the key point characteristic map. The present application solves the technical problem of poor processing effect for key points of garment images. Through one training, the present application not only completes the detection of the garment position, but also calculates the key points of the garment. In addition,the detection accuracy can be ensured while the detection speed is improved.

Application Domain

Image enhancementImage analysis

Technology Topic

Computer visionData processing +2

Image

  • Data processing method and device for garment image
  • Data processing method and device for garment image
  • Data processing method and device for garment image

Examples

  • Experimental program(1)

Example Embodiment

[0025] In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.
[0026] It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances for the purposes of the embodiments of the present application described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
[0027] It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with embodiments.
[0028] Such as figure 1 As shown, the method includes the following steps S102 to S106:
[0029] Step S102, input a picture to be detected;
[0030] The input pictures to be detected are usually not high-quality close-ups of clothing. Most of the pictures to be detected are from scenes such as street photography, so the proportion of the target clothing in the picture is not certain. In addition, the picture to be detected may contain interference from images of multiple people.
[0031] It should be noted that in order to prevent the image from being distorted, the images to be detected can be adjusted to a uniform size.
[0032] Step S104, detecting the position of the clothing in the picture according to the pre-training detection model;
[0033] The pre-training detection model can be used to detect the input to-be-detected picture after pre-training the detection model once by designing training data. After pre-training the detection model, the position of the clothing in the picture can be obtained.
[0034] Step S106, generating a heat map and obtaining a feature map of key points in the picture;
[0035] The heat map is generated in the pre-training detection model, usually after the RPN (Region Proposal Network) candidate region network in the pre-training detection model generates the candidate region. Among them, the heat map is a heat map for the key points of clothing. At the same time, output the feature map of the key points in the picture in the clothing key point detection network in the pre-training detection model.
[0036] Step S108: Obtain the positions of key points of clothing in the picture according to the key point feature map.
[0037] Determine the position of the clothing key points in the picture according to the key point feature map, and map the position of the clothing key points back to the original image to get the position of the clothing key points on the original image.
[0038] It should be noted that in the above-mentioned image data processing process, it is first necessary to detect the position of the clothing in the picture, and then determine the position of the key points of the clothing in the picture.
[0039] From the above description, it can be seen that this application has achieved the following technical effects:
[0040] In the embodiment of the present application, the pre-training detection model is pre-trained once, by inputting the picture to be detected, the position of the clothing in the picture is detected according to the pre-training detection model, and the heat map is generated and the picture is obtained. The purpose of the key point feature map is to achieve the technical effect of obtaining the position of the clothing key points in the picture according to the key point feature map, thereby solving the technical problem of poor processing effect of the key points of the clothing image. Through this application, the position of the clothing in the picture is first detected, then the key points of the clothing are determined through the heat map, and the key point results are output.
[0041] According to the embodiment of this application, as a preference in this embodiment, such as figure 2 As shown, the method further includes a training phase, and the training phase includes:
[0042] Step S202, configuring training data to pre-train the detection model;
[0043] When configuring the training data, the training data is a marked picture, and the size of the picture needs to be adjusted to be consistent. At the same time, the corresponding positions of the frame and the key points need to be adjusted.
[0044] Step S204, training a first detection network for detecting the position of clothing;
[0045] The first detection network used to detect the clothing position can be trained using existing data parameters. It is not limited in this application, and those skilled in the art can make selections according to relevant usage scenarios.
[0046] Preferably, a backbone network based on the combination of ResNet101 and FPN can be used.
[0047] Step S206: Train a second detection network for detecting key points of clothing.
[0048] The second detection network used to detect key points of clothing can be trained using existing data parameters.
[0049] Preferably, the same backbone network is shared when the position of the clothing in the picture is detected according to the pre-training detection model and the position of the key points of the clothing in the picture is obtained according to the key point feature map.
[0050] Preferably, after the input of the picture to be detected includes: adjusting the size of the picture with the icon, and converting the coordinate system of the key points from the original picture to a coordinate system relative to the detection frame.
[0051] The method further includes: generating a heat map according to the candidate area generated by the first detection network. Generally, the generation of the heat map requires that after the RPN (Region Proposal Network) candidate region network generates the candidate frame, the key point coordinate system is converted to the coordinate system relative to the candidate frame, and then the heat map is generated.
[0052] The method further includes: generating key points and the number of key points according to the feature map generated by the second detection network. Specifically, the size of the feature map can be adjusted first, and the feature map can be adjusted to the size of the detection frame. Then adjust the channel, perform the Softmax loss function calculation operation after transforming the feature map, output the number and position of the key, and select the largest value as the position of the key point.
[0053] It should be noted that in the above image data training process, the training of the first detection network for detecting the position of the clothing and the second detection network for detecting the key points of clothing are trained simultaneously. The heat map generated from the candidate region generated by the first detection network and the key points and the number of key points generated from the feature map generated by the second detection network are also trained simultaneously.
[0054] It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the logical sequence is shown in the flowchart, in some cases, The steps shown or described can be performed in a different order than here.
[0055] According to an embodiment of the present application, there is also provided a data processing device for clothing images for implementing the above method, such as image 3 As shown, the device includes: an input module 10 for inputting a picture to be detected; a clothing position module 20 for detecting the position of clothing in the picture according to a pre-trained detection model; a heat map module 30 for generating a heat map and Obtain a feature map of key points in the picture; and a clothing key point module 40 for obtaining the positions of key points of clothing in the picture according to the key point feature map, wherein the pre-training detection model is pre-trained once.
[0056] The pictures to be detected input in the input module 10 of the embodiment of the present application are usually not high-quality clothing close-up pictures. Most of the pictures to be detected are from scenes such as street shooting, so the proportion of the target clothing in the picture is not certain. In addition, the picture to be detected may contain interference from images of multiple people.
[0057] It should be noted that in order to prevent the image from being distorted, the images to be detected can be adjusted to a uniform size.
[0058] The pre-training detection model described in the clothing position module 20 of the embodiment of the present application is pre-trained once, that is, after the detection model is pre-trained once by designing training data, it can be used to detect the input image to be detected. After pre-training the detection model, the position of the clothing in the picture can be obtained.
[0059] In the heat map module 30 of the embodiment of the present application, the heat map is generated in the pre-training detection model. Generally, the heat map can be generated after the RPN (Region Proposal Network) candidate region network in the pre-training detection model generates candidate regions. Among them, the heat map is a heat map for the key points of clothing. At the same time, output the feature map of the key points in the picture in the clothing key point detection network in the pre-training detection model.
[0060] The clothing key point module 40 of the embodiment of the present application determines the position of the clothing key point in the picture according to the key point feature map, and maps the position of the clothing key point back to the original image to obtain the position of the clothing key point on the original image.
[0061] According to the embodiment of this application, as a preference in this embodiment, such as Figure 4 As shown, the device further includes: a training module 50, the training module 50 includes: a training data unit 501 for configuring training data to pre-train the detection model; a first training unit 502 for training a clothing position detection unit A first detection network; and a second training unit 503 for training a second detection network for detecting key points of clothing.
[0062] When the training data is configured in the training data unit 501 of the embodiment of the present application, the training data is a picture with a mark, and the size of the picture needs to be adjusted to be consistent. At the same time, the corresponding positions of the frame and the key points need to be adjusted.
[0063] The first detection network used for detecting the clothing position in the first training unit 502 of the embodiment of the present application can be trained using existing data parameters. It is not limited in this application, and those skilled in the art can make selections according to relevant usage scenarios.
[0064] Preferably, a backbone network based on the combination of ResNet101 and FPN can be used.
[0065] The second detection network used for detecting key points of clothing in the second training unit 503 of the embodiment of the present application may be trained using existing data parameters.
[0066] Preferably, the same backbone network is shared when the position of the clothing in the picture is detected according to the pre-training detection model and the position of the key points of the clothing in the picture is obtained according to the key point feature map.
[0067] Preferably, after the input of the picture to be detected includes: adjusting the size of the picture with the icon, and converting the coordinate system of the key points from the original picture to a coordinate system relative to the detection frame.
[0068] According to the embodiment of this application, as a preference in this embodiment, such as Figure 5 As shown, the training module includes: a heat map unit 504, configured to generate a heat map according to the candidate region generated by the first detection network.
[0069] The generation of the heat map in the heat map unit 504 of the embodiment of the present application requires that after the RPN (Region Proposal Network) candidate region network generates the candidate frame, the key point coordinate system is converted to the coordinate system relative to the candidate frame, and then the heat map is generated.
[0070] According to the embodiment of this application, as a preference in this embodiment, such as Figure 5 As shown, the training module includes a key point unit 505, which is used to generate key points and the number of key points according to the feature map generated by the second detection network.
[0071] The key point unit 505 of the embodiment of the present application may first adjust the size of the feature map, and adjust the feature map to the size of the detection frame. Then adjust the channel, perform the Softmax loss function calculation operation after transforming the feature map, output the number and position of the key, and select the largest value as the position of the key point.
[0072] As a preference in the embodiments of this application, this application adopts a strategy based on Mask RCNN and proposes a CRCNN algorithm architecture, where C stands for Clothes and RCNN stands for Mask RCNN or Fast RCNN.
[0073] As a preference in this embodiment, a human body key point detection solution based on deep learning is adopted to solve the technical problem of poor clothing key point detection effect in the background art.
[0074] In this application, a "top-down" strategy of "multi-person" human body key point detection is adopted, and the target picture considered is not a high-quality close-up of clothing, such as a picture taken on a street. Therefore, the proportion of the target clothing in the picture is not certain, and sometimes it is very small. Therefore, before performing clothing key point detection, it is necessary to perform clothing detection to obtain the position of the clothing in the image. The "single person" human body key point detection and the "bottom-up" "multi-person" human body key point detection methods will be interfered by other information in the picture. Specifically, this application adopts a strategy based on Mask RCNN, which combines two functions of clothing detection and clothing key point detection in a convolutional neural network. Compared with G-RMI and CPN, position detection and key point detection are required Two models, this application only needs one model. After testing, the accuracy rate of this application reaches 95.4%.
[0075] The specific implementation steps of CRCNN in this application are as follows:
[0076] 1) In the training phase:
[0077] Step 1: Obtain the training data, and adjust all the input labeled pictures to the size of 512x 512, in order to prevent image distortion. The long side can be adjusted to 512, the short side is adjusted proportionally, the insufficient part is symmetrically filled with 0, and the position of the frame and key points can be adjusted synchronously. Finally, the coordinate system of the key points is converted from the original image to the coordinate system relative to the detection frame.
[0078] Step 2: Clothing detection network. In this application, the backbone network based on the combination of ResNet101 and FPN is used to generate feature maps P2-P6, and candidate frames are generated through the RPN (Region Proposal Network) candidate region network, and then the features are combined through the ROI Pooling operation The image is passed to the Fast RCNN part to complete the detection.
[0079] In addition, the specific loss function in the detection part can use Softmax Cross Entropy Loss Function to process category training, and Smooth L1 Loss Function to process detection frame training.
[0080] Step 3: Generate a heat map. The heat map is generated after the candidate frame is generated by the RPN candidate area network, and then the IOU between each candidate frame and the real frame is calculated. For example, when a candidate frame with an IOU greater than 0.5 is associated with the corresponding real frame, the key point coordinate system is converted to a coordinate system relative to the candidate frame, and then a heat map is generated. In this application, the CRCNN heat map is generated in a similar way to Mask RCNN. Since the size of the final heat map in this embodiment is 56x56, this application maps the key points to 56x56, and at the same time transforms the 56x56 heat map into a one-dimensional vector 3136, the key position is set to 1, and the remaining positions are 0. The label of the last key point is the position of 1 in the 3136 numbers. For example, if 1 is at the 3100th position, the label of the key point is 3100. Also note that the first bit is 0.
[0081] Step 4: Clothing key point detection network: share a backbone network with the clothing detection network, specifically for the candidate frame generated by the RPN candidate area network, perform ROI Pooling to generate a 14x14 feature map, and then connect 8 3x3x 256x 256 volumes Product operation. Then after each convolution, a ReLU activation layer is needed to obtain a 14x14x256 feature map. Then you need to connect a deconvolution operation and a ReLU operation to get a 28x 28x 256 feature map, and finally a deconvolution operation to get a 56x 56x NUM_KEYPOINTS feature map, where NUM_KEYPOINTS is the number of key points.
[0082] Step 5: Clothing key point detection loss function: For the 56x 56x NUM_KEYPOINTS feature map, first perform channel adjustment to get the NUM_KEYPOINTS x 56x 56 feature map, and then merge the last two channels to get the NUM_KEYPOINTS x 3136 vector. Finally, perform the Softmax operation to get the result, and calculate the Softmax Cross Entropy loss together with the generated heat map label.
[0083] The training strategy of the above training step includes: the batch size used during training is 5, the number of anchor frames for RPN sampling is 50, the number of candidate frames selected for training is 35, the initial learning rate is 0.02, one Kind of 30 iterations, every 10 times of learning is reduced to 1/10 of the original.
[0084] 2) In the testing phase:
[0085] Step 1: Input the image to be detected, the image to be detected is still adjusted to the size of 512x 512, and the strategy is the same as during training.
[0086] Step 2: When calculating the heat map, first obtain the clothing detection frame from the CRCNN network obtained by pre-training. Unlike the training phase, the test phase uses the output of Fast RCNN as the input of the key point detection network instead of the RPN Output, get 56x 56x NUM_KEYPOINTS feature map through the key point detection network.
[0087] Step 3: When calculating key points, first adjust the size of the feature map, and adjust the feature map to the size of the detection frame, Hx W x NUM_KEYPOINTS. The present invention adopts the bilinear interpolation method. Then adjust the channel and convert the feature map to NUM_KEYPOINTS x H x W. Then perform the Softmax operation on the H x W vector, and select the largest value as the position of the key point. Finally, the calculated key points are mapped back to the original image to get the result.
[0088] The network structure CRCNN provided in this application is used for the detection of key points of clothing, with an accuracy rate of 95.4%, and CRCNN can also output the location of clothing. In addition, the CRCNN proposed in this application can not only complete the detection of the clothing position, but also calculate the key points of the clothing through one training, while methods such as G-RMI and CPN need to train two models of position detection and key point detection, and then Improve the detection speed.
[0089] Obviously, those skilled in the art should understand that the above-mentioned modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, alternatively, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, or they can be made into individual integrated circuit modules, or they can be Multiple modules or steps are made into a single integrated circuit module to achieve. In this way, this application is not limited to any specific hardware and software combination.
[0090] The foregoing descriptions are only preferred embodiments of the application, and are not used to limit the application. For those skilled in the art, the application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products