Target region detection method and device, storage medium and processor
By performing feature extraction and feature augmentation on the image, a second feature tensor is generated to determine the target label, which improves the accuracy of the neural network model in recognizing the target region and solves the problem of low image detection accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA FAW CO LTD
- Filing Date
- 2022-10-20
- Publication Date
- 2026-06-23
Smart Images

Figure CN115578595B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and more specifically, to a method, apparatus, storage medium, and processor for detecting target regions. Background Technology
[0002] Currently, image recognition technology is widely used in object detection such as faces, pedestrians, obstacles, and vehicles. However, due to the varying distances between the image acquisition device and the object, the size of the area occupied by the object in the acquired image also varies. Furthermore, the image quality is poor due to interference from external environments such as lighting and noise. Therefore, the accuracy of object detection based on images is relatively low.
[0003] There is currently no effective solution to the aforementioned technical problem of low image detection accuracy. Summary of the Invention
[0004] This invention provides a target region detection method, apparatus, storage medium, and processor to at least address the technical problem of low image detection accuracy.
[0005] According to one aspect of the present invention, a target region detection method is provided. The method may include: acquiring an image to be detected; extracting features from the image to be detected to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, the target feature region being the feature region in the image to be detected corresponding to the target region in the first feature tensor; performing feature augmentation on the target feature region based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region; and determining a target label corresponding to the target feature region based on the second feature tensor, wherein the target label is used to indicate the position of the target feature region in the second feature tensor.
[0006] Optionally, feature augmentation is performed on the target feature region based on a preset feature augmentation method, including: performing feature augmentation on the target feature region according to one of the following operations: single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature splicing.
[0007] Optionally, in response to the preset feature augmentation method being single-sample region replication, feature augmentation is performed on the target feature region to obtain a second feature tensor, including: translating or rotating the target feature tensor in the first feature tensor to obtain a new feature region; and fusing the new feature region into the first feature tensor to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the new feature region.
[0008] Optionally, in response to the preset feature augmentation method being cross-sample region feature fusion, feature augmentation is performed on the target feature region to obtain a second feature tensor, including: acquiring at least one first reference image; performing feature extraction on the at least one first reference feature image to obtain a first reference feature tensor, wherein the first reference feature tensor includes at least one first reference feature region; and fusing the first feature tensor with the first reference feature tensor to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the first reference feature region.
[0009] Optionally, in response to the preset feature augmentation method being multi-sample region feature concatenation, feature augmentation is performed on the target feature region to obtain a second feature tensor, including: acquiring at least two second reference images; performing feature extraction on the at least two second reference images to obtain at least two second reference feature tensors, wherein each second reference feature tensor includes at least one second reference feature region; adjusting the size of the first feature tensor and each second reference feature tensor based on the target size to obtain multiple preprocessed feature tensors; and concatenating the multiple preprocessed feature tensors to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the second reference feature region.
[0010] Optionally, determining the target label corresponding to the target feature region based on the second feature tensor includes: determining the transformation parameters of the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor; and performing transformation calculation on the original label corresponding to the target feature region according to the transformation parameters to obtain the target label corresponding to the target feature region.
[0011] According to another aspect of the present invention, a target region detection device is also provided, comprising: an acquisition module for acquiring an image to be detected; an extraction module for extracting features from the image to be detected to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, the target feature region being the feature region corresponding to the target region in the first feature tensor in the image to be detected; an augmentation module for augmenting the target feature region based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region; and a determination module for determining a target label corresponding to the target feature region based on the second feature tensor, wherein the target label is used to indicate the position of the target feature region in the second feature tensor.
[0012] According to another aspect of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium includes a stored program, wherein, when the program is executed, it controls the device where the computer-readable storage medium is located to perform the target region detection method of the present invention.
[0013] According to another aspect of the present invention, a processor is also provided. The processor is used to run a program, wherein the program executes the target region detection method of the present invention during runtime.
[0014] According to another aspect of the present invention, a vehicle is also provided for performing the target area detection method of the present invention.
[0015] In this embodiment of the invention, an image to be detected is acquired, and features are extracted from the image to obtain a first feature tensor. The first feature tensor includes a target feature region, which is the feature region in the image to be detected corresponding to the target region in the first feature tensor. Then, the target feature region is augmented using a preset feature augmentation method to obtain a second feature tensor, which includes at least the target feature region. Based on the second feature tensor, a target label for the target feature region is determined. In other words, in this embodiment of the invention, feature augmentation can be used to augment the target feature region corresponding to the target region in the image to be detected, obtaining a feature tensor. Then, the augmented feature tensor is used to train a neural network model, which can improve the neural network model's ability to focus on the target feature region, thereby improving the accuracy of image recognition and solving the technical problem of low image detection accuracy. Attached Figure Description
[0016] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:
[0017] Figure 1 This is a flowchart of a target region detection method according to an embodiment of the present invention;
[0018] Figure 2 This is a schematic diagram of the training process of a neural network model according to an embodiment of the present invention;
[0019] Figure 3 This is a schematic diagram of single-sample region feature replication according to an embodiment of the present invention;
[0020] Figure 4 This is a schematic diagram of cross-sample region feature fusion according to an embodiment of the present invention;
[0021] Figure 5 This is a schematic diagram of a multi-sample region feature combination according to an embodiment of the present invention.
[0022] Figure 6 This is a schematic diagram of a target area detection device according to an embodiment of the present invention. Detailed Implementation
[0023] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0024] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0025] Example 1
[0026] According to an embodiment of the present invention, an embodiment of a target region detection method is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0027] Figure 1 This is a flowchart of a target region detection method according to an embodiment of the present invention, such as... Figure 1 As shown, the method may include the following steps:
[0028] Step S101: Obtain the image to be detected.
[0029] In the technical solution provided by step S101 of the present invention, multiple images acquired by an image acquisition device or point cloud images acquired by radar can be used as sample images and input into a neural network model to train the neural network model. It should be noted that each sample image in the multiple sample images includes a target region. Each image in the multiple sample images can be used as the image to be detected, and the remaining sample images can be used as reference images to train the neural network model multiple times, so that the neural network model can accurately identify the target region included in the image.
[0030] Optionally, during the training of the neural network model, one sample image can be randomly selected from multiple sample images as the image to be detected, and the remaining sample images from the multiple sample images other than the image to be detected can be used as reference images to train the neural network model. After the neural network model is trained using the sample image, another sample image can be randomly selected from the remaining unselected images from the multiple sample images as the image to be detected, and the remaining sample images from the sample images other than the image to be detected can be used as reference images to train the neural network model. Using the same method, each image in the sample images can be used as the image to be detected to train the neural network model.
[0031] For example, suppose there are N sample images. These N sample images can be input into a neural network model all at once. Each of these N sample images includes a target region. After receiving the N sample images, the neural network model can first randomly select one sample image from the N sample images as the image to be detected, and use the remaining N-1 sample images as reference images for training. Similarly, after the neural network model is trained based on the image to be detected, it can be trained again by randomly selecting one image from the remaining N-1 sample images as the image to be detected, and using the other images in the sample images as reference images. According to this method, each of the N sample images can be used as the image to be detected to train the neural network model, thereby enhancing the accuracy of the neural network model in identifying the target region in the image.
[0032] Step S102: Perform feature extraction on the image to be detected to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, and the target feature region is the feature region corresponding to the target region in the image to be detected in the first feature tensor.
[0033] In the technical solution provided by step S102 of the present invention, as can be seen from the foregoing description, each sample image includes a target region. Based on this, after determining the image to be detected, feature extraction can be performed on the image to be detected to obtain a first feature tensor. The first feature tensor includes a target feature region, which is the feature region corresponding to the target region in the image to be detected. The position of the target feature region in the first feature tensor corresponds to the position of the target region in the image to be detected.
[0034] Optionally, the image to be detected can be input into the convolutional layer of the neural network model for convolution operation to obtain the first feature tensor corresponding to the image to be detected.
[0035] Step S103: Perform feature augmentation on the target feature region based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region.
[0036] In the technical solution provided by step S103 of the present invention, as can be seen from the foregoing description, feature extraction is performed on the image to be detected, and the first feature tensor obtained includes the target feature region. Therefore, the target feature region can be augmented based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region.
[0037] Optionally, the preset feature augmentation methods may include single-sample region replication, cross-sample region feature fusion, and multi-sample region feature splicing.
[0038] Optionally, when the preset feature augmentation method is single-sample region feature copying, the target feature region can be translated or rotated in the first feature tensor to obtain a new feature region. Then, the new feature region is fused into the first feature tensor to obtain a second feature tensor. The second feature tensor includes the target feature region and the new feature region, where both the target feature region and the new feature region are feature regions corresponding to the target region in the image to be detected.
[0039] Optionally, when the preset feature augmentation method is cross-sample region feature fusion, a first reference image can be arbitrarily selected from the reference images described in step S101 above. Then, feature extraction is performed on the obtained first reference image to obtain a first reference feature tensor. After that, the first feature tensor is fused with the first reference feature tensor to obtain a second feature tensor. It should be noted that when fusing the first feature tensor with the first reference feature tensor, a splicing or stacking method can be used for fusion. The fused second feature tensor includes a target feature region and a first reference feature region. The position of the target feature region in the first feature tensor corresponds to the position of the target region in the image to be detected, and the position of the first reference feature region in the first reference feature tensor corresponds to the target region in the first reference image.
[0040] Optionally, when the preset feature augmentation method is multi-sample region feature stitching, at least two second reference images can be arbitrarily selected from the reference images described in step S101 above. Then, feature extraction is performed on the at least two second reference images to obtain at least two second reference feature tensors. Each second reference feature tensor includes at least one second reference feature region. After obtaining the first feature tensor and at least two second reference feature tensors, the first feature tensor and at least two second reference feature tensors can be stitched together to obtain a second feature tensor. The second feature tensor includes the target feature region and the second reference feature region. It should be noted that before stitching the first feature tensor and at least two second reference feature tensors, the size of the first feature tensor and each second reference feature tensor can be adjusted based on the target size so that the size of the second feature tensor obtained after stitching is the same as the size of the first feature tensor. The target size can be preset and is not specifically limited here.
[0041] Optionally, the neural network model may include multiple branches, wherein each branch may randomly select one preset feature augmentation method from the above-mentioned multiple preset feature augmentation methods to augment the target feature region in the first feature tensor to obtain a second feature tensor. The feature regions contained in the second feature tensor are different depending on the preset feature augmentation method selected.
[0042] Step S104: Based on the second feature tensor, determine the target label corresponding to the target feature region, wherein the target label is used to indicate the position of the target feature region in the second feature tensor.
[0043] In the technical solution provided by step S104 of the present invention, after obtaining the second feature tensor, the transformation parameters of the target feature region can be determined based on the feature augmentation method corresponding to the target feature region in the second feature tensor. Then, according to the transformation parameters, the original label corresponding to the target feature region is transformed and calculated to obtain the target label corresponding to the target feature region. The target label can be used to indicate the position of the target feature region in the second feature tensor.
[0044] Optionally, as described above, after feature extraction from the image to be detected, the resulting first feature tensor includes the target feature region, which corresponds to the target region in the image to be detected. Then, feature augmentation can be performed on the target feature region based on a preset feature augmentation method to obtain a second feature tensor. The preset feature augmentation methods include single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature stitching. Different preset feature augmentation methods result in different feature regions included in the second feature tensor. As described above, when the preset feature augmentation method is single-sample region feature copying, the resulting second feature tensor includes both the target feature region and the newly added feature region. When the preset feature augmentation method is cross-sample region feature stitching, the resulting second feature tensor includes... The second feature tensor includes a target feature region and a first reference feature region. When the preset augmentation method used is multi-sample region feature concatenation, the resulting second feature tensor includes a target feature region and a second reference feature region. Based on this, the transformation parameters of the target feature region can be determined for the preset feature augmentation method corresponding to the target feature region in the second feature tensor. These transformation parameters can be used to indicate the preset feature augmentation method corresponding to the target feature region. Then, according to the transformation parameters, the original label corresponding to the target feature region is transformed and calculated to obtain the target label corresponding to the target feature region. The original label corresponding to the target feature region is used to indicate the position of the target feature region in the first feature tensor, and the target label of the target feature region is used to indicate the position of the target feature region in the second feature tensor.
[0045] In steps S101 to S104 of this application, an image to be detected is acquired, and features are extracted from the image to obtain a first feature tensor. The first feature tensor includes a target feature region, which is the feature region in the image to be detected corresponding to the target region in the first feature tensor. Then, feature augmentation is performed on the target feature region based on a preset feature augmentation method to obtain a second feature tensor. The second feature tensor includes at least the target feature region. Finally, based on the acquired second feature tensor, a target label corresponding to the target feature region is determined. This target label can be used to indicate the position of the target feature region in the second feature tensor. In other words, in this embodiment of the invention, after feature extraction from the image to be detected, a first feature tensor is obtained, which contains the target feature region. Feature augmentation is then performed on the target feature region to obtain a second feature tensor, which contains at least the target feature region. Training a neural network model based on this second feature tensor can improve the neural network model's ability to focus on the target feature region, thereby improving the accuracy of image recognition and solving the technical problem of low image detection accuracy.
[0046] The method described in this embodiment will be further described below.
[0047] As an optional embodiment, step S103 involves augmenting the target feature region based on a preset feature augmentation method, including augmenting the target feature region according to one of the following operations: single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature splicing.
[0048] In this embodiment, when the preset feature augmentation method is single-sample region feature copying, the target feature region can be translated or rotated in the first feature tensor to obtain a new feature region. Then, the new feature region is fused into the first feature tensor to obtain a second feature tensor. The second feature tensor includes the target feature region and the new feature region. The new feature region contains the same feature information as the target feature region. That is, both the new feature region and the target feature region correspond to the target region in the image to be detected.
[0049] For example, the target feature region in the first feature region can be copied to obtain a new feature region. Then, the new feature region can be translated to another position in the first feature tensor and pasted, or the new feature region can be rotated to another position in the first feature tensor and pasted to obtain a second feature tensor, which includes the target feature region and the new feature region.
[0050] Optionally, when responding to the preset feature augmentation method as cross-sample region feature fusion, at least one first reference image can be obtained from multiple reference images first, and then feature extraction can be performed on the at least one first reference image to obtain a first reference feature tensor. The first reference feature tensor includes at least one first reference feature region. By fusing the first feature tensor with the first reference feature tensor, a second reference feature tensor can be obtained. The second reference feature tensor includes a target feature region and a first reference feature region. The target feature region corresponds to the target region in the image to be detected, and the first reference feature region is the target region in the first reference image.
[0051] For example, at least one first reference image can be randomly selected from multiple reference images. Then, the first reference image is input into the convolutional layer of the neural network model for feature extraction to obtain a first reference feature tensor corresponding to the first reference image. The first reference feature tensor includes a first reference feature region corresponding to the target region in the first reference image. Based on this, the first feature tensor where the target feature region is located can be fused with the first reference feature tensor where the first reference feature region is located to obtain a second feature tensor.
[0052] Optionally, when fusing the first feature tensor and the first reference feature tensor, a concatenation or stacking method can be used. When using the concatenation method, the target feature region can be concatenated into the first reference feature tensor containing the first reference feature region to obtain the second feature tensor. Alternatively, the first reference feature tensor can be concatenated into the first feature tensor containing the target feature region to obtain the second feature tensor. Taking the concatenation of the first feature tensor into the first reference feature tensor containing the first reference feature region as an example, the position coordinates of each point in the target feature region within the three-dimensional coordinate system of the first feature tensor can be determined first. Then, based on the position coordinates of each point in the target feature region and the coordinate transformation relationship between the three-dimensional coordinate systems of the first feature tensor and the first reference feature tensor, the position coordinates of each point in the target feature region can be transformed into the three-dimensional coordinate system of the first reference feature tensor to obtain the second reference feature tensor. Similarly, the same method can be used to concatenate the first reference feature tensor into the first feature tensor containing the target feature region to obtain the second feature tensor.
[0053] Optionally, when using a stacking method for fusion, the first feature tensor containing the target feature region can be directly superimposed on the first reference feature tensor containing the first reference feature region, or the first reference feature tensor can be directly superimposed on the first feature tensor to obtain the second feature tensor.
[0054] Optionally, when responding to the preset feature augmentation method of multi-sample region feature stitching, at least two second reference images can be acquired first, and then feature extraction can be performed on the at least two second reference images to obtain at least two second reference feature tensors. Each second reference feature tensor includes at least one second reference feature region. Then, the size of the first feature tensor and each second reference feature tensor can be adjusted based on the target size to obtain multiple preprocessed feature tensors. By stitching the multiple preprocessed feature tensors together, a second feature tensor can be obtained. The second feature tensor includes a target feature region and a second reference feature region, wherein the target feature region corresponds to the target region in the image to be detected, and the second reference feature region corresponds to the target region in the second reference image.
[0055] For example, at least two second reference images can be randomly selected from multiple reference images. Then, each selected second reference image is input into the convolutional layer of a neural network model for feature extraction, resulting in a second reference feature tensor corresponding to each second reference image. Each second reference feature tensor includes at least one second reference feature region. To ensure that the size of the second feature tensor obtained after concatenating the first feature tensor with at least two second reference feature tensors is the same as the size of the first feature tensor, the sizes of the first feature tensor and the second reference feature tensor can be adjusted based on the target size before concatenation to obtain a preprocessed feature tensor. This preprocessed feature tensor includes a first feature tensor of the target size and a second reference feature tensor of the target size. Then, the preprocessed feature tensors are concatenated, that is, the first feature tensor of the target size and the second reference feature tensor of the target size are concatenated to obtain the second feature tensor.
[0056] Alternatively, when combining the first feature tensor with multiple second reference feature tensors, the size of the feature tensors may not be adjusted, but they may be combined according to the splicing or stacking methods described above. No specific restrictions are imposed here.
[0057] As an optional embodiment, step S104, determining the target label corresponding to the target feature region based on the second feature tensor, includes: determining the transformation parameters of the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor; and performing transformation calculation on the original label corresponding to the target feature region according to the transformation parameters to obtain the target label corresponding to the target feature region.
[0058] In this embodiment, after obtaining the second feature tensor, the transformation parameters of the target feature region can be determined based on the feature augmentation method corresponding to the target feature region in the second feature tensor. These transformation parameters can indicate whether the feature augmentation method used for the target feature region is single-sample region feature replication, cross-sample region feature fusion, or multi-sample region feature concatenation. After determining the transformation parameters, the original label corresponding to the target feature region can be transformed and calculated according to the transformation parameters to obtain the target label corresponding to the target feature region. The original label corresponding to the target feature region indicates the position of the target feature region in the first feature tensor, and the target label indicates the position of the target feature region in the second feature tensor.
[0059] In this embodiment, the target feature regions corresponding to the target regions in the image to be detected are augmented using various preset augmentation methods to obtain a second feature tensor. Since different preset augmentation methods are used, the feature regions included in the second feature tensor are also different. In this way, the neural network model can be trained using the second feature tensor, which can effectively improve the neural network model's ability to focus on and recognize target feature regions in complex backgrounds. This achieves the goal of improving the accuracy and robustness of the neural network model's recognition, realizes the technical effect of improving the accuracy of the neural network model in recognizing images, and solves the technical problem of low image detection accuracy.
[0060] Example 2
[0061] The technical solutions of the embodiments of the present invention will be illustrated below with reference to preferred embodiments.
[0062] With the widespread application of image recognition technology, the accuracy of image recognition has become particularly important. However, due to the varying distances between the image acquisition device and the target object, the size of the area occupied by the target object in the acquired image varies. Furthermore, the image quality of the acquired image is poor due to interference from external environments such as lighting and noise. Therefore, the accuracy of target object detection based on images is relatively low.
[0063] Therefore, to overcome the above problems, a method for improving the detection capability of a neural network model has been proposed in a related technology. This method increases the number of target objects in the training samples by augmenting the images, that is, by stitching or fusing the images. The augmented images are then input into the neural network model to train the model, thereby enhancing the neural network model's ability to recognize target objects. However, this method of stitching sample images only increases the number of training samples for the neural network model, and the training method remains unchanged. Therefore, using this method to train the neural network model cannot substantially improve the accuracy and robustness of the neural network model's recognition.
[0064] However, the target region detection method proposed in this invention extracts features from the image to be detected to obtain a feature tensor corresponding to the image. Then, it augments the target feature regions included in the feature tensor using different feature augmentation methods to obtain multiple target feature tensors. Each target feature tensor includes not only the target feature region but also other feature regions. Based on this, the neural network model is further trained using the target feature tensor, which can improve the neural network model's ability to focus on target feature regions and also improve the neural network model's recognition ability. This achieves the goal of improving the accuracy and robustness of the neural network model's recognition and solves the technical problem of low image detection accuracy.
[0065] The training process of the neural network model provided in the embodiments of the present invention will be further illustrated below:
[0066] Figure 2 This is a schematic diagram illustrating the training process of a neural network model according to an embodiment of the present invention. Figure 2 As shown, sample image 1 can be downsampled multiple times to obtain downsampled sample images 2, 3, and 4, or sample image 4' can be upsampled multiple times to obtain upsampled sample images 3', 2', and 1'. Then, feature extraction is performed on each sample image, and feature augmentation is applied to the extracted sample images to obtain augmented target feature tensors. A loss function is then calculated on the target feature tensors to obtain loss function values. Additionally, the target feature tensors obtained using each feature augmentation method can be identified to obtain a target label. A loss function is then calculated on each target label to obtain a loss function value. Finally, the loss function values calculated on the target feature tensors are compared with the loss function values calculated on the target labels to determine the accuracy of the neural network model's detection results.
[0067] The following examples illustrate several different feature augmentation methods:
[0068] In this embodiment, the feature augmentation method may include: single-sample region feature replication, cross-sample region feature fusion, and multi-sample region feature combination. Figure 3 This is a schematic diagram illustrating single-sample region feature replication according to an embodiment of the present invention. Figure 3As shown, the image to be detected 301 includes a target region 3011. The image to be detected is input into a Convolutional Neural Network (CNN) layer for convolution operations to obtain a feature map 302. This feature map 302 includes a target feature region 3021 corresponding to the target region 3011 in the image to be detected 301. Based on this, the target feature region 3021 in the feature map 302 can be copied multiple times to obtain new feature regions 3022, 3023, 3024, 3025, and 3025. The new feature regions 3022, 3023, and 3025 are then compared and contrasted. 3024. The newly added feature region 3025 is translated or rotated to other positions of the feature map 302 and pasted. The feature map after pasting multiple newly added feature regions is called the target feature map. The target feature map includes the target feature region 3021, the newly added feature region 3022, the newly added feature region 3023, the newly added feature region 3024, and the newly added feature region 3025. Using this target feature map to train the neural network model can make the neural network model more focused on the target region of the target object during the training process, thereby enhancing the accuracy of the neural network model in recognizing the object.
[0069] Figure 4 This is a schematic diagram of cross-sample region feature fusion according to an embodiment of the present invention. Figure 4 As shown, the image to be detected 401 contains a target region 4011, and the reference image 402 contains a target region 4021. The image to be detected 401 and the reference image 402 are simultaneously input into a CNN layer and subjected to inverse convolution operations to obtain feature map 403 and feature map 404. Feature map 403 contains the target feature region 4031 corresponding to the target region 4011 in the image to be detected 4010, and feature map 404 contains the target feature region 4041 corresponding to the target region 4021 in the reference image 402. The target feature region 4041 is copied, and the copied feature region 4042 is pasted into feature map 403 to obtain target feature map 405. Target feature map 405 contains target feature region 4051 and target feature region 4053. Target feature region 4051 is the feature region corresponding to the target region 4011 in the image to be detected 401, and target feature region 4052 is the feature region corresponding to the target region 4021 in the reference image 402. Using the target feature map 405 to train the neural network model can improve the neural network model's ability to recognize target feature regions under different backgrounds.
[0070] Figure 5 This is a schematic diagram of a multi-sample region feature combination according to an embodiment of the present invention. Figure 5As shown, the image to be detected 501 includes a target region 5011, the reference image 502 includes a target region 5021, the reference image 503 includes a target region 5031, and the reference image 504 includes a target region 5041. The image to be detected 501 and the reference images 502, 503, and 504 are simultaneously input into a CNN layer and subjected to inverse convolution operations to obtain feature maps 505, 506, 507, and 508. Feature map 505 contains the target feature region 5051 corresponding to the target region 5011 in the image to be detected 501, feature map 506 contains the target feature region 5061 corresponding to the target region 5021 in the reference image 502, feature map 507 contains the target feature region 5071 corresponding to the target region 5031 in the reference image 503, and feature map 508 contains the target feature region 5081 corresponding to the target region 5041 in the reference image 504. Feature maps 505, 506, 507, and 508 have the same size. After obtaining feature maps 505, 506, 507, and 508, to ensure that the size of the target feature map 5013 obtained after stitching is consistent with the size of feature map 505, the size of each feature map can be adjusted based on the target size to obtain feature maps 509, 5010, 5011, and 512. Then, the feature maps can be stitched together to obtain the target feature map 5013. The target feature map 5013 contains target feature regions 50131, 50132, 50133, and 50134. Target feature region 50131 is the feature region corresponding to target region 5011 in the image to be detected 501; target feature region 50132 is the feature region corresponding to target region 5021 in the reference image 502; target feature region 50133 is the feature region corresponding to target region 5031 in the reference image 503; and target feature region 50134 is the feature region corresponding to target region 5041 in the reference image 504. Using this target feature map 5013 to train a neural network model can improve the neural network model's feature representation of the feature regions and enhance the network model's performance.
[0071] In this embodiment of the invention, a variety of feature augmentation methods are proposed. By using several different feature augmentation methods, the target feature regions corresponding to the target objects in the image to be detected are augmented to obtain target feature maps. Then, the target feature maps are used to train the neural network model, which can enhance the accuracy of the neural network model in recognizing objects and improve the neural network model's ability to recognize target features under different backgrounds. This achieves the technical effect of improving the accuracy of the neural network model in recognizing images and solves the technical problem of inaccurate image recognition.
[0072] Example 3
[0073] According to an embodiment of the present invention, a target region detection device is also provided. It should be noted that this target region detection device can be used to perform the target region detection method in Embodiment 1.
[0074] Figure 6 This is a schematic diagram of a target area detection device according to an embodiment of the present invention. Figure 6 As shown, the target area detection device 600 may include: an acquisition module 601, an extraction module 602, an augmentation module 603, and a determination module 604.
[0075] The acquisition module 601 is used to acquire the image to be detected;
[0076] Extraction module 602 is used to extract features from the image to be detected to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, and the target feature region is the feature region corresponding to the target region in the image to be detected in the first feature tensor;
[0077] Augmentation module 603 is used to augment the target feature region based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region;
[0078] The determination module 604 is used to determine the target label corresponding to the target feature region based on the second feature tensor, wherein the target label is used to indicate the position of the target feature region in the second feature tensor.
[0079] Optionally, augmentation module 603 is used to augment the target feature region by one of the following operations: single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature splicing.
[0080] Optionally, in response to the preset feature augmentation method being single-sample region feature replication, the augmentation module 603 includes: an adjustment unit for translating or rotating the target feature region in a first feature tensor to obtain a new feature region; and a fusion unit for fusing the new feature region into the first feature tensor to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the new feature region.
[0081] Optionally, in response to the preset feature augmentation method being cross-sample region feature fusion, the augmentation module 603 includes: a first acquisition unit for acquiring at least one first reference image; a first extraction unit for extracting features from the at least one first reference image to obtain a first reference feature tensor, wherein the first reference feature tensor includes at least one first reference feature region; and a fusion unit for fusing the first feature tensor with the first reference feature tensor to obtain a second feature tensor, wherein the second feature tensor includes a target feature region and a first reference feature region.
[0082] Optionally, in response to the preset feature augmentation method being multi-sample region feature stitching, the augmentation module 603 includes: a second acquisition unit for acquiring at least two second reference images; a second extraction unit for extracting features from the at least two second reference images to obtain at least two second reference feature tensors, wherein each second reference feature tensor includes at least one second reference feature region; an adjustment unit for adjusting the size of the first feature tensor and each second reference feature tensor based on a target size to obtain multiple preprocessed feature tensors; and a stitching unit for stitching the multiple preprocessed feature tensors to obtain a second feature tensor, wherein the second feature tensor includes a target feature region and a second reference feature region.
[0083] Optionally, the determining module 604 includes: a determining unit, used to determine the transformation parameters of the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor; and a calculation unit, used to perform transformation calculation on the original label corresponding to the target feature region according to the transformation parameters to obtain the target label corresponding to the target feature region.
[0084] In this embodiment, the acquisition module is used to acquire the image to be detected; the extraction module is used to extract features from the image to be detected to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, and the target feature region is the feature region corresponding to the target region in the image to be detected in the first feature tensor; the augmentation module is used to augment the target feature region based on a preset augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region; and the determination module is used to determine the target label corresponding to the target feature region based on the second feature tensor, wherein the target label is used to indicate the position of the target feature region in the second feature tensor. That is, in this embodiment of the invention, the target feature region corresponding to the target region in the image to be detected can be augmented to obtain a second feature tensor, which includes multiple feature regions. Using the second feature tensor to train the neural network model can effectively improve the neural network model's ability to focus on and recognize the target feature region, thereby improving the accuracy and robustness of the neural network model's detection and solving the technical problem of low image detection accuracy.
[0085] Example 4
[0086] According to an embodiment of the present invention, a computer-readable storage medium is also provided, the storage medium including a stored program, wherein, when the program is executed, it controls the device where the computer-readable storage medium is located to execute the target area detection method in embodiment 1.
[0087] Example 5
[0088] According to an embodiment of the present invention, a processor is also provided for running a program, wherein the program executes the target region detection method in embodiment 1 during runtime.
[0089] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0090] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0091] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of modules can be a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection between units or modules, and can be electrical or other forms.
[0092] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0093] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0094] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0095] The above are merely preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A target region detection method, characterized in that, include: Acquire the image to be detected; The image to be detected is input into the convolutional layer of the neural network model for feature extraction to obtain a first feature tensor. The first feature tensor includes a target feature region, which is the feature region in the image to be detected corresponding to the target region in the first feature tensor. The target feature region is augmented based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region; Based on the feature augmentation method corresponding to the target feature region in the second feature tensor, the target label corresponding to the target feature region is determined, wherein the target label is used to indicate the position of the target feature region in the second feature tensor; The feature augmentation of the target feature region based on the preset feature augmentation method includes: randomly performing feature augmentation on the target feature region according to one of the following operations: single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature splicing; In response to the preset feature augmentation method being cross-sample region feature fusion, the step of performing feature augmentation on the target feature region to obtain a second feature tensor includes: acquiring at least one first reference image; extracting features from the at least one first reference image through the convolutional layer of a neural network model to obtain a first reference feature tensor, wherein the first reference feature tensor includes at least one first reference feature region; and fusing the first feature tensor and the first reference feature tensor using a concatenation or stacking method to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the first reference feature region. The step of determining the target label corresponding to the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor includes: determining the transformation parameters of the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor, wherein the transformation parameters are used to indicate the preset feature augmentation method corresponding to the target feature region; and performing transformation calculation on the original label corresponding to the target feature region according to the transformation parameters to obtain the target label corresponding to the target feature region, wherein the original label is used to indicate the position of the target feature region in the first feature tensor.
2. The method according to claim 1, characterized in that, In response to the preset feature augmentation method being single-sample region replication, the step of performing feature augmentation on the target feature region to obtain a second feature tensor includes: The target feature region is translated or rotated in the first feature tensor to obtain a new feature region; The newly added feature region is fused into the first feature tensor to obtain the second feature tensor, wherein the second feature tensor includes the target feature region and the newly added feature region.
3. The method according to claim 1, characterized in that, In response to the preset feature augmentation method being multi-sample region feature concatenation, the step of performing feature augmentation on the target feature region to obtain a second feature tensor includes: Acquire at least two second reference images; Feature extraction is performed on the at least two second reference images to obtain at least two second reference feature tensors, wherein each second reference feature tensor includes at least one second reference feature region; Based on the target size, the first feature tensor and each of the second reference feature tensors are sized to obtain multiple preprocessed feature tensors; The multiple preprocessed feature tensors are concatenated to obtain the second feature tensor, wherein the second feature tensor includes the target feature region and the second reference feature region.
4. A target area detection device, characterized in that, include: The acquisition module is used to acquire the image to be detected; The extraction module is used to input the image to be detected into the convolutional layer of the neural network model for feature extraction to obtain a first feature tensor, wherein the first feature tensor includes a target feature region, and the target feature region is the feature region in the image to be detected corresponding to the target region in the first feature tensor; An augmentation module is used to augment the target feature region based on a preset feature augmentation method to obtain a second feature tensor, wherein the second feature tensor includes at least the target feature region; The determination module is used to determine the target label corresponding to the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor, wherein the target label is used to indicate the position of the target feature region in the second feature tensor; The augmentation module is also used to randomly augment the target feature region by one of the following operations: single-sample region feature copying, cross-sample region feature fusion, and multi-sample region feature splicing; In response to the preset feature augmentation method being cross-sample region feature fusion, the augmentation module is further configured to acquire at least one first reference image; extract features from the at least one first reference image through the convolutional layer of a neural network model to obtain a first reference feature tensor, wherein the first reference feature tensor includes at least one first reference feature region; and fuse the first feature tensor and the first reference feature tensor using a concatenation or stacking method to obtain a second feature tensor, wherein the second feature tensor includes the target feature region and the first reference feature region. The determining module is further configured to determine the transformation parameters of the target feature region based on the feature augmentation method corresponding to the target feature region in the second feature tensor, wherein the transformation parameters are used to indicate the preset feature augmentation method corresponding to the target feature region; and to perform transformation calculation on the original label corresponding to the target feature region according to the transformation parameters to obtain the target label corresponding to the target feature region, wherein the original label is used to indicate the position of the target feature region in the first feature tensor.
5. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein, when the program is executed, it controls the device on which the computer-readable storage medium is located to perform the method according to any one of claims 1 to 3.
6. A processor, characterized in that, The processor is used to run a program, wherein the program is executed by the processor to perform the method according to any one of claims 1 to 3.
7. A vehicle, characterized in that, The vehicle is used to perform the method according to any one of claims 1 to 3.