Multi-target matching method and device, electronic equipment and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UISEE TECH BEIJING LTD
- Filing Date
- 2022-12-07
- Publication Date
- 2026-06-16
Smart Images

Figure CN115830348B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of target tracking technology, and in particular to a multi-target matching method, apparatus, electronic device, and storage medium. Background Technology
[0002] Target matching is a crucial part of multi-target tracking, and accurate multi-target matching is a prerequisite for achieving multi-target tracking.
[0003] Currently, target matching can be performed using deep learning-based feature re-identification metrics, distance metrics, and intersection-union metrics.
[0004] However, the target matching methods mentioned above are all for single targets. In multi-target scenarios, the density of targets and their actual movement can cause matching errors, which in turn lead to subsequent target tracking errors. Summary of the Invention
[0005] To address or at least partially address the aforementioned technical problems, embodiments of this disclosure provide a multi-target matching method, apparatus, electronic device, and storage medium that improve the accuracy and robustness of multi-target matching by considering the interactions and positional relationships between multiple targets.
[0006] In a first aspect, embodiments of this disclosure provide a multi-target matching method, the method comprising:
[0007] Based on the image to be matched and detected, multiple detection bounding boxes are determined;
[0008] For each detection target box, a target field corresponding to the current target box is constructed according to the size of the current target box and the preset target field construction method. The current target box is any one of the plurality of detection target boxes.
[0009] Based on the image to be matched and the target field image, a target field sub-image corresponding to each target field is determined; wherein, the target field image includes each target field;
[0010] Based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched, a similarity matrix is determined, and based on the similarity matrix, the association relationship between each of the detected target boxes and each of the predicted target boxes in the image to be matched is determined;
[0011] Based on the aforementioned correlation, a combination of detection bounding boxes and prediction bounding boxes representing the same target is determined.
[0012] Secondly, embodiments of this disclosure also provide a multi-target matching device, the device comprising:
[0013] The target bounding box determination module is used to determine multiple target bounding boxes based on the image to be matched and detected.
[0014] The target field construction module is used to construct a target field corresponding to each detection target box according to the size of the current target box and a preset target field construction method. The current target box is any one of the plurality of detection target boxes.
[0015] The target field sub-image cropping module is used to determine the detection target field sub-image corresponding to each target field based on the image to be matched and the detection target field image; wherein, the detection target field image includes each of the target fields;
[0016] The target bounding box association module is used to determine a similarity matrix based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched, and to determine the association relationship between each of the detected target bounding boxes and each of the predicted target bounding boxes in the image to be matched based on the similarity matrix.
[0017] The target bounding box combination module is used to determine a combination of detected target bounding boxes and predicted target bounding boxes representing the same target based on the association relationship.
[0018] Thirdly, embodiments of this disclosure also provide an electronic device, the electronic device comprising: one or more processors; a storage device for storing one or more programs; and when the one or more programs are executed by the one or more processors, causing the one or more processors to implement the multi-target matching method as described above.
[0019] Fourthly, embodiments of this disclosure also provide a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the multi-target matching method as described above.
[0020] This disclosure provides a multi-target matching method that detects and identifies multiple target boxes in an image to be matched, constructs a target field for each target box to obtain a target field image, and then crops a target field sub-image from the target field image for each target box. By solving the similarity matrix between each target field sub-image and each predicted target field sub-image, the association relationship between each target box and each predicted target box is obtained to determine that they represent the same target, thus completing the target matching. This method considers the interaction and positional relationship between multiple targets to improve the accuracy and robustness of multi-target matching. Attached Figure Description
[0021] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.
[0022] Figure 1 This is a schematic diagram of the target bounding box at the current time and the target bounding box at the predicted time.
[0023] Figure 2 This is a flowchart of a multi-target matching method according to an embodiment of the present disclosure;
[0024] Figure 3 This is a schematic diagram of a target field in an embodiment of this disclosure;
[0025] Figure 4 This is a flowchart of a target field construction method according to an embodiment of the present disclosure;
[0026] Figure 5 This is a schematic diagram of a first distance and a second distance in an embodiment of this disclosure;
[0027] Figure 6 This is a schematic diagram of each detection target bounding box and the detection target field image in an embodiment of this disclosure;
[0028] Figure 7 This is a schematic diagram of a target field sub-map clipping method in one embodiment of the present disclosure;
[0029] Figure 8 This is a flowchart of a method for constructing a target field subgraph according to an embodiment of this disclosure;
[0030] Figure 9 This is a schematic diagram of the structure of a multi-target matching device according to an embodiment of the present disclosure;
[0031] Figure 10 This is a schematic diagram of the structure of an electronic device according to an embodiment of this disclosure. Detailed Implementation
[0032] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0033] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.
[0034] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
[0035] Figure 1 This is a diagram illustrating the target bounding box at the current time and the target bounding box at the predicted time. For example... Figure 1 As shown, in the same coordinate system, the solid line target box represents the target box identified in the previous time step, and the dashed line target box represents the target box identified in the current time step. If target matching is performed according to the principle of maximum IOU (Intersection over Union) or nearest distance, a complete match error may occur due to the target's position shift.
[0036] To address the aforementioned issues, this disclosure provides a multi-target matching method that considers the interactions and positional relationships between multiple targets and uses these as important matching features to improve the accuracy of target matching.
[0037] Figure 2 This is a flowchart illustrating a multi-target matching method according to an embodiment of the present disclosure. The method can be executed by a multi-target matching method apparatus, which can be implemented in software and / or hardware, and can be configured in an electronic device. Figure 2 As shown, the method may specifically include the following steps:
[0038] S110. Based on the image to be matched and detected, determine multiple detection target boxes.
[0039] The image to be matched and detected can be the image for which target tracking and matching are performed. The detected target bounding boxes can be multiple target bounding boxes obtained after target detection is performed on the image to be matched and detected.
[0040] Specifically, images that have been actually collected but have not been tracked and detected can be used as images to be matched and detected. Target detection is then performed on these images to determine multiple target bounding boxes on them.
[0041] For example, the detection bounding box can be a two-dimensional bounding box to represent the detection target, and can be represented by cls, xc, yc, w, h, and θ. Here, cls represents the category of the detection target, xc and yc are the x and y coordinates of the center of the detection bounding box, respectively, w and h are the length and width of the detection bounding box, respectively, and θ represents the tilt of the detection bounding box.
[0042] S120. For each detected target bounding box, construct a target field corresponding to the current target bounding box based on the size of the current target bounding box and the preset target field construction method.
[0043] The current target bounding box can be any one of multiple detected target bounding boxes. The dimensions of the current target bounding box can include its length and width. The target field construction method can be a method used to construct the target field for the current target bounding box, such as constructing it according to a preset function or a preset model. The target field can be the result of the preset target field construction method, composed of the target field intensity of each pixel, with the target field intensity gradually decreasing from the center of the current target bounding box towards the boundary. The target field describes the strength of the current target's influence on other targets, and can include boundary scale and interaction information between multiple targets. The tilt of the target field is the same as the tilt of the corresponding target bounding box.
[0044] Specifically, the target field can be constructed using the same method for each detected bounding box. Therefore, taking any one of the detected bounding boxes as an example, we will use it as the current bounding box for illustration. Subsequently, other detected bounding boxes can be used as the current bounding boxes to construct the target field in the same way. The target field intensity corresponding to each pixel within the current bounding box can be calculated based on its size and the preset target field construction method. Then, based on the target field intensity corresponding to each pixel, a target field with the same size as the current bounding box can be constructed. An example of a target field diagram is shown below. Figure 3 As shown, by Figure 3 As can be seen, the target field intensity gradually decreases from the center of the current target box towards the boundary. This decrease can occur in various ways, such as uniform attenuation, inverse proportional attenuation, or exponential attenuation.
[0045] Figure 4 This is a flowchart of a target field construction method according to an embodiment of this disclosure. Based on the above example, the following steps S1201-S1203 can be used to construct a target field corresponding to the current target box according to the size of the current target box and a preset target field construction method.
[0046] S1201. For a pixel within the current target box, determine the first distance from the pixel to the long boundary of the current target box and the second distance from the pixel to the wide boundary of the current target box.
[0047] Here, the long boundary of the current target box can be understood as one edge within the current target box, and the wide boundary can be understood as another edge perpendicular to the long boundary. The first distance can be the distance from a pixel to the long boundary of the current target box. The second distance can be the distance from a pixel to the wide boundary of the current target box.
[0048] Specifically, for each pixel within the current target bounding box, the first and second distances corresponding to that pixel can be determined using the same method. Therefore, taking any one pixel as an example, the first and second distances for other pixels in the current target bounding box can be determined in the same way. The distance from the pixel to the long boundary of the current target bounding box is used as the first distance, and the distance from the pixel to the wide boundary of the current target bounding box is used as the second distance.
[0049] For example, such as Figure 5 The diagram shows the first and second distances, where xc and yc are the horizontal and vertical coordinates of the center of the current target box, respectively, w and h are the length and width of the current target box, θ represents the tilt of the current target box, t is the first distance corresponding to pixel A, and l is the second distance corresponding to pixel A.
[0050] S1202. Determine the target field intensity corresponding to the pixel based on the first distance, the second distance, and the current target box size.
[0051] Specifically, for each pixel, the first distance, the second distance, and the size of the current target box corresponding to that pixel are substituted into the pre-determined target field intensity determination formula to obtain the target field intensity corresponding to that pixel.
[0052] It should be noted that the formula for determining the target field intensity can be a formula constructed based on multiple experimental verifications and calculations.
[0053] Based on the above example, the target field intensity corresponding to the pixel is determined according to the first distance, the second distance, and the size of the current target box. Specifically, it can be:
[0054] The target field intensity corresponding to a pixel is determined using the following formula:
[0055]
[0056] Where f(x,y) represents the target field intensity of the pixel at coordinates (x,y), l represents the first distance, t represents the second distance, w represents the width of the current target box, and h represents the length of the current target box.
[0057] S1203. Construct a target field corresponding to the current target box based on the target field intensity corresponding to each pixel.
[0058] Specifically, by arranging the target field intensity corresponding to each pixel according to the position of each pixel, the target field corresponding to the current target box can be obtained.
[0059] S130. Based on the image to be matched and the image of the target field to be detected, determine the target field sub-image corresponding to each target field.
[0060] The target field image can include various target fields. The target field sub-image can be a sub-image obtained by cropping the target field image for each target field.
[0061] Understandably, the target field subgraph may include all or part of the current target field, and in addition, it may include all or part of other target fields.
[0062] Specifically, based on the dimensions of each detection bounding box in the image to be matched, the distance between each detection bounding box, and the dimensions of the image to be matched, the dimensions of the detection target field sub-image corresponding to each detection bounding box can be determined according to a pre-determined method for determining the dimensions of the detection target field sub-image. That is, the dimensions of the detection target field sub-image corresponding to each target field are determined. For each target field, based on its position and the dimensions of the corresponding detection target field sub-image, the detection target field sub-image corresponding to that target field is determined in the image of the target field.
[0063] For example, schematic diagrams of each detection bounding box and the detection target field image are shown below. Figure 6 As shown. Figure 7 The diagram shown illustrates the cropping of the target field sub-image, where wsub represents the width of the sub-image and hsub represents the length of the sub-image.
[0064] Figure 8 This is a flowchart of a method for constructing a target field subgraph according to an embodiment of this disclosure. Based on the above example, the following steps S1301-S1302 can be used to determine the target field subgraph corresponding to each target bounding box based on the image to be matched and the target field image.
[0065] S1301. Determine the sub-image size based on the distance between each detection target box in the image to be matched and detected.
[0066] The sub-image size can be the size of the subsequent detection target field sub-image.
[0067] It should be noted that, in order to accurately consider the distribution characteristics of each target in the image to be matched and detected, the sub-image size is a crucial parameter affecting subsequent matching results. Existing technologies typically select a fixed sub-image size. However, using a fixed sub-image size suffers from poor adaptation to the image to be matched and to each detected target bounding box, leading to poor accuracy in subsequent multi-target matching and tracking. Therefore, a method is proposed to adaptively determine the sub-image size based on the distance between the detected target bounding boxes in the image to be matched and detected.
[0068] Specifically, based on the detection bounding boxes in the image to be matched, the distance between the centers of every two detection bounding boxes can be calculated. Then, based on the determined distances between the centers of every two detection bounding boxes, the distribution of each detection bounding box is determined, and the sub-image size is further determined. The distance between the centers of every two detection bounding boxes can be Euclidean distance, etc.
[0069] Based on the above example, the sub-image size can be determined according to the distance between the detection bounding boxes in the image to be matched and detected, as follows:
[0070] For each detected bounding box, determine the nearest bounding box corresponding to the current bounding box, and determine the distance between the current bounding box and the nearest bounding box as the candidate distance corresponding to the current bounding box; determine the maximum value among the candidate distances corresponding to each detected bounding box as the statistical distance; determine the sub-image size based on the statistical distance and empirical size.
[0071] The nearest bounding box can be the detected bounding box that is closest to the current bounding box. The candidate distance can be the distance between the center of the current bounding box and the center of the nearest bounding box. The statistical distance can be the maximum value among the candidate distances. The empirical size can be pre-defined size information.
[0072] Specifically, each detected bounding box can be used as the current bounding box to calculate candidate distances. Taking one detected bounding box as an example, this box is designated as the current bounding box, and the distances between its center and the centers of all other detected bounding boxes are calculated. The bounding box with the minimum distance among these is selected as the nearest bounding box, and this minimum distance is used as a candidate distance. Furthermore, candidate distances can be obtained for each detected bounding box, and the maximum value among these candidate distances is taken as the statistical distance. Based on a predetermined empirical size and sub-image size determination method, the statistical distance is processed to obtain the sub-image size.
[0073] Based on the above example, the empirical dimensions include the maximum and minimum length, maximum and minimum width. Therefore, the subfigure dimensions can be determined as follows:
[0074] The maximum value among the statistical distance and minimum width is determined as the first process value; the minimum value among the first process value and the maximum width value is determined as the width in the sub-figure size; the maximum value among the statistical distance and minimum length is determined as the second process value; the minimum value among the second process value and the maximum length value is determined as the length in the sub-figure size.
[0075] Specifically, the sub-figure size can be determined using the following formula:
[0076] wsub = min(wmax, max(r, wmin))
[0077] hsub = min(hmax, max(r, hmin))
[0078] Where wsub represents the width of the subplot size, hsub represents the length of the subplot size, wmin represents the minimum width of the empirical size, wmax represents the maximum width of the empirical size, hmin represents the minimum length of the empirical size, hmax represents the maximum length of the empirical size, the empirical size is the size set by human experience, and r represents the statistical distance.
[0079] It is understandable that max(r, wmin) represents the first process value, and max(r, hmin) represents the second process value.
[0080] The purpose of introducing empirical dimensions is to ensure that the subgraph size is neither too large nor too small. Specifically, using the above formula, regardless of the value of the statistical distance r, wsub will always be within the interval [wmin, wmax]. If r < wmin, then wsub = wmin; if r > wmax, then wsub = wmax. Furthermore, hsub will always be within the interval [hmin, hmax]. If r < hmin, then hsub = hmin; if r > hmax, then hsub = hmax.
[0081] S1302. Based on the sub-image size and the target field image, determine the target field sub-image corresponding to each target field.
[0082] Specifically, for each target field, the image is cropped with the center of the target field as the center point and the length and width as the sub-image size, and used as the target field sub-image for detection.
[0083] For example, such as Figure 7The diagram shows a cropping illustration of the target field subimage, where wsub represents the width and hsub represents the length. For the target field, an image with a width of wsub and a length of hsub, centered at the center of the target field, is used as the target field subimage. It should be noted that if the range of the target field subimage exceeds the range of the image to be matched during cropping, the excess portion will be padded with a preset value, such as 0.
[0084] S140. Based on each detected target field sub-image and each predicted target field sub-image corresponding to the image to be matched, determine the similarity matrix, and based on the similarity matrix, determine the association between each detected target box and each predicted target box in the image to be matched.
[0085] It should be noted that the construction method of the predicted target field subgraph is similar to that of the detected target field subgraph, specifically:
[0086] The target locations are predicted from the target bounding boxes in a preset number of historical detection images. For each predicted target bounding box, a target field corresponding to the current predicted bounding box is constructed according to the size of the current predicted bounding box and the preset target field construction method. A predicted target field image is constructed according to the target fields corresponding to each predicted target bounding box. Based on the detection image to be matched and the predicted target field image, a predicted target field sub-image corresponding to each predicted target bounding box is determined.
[0087] The historical detection images can be previously acquired images of the image to be matched and detected. These historical detection images contain labeled target tracking results, and the preset number can be at least one frame. The similarity matrix can be a matrix composed of the similarities between each detected target field sub-image and each predicted target field sub-image. The association relationship can be an association between a detected target bounding box and a predicted target bounding box.
[0088] For example, if the preset number is one frame, the predicted target box after the historical target box has moved can be predicted based on the moving speed and direction of the target corresponding to each historical target box in the predetermined historical detection images, and then combined with the acquisition time interval between the historical prediction image and the detection image to be matched. If the preset number is two frames or more, the moving speed and direction of each target can be predicted based on the position of each tracked target in the preset number of historical detection images, and then the predicted target box after the historical target box has moved can be predicted.
[0089] Specifically, for the image to be matched and detected, the similarity between each detected target field sub-image and each predicted target field sub-image can be calculated, and a similarity matrix can be constructed based on these similarities. Furthermore, by solving the similarity matrix, the association between each detected target box and each predicted target box can be determined. The solution method can be based on algorithms such as the Hungarian algorithm.
[0090] S150. Determine the combination of detection bounding boxes and prediction bounding boxes that represent the same target based on the correlation relationship.
[0091] Specifically, based on the determined association, the detected target boxes and the predicted target boxes that have an association can be regarded as target boxes representing the same target. Specifically, the target corresponding to the predicted target box can be regarded as the target corresponding to the associated detected target box, so as to perform target matching and target tracking.
[0092] Based on the above example, the similarity matrix can be determined using the following paradigm: (The model is missing from the provided text.)
[0093] For each detected target field subgraph, determine the similarity between the current detected target field subgraph and each predicted target field subgraph. Based on each similarity, construct a similarity vector corresponding to the current detected target field subgraph. Based on the similarity vectors corresponding to each detected target field subgraph, construct a similarity matrix.
[0094] The similarity vector can be a vector composed of the similarities between the current detected target field subgraph and each predicted target field subgraph. The current detected target field subgraph can be any one of the multiple detected target field subgraphs.
[0095] Specifically, for each detected target field subgraph, it can be used as the current detected target field subgraph, and the following operations can be performed: Calculate the similarity between the current detected target field subgraph and each predicted target field subgraph according to a pre-defined similarity calculation method, and arrange and combine these similarities in order to obtain a similarity vector corresponding to the current detected target field subgraph. Then, combining the similarity vectors corresponding to each detected target field subgraph yields a similarity matrix.
[0096] For example, the pre-defined similarity calculation method can be any existing similarity calculation method, such as cosine similarity, Euclidean distance, Manhattan distance, etc.
[0097] Optionally, the pre-defined similarity calculation method is SSIM (Structural Similarity), and the SSIM formula is shown below:
[0098]
[0099] Where x represents the detected target field subgraph, y represents the predicted target field subgraph, and μ x μ represents the average value of x. y σ represents the average value of y. x 2 σ represents the variance of x. y 2 Let represent the variance of y, and c1 and c2 be two constants used to maintain stability, where c1 = (k1L). 2 c2 = (k2L) 2 L represents the dynamic range of pixel values, k1 has a default value of 0.01, and k2 has a default value of 0.03.
[0100] The multi-target matching method provided in this embodiment detects and identifies multiple target boxes in the image to be matched, and constructs a target field for each target box to obtain a target field image. Then, for each target box, a target field sub-image is cropped from the target field image. By solving the similarity matrix between each target field sub-image and each predicted target field sub-image, the association relationship between each target box and each predicted target box is obtained to determine that they represent the same target, thus completing the target matching. This method considers the interaction and positional relationship between multiple targets to improve the accuracy and robustness of multi-target matching.
[0101] Figure 9 This is a schematic diagram of the structure of a multi-target matching device according to an embodiment of this disclosure. Figure 9 As shown: The device includes: a target bounding box determination module 710, a target field construction module 720, a target field subgraph clipping module 730, a target bounding box association module 740, and a target bounding box combination module 750.
[0102] The system includes the following modules: a detection target bounding box determination module 710, used to determine multiple detection target bounding boxes based on the image to be matched; a target field construction module 720, used to construct a target field corresponding to each detection target bounding box based on the size of the current target bounding box and a preset target field construction method, wherein the current target bounding box is any one of the multiple detection target bounding boxes; a target field sub-image cropping module 730, used to determine a detection target field sub-image corresponding to each target field based on the image to be matched and the detection target field image, wherein the detection target field image includes each target field; a target bounding box association module 740, used to determine a similarity matrix based on each detection target field sub-image and each predicted target field sub-image corresponding to the image to be matched, and to determine the association relationship between each detection target bounding box and each predicted target bounding box in the image to be matched based on the similarity matrix; and a target bounding box combination module 750, used to determine a combination of detection target bounding boxes and predicted target bounding boxes representing the same target based on the association relationship.
[0103] The multi-target matching device provided in this embodiment detects and identifies multiple target boxes in the image to be matched, and constructs a target field for each target box to obtain a target field image. Then, for each target box, a target field sub-image is cropped from the target field image. By solving the similarity matrix between each target field sub-image and each predicted target field sub-image, the association relationship between each target box and each predicted target box is obtained to determine that they represent the same target, thus completing the target matching. This device considers the interaction and positional relationship between multiple targets to improve the accuracy and robustness of multi-target matching.
[0104] Optionally, the target field construction module 720 is further configured to: determine a first distance from the pixel to the long boundary of the current target box and a second distance from the pixel to the wide boundary of the current target box; determine the target field intensity corresponding to the pixel based on the first distance, the second distance and the size of the current target box; and construct a target field corresponding to the current target box based on the target field intensity corresponding to each pixel.
[0105] Optionally, the target field construction module 720 is further configured to determine the target field intensity corresponding to the pixel according to the following formula:
[0106]
[0107] Where f(x,y) represents the target field intensity of the pixel at coordinates (x,y), l represents the first distance, t represents the second distance, w represents the width of the current target box, and h represents the length of the current target box.
[0108] Optionally, the target field sub-image cropping module 730 is further configured to determine the sub-image size based on the distance between each detection target box in the image to be matched and detected; and to determine the detection target field sub-image corresponding to each target field based on the sub-image size and the detection target field image.
[0109] Optionally, the target field sub-image clipping module 730 is further configured to, for each detected target box, determine the nearest target box corresponding to the current target box, and determine the distance between the current target box and the nearest target box as the candidate distance corresponding to the current target box; determine the maximum value among the candidate distances corresponding to each detected target box as the statistical distance; and determine the sub-image size based on the statistical distance and the empirical size.
[0110] Optionally, the empirical dimensions include a maximum length, a minimum length, a maximum width, and a minimum width. The target field sub-image cropping module 730 is further configured to: determine the maximum value among the statistical distance and the minimum width as a first process value; determine the minimum value among the first process value and the maximum width as the width in the sub-image dimensions; determine the maximum value among the statistical distance and the minimum length as a second process value; and determine the minimum value among the second process value and the maximum length as the length in the sub-image dimensions.
[0111] Optionally, the target bounding box association module 740 is further configured to, for each detected target field subgraph, determine the similarity between the current detected target field subgraph and each predicted target field subgraph, construct a similarity vector corresponding to the current detected target field subgraph based on the similarity values, and construct a similarity matrix based on the similarity vectors corresponding to each detected target field subgraph.
[0112] The multi-target matching apparatus provided in this disclosure can execute the steps in the multi-target matching method provided in this disclosure, and has the execution steps and beneficial effects, which will not be described in detail here.
[0113] Figure 10 This is a schematic diagram of the structure of an electronic device according to an embodiment of this disclosure. See below for details. Figure 10 It shows a schematic diagram of a structure suitable for implementing the electronic device 800 in the embodiments of this disclosure. Figure 10 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.
[0114] like Figure 10As shown, the electronic device 800 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 801, which can perform various appropriate actions and processes to implement the methods of the embodiments described herein, based on a program stored in a read-only memory (ROM) 802 or a program loaded from a storage device 808 into a random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the electronic device 800. The processing device 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.
[0115] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts, thereby implementing the positioning method as described above. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 809, or installed from a storage device 808, or installed from a ROM 802. When the computer program is executed by the processing device 801, it performs the functions defined in the methods of embodiments of this disclosure.
[0116] It should be noted that the computer-readable medium described in this disclosure can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.
[0117] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device. The aforementioned computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to:
[0118] Based on the image to be matched and detected, multiple detection bounding boxes are determined;
[0119] For each detection target box, a target field corresponding to the current target box is constructed according to the size of the current target box and the preset target field construction method. The current target box is any one of the plurality of detection target boxes.
[0120] Based on the image to be matched and the target field image, a target field sub-image corresponding to each target field is determined; wherein, the target field image includes each target field;
[0121] Based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched, a similarity matrix is determined, and based on the similarity matrix, the association relationship between each of the detected target boxes and each of the predicted target boxes in the image to be matched is determined;
[0122] Based on the aforementioned correlation, a combination of detection bounding boxes and prediction bounding boxes representing the same target is determined.
[0123] Optionally, when one or more of the above-described procedures are executed by the electronic device, the electronic device may also perform other steps described in the above embodiments.
[0124] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0125] The above description is merely a preferred embodiment of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features disclosed in this disclosure that have similar functions.
Claims
1. A multi-target matching method, characterized in that, The method includes: Based on the image to be matched and detected, multiple detection bounding boxes are determined; For each detection bounding box, based on the size of the current bounding box and a preset target field construction method, the target field intensity corresponding to each pixel is calculated and determined according to the target field intensity corresponding to each pixel. A target field corresponding to the current bounding box is then constructed based on the target field intensity corresponding to each pixel. The current bounding box can be any one of the plurality of detection bounding boxes. The target field describes the strength of the current target's influence on other targets and is composed of the target field intensity of each pixel. The target field intensity gradually decreases from the center of the current bounding box towards the boundary. For each pixel within the current bounding box, a first distance from the pixel to the long boundary of the current bounding box and a second distance from the pixel to the wide boundary of the current bounding box are determined. Based on the first distance, the second distance, and the size of the current bounding box, the target field intensity corresponding to the pixel is determined. Based on the target field intensity corresponding to each pixel, a target field corresponding to the current bounding box is constructed. Based on the image to be matched and the target field image, a target field sub-image corresponding to each target field is determined; wherein, the target field image includes each target field; Based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched, a similarity matrix is determined, and based on the similarity matrix, the association relationship between each of the detected target boxes and each of the predicted target boxes in the image to be matched is determined; Based on the aforementioned correlation, a combination of detection bounding boxes and prediction bounding boxes representing the same target is determined.
2. The method according to claim 1, characterized in that, Determining the target field intensity corresponding to the pixel based on the first distance, the second distance, and the size of the current target bounding box includes: The target field intensity corresponding to the pixel is determined according to the following formula: in, Indicates coordinates as The target field intensity of the pixel, l represents the first distance, t represents the second distance, w represents the width of the current target box, and h represents the length of the current target box.
3. The method according to claim 1, characterized in that, The step of determining the detection target field sub-image corresponding to each target field based on the image to be matched and the detection target field image includes: The sub-image size is determined based on the distance between the detection target boxes in the image to be matched and detected; Based on the sub-image size and the detected target field image, a detection target field sub-image corresponding to each target field is determined.
4. The method according to claim 3, characterized in that, The step of determining the sub-image size based on the distance between each detection target box in the image to be matched includes: For each detected bounding box, determine the nearest bounding box corresponding to the current bounding box, and determine the distance between the current bounding box and the nearest bounding box as the candidate distance corresponding to the current bounding box; The maximum value among the candidate distances corresponding to each detected target box is determined as the statistical distance; The subgraph size is determined based on the statistical distance and empirical dimensions.
5. The method according to claim 4, characterized in that, The empirical dimensions include a maximum length, a minimum length, a maximum width, and a minimum width. Determining the sub-image size based on the statistical distance and the empirical dimensions includes: The maximum value between the statistical distance and the minimum width is determined as the first process value; The minimum value between the first process value and the maximum width value is determined to be the width in the sub-image size; The maximum value between the statistical distance and the minimum length value is determined as the second process value; The minimum value between the second process value and the maximum length value is determined to be the length in the subgraph dimension.
6. The method according to claim 1, characterized in that, The step of determining the similarity matrix based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched includes: For each detected target field subgraph, the similarity between the current detected target field subgraph and each predicted target field subgraph is determined, and a similarity vector corresponding to the current detected target field subgraph is constructed based on the similarity values. A similarity matrix is constructed based on the similarity vectors corresponding to each detection target field subgraph.
7. A multi-target matching device, characterized in that, The device includes: The target bounding box determination module is used to determine multiple target bounding boxes based on the image to be matched and detected. The target field construction module is used to calculate the target field intensity corresponding to each pixel of each detected target bounding box based on the size of the current target bounding box and a preset target field construction method. The module then constructs a target field corresponding to the current target bounding box based on the target field intensity corresponding to each pixel. The current target bounding box can be any one of the plurality of detected target bounding boxes. The target field describes the strength of the current target's influence on other targets and is composed of the target field intensities of each pixel. The target field intensity gradually decreases from the center of the current target bounding box towards the boundary. For each pixel within the current target bounding box, the module determines a first distance from the pixel to the long boundary of the current target bounding box and a second distance from the pixel to the wide boundary of the current target bounding box. Based on the first distance, the second distance, and the size of the current target bounding box, the module determines the target field intensity corresponding to the pixel. Finally, based on the target field intensities corresponding to each pixel, the module constructs a target field corresponding to the current target bounding box. The target field sub-image cropping module is used to determine the detection target field sub-image corresponding to each target field based on the image to be matched and the detection target field image; wherein, the detection target field image includes each of the target fields; The target bounding box association module is used to determine a similarity matrix based on each of the detected target field sub-graphs and each of the predicted target field sub-graphs corresponding to the image to be matched, and to determine the association relationship between each of the detected target bounding boxes and each of the predicted target bounding boxes in the image to be matched based on the similarity matrix. The target bounding box combination module is used to determine a combination of detected target bounding boxes and predicted target bounding boxes representing the same target based on the association relationship.
8. An electronic device, characterized in that, The electronic device includes: One or more processors; Storage device for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the multi-target matching method as described in any one of claims 1-6.
9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the multi-target matching method as described in any one of claims 1-6.