A target positioning system and a target positioning method
By combining image acquisition, processing, and display modules, the problem of inaccurate local area distribution in existing technologies has been solved, enabling intuitive and accurate distribution display of target objects in exhibition halls and other venues.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BOE TECHNOLOGY GROUP CO LTD
- Filing Date
- 2023-03-06
- Publication Date
- 2026-06-30
Smart Images

Figure CN116245941B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing, and more particularly to a target localization system and a target localization method. Background Technology
[0002] When determining the distribution of target objects within a specified area, the distribution can usually be judged based on the number of target objects in that area. For example, when determining the density of people in an exhibition hall or other similar scenarios (such as a shopping mall), the number of people in the exhibition hall can be determined based on images collected by the monitoring equipment at the entrances and exits. If the number of people exceeds a certain set threshold, the exhibition hall is considered to be relatively densely populated. However, this method cannot accurately reflect the distribution of people in a local area. Summary of the Invention
[0003] This invention provides a target positioning system and a target positioning method to address the shortcomings of related technologies.
[0004] According to a first aspect of the present invention, a target positioning system is provided, the system comprising: at least one image acquisition device, the image acquisition device being used to acquire an image of a target area to be processed;
[0005] An image processing module is used to obtain the image coordinates of a target object in the image to be processed, and to transform the image coordinates to obtain the spatial coordinates of the target object;
[0006] The results display module is used to display the spatial coordinates of the target object on the map corresponding to the target area.
[0007] In some embodiments, the image processing module is specifically used for:
[0008] Perform moving object detection on the image to be processed to obtain moving object detection results;
[0009] In response to the moving object detection result including a moving object detection box, target detection is performed on the image to be processed to obtain a target object detection box;
[0010] In response to the overlapping area between the moving object detection box and the corresponding target object detection box meeting the set requirements, the image coordinates of the target object are obtained based on the target object detection box.
[0011] In some embodiments, the target object includes a human body, and the image processing module is specifically used for:
[0012] The image coordinates of the human body are determined based on the position information of the human body's feet in the target object detection frame.
[0013] In some embodiments, the image processing module is specifically used for:
[0014] If the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, the detection conditions for moving object detection are stopped from being updated. The detection conditions are used to indicate whether each pixel in the image to be processed belongs to a moving object or the background.
[0015] In some embodiments, the image processing module is specifically used for:
[0016] Obtain the first transformation relationship corresponding to the image acquisition device, the first transformation relationship being used to indicate the relationship between image coordinates in the image coordinate system and spatial coordinates in the world coordinate system;
[0017] The image coordinates are converted into spatial coordinates according to the first transformation relationship.
[0018] In some embodiments, the system further includes a calibration module;
[0019] The calibration module is used to acquire a calibration image acquired by the image acquisition device. The calibration image includes at least four marker points, and each marker point corresponds to a spatial coordinate.
[0020] Obtain the image coordinates of each marker point in the calibration image;
[0021] The first transformation relationship corresponding to the image acquisition device is determined based on the image coordinates and spatial coordinates of at least four marker points.
[0022] In some embodiments, a calibration plate of a predetermined size is placed at the marking point, and the center point of the calibration plate coincides with the marking point; the calibration module is specifically used for:
[0023] Obtain the spatial coordinates of the marker point;
[0024] Based on the spatial coordinates of the marker points and the dimensions of the calibration board, determine the spatial coordinates of at least one vertex in the calibration board;
[0025] Obtain the image coordinates of the vertex from the calibration image;
[0026] The first transformation relationship corresponding to the image acquisition device is determined based on at least four sets of coordinate pairs, wherein each set of coordinate pairs includes spatial coordinates and image coordinates corresponding to the spatial coordinates.
[0027] In some embodiments, where the acquisition viewpoints of the first image acquisition device and the second image acquisition device partially overlap, the calibration module is further configured to:
[0028] Acquire a first image captured by the first image acquisition device and a second image captured by the second image acquisition device, wherein the first image and the second image contain the same reference object;
[0029] Based on the first image coordinates of the reference object in the first image and the second image coordinates in the second image, a second transformation relationship between the first image acquisition device and the second image acquisition device is determined.
[0030] Obtain the first conversion relationship corresponding to the first image acquisition device;
[0031] Based on the first conversion relationship of the first image acquisition device and the second conversion relationship between the first image acquisition device and the second image acquisition device, the first conversion relationship of the second image acquisition device is determined.
[0032] In some embodiments, the image processing module is applied to an edge box, the edge box is connected to at least one image acquisition device, and the result display module is applied to a terminal device;
[0033] The terminal device is also used to send processing instructions to the edge box according to user instructions;
[0034] The edge box is also used to acquire the image to be processed acquired by the image acquisition device indicated by the processing instruction.
[0035] In some embodiments, the edge box includes multiple image processing modules, each corresponding to an image acquisition device.
[0036] According to a second aspect of the present invention, a target localization method is provided, the method comprising:
[0037] Acquire an image of the target area to be processed, captured by at least one image processing device;
[0038] Obtain the image coordinates of the target object in the image to be processed, and transform the image coordinates to obtain the spatial coordinates of the target object;
[0039] The spatial coordinates of the target object are sent to the result display module, which then displays the spatial coordinates of the target object on the map corresponding to the target area.
[0040] In some embodiments, obtaining the image coordinates of the target object in the image to be processed includes:
[0041] Perform moving object detection on the image to be processed to obtain moving object detection results;
[0042] In response to the moving object detection result including a moving object detection box, target detection is performed on the image to be processed to obtain a target object detection box;
[0043] In response to the overlapping area between the moving object detection box and the corresponding target object detection box meeting the set requirements, the image coordinates of the target object are obtained based on the target object detection box.
[0044] In some embodiments, the target object includes a human body, and obtaining the image coordinates of the target object based on the target object detection box includes:
[0045] The image coordinates of the human body are determined based on the position information of the human body's feet in the target object detection frame.
[0046] In some embodiments, the method further includes:
[0047] If the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, the detection conditions for moving object detection are stopped from being updated. The detection conditions are used to indicate whether each pixel in the image to be processed belongs to a moving object or the background.
[0048] As can be seen from the above embodiments, the target positioning system provided by the present invention acquires an image of the target area to be processed through at least one image acquisition device; obtains the image coordinates of the target object in the image to be processed using an image processing module, and transforms the image coordinates to obtain the spatial coordinates of the target object; and displays the spatial coordinates of the target object on a map corresponding to the target area using a result display module. By displaying the spatial coordinates of the target object on a map corresponding to the target area, the distribution of the target object in the target area can be seen intuitively and accurately.
[0049] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit the invention. Attached Figure Description
[0050] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
[0051] Figure 1 This is a schematic diagram of a target positioning system according to an embodiment of the present invention.
[0052] Figure 2 This is a schematic diagram of a human body detection frame according to an embodiment of the present invention.
[0053] Figure 3 This is a schematic diagram of the mapping results displayed by a result display module according to an embodiment of the present invention.
[0054] Figure 4 This is a schematic diagram illustrating an embodiment of the present invention for obtaining image coordinates.
[0055] Figure 5 This is a schematic diagram of a target object detection box according to an embodiment of the present invention.
[0056] Figure 6 This is a schematic diagram illustrating a calibration of a first transformation relationship according to an embodiment of the present invention.
[0057] Figure 7 This is a schematic diagram of a target positioning system according to an embodiment of the present invention.
[0058] Figure 8 This is a schematic diagram illustrating the interaction between threads in an image processing module according to an embodiment of the present invention.
[0059] Figure 9 This is a schematic diagram illustrating a target localization method according to an embodiment of the present invention. Detailed Implementation
[0060] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention as detailed in the appended claims.
[0061] To determine the distribution of target objects within a target area, the present invention provides a target positioning system, the following embodiments of which will be described in conjunction with the accompanying drawings.
[0062] Figure 1 This is a schematic diagram of a target positioning system according to an embodiment of the present invention, such as... Figure 1 As shown, the target positioning system provided by the present invention includes: at least one image acquisition device 101, an image processing module 102, and a result display module 103.
[0063] The image acquisition device 101 is used to acquire the image to be processed in the target area; the image processing module 102 is used to obtain the image coordinates of the target object in the image to be processed, and to convert the image coordinates to obtain the spatial coordinates of the target object; the result display module 103 is used to display the spatial coordinates of the target object on the map corresponding to the target area.
[0064] The target area refers to the area that the user is interested in, and at least one image acquisition device can be deployed in the target area as needed.
[0065] The image processing module 102 can perform target detection on the image to be processed, obtain a target object detection box, and obtain the image coordinates of the target object based on the target object detection box.
[0066] The target object in this embodiment of the invention can be determined according to actual needs. For example, when obtaining information on the distribution of visitors in an exhibition hall, the target object can be visitors, i.e., human bodies. Similarly, when determining the parking situation in a parking lot, the target object can be vehicles.
[0067] In one implementation, a deep learning network, such as the Faster R-CNN network, can be used to perform object detection on the image to be processed, obtaining object detection results. When an object is detected in the image to be processed, the object detection results may include the object detection box, the position of the object detection box, etc. Those skilled in the art should understand that the Faster R-CNN network is merely an example, and other networks, such as the YOLO network, can also be used for object detection; this invention does not limit this.
[0068] When the target object includes a human body, the image processing module 102 is specifically used to determine the image coordinates of the human body based on the foot position information of the human body within the target object detection frame. That is, using the human body's coordinates on the ground as a positioning reference, the image coordinates of the human body in the image coordinate system are determined based on the position of the human body's feet. Figure 2 This is a schematic diagram of a human body detection frame according to an embodiment of the present invention, such as... Figure 2 As shown, when a human body detection box is obtained, the coordinates corresponding to the midpoint of the side where the foot position is located in the human body detection box can be determined as the image coordinates of the human body in the image coordinate system within the human body detection box.
[0069] After obtaining the image coordinates of the target object in the image coordinate system, the image processing module 102 can transform the image coordinates of the target object to the world coordinate system according to the pre-calibrated transformation relationship to obtain the spatial coordinates of the target object.
[0070] The result display module 103 can pre-store the map corresponding to the target area. When the spatial coordinates of the target object in the target area are obtained, the spatial coordinates of the target object can be mapped onto the map corresponding to the target area, and the mapping result can be displayed to the user.
[0071] Figure 3This is a schematic diagram illustrating the mapping results displayed by a result display module according to an embodiment of the present invention, such as... Figure 3 As shown, by Figure 2 The target objects are displayed on the map corresponding to the target area, allowing you to intuitively see the distribution of people within the target area.
[0072] The target positioning system provided by this invention acquires an image of a target area using at least one image acquisition device; obtains the image coordinates of a target object in the image using an image processing module, and transforms the image coordinates to obtain the spatial coordinates of the target object; and displays the spatial coordinates of the target object on a map corresponding to the target area using a result display module. By displaying the spatial coordinates of the target object on the map corresponding to the target area, the distribution of the target object within the target area can be obtained intuitively and accurately. In other words, by displaying the target object on the map corresponding to the target area based on its spatial coordinates, the location of the target object in the target area and the density of target objects within the target area can be clearly seen.
[0073] In one implementation, when the target areas captured by at least two image acquisition devices partially overlap, that is, for the same target object, the result display module 103 can receive multiple spatial coordinates obtained from the images to be processed captured by different image acquisition devices. In this case, the result display module 103 can obtain the shortest distance between the spatial coordinates of each pair of target objects within the target area. If the shortest distance is less than a set threshold, it indicates that the two spatial coordinates correspond to the same target object, and only one of the spatial coordinates needs to be retained.
[0074] In another implementation, if the target areas acquired by at least two image acquisition devices partially overlap, the region of interest corresponding to each image acquisition device can be pre-divided. During the processing of the images to be processed acquired by the image acquisition devices, only the images corresponding to the region of interest in the images to be processed can be processed, thus avoiding the situation where the same target object corresponds to multiple spatial coordinates.
[0075] When using deep learning networks for object detection, objects possessing characteristics of the target object may be mistakenly identified as the target object during feature recognition, leading to false detections. For example, in shopping malls, mannequins are often used to display clothing. When using a deep learning network to detect objects in an image from that mall, false detections are easily made, such as misidentifying the mannequin as a human body. Alternatively, training a deep learning network using sample images from a specific scene can result in a significantly higher accuracy rate for object detection in that scene compared to other scenes. Considering the potential for false detections, this embodiment of the invention combines moving object detection and target detection to improve the accuracy of detecting target object bounding boxes from the image to be processed. Specifically, the image processing module can be used to: perform moving object detection on the image to be processed to obtain a moving object detection result; in response to the moving object detection result including a moving object bounding box, perform target detection on the image to be processed to obtain a target object bounding box; and in response to the overlapping area between the moving object bounding box and the corresponding target object bounding box meeting a set requirement, obtain the image coordinates of the target object based on the target object bounding box.
[0076] The process of detecting moving objects in the image to be processed and obtaining the moving object detection result includes: comparing the pixel value of each pixel in the image to be processed with the mean of multiple Gaussian models corresponding to each pixel; if the difference between the pixel value and the mean is greater than a first preset condition (e.g., 3 times the standard deviation), it indicates that the pixel belongs to the foreground region (i.e., moving object), and the pixel value of the pixel is set to a first value (e.g., 255); otherwise, it indicates that the pixel belongs to the background region, and the pixel value of the pixel is set to a second value (e.g., 0); after traversing all pixels of the target image, a binary image is obtained; based on the binary image, at least one connected region is obtained; if the area of the connected region meets a second preset condition, it indicates that there is a moving object at the location of the connected region, and the connected region is determined as a moving object detection box.
[0077] In one implementation, the target detection algorithm can employ the YOLOv5 algorithm, and the moving object detection can employ a Gaussian mixture model (MOG model), such as... Figure 4 As shown, the process of obtaining image coordinates by combining the MOG model and the YOLOv5 algorithm includes the following steps 401 to 406.
[0078] In step 401, the image to be processed is acquired.
[0079] In step 402, the MOG model is used to detect moving objects in the image to be processed, and the moving object detection results are subjected to morphological processing and maximum region contour determination.
[0080] In step 403, if the area of the moving object detection box in the moving object detection result meets the second set condition, it indicates that there is a moving object, and step 404 is executed; otherwise, the MOG model is updated.
[0081] In step 404, target detection is performed on the image to be processed. If a target object detection box is detected, step 405 is executed; otherwise, the MOG model is updated.
[0082] In step 405, if the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, then step 406 is executed; otherwise, the MOG model is updated.
[0083] In step 406, the image coordinates of the target object are obtained based on the target object detection box.
[0084] In other words, the image processing module first uses the MOG model to detect moving objects in the image to be processed, and then uses the YOLOv5 algorithm to perform target detection. When the target object detection box detected by the YOLOv5 algorithm intersects with the moving object detection box detected by the MOG model (i.e., the foreground region), the target object can be confirmed to exist; otherwise, it is considered that the target object does not exist. By combining the MOG model and the YOLOv5 algorithm, the accuracy of target object detection is improved.
[0085] Figure 5 This is a schematic diagram of a target object detection box according to an embodiment of the present invention, as shown below. Figure 5 As shown, when using the YOLOv5 algorithm to perform object detection on the image 500, objects resembling human bodies in the image 500 are falsely detected as human bodies, resulting in a human body detection box 501. Using the MOG model to perform moving object detection on the image 500, a moving object detection result 503 is obtained. It can be seen that the region 502 corresponding to the human body detection box 501 is a background region, indicating that a false detection has occurred. In this case, the human body detection box 501 in the object detection result is deleted.
[0086] In this embodiment of the invention, the image processing module is specifically used to: stop updating the detection conditions for moving object detection when the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, wherein the detection conditions are used to indicate that each pixel in the image to be processed belongs to a moving object or the background.
[0087] Understandably, if the overlap between the moving object detection box and the corresponding target object detection box meets the set requirements, it means that the target object can be identified based on the current detection conditions. Therefore, updating the moving object detection model should be stopped to prevent the target object from being updated into the model's background. For example, taking a human body as the target object, if the presence of a human body is confirmed, updating the MOG model should be stopped to avoid updating the background model of the MOG.
[0088] In this embodiment of the invention, each image acquisition device corresponds to its own transformation relationship, which indicates the transformation of image coordinates in the image coordinate system to spatial coordinates in the world coordinate system. To distinguish it from the transformation relationships mentioned later, the transformation relationship between image coordinates and spatial coordinates is referred to as the first transformation relationship.
[0089] Given the image coordinates of the target object in the image to be processed, a first transformation relationship corresponding to the image acquisition device that acquired the image to be processed is obtained, and the image coordinates of the target object are transformed according to the first transformation relationship to obtain the spatial coordinates of the target object in the world coordinate system.
[0090] In this embodiment of the invention, a first transformation relationship corresponding to each image acquisition device can be pre-calibrated. For example, a calibration module in the system can be used to calibrate the first transformation relationship corresponding to each image acquisition device. The calibration module is used to acquire a calibration image acquired by the image acquisition device, the calibration image including at least four marker points, each marker point corresponding to a spatial coordinate; acquire the image coordinates of each marker point in the calibration image; and determine the first transformation relationship corresponding to the image acquisition device based on the image coordinates and spatial coordinates of the at least four marker points.
[0091] The transformation between image coordinates and spatial coordinates can be viewed as a perspective transformation. The transformation relationship (also known as the transformation matrix) has 8 degrees of freedom. Therefore, it is necessary to establish 8 equations based on the image coordinates and spatial coordinates of 4 marker points to solve for the transformation matrix.
[0092] Assuming the spatial coordinates of the four marker points are: marker point A(x1,y1), marker point B(x2,y2), marker point C(x3,y3), and marker point D(x4,y4), the spatial coordinates of each marker point can be represented by formula (1).
[0093]
[0094] In formula (1), the left side (x) ′ i y ′ i , 1) represents spatial coordinates, H in the middle represents the transformation matrix, and (x) on the right represents the spatial coordinates.i y i , 1) represents the image coordinates.
[0095] Based on the image coordinates and spatial coordinates of at least four marker points, the transformation matrix H, i.e., the first transformation relationship of the image acquisition device, can be obtained. Considering actual operational errors, more marker points can be marked to improve the accuracy of calibrating the first transformation relationship.
[0096] Since measuring the spatial coordinates of marker points is not only time-consuming but also prone to errors, in some embodiments, a calibration plate of a set size can be placed at the marker point, and the center point of the calibration plate can be aligned with the marker point. In this way, after measuring the spatial coordinates of the marker point, the spatial coordinates of at least one vertex of the calibration plate can be obtained according to the size of the calibration plate. This method can reduce the workload of measurement and reduce the error caused by measurement.
[0097] For example, if the calibration plate is a 20*20 rectangular calibration plate, and the spatial coordinates of calibration point A are determined to be (20, 30), since the center point of the rectangular calibration plate coincides with the marker point A, the spatial coordinates of the four vertices of the rectangular calibration plate can be obtained as (10, 20), (30, 20), and (10, 40).
[0098] (30,40). Those skilled in the art should understand that the shape of the calibration plate described above can be specifically set according to actual needs, and this embodiment does not impose any limitations on it.
[0099] In this case, the calibration module is specifically used to: obtain the spatial coordinates of the marker point; determine the spatial coordinates of at least one vertex in the calibration board based on the spatial coordinates of the marker point and the size of the calibration board; obtain the image coordinates of the vertex from the calibration image; and determine the first transformation relationship corresponding to the image acquisition device based on at least four sets of coordinate pairs, wherein each set of coordinate pairs includes spatial coordinates and image coordinates corresponding to the spatial coordinates.
[0100] In one example, the spatial coordinates of a marker point can be obtained. Then, based on the spatial coordinates of the marker point and the size of the calibration plate, the spatial coordinates of four vertices in the calibration plate are determined, and the image coordinates of the four vertices are obtained from the calibration image. Based on the spatial coordinates and image coordinates of the four vertices, a first transformation relationship corresponding to the image acquisition device is determined. The calibration method in this example can reduce the workload of measurement.
[0101] In another example, the spatial coordinates of four marker points can be obtained. Then, based on the spatial coordinates of each marker point and the size of the calibration plate at each marker point, the spatial coordinates of four vertices on the corresponding calibration plate are determined, and the image coordinates of each vertex are obtained from the calibration image. In this case, 16 sets of coordinate pairs can be obtained, each set including a spatial coordinate and an image coordinate corresponding to the spatial coordinate. Based on these 16 sets of coordinate pairs, the first transformation relationship corresponding to the image acquisition device can be determined. The calibration method in this example can obtain multiple spatial coordinates with less measurement work. Calibration based on these multiple spatial coordinates and the corresponding image coordinates can improve the accuracy of calibrating the first transformation relationship.
[0102] In one implementation, to save storage space, a QR code can be generated based on the spatial coordinates of the marker point, and the QR code can be printed as a fixed-size QR code image. The QR code image is then placed at the corresponding marker point, with the center point of the QR code image coinciding with the marker point. Figure 6 This is a schematic diagram illustrating a calibration of a first transformation relationship according to an embodiment of the present invention, such as... Figure 6 As shown, the calibration process includes steps 601 to 605.
[0103] In step 601, a calibration image is acquired by the image acquisition device, the calibration image including at least 4 QR code images.
[0104] In step 602, the QR codes in the calibration image are located, and the image coordinates of the four vertices in each QR code image are obtained.
[0105] In step 603, each QR code is identified to obtain the spatial coordinates of the corresponding identification point.
[0106] In step 604, the spatial coordinates of the four vertices in the QR code image are determined based on the spatial coordinates of each marker point and the size of the QR code image.
[0107] In step 605, based on the 16 sets of spatial coordinates and image coordinates obtained from the four QR code images, the first transformation relationship corresponding to the image acquisition device is determined.
[0108] With 16 sets of spatial coordinates and image coordinates obtained, solving the system of equations corresponding to the above formula (1) can be transformed into solving the aligned system of equations Ax = 0. Solving Ax = 0 can be transformed into a nonlinear optimization problem of min||Ax||2. This can be obtained by calculating the eigenvalues and eigenvectors of the coefficient matrix. In this embodiment, the least squares solution can be obtained through formula (2).
[0109] [V,D] = eig(A'*A) Formula (2)
[0110] In formula (2), D is the eigenvalue diagonal matrix (eigenvalues in descending order along the main diagonal), V is the eigenvector matrix composed of the eigenvectors (column vectors) corresponding to the eigenvalues of D, and A' represents the transpose of A. Its least squares solution is V(1), that is, the eigenvector corresponding to the smallest eigenvalue of the coefficient matrix A is the least squares solution of the overdetermined system of equations Ax=0.
[0111] In this embodiment of the invention, the target objects in the images to be processed acquired by each image acquisition device from its own acquisition perspective are transformed into the same world coordinate system through the first transformation relationship corresponding to each image acquisition device, so that the user can view the distribution of target objects in the target area from a global perspective.
[0112] In some embodiments, where the acquisition viewpoints of the first image acquisition device and the second image acquisition device partially overlap, the calibration module is further configured to: acquire a first image acquired by the first image acquisition device and a second image acquired by the second image acquisition device, wherein the first image and the second image contain the same reference object; determine a second transformation relationship between the first image acquisition device and the second image acquisition device based on the first image coordinates of the reference object in the first image and the second image coordinates in the second image; acquire a first transformation relationship corresponding to the first image acquisition device; and determine a first transformation relationship of the second image acquisition device based on the first transformation relationship of the first image acquisition device and the second transformation relationship between the first image acquisition device and the second image acquisition device.
[0113] In this embodiment, the transformation relationship between the first image coordinates of the same reference object in the first image acquisition device and the second image coordinates of the same reference object in the second image acquisition device is called the second transformation relationship.
[0114] If the acquisition viewpoints of the first image acquisition device and the second image acquisition device partially overlap, then the first image acquired by the first image acquisition device and the second image acquired by the second image acquisition device contain the same object. In this case, the second transformation relationship between the first image acquisition device and the second image acquisition device can be determined based on the first image coordinates of the same object in the first image and the second image coordinates of the object in the second image.
[0115] For example, the first image acquired by the first image acquisition device and the second image acquired by the second image acquisition device contain the same reference object G. Based on the first image coordinates of the reference object G in the first image and the second image coordinates of the reference object G in the second image, a second transformation relationship between the first image acquisition device and the second image acquisition device is determined.
[0116] If a second conversion relationship is obtained between the first image acquisition device and the second image acquisition device, a first conversion relationship corresponding to the first image acquisition device is obtained, and a first conversion relationship of the second image acquisition device is determined based on the first conversion relationship of the first image acquisition device and the second conversion relationship between the first image acquisition device and the second image acquisition device.
[0117] In other words, for image acquisition devices without overlapping perspectives, the aforementioned method can be used to calibrate each device, i.e., using different image coordinate systems and the same world coordinate system for calibration, calculating the coordinate transformation relationship between different image acquisition devices, i.e., different transformation matrices H. For image acquisition devices with the same perspective, the perspective transformation relationship HN between the two devices can be determined by utilizing the characteristics of intersecting perspectives. In this case, calibrating the transformation matrix H of one image acquisition device allows the calculation of the transformation matrix H' of the other image acquisition device using H' = H * HN. The calibration method in this embodiment not only reduces the number of ground markers required and the calibration time, but also allows for recalculation of the second transformation relationship of matching points in the intersecting region and recalculation of the transformation matrix H in subsequent applications when the image acquisition device moves, thereby reducing unnecessary calibration steps.
[0118] In this embodiment of the invention, the image processing module can be applied to an edge box and / or a server. The edge box can be connected to at least one image acquisition device, and the result display module can be applied to a server or a terminal device. For example, a user can use the target positioning system provided in this embodiment to view the distribution of target objects on a server or terminal device.
[0119] When the image processing module is applied to the edge box, the server or terminal device is also used to send processing instructions to the edge box according to user instructions; the edge box is also used to acquire the image to be processed acquired by the image acquisition device indicated by the processing instructions.
[0120] The processing instructions can also be used to instruct the detection of target objects from the images to be processed, such as instructing the detection of human bodies from the images to be processed acquired by image acquisition devices numbered 1 and 2, and the detection of vehicles from the images to be processed acquired by image acquisition devices numbered 3 and 4.
[0121] In some scenarios, if the target area is large or comprises multiple sub-areas in different locations (e.g., an exhibition hall may include different exhibition areas), multiple image acquisition devices can be used to acquire images of the target area to be processed. In this case, the image acquisition devices are relatively dispersed. To reduce the pressure on data transmission, at least one image acquisition device can be configured with an edge box or server to handle the processing of the images. Compared to configuring a server for the image acquisition devices, configuring an edge box for them can save on configuration costs.
[0122] In one implementation, such as Figure 7 As shown, the hard disk recorder 704 can be used to manage each image acquisition device in the image acquisition device group 701 and store the images to be processed acquired by each image acquisition device for easy viewing by the terminal device 703. In scenarios with a large number of image acquisition devices, the hard disk recorder 704 manages and stores the images to be processed acquired by the image acquisition devices. The terminal device 703 can issue processing instructions to the edge box 702 according to user instructions. The edge box 702 can obtain the images to be processed acquired by the image acquisition devices indicated by the processing instructions from the hard disk recorder 704 and report the spatial coordinates (positioning data) of the target object to the terminal device 703.
[0123] In some embodiments, the edge box may include multiple image processing modules, each corresponding to an image acquisition device. This processing structure improves the flexibility of the edge box and facilitates parallel processing of images acquired by multiple image acquisition devices. Utilizing the edge box's powerful computing capabilities, images acquired by multiple image acquisition devices can be processed simultaneously to locate target objects within the target area.
[0124] In one implementation, the image acquisition device can acquire images to be processed in real time, thus obtaining the video stream acquired by the image acquisition device. As the number of image acquisition devices increases, the computing power required to process the video streams acquired by the image acquisition devices also increases accordingly. In this embodiment of the invention, a corresponding image processing module can be configured for each image acquisition device in the edge box. The video streams acquired by each image acquisition device are processed using its respective image processing module, providing more powerful computing power support. On the other hand, providing a corresponding image processing module for each image acquisition device allows for independent processing of the video streams of each device, avoiding mutual interference and thus achieving decoupling, facilitating processing according to user needs. For example, in the case of multiple image acquisition devices, all or some of the image processing modules can be activated according to user needs.
[0125] Furthermore, for an image processing module, the entire processing of a single frame of image to be processed can be divided into multiple processing tasks, with corresponding threads created for each task. This approach can improve the processing efficiency of the image processing module. It should be noted that the processing time for each task varies. If the entire processing of the image processing module is implemented using a single thread, the time required for that thread to complete the entire process will be long, leading to the loss of images from the video stream. In other words, due to the low efficiency of the image processing module, the images captured by the image acquisition device will not be utilized.
[0126] For example, suppose the entire processing process is E, which includes two processing tasks, E1 and E2. If a single thread is used, the process would involve processing the first frame of the image to be processed, and then processing the second frame after completing the processing of the first frame. If, in this embodiment, threads F1 and F2 are created for different processing tasks, then thread F1 processes task E1 of the first frame, and thread F2 processes task E2 of the first frame. In this case, thread F1 can process task E1 of the second frame, and thread F2 processes task E2 of the second frame, and so on. It can be seen that compared to using a single thread to complete the entire processing process, using multiple threads to process different processing tasks for different images not only improves processing efficiency but also avoids the loss of images in the video stream due to excessive processing time.
[0127] Figure 8 This is a schematic diagram illustrating the interaction between threads in an image processing module according to an embodiment of the present invention, such as... Figure 8 As shown, one image acquisition device corresponds to one image processing module, and each image processing module includes multiple threads.
[0128] In this embodiment of the invention, the entire processing procedure of the image processing module includes acquiring an image to be processed captured by an image acquisition device, acquiring the image coordinates of a target object in the image to be processed, transforming the image coordinates to obtain the spatial coordinates of the target object, and finally sending the spatial coordinates of the target object to the result display module. The entire processing procedure of the image processing module can be divided into four processing tasks: acquiring the image to be processed captured by the image acquisition device, acquiring the image coordinates of the target object in the image to be processed and transforming the image coordinates to obtain the spatial coordinates of the target object, and sending the spatial coordinates of the target object to the result display module.
[0129] Create at least one thread for each processing task, such as Figure 8As shown, for the task of acquiring images to be processed from an image acquisition device, a streaming thread and a decoding thread are created. The streaming thread is responsible for acquiring the data to be processed from the video stream of the image acquisition device, and the decoding thread is responsible for decoding the data to be processed acquired by the streaming thread to obtain the image data to be processed. For the task of acquiring the image coordinates of the target object in the image to be processed and converting the image coordinates to obtain the spatial coordinates of the target object, a model inference service thread and an algorithm implementation thread are created. The model inference service thread is responsible for loading and initializing the trained target detection model and moving object detection module, using the target detection model to perform target detection on the image to be processed to obtain the target object detection box, and using the moving object detection module to perform moving object detection on the image to be processed to obtain the moving object detection result. The algorithm implementation thread is responsible for obtaining the image coordinates of the target object based on the target object detection box, ensuring that the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, and converting the image coordinates into spatial coordinates according to the first transformation relationship corresponding to the image acquisition device. For the task of sending the spatial coordinates of the target object to the result display module, a data sending thread is created.
[0130] In addition, since the sending efficiency of the data sending thread depends on the network environment between the edge box and the server or terminal device, in order to avoid the loss of processing results or thread blocking caused by network quality problems, the algorithm implements the thread to push the processing results (i.e., spatial coordinates) into the corresponding message queue, and the data sending thread retrieves the processing results to be sent from the message queue.
[0131] It should be noted that those skilled in the art should understand that the division of the above-mentioned processing tasks and the creation of corresponding threads can be specifically set according to actual needs. This embodiment does not limit the way the entire processing process is divided, or the number of corresponding threads created for each processing task.
[0132] Furthermore, if a model inference service thread is created for each image processing module, space will be wasted due to data redundancy. Therefore, in this embodiment, multiple image processing modules can share a single model inference service thread. That is, the loading and initialization process of the target detection model and the moving object detection model is completed in the model inference service thread. When the algorithm implementation thread needs to perform target detection and / or moving object detection, it calls the model inference service thread and obtains the corresponding detection results.
[0133] In practical applications, the image processing module can also use the receiving control parameter thread to receive processing instructions, such as algorithm control parameters, sent from the server or terminal device to the edge box, and use the algorithm control parameter thread to update the parameters of the algorithm implementation thread.
[0134] For example, the current edge box processes the images to be processed acquired by image acquisition devices numbered 1, 2, and 3. When the receiving control parameter thread receives a processing instruction to process the images to be processed acquired by image acquisition devices numbered 1, 4, and 5, the algorithm control parameter thread can update the address parameters in the algorithm implementation thread based on the addresses corresponding to image acquisition devices numbered 4 and 5. A corresponding streaming thread and decoding thread are created for image acquisition device numbered 4, and similarly, for image acquisition device numbered 5. The streaming threads and decoding threads corresponding to image acquisition devices numbered 2 and 3 will be closed to release the corresponding resources.
[0135] In this embodiment of the invention, the entire processing process is divided into different processing tasks, which are executed by different threads. In the event of a failure in the processing process, the fault can be located by elimination, thereby improving the efficiency of fault location.
[0136] Those skilled in the art should understand that the type of edge box can be selected according to actual needs. For example, NVIDIA's Jetson AGX Xavier module can be used. With the help of AGX's powerful computing capabilities, it can process images to be processed from multiple image acquisition devices at the same time, thereby locating the target object in the target area. This invention does not limit this.
[0137] Figure 9 This is a schematic diagram illustrating a target localization method according to an embodiment of the present invention, such as... Figure 9 As shown, this embodiment provides a target localization method, including the following steps 901 to 903.
[0138] In step 901, an image of the target area to be processed is acquired by at least one image processing device.
[0139] In step 902, the image coordinates of the target object in the image to be processed are obtained, and the image coordinates are transformed to obtain the spatial coordinates of the target object.
[0140] In step 903, the spatial coordinates of the target object are sent to the result display module, so that the result display module displays the spatial coordinates of the target object on the map corresponding to the target area.
[0141] In some embodiments, obtaining the image coordinates of the target object in the image to be processed includes:
[0142] Perform moving object detection on the image to be processed to obtain moving object detection results;
[0143] In response to the moving object detection result including a moving object detection box, target detection is performed on the image to be processed to obtain a target object detection box;
[0144] In response to the overlapping area between the moving object detection box and the corresponding target object detection box meeting the set requirements, the image coordinates of the target object are obtained based on the target object detection box.
[0145] In some embodiments, the target object includes a human body, and obtaining the image coordinates of the target object based on the target object detection box includes:
[0146] The image coordinates of the human body are determined based on the position information of the human body's feet in the target object detection frame.
[0147] In some embodiments, the method further includes:
[0148] If the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, the detection conditions for moving object detection are stopped from being updated. The detection conditions are used to indicate whether each pixel in the image to be processed belongs to a moving object or the background.
[0149] The server described in this embodiment may include a display device, which may be any product or component with display function, such as electronic paper, mobile phone, tablet computer, television, laptop computer, digital photo frame, or navigator.
[0150] It should be noted that the dimensions of layers and regions may be exaggerated in the accompanying drawings for clarity. Furthermore, it is understood that when an element or layer is referred to as being "on" another element or layer, it can be directly on the other element, or there may be intermediate layers. Additionally, it is understood that when an element or layer is referred to as being "below" another element or layer, it can be directly below the other element, or there may be more than one intermediate layer or element. Furthermore, it is also understood that when a layer or element is referred to as being "between" two layers or two elements, it can be the only layer between the two layers or two elements, or there may be more than one intermediate layer or element. Similar reference numerals throughout indicate similar elements.
[0151] In this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance. The term "multiple" refers to two or more unless otherwise expressly defined.
[0152] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure herein. The invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of the invention are indicated by the following claims.
[0153] It should be understood that the present invention is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.
Claims
1. A target positioning system, characterized in that, The system includes: At least one image acquisition device, said image acquisition device being used to acquire an image of a target area to be processed; An image processing module is configured to perform moving object detection on the image to be processed to obtain a moving object detection result; in response to the moving object detection result including a moving object detection box, perform target detection on the image to be processed to obtain a target object detection box; in response to the overlapping area between the moving object detection box and the corresponding target object detection box meeting a set requirement, obtain the image coordinates of the target object based on the target object detection box, and transform the image coordinates to obtain the spatial coordinates of the target object; The results display module is used to display the spatial coordinates of the target object on the map corresponding to the target area.
2. The system according to claim 1, characterized in that, The target object includes the human body, and the image processing module is specifically used for: The image coordinates of the human body are determined based on the position information of the human body's feet in the target object detection frame.
3. The system according to claim 1, characterized in that, The image processing module is specifically used for: If the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, the detection conditions for moving object detection are stopped from being updated. The detection conditions are used to indicate whether each pixel in the image to be processed belongs to a moving object or the background.
4. The system according to claim 1, characterized in that, The image processing module is specifically used for: Obtain the first transformation relationship corresponding to the image acquisition device, the first transformation relationship being used to indicate the relationship between image coordinates in the image coordinate system and spatial coordinates in the world coordinate system; The image coordinates are converted into spatial coordinates according to the first transformation relationship.
5. The system according to claim 1, characterized in that, The system also includes a calibration module; The calibration module is used to acquire a calibration image acquired by the image acquisition device. The calibration image includes at least four marker points, and each marker point corresponds to a spatial coordinate. Obtain the image coordinates of each marker point in the calibration image; The first transformation relationship corresponding to the image acquisition device is determined based on the image coordinates and spatial coordinates of at least four marker points.
6. The system according to claim 5, characterized in that, A calibration plate of a predetermined size is placed at the marked point, and the center point of the calibration plate coincides with the marked point; the calibration module is specifically used for: Obtain the spatial coordinates of the marker point; Based on the spatial coordinates of the marker points and the dimensions of the calibration board, determine the spatial coordinates of at least one vertex in the calibration board; Obtain the image coordinates of the vertex from the calibration image; The first transformation relationship corresponding to the image acquisition device is determined based on at least four sets of coordinate pairs, wherein each set of coordinate pairs includes spatial coordinates and image coordinates corresponding to the spatial coordinates.
7. The system according to claim 5 or 6, characterized in that, When the viewing angles of the first image acquisition device and the second image acquisition device partially overlap, the calibration module is further used for: Acquire a first image captured by the first image acquisition device and a second image captured by the second image acquisition device, wherein the first image and the second image contain the same reference object; Based on the first image coordinates of the reference object in the first image and the second image coordinates in the second image, a second transformation relationship between the first image acquisition device and the second image acquisition device is determined. Obtain the first conversion relationship corresponding to the first image acquisition device; Based on the first conversion relationship of the first image acquisition device and the second conversion relationship between the first image acquisition device and the second image acquisition device, the first conversion relationship of the second image acquisition device is determined.
8. The system according to any one of claims 1 to 6, characterized in that, The image processing module is applied to the edge box, the edge box is connected to at least one image acquisition device, and the result display module is applied to the terminal device. The terminal device is also used to send processing instructions to the edge box according to user instructions; The edge box is also used to acquire the image to be processed acquired by the image acquisition device indicated by the processing instruction.
9. The system according to claim 8, characterized in that, The edge box includes multiple image processing modules, each corresponding to an image acquisition device.
10. A target localization method, characterized in that, The method includes: Acquire an image of the target area to be processed, captured by at least one image processing device; Perform moving object detection on the image to be processed to obtain moving object detection results; In response to the moving object detection result including a moving object detection box, target detection is performed on the image to be processed to obtain a target object detection box; In response to the overlapping area between the moving object detection box and the corresponding target object detection box meeting the set requirements, the image coordinates of the target object are obtained according to the target object detection box, and the image coordinates are transformed to obtain the spatial coordinates of the target object; The spatial coordinates of the target object are sent to the result display module, which then displays the spatial coordinates of the target object on the map corresponding to the target area.
11. The method according to claim 10, characterized in that, The target object includes a human body, and obtaining the image coordinates of the target object based on the target object detection box includes: The image coordinates of the human body are determined based on the position information of the human body's feet in the target object detection frame.
12. The method according to claim 10, characterized in that, The method further includes: If the overlapping area between the moving object detection box and the corresponding target object detection box meets the set requirements, the detection conditions for moving object detection are stopped from being updated. The detection conditions are used to indicate whether each pixel in the image to be processed belongs to a moving object or the background.