A target positioning method and device, storage medium and electronic device

By extracting key point image features from monitoring equipment and matching them with 3D point cloud maps, combined with intrinsic parameter matrices and ground depth functions, the problem of low satellite positioning accuracy was solved, and high-precision target positioning was achieved.

CN116342690BActive Publication Date: 2026-06-26BEIJING SANKUAI ONLINE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING SANKUAI ONLINE TECH CO LTD
Filing Date
2023-03-20
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, the low accuracy of satellite positioning leads to severe signal interference in scenarios requiring high-precision positioning, affecting the accuracy of target positioning.

Method used

By acquiring historical monitoring images collected by monitoring equipment, extracting image features of key points, and matching them with a pre-constructed 3D point cloud map, the 3D coordinates of the key points are determined. Combined with the intrinsic parameter matrix, extrinsic parameter matrix, and ground depth function of the monitoring equipment, the 3D coordinates of the target object are calculated.

Benefits of technology

It improves the accuracy of target positioning, ensures accurate positioning of targets in complex environments, and reduces the impact of signal interference on positioning results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116342690B_ABST
    Figure CN116342690B_ABST
Patent Text Reader

Abstract

In the target object positioning method provided in the present specification, firstly, key points are extracted from historical monitoring images collected by a monitoring device, and the extracted key points are matched with a three-dimensional point cloud graph corresponding to the monitoring range of the monitoring device, which is constructed in advance, to determine the three-dimensional coordinates of the key points, and an extrinsic parameter matrix of the monitoring device is determined through the three-dimensional coordinates; in response to a positioning request, a target object and a real-time monitoring image in which the target object is located are determined, and according to the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and the extrinsic parameter matrix of the monitoring device, and the determined ground depth function, the three-dimensional coordinates of the target object are determined. In the above method, it can be seen that the extrinsic parameter matrix of the monitoring device is determined by using the three-dimensional coordinates of the extracted key points, and the three-dimensional coordinates of the target object can be determined according to the two-dimensional coordinates of the target object in the real-time monitoring image, which greatly improves the positioning accuracy of the target object.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computers, and more particularly to a method, apparatus, storage medium, and electronic device for locating a target object. Background Technology

[0002] With the development of the times, the demand for target object positioning is increasing, and the requirements for positioning accuracy are becoming higher and higher. Currently, by positioning targets, it is possible to record the movement trajectory of targets, or to collect statistics on passenger flow information in a scene, etc.

[0003] In existing technologies, due to the relatively low accuracy of satellite positioning, beacon positioning is typically used in scenarios requiring high precision. For example, in a warehouse setting where the work trajectories of workers need to be recorded, numerous Bluetooth beacons can be deployed to determine the signal strength between each worker's Bluetooth device and different beacons, thus determining the distance between the worker and the beacons and achieving worker positioning. However, when there are many workers, signal interference can occur, leading to inaccurate positioning results.

[0004] Therefore, improving the accuracy of target positioning is an urgent problem to be solved. Summary of the Invention

[0005] This specification provides a method, apparatus, storage medium, and electronic device for locating a target object, in order to at least partially solve the above-mentioned problems.

[0006] The embodiments in this specification adopt the following technical solutions:

[0007] This specification provides a method for locating a target object, the method comprising:

[0008] Acquire historical monitoring images collected by monitoring equipment, and determine the image features of key points in the historical monitoring images;

[0009] Determine the three-dimensional point cloud map corresponding to the monitoring range of the pre-constructed monitoring equipment;

[0010] The key point is matched with the three-dimensional points in the three-dimensional point cloud to determine the three-dimensional point that matches the key point, and the three-dimensional coordinates of the key point are determined based on the three-dimensional coordinates of the matched three-dimensional point.

[0011] Based on the determined three-dimensional coordinates of the key points, the external parameter matrix of the monitoring device is determined, and based on the three-dimensional points matched with the key points belonging to the ground, the ground depth function of the monitoring device is determined.

[0012] In response to a positioning request, a target object in the real-time monitoring image acquired by the monitoring device is identified, and the three-dimensional coordinates of the target object are determined based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

[0013] Optionally, the constructed 3D point cloud map corresponding to the monitoring range of the monitoring device specifically includes:

[0014] Acquire environmental images around the monitoring equipment;

[0015] The pose of each environmental image is determined based on the pose of the acquisition device when acquiring the environmental images.

[0016] Based on each environmental image and its corresponding pose, an initial 3D point cloud map is constructed.

[0017] Based on the initial three-dimensional point cloud map, a three-dimensional point cloud map corresponding to the monitoring range of the monitoring device is determined, and based on at least some environmental images, the image features corresponding to the three-dimensional points in the three-dimensional point cloud map are determined.

[0018] Optionally, image feature matching is performed between the key point and the 3D points in the 3D point cloud image to determine the 3D point that matches the key point, and the 3D coordinates of the key point are determined based on the 3D coordinates of the matched 3D point. Specifically, this includes:

[0019] For each key point, the image features of the key point are matched with the image features of each three-dimensional point in the three-dimensional point cloud to determine the similarity between the key point and each three-dimensional point.

[0020] The 3D coordinates of the 3D point with the highest similarity are used as the 3D coordinates of the key point.

[0021] Optionally, the three-dimensional coordinates of the target object are determined based on its two-dimensional coordinates in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function. Specifically, this includes:

[0022] Target object identification is performed on the real-time monitoring image to determine the image region of the target object in the real-time monitoring image;

[0023] The two-dimensional coordinates of the target object in the real-time monitoring image are determined based on the lower edge of the image region in the real-time monitoring image;

[0024] The three-dimensional coordinates of the target object are determined based on the two-dimensional coordinates, the intrinsic and extrinsic parameter matrices of the monitoring device, and the ground depth function.

[0025] Optionally, multiple monitoring devices are installed in the area where the target object is located;

[0026] The method further includes:

[0027] In response to a positioning request, determine the real-time monitoring images collected by each monitoring device at the same time, and determine the three-dimensional coordinates of each target object in each real-time monitoring image;

[0028] Based on the three-dimensional coordinates of each target object, duplicate targets are removed;

[0029] Store the three-dimensional coordinates of the target object after deduplication.

[0030] Optionally, deduplication can be performed based on the three-dimensional coordinates of each target object, specifically including:

[0031] Based on the three-dimensional coordinates of each target object, target objects whose distance between them is no greater than a preset first threshold are identified as the same target objects.

[0032] The three-dimensional coordinates corresponding to the same target are deduplicated, and one three-dimensional coordinate is retained as the three-dimensional coordinate of the same target.

[0033] Optionally, target objects whose distance is no greater than a preset first threshold are identified as the same target object, specifically including:

[0034] Target object identification is performed on each real-time monitoring image to determine the image region of the target object in each real-time monitoring image;

[0035] Based on the image regions, determine the image similarity between target objects;

[0036] Each target object whose image similarity to the target objects is not less than a preset second threshold is identified as a target object to be determined.

[0037] Based on the three-dimensional coordinates of the target objects, each target object whose distance from the target objects is not greater than a preset first threshold is identified as the same target object.

[0038] This specification also provides a target positioning device, the device comprising:

[0039] The acquisition module is used to acquire historical monitoring images collected by the monitoring equipment and determine the image features of key points in the historical monitoring images;

[0040] The range determination module is used to determine the three-dimensional point cloud map corresponding to the pre-constructed monitoring range of the monitoring equipment;

[0041] The matching module is used to perform image feature matching between the key point and the three-dimensional points in the three-dimensional point cloud map, determine the three-dimensional points that match the key point, and determine the three-dimensional coordinates of the key point based on the three-dimensional coordinates of the matched three-dimensional points.

[0042] The extrinsic parameter determination module is used to determine the extrinsic parameter matrix of the monitoring device based on the determined three-dimensional coordinates of the key points, and to determine the ground depth function of the monitoring device based on the three-dimensional points matched by the key points belonging to the ground.

[0043] The coordinate determination module is used to respond to a positioning request, determine the target object in the real-time monitoring image collected by the monitoring device, and determine the three-dimensional coordinates of the target object based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

[0044] This specification provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described target location method.

[0045] This specification provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the above-described target positioning method.

[0046] The above-described at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects:

[0047] In the target object positioning method provided in this specification, key points are first extracted from historical monitoring images collected by the monitoring equipment, and the extracted key points are matched with a pre-constructed three-dimensional point cloud map corresponding to the monitoring range of the monitoring equipment to determine the three-dimensional coordinates of the key points. The extrinsic parameter matrix of the monitoring equipment is then determined using the three-dimensional coordinates. In response to a positioning request, the target object and the real-time monitoring image in which the target object is located are determined. Based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring equipment, and the determined ground depth function, the three-dimensional coordinates of the target object are determined.

[0048] As can be seen from the above method, by using the three-dimensional coordinates of the extracted key points to determine the external parameter matrix of the monitoring equipment, and by using the two-dimensional coordinates of the target object in the real-time monitoring image to determine the three-dimensional coordinates of the target object, the accuracy of the target object positioning is greatly improved. Attached Figure Description

[0049] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0050] Figure 1 A flowchart illustrating a target location method provided in an embodiment of this specification;

[0051] Figure 2 This is a schematic diagram for determining key points on the ground, as provided in this specification.

[0052] Figure 3 This is a schematic diagram illustrating ground depth provided in this specification;

[0053] Figure 4 This is a schematic diagram illustrating the construction of the initial 3D point cloud provided in this manual;

[0054] Figure 5 A schematic diagram illustrating the semantic segmentation and recognition results provided in the embodiments of this specification;

[0055] Figure 6 This is a schematic diagram illustrating the output of the bounding box of the target object using a model, provided in the embodiments of this specification.

[0056] Figure 7 This is a schematic diagram illustrating the output of the bounding box of the target object using a model, as provided in the embodiments of this specification.

[0057] Figure 8 This is a schematic diagram of a target positioning device provided in an embodiment of this specification;

[0058] Figure 9 The embodiments provided in this specification correspond to Figure 1 A schematic diagram of an electronic device. Detailed Implementation

[0059] To make the objectives, technical solutions, and advantages of this specification clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments in this specification without creative effort are within the scope of protection of this application.

[0060] The technical solutions provided by the various embodiments of this application are described in detail below with reference to the accompanying drawings.

[0061] Figure 1 This document provides a flowchart illustrating a target location method, which includes the following steps:

[0062] S100: Acquire historical monitoring images collected by the monitoring equipment, and determine the image features of key points in the historical monitoring images.

[0063] The target object location method provided in this manual can be executed by a server or an electronic device such as a personal computer (PC). For ease of description, the following description will only use a server as the execution subject to explain the target object location method provided in this manual.

[0064] In the embodiments described in this specification, the server can acquire historical monitoring images collected by the monitoring equipment and extract key points from these historical monitoring images to obtain the image features of the key points in the historical monitoring images. Here, historical monitoring images refer to monitoring images used for key point extraction to facilitate subsequent determination of the monitoring equipment's extrinsic parameter matrix. These historical monitoring images can be one or multiple images; this specification does not impose a specific limit on the number of historical monitoring images.

[0065] In the embodiments of this specification, by determining the key points and image features of the key points in historical monitoring images, the server can then use a pre-constructed 3D point cloud map of the area where the monitoring device is located to determine the 3D coordinates of the key points and obtain the external parameter matrix of the monitoring device through subsequent steps.

[0066] To improve the efficiency and accuracy of subsequent keypoint and 3D point matching, it is generally necessary to perform subsequent steps based on the keypoints and 3D points of static objects. Therefore, when acquiring historical monitoring images collected by the monitoring equipment, target object identification can be performed on each historical monitoring image to filter out those that do not contain target objects. Keypoint extraction can then be performed on these historical monitoring images, and the extracted keypoints are also keypoints of static objects, thus improving the efficiency of matching keypoints with 3D points in the initial 3D point cloud. In the embodiments of this specification, to ensure the accuracy of target object positioning, the monitoring angle of the monitoring equipment is fixed, and the camera of the monitoring equipment cannot rotate.

[0067] S101: Determine the three-dimensional point cloud map corresponding to the monitoring range of the pre-constructed monitoring equipment.

[0068] In the embodiments of this specification, a three-dimensional point cloud map can be constructed using environmental images pre-collected around the monitoring equipment. This three-dimensional point cloud map is then used to match the historical monitoring images. Since the collected environmental images may contain moving objects such as people, which can interfere with the matching of the three-dimensional point cloud map and the historical monitoring images, resulting in reduced matching efficiency, the three-dimensional point cloud map can be constructed based solely on static objects in the environmental images, such as tables, chairs, walls, etc.

[0069] In the embodiments of this specification, since the monitoring device is a single device, it is impossible to determine the extrinsic parameter matrix of the monitoring device based on the historical monitoring images captured by the monitoring device, and therefore it is impossible to locate the three-dimensional coordinates of the target object. Therefore, a three-dimensional point cloud map corresponding to the monitoring range of the monitoring device can be pre-constructed, and the three-dimensional coordinates of key points in the historical monitoring images can be determined using the three-dimensional point cloud map.

[0070] Specifically, the server first acquires several environmental images of the monitoring equipment's surroundings using acquisition devices, and then constructs an initial 3D point cloud map based on these images. This initial 3D point cloud map contains several 3D points, and each 3D point can include its 3D coordinates and corresponding image features. Since the 3D point cloud map is constructed from these environmental images, each 3D point corresponds to at least two environmental images. Therefore, the image features of the 3D point can be determined using the environmental images containing the corresponding pixels, for example, the feature descriptor of a 3D point in image A. After the initial 3D point cloud map is constructed, the extracted key point can be matched with the initial 3D point cloud map for image feature matching.

[0071] However, in the embodiments of this specification, since only the three-dimensional points in the three-dimensional point cloud map within the monitoring range can be used to determine the three-dimensional coordinates of the key points, and the initial three-dimensional point cloud map constructed based on the environmental images around the monitoring device may contain areas outside the monitoring range, in order to avoid interference and improve the efficiency of image feature matching, the monitoring range of the monitoring device can be determined first in the area corresponding to the initial three-dimensional point cloud map, and the key point can be matched with the three-dimensional point cloud map corresponding to the determined monitoring range of the monitoring device for image feature matching.

[0072] For example, to locate people entering and exiting an office on a floor of a building, the extrinsic and intrinsic parameter matrices of the monitoring equipment for that office need to be adjusted. An initial 3D point cloud map of the floor containing the office is constructed, resulting in initial point cloud map A. The area corresponding to the office in this initial point cloud map is designated as B. Monitoring images captured by the monitoring equipment in that office are acquired, and key point features of these images are determined. These key points can be matched with the 3D points in A. To avoid interference and improve the efficiency of image feature matching, these key points can also be matched with the 3D points in B.

[0073] S102: Perform image feature matching between the key point and the three-dimensional points in the three-dimensional point cloud map to determine the three-dimensional points that match the key point, and determine the three-dimensional coordinates of the key point based on the three-dimensional coordinates of the matched three-dimensional points.

[0074] In the embodiments of this specification, the three-dimensional coordinates of the key point can be determined by performing image feature matching between the key point and the three-dimensional points in the three-dimensional point cloud. The specific method is as follows: determine the similarity between the key point and each three-dimensional point; among each similarity, determine the three-dimensional point with a similarity greater than a preset similarity threshold, and use the three-dimensional coordinates of the three-dimensional point as the three-dimensional coordinates of the key point; when there are multiple three-dimensional points with a similarity greater than the preset similarity threshold, among each similarity greater than the preset similarity threshold, determine the three-dimensional coordinates of the three-dimensional point corresponding to the highest similarity, and use the three-dimensional coordinates of the three-dimensional point as the three-dimensional coordinates of the key point.

[0075] However, if there are multiple 3D points with a similarity greater than the preset similarity threshold, it indicates that there may be errors in matching the key point with the 3D point. Therefore, in order to improve the accuracy of image feature matching, the key point should be removed and the key point should be re-determined.

[0076] S103: Based on the determined three-dimensional coordinates of the key points, determine the external parameter matrix of the monitoring device, and based on the three-dimensional points matched with the key points belonging to the ground, determine the ground depth function of the monitoring device.

[0077] After determining the 3D coordinates of key points, the extrinsic parameter matrix of the monitoring device is determined using these coordinates. However, the specific 3D coordinates of the target object cannot be determined solely based on its 2D coordinates, the extrinsic parameter matrix, and the intrinsic parameter matrix. Therefore, it is necessary to identify ground-based key points from the historical monitoring images. Based on the 3D coordinates of these ground-based key points, the ground depth function of the monitoring device is determined. This ground depth function reflects the depth mapping relationship between the monitoring device and the ground. At this point, the server can determine the 3D coordinates of the target object based on its 2D coordinates, the extrinsic parameter matrix, the intrinsic parameter matrix, and the ground depth function. The intrinsic parameter matrix of the monitoring device refers to its focal length, pixel size, etc., and can be preset based on actual conditions or determined from the 3D coordinates of the key points. The extrinsic parameter matrix refers to the monitoring device's position, rotation direction, etc.

[0078] like Figure 2 As shown, key points belonging to the ground were identified from among the key points in the historical surveillance image. Figure 2 (The dashed box portion) Specifically, after determining the 3D coordinates of each key point, the RANSAC algorithm can be used to determine the key points belonging to the ground based on the discreteness of each 3D coordinate. Semantic segmentation can also be performed on the acquired historical surveillance images to determine the ground regions within the images. Based on the 2D coordinates of these ground regions and a pre-constructed 3D point cloud map, the 3D coordinates of the ground regions can then be determined. Figure 2This display shows a top-down view of the key points within the monitoring range of the monitoring equipment after the 3D coordinates of the key points have been determined. The solid lines represent the monitoring range, and the key points are defined by... Figure 2 The dots in the figure represent key points of the identified ground area, and the dashed box represents key points of the identified ground area.

[0079] After determining the three-dimensional coordinates of the ground area, a nonlinear optimization library, such as Ceres optimization, can be used to determine the ground depth function of the monitoring device. The depth in the ground depth function refers to the distance between the monitoring device and three-dimensional points within the area captured by the device, such as... Figure 3 As shown. Figure 3 This is a side view of each key point within the monitoring range of the monitoring equipment. After identifying each key point, the location of each key point is observed to determine which part is the key point of the ground area.

[0080] exist Figure 3 In the image, the key points within the dashed box are densely distributed and all lie on the same plane, indicating that this area represents the key points of the ground region. After determining the location of the key points in the ground region, their three-dimensional coordinates are further determined. Based on these coordinates, the distances between the key points in each ground region within the monitoring range and the monitoring equipment can be determined, thereby establishing the ground depth function for the monitoring equipment. Figure 3 The plane on which the ground is located is the xy plane, and the direction pointed to by the arrow is the z-axis. The three-dimensional coordinates of each key point are determined using this coordinate system.

[0081] S104: In response to a positioning request, determine the target object in the real-time monitoring image collected by the monitoring device, and determine the three-dimensional coordinates of the target object based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

[0082] In steps S100-S103, by determining the three-dimensional coordinates of key points in the historical monitoring images collected by the monitoring equipment, the intrinsic and extrinsic parameter matrices of the monitoring equipment can be determined. After the intrinsic and extrinsic parameter matrices are determined, the server receives a positioning request, determines the real-time monitoring images collected by the monitoring equipment, identifies the target object in the real-time monitoring images, determines the two-dimensional coordinates of the target object in the real-time monitoring images, and determines the three-dimensional coordinates of the target object based on the two-dimensional coordinates, the intrinsic and extrinsic parameter matrices of the monitoring equipment, and the ground depth function, so as to realize the positioning of the target object.

[0083] In the embodiments of this specification, the positioning request can be sent by the user to the server through the terminal, so that the server can obtain the real-time monitoring image collected by the monitoring device and determine the three-dimensional coordinates or motion trajectory of the target object in the real-time monitoring image.

[0084] For example, if a user wants to know the real-time location of target object C, after receiving the location request from the user, the server obtains the real-time monitoring image collected by the monitoring device at that moment, determines the two-dimensional coordinates of target object C in the real-time monitoring image, and determines the three-dimensional coordinates of target object C at that moment based on the two-dimensional coordinates, the intrinsic parameter matrix of the monitoring device, and the extrinsic parameter matrix of the monitoring device. The server then sends the determined three-dimensional coordinates of target object C at that moment to the terminal and displays them to the user.

[0085] In the embodiments of this specification, the target object can be either a person or an object. When the target object is an object, for example, when it is necessary to count a batch of goods entering a warehouse, the target object can be determined to be goods. The server can then pre-construct an initial 3D point cloud map based on images of the surrounding environment of the warehouse, acquire historical monitoring images collected by monitoring devices covering the area where the batch of goods is located, and extract key points from these historical monitoring images. To improve the accuracy of target object positioning, these historical monitoring images are monitoring images of empty shelves without goods. Image feature matching is performed between these historical monitoring images and the 3D point cloud map corresponding to the area where the batch of goods is located in the initial 3D point cloud map, determining the 3D coordinates of the key points. Then, based on these 3D coordinates, the intrinsic parameter matrix, extrinsic parameter matrix, and the determined ground depth function of the monitoring device are determined. When a positioning request is received, the server can respond to the positioning request, determine the 2D coordinates of the goods based on the area of ​​the goods in the real-time monitoring image, and then determine the 3D coordinates of the goods. Based on the determined 3D coordinates, the inventory of goods in the warehouse can be performed.

[0086] By inventorying the goods in the warehouse, the quantity and location of the goods can be determined. Based on this, warehouse staff can be rationally allocated, with more staff assigned to areas storing large quantities of goods. Furthermore, the real-time quantity of each batch of goods can be determined and recorded based on their location data, enabling updates to warehouse inventory data.

[0087] In the target object positioning method provided in this specification, key points are extracted from the acquired historical monitoring images. The extracted key points are then matched with three-dimensional points in a pre-constructed three-dimensional point cloud to determine the three-dimensional coordinates of the key points. The extrinsic parameter matrix of the monitoring device is determined using these three-dimensional coordinates. When responding to a positioning request to locate a target object, a real-time monitoring image is acquired, and the two-dimensional coordinates of the target object in the real-time monitoring image are determined. Based on these two-dimensional coordinates, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the determined ground depth function, the three-dimensional coordinates of the target object are determined, thereby achieving target object positioning and improving the accuracy of target object positioning.

[0088] In the embodiments of this specification, to determine the three-dimensional coordinates of key points in historical monitoring images, an initial three-dimensional point cloud map can be pre-constructed as needed. Environmental images around the monitoring device are acquired using a data acquisition device. Based on the pose of the data acquisition device when acquiring the environmental images, the pose corresponding to each environmental image is determined. Then, based on each environmental image and its corresponding pose, an initial three-dimensional point cloud map is constructed, such as... Figure 4 As shown. Based on the initial 3D point cloud map, the 3D point cloud map corresponding to the monitoring range of the monitoring device is determined, and based on at least some environmental images, the image features corresponding to the 3D points in the 3D point cloud map are determined.

[0089] Then, when determining the extrinsic parameter matrix of the monitoring equipment, the image features of key points in the historical monitoring images can be determined. Then, by matching the image features of the key point with the image features corresponding to the 3D point, the 3D coordinates of the key point can be determined.

[0090] Since the 3D point cloud map is constructed from several environmental images, after determining the 3D point cloud map, the positions of the 3D points in the 3D point cloud map within the environmental images can be determined. Using these positions as centers, the images of the surrounding areas can then be identified. When determining the matching relationship between 3D points and key points, similarity calculations are performed between the images of the area surrounding the 3D point and the images of the areas surrounding key points in the historical monitoring images. 3D points with high image similarity are matched with the key points, and their 3D coordinates are then used as the 3D coordinates of the key point.

[0091] In the embodiments of this specification, after receiving a positioning request, the server can acquire a real-time monitoring image and determine the two-dimensional coordinates of the target object in the real-time monitoring image. Since the target object is usually a person or object, rather than a single pixel, the area where the target object is located in the real-time monitoring image contains several pixels. It is necessary to determine one pixel among the several pixels and use the two-dimensional coordinates of that pixel as the two-dimensional coordinates of the target object.

[0092] By identifying a single pixel among a set of pixels, semantic segmentation can be performed on the real-time monitoring image to determine the image region of the target object within that image. Figure 5As shown in Figure 5, the image region of the target object determined by semantic segmentation is usually an irregularly shaped region, i.e., the region filled with diagonal lines. Therefore, to improve the accuracy of target object localization and reduce errors, the two-dimensional coordinates of the lowest pixel at the bottom edge of this image region can be used as the two-dimensional coordinates of the target object. When there are multiple lowest pixels at the bottom edge of the image region, the two-dimensional coordinates of the lowest pixel closest to the midpoint of the bottom edge of the image region can be used as the two-dimensional coordinates of the target object. The bottom edge of this image region is usually located at the foot or bottom of the target object.

[0093] Alternatively, the real-time monitoring image can be input into a pre-trained model, which can output the bounding box of the target object. Since the size of the bounding box can be preset, when there is no gap between the bounding box and the target object, such as... Figure 6 As shown. The two-dimensional coordinates of the pixel at the midpoint of the lower edge of the bounding box of the target object can be used as the two-dimensional coordinates of the target object; when there is a gap between the bounding box and the target object, as shown... Figure 7 As shown, the coordinates of a pixel can be determined by moving vertically upwards by a preset distance threshold and the pixel at the midpoint of the lower edge of the bounding box of the target object, using that pixel as the starting point. These coordinates are then used as the two-dimensional coordinates of the target object. Various methods exist for determining the two-dimensional coordinates of the target object, and this specification does not impose any limitations.

[0094] When a target object is obscured by other objects in a real-time monitoring image, making it impossible to determine its two-dimensional coordinates and consequently its three-dimensional coordinates, image recognition can be performed on the target object to determine its category. Simultaneously, the two-dimensional coordinates of the pixels at the target object's upper edge are determined. Based on a preset threshold corresponding to that category, the vertical coordinate of these two-dimensional coordinates is subtracted from the preset threshold, while the horizontal coordinate remains unchanged, thus determining the estimated coordinates of the target object. These estimated coordinates are then used as the two-dimensional coordinates of the pixels at the point where the target object's feet or bottom connect to the ground. Based on these estimated coordinates, the two-dimensional coordinates of the target object can be determined, which in turn allows for the further determination of its three-dimensional coordinates. For example, if target object 'a' is a person whose lower body is obscured, its two-dimensional coordinates cannot be determined based on its foot coordinates. Therefore, image recognition can be performed on 'a' to determine its gender. Assuming 'a' is female, the two-dimensional coordinates of the pixels at 'a's head are determined. Based on a preset threshold for female, the estimated coordinates of 'a' are determined. These estimated coordinates are then used as the foot coordinates to determine the two-dimensional coordinates of 'a', and subsequently, its three-dimensional coordinates.

[0095] In the embodiments of this specification, in addition to the location request sent by the user to the server through the terminal, the location request can also be automatically created and sent to the server according to business needs. For example, suppose that when sales are lower than a preset risk value, it is necessary to automatically analyze customer traffic. When the business server determines that the sales are lower than the preset risk value, it will initiate the corresponding business process, create a location request, and send the location request to the server.

[0096] In step S104, the server responds to the location request, locates the target object, and records the location result and movement trajectory of the target object. When the target object is a person, the location result and movement trajectory of the target object can be used to statistically analyze the pedestrian traffic in the area where the target object is located; when the target object is an object, such as goods in a warehouse, the location result and movement trajectory of the target object can be used to inventory the goods in the warehouse.

[0097] In the embodiments of this specification, the number of monitoring devices can be one or more. The above method will be described below with the example of multiple monitoring devices.

[0098] When multiple monitoring devices are installed within the area where a target object is located, after the server receives the location request, it acquires real-time monitoring images collected by each device at the same time. These images are then filtered to obtain those containing the target object. For example, if the target object is pedestrians in a shopping mall, images that do not contain pedestrians are filtered out from the acquired real-time monitoring images, and the 3D coordinates of each target object contained in each of the resulting real-time monitoring images are determined. Because there are multiple monitoring devices, their monitoring ranges may overlap. Different monitoring devices may capture the same target object in real-time monitoring images at the same time. Storing the determined 3D coordinates of each target object without deduplication would not only waste storage space but also affect the subsequent determination of the target object's movement trajectory and the statistics of the target's logistics volume. Therefore, when there are multiple monitoring devices, after determining the 3D coordinates of each target object, deduplication is necessary, and the 3D coordinates of each target object are stored based on the deduplication result.

[0099] After determining the 3D coordinates of each target object, deduplication of each target object can be performed by identifying the target object in each real-time monitoring image, determining the image region of the target object in each real-time monitoring image, determining the image similarity between target objects based on the image region, and identifying target objects whose image similarity between target objects is not less than a preset second threshold as candidate target objects. Based on the 3D coordinates of the candidate target objects, identifying candidate target objects whose distance is not greater than a preset first threshold as identical target objects. The 3D coordinates corresponding to the identical target objects are then deduplicated, retaining only one 3D coordinate as the 3D coordinates of the identical target object.

[0100] Alternatively, the distance between each target object can be determined based on its three-dimensional coordinates. Target objects whose distance is not greater than a preset first threshold are designated as potential targets. The real-time monitoring image containing each potential target object is determined, and target object identification is performed on this image. The image region of each potential target object in the real-time monitoring image is determined. Based on the image region, the image similarity between each potential target object is determined, and those potential target objects whose image similarity is not less than a preset second threshold are designated as identical targets. The three-dimensional coordinates corresponding to these identical targets are deduplicated, and only one three-dimensional coordinate is retained as the three-dimensional coordinate of the identical target. The first threshold can be preset based on the target object flow rate within the monitoring range in the past. The larger the target object flow rate within the monitoring range, the smaller the first threshold is set; the smaller the target object flow rate within the monitoring range, the larger the first threshold is set. The second threshold can be preset according to specific circumstances.

[0101] In the embodiments described in this specification, not only can the target object be located, but the movement trajectory of the target object can also be recorded.

[0102] After determining the 3D coordinates of each target object at the current moment and storing the deduplicated 3D coordinates of the target objects, according to a preset time, the real-time monitoring images collected by each monitoring device at the next moment are acquired. The real-time monitoring images are filtered to obtain real-time monitoring images containing the target objects. The 3D coordinates of each target object at this moment are determined, deduplicated, and the deduplicated 3D coordinates of the target objects are stored. The real-time monitoring images of each target object stored at the previous moment are acquired as the first monitoring image, and the real-time monitoring images of each target object stored at this moment are acquired as the second monitoring image. Target object identification is performed on the first monitoring image and the second monitoring image to determine the similarity between the identified target objects. For each target object, the target object with the highest similarity is identified as the similar target object, and the 3D coordinates of the target object are associated with the 3D coordinates of the similar target object. The motion trajectory of each target object is determined based on the associated 3D coordinates.

[0103] The above describes a target positioning method provided by one or more embodiments of this specification. Based on the same idea, this specification also provides a corresponding target positioning device, such as... Figure 8 As shown.

[0104] Figure 8 This specification provides a schematic diagram of a target positioning device, which specifically includes:

[0105] The acquisition module 801 is used to acquire historical monitoring images collected by the monitoring equipment and determine the image features of key points in the historical monitoring images;

[0106] The range determination module 802 is used to determine the three-dimensional point cloud map corresponding to the pre-constructed monitoring range of the monitoring equipment;

[0107] The matching module 803 is used to perform image feature matching between the key point and the three-dimensional points in the three-dimensional point cloud map, determine the three-dimensional points that match the key point, and determine the three-dimensional coordinates of the key point based on the three-dimensional coordinates of the matched three-dimensional points.

[0108] The extrinsic parameter determination module 804 is used to determine the extrinsic parameter matrix of the monitoring device based on the determined three-dimensional coordinates of the key points, and to determine the ground depth function of the monitoring device based on the three-dimensional points matched by the key points belonging to the ground.

[0109] The coordinate determination module 805 is used to respond to a positioning request, determine the target object in the real-time monitoring image collected by the monitoring device, and determine the three-dimensional coordinates of the target object based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

[0110] Optionally, the range determination module 802 is specifically used to acquire environmental images collected around the monitoring device; determine the pose corresponding to each environmental image based on the pose of the acquisition device when acquiring the environmental images; construct an initial three-dimensional point cloud map based on each environmental image and the pose corresponding to each environmental image; determine the three-dimensional point cloud map corresponding to the monitoring range of the monitoring device based on the initial three-dimensional point cloud map, and determine the image features corresponding to the three-dimensional points in the three-dimensional point cloud map based on at least some of the environmental images.

[0111] Optionally, the matching module 803 is specifically used to match the image features of each key point with the image features of each three-dimensional point in the three-dimensional point cloud map to determine the similarity between the key point and each three-dimensional point; and to take the three-dimensional coordinates of the three-dimensional point with the highest similarity as the three-dimensional coordinates of the key point.

[0112] Optionally, the coordinate determination module 805 is specifically used to identify the target object in the real-time monitoring image, determine the image region of the target object in the real-time monitoring image; determine the two-dimensional coordinates of the target object in the real-time monitoring image based on the lower edge of the image region in the real-time monitoring image; and determine the three-dimensional coordinates of the target object based on the two-dimensional coordinates, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

[0113] Optionally, multiple monitoring devices are installed in the area where the target object is located;

[0114] The device further includes:

[0115] The deduplication module 806 is used to respond to the positioning request, determine the real-time monitoring images collected by each monitoring device at the same time, and determine the three-dimensional coordinates of each target object in each real-time monitoring image; perform deduplication of the target objects based on the three-dimensional coordinates of each target object; and store the three-dimensional coordinates of the deduplicated target objects.

[0116] Optionally, the deduplication module 806 is specifically used to determine, based on the three-dimensional coordinates of each target object, each target object whose distance between them is not greater than a preset first threshold, and to identify them as identical target objects; to deduplicate the three-dimensional coordinates corresponding to the identical target objects, and to retain one three-dimensional coordinate as the three-dimensional coordinate of the identical target object.

[0117] Optionally, the deduplication module 806 is specifically used to identify target objects in each real-time monitoring image, determine the image region of the target object in each real-time monitoring image; determine the image similarity between target objects based on the image region; determine each target object whose image similarity between the target objects is not less than a preset second threshold as a candidate target object; and determine each candidate target object whose distance between the candidate target objects is not greater than a preset first threshold based on the three-dimensional coordinates of the candidate target objects as the same target object.

[0118] This specification also provides a computer-readable storage medium storing a computer program that can be used to execute the above-described... Figure 1 The method provided for locating the target object.

[0119] The embodiments in this specification also propose Figure 9 The diagram shows a schematic structural representation of the electronic device. Figure 9 At the hardware level, this electronic device includes a processor, internal bus, network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then executes it to achieve the above. Figure 1The target object localization method described herein. Of course, in addition to software implementation, this specification does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to individual logic units, but can also be hardware or logic devices.

[0120] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0121] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0122] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0123] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.

[0124] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0125] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0126] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0127] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0128] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0129] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0130] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0131] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, pharmaceutical product, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, pharmaceutical product, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, pharmaceutical product, or apparatus that includes said element.

[0132] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0133] This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0134] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

[0135] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. A method for locating a target object, characterized in that, The method includes: Acquire historical monitoring images collected by monitoring equipment, and determine the image features of key points in the historical monitoring images; Determine the three-dimensional point cloud map corresponding to the monitoring range of the pre-constructed monitoring equipment; The key point is matched with the three-dimensional points in the three-dimensional point cloud to determine the three-dimensional point that matches the key point, and the three-dimensional coordinates of the key point are determined based on the three-dimensional coordinates of the matched three-dimensional point. Based on the determined three-dimensional coordinates of the key points, the external parameter matrix of the monitoring device is determined, and based on the three-dimensional points matched with the key points belonging to the ground, the ground depth function of the monitoring device is determined. In response to a positioning request, a target object in the real-time monitoring image acquired by the monitoring device is identified, and the three-dimensional coordinates of the target object are determined based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

2. The method as described in claim 1, characterized in that, The constructed 3D point cloud map corresponding to the monitoring range of the monitoring equipment specifically includes: Acquire environmental images around the monitoring equipment; The pose of each environmental image is determined based on the pose of the acquisition device when acquiring the environmental images. Based on each environmental image and its corresponding pose, an initial 3D point cloud map is constructed. Based on the initial three-dimensional point cloud map, a three-dimensional point cloud map corresponding to the monitoring range of the monitoring device is determined, and based on at least some environmental images, the image features corresponding to the three-dimensional points in the three-dimensional point cloud map are determined.

3. The method as described in claim 1, characterized in that, The key point is matched with the 3D points in the 3D point cloud image to determine the 3D point that matches the key point, and the 3D coordinates of the key point are determined based on the 3D coordinates of the matched 3D point. Specifically, this includes: For each key point, the image features of the key point are matched with the image features of each three-dimensional point in the three-dimensional point cloud to determine the similarity between the key point and each three-dimensional point. The 3D coordinates of the 3D point with the highest similarity are used as the 3D coordinates of the key point.

4. The method as described in claim 1, characterized in that, The three-dimensional coordinates of the target object are determined based on its two-dimensional coordinates in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring equipment, and the ground depth function. Specifically, this includes: Target object identification is performed on the real-time monitoring image to determine the image region of the target object in the real-time monitoring image; The two-dimensional coordinates of the target object in the real-time monitoring image are determined based on the lower edge of the image region in the real-time monitoring image; The three-dimensional coordinates of the target object are determined based on the two-dimensional coordinates, the intrinsic and extrinsic parameter matrices of the monitoring device, and the ground depth function.

5. The method as described in claim 1, characterized in that, Multiple monitoring devices were installed in the area where the target object was located; The method further includes: In response to a positioning request, determine the real-time monitoring images collected by each monitoring device at the same time, and determine the three-dimensional coordinates of each target object in each real-time monitoring image; Based on the three-dimensional coordinates of each target object, duplicate targets are removed; Store the three-dimensional coordinates of the target object after deduplication.

6. The method as described in claim 5, characterized in that, Based on the three-dimensional coordinates of each target object, deduplication is performed, specifically including: Based on the three-dimensional coordinates of each target object, target objects whose distance between them is no greater than a preset first threshold are identified as the same target objects. The three-dimensional coordinates corresponding to the same target are deduplicated, and one three-dimensional coordinate is retained as the three-dimensional coordinate of the same target.

7. The method as described in claim 6, characterized in that, Targets whose distances to each other are no greater than a preset first threshold are identified as identical target objects. Specifically, this includes: Target object identification is performed on each real-time monitoring image to determine the image region of the target object in each real-time monitoring image; Based on the image regions, determine the image similarity between target objects; Each target object whose image similarity to the target objects is not less than a preset second threshold is identified as a target object to be determined. Based on the three-dimensional coordinates of the target objects, each target object whose distance from the target objects is not greater than a preset first threshold is identified as the same target object.

8. A target positioning device, characterized in that, The device includes: The acquisition module is used to acquire historical monitoring images collected by the monitoring equipment and determine the image features of key points in the historical monitoring images; The range determination module is used to determine the three-dimensional point cloud map corresponding to the pre-constructed monitoring range of the monitoring equipment; The matching module is used to perform image feature matching between the key point and the three-dimensional points in the three-dimensional point cloud map, determine the three-dimensional points that match the key point, and determine the three-dimensional coordinates of the key point based on the three-dimensional coordinates of the matched three-dimensional points. The extrinsic parameter determination module is used to determine the extrinsic parameter matrix of the monitoring device based on the determined three-dimensional coordinates of the key points, and to determine the ground depth function of the monitoring device based on the three-dimensional points matched by the key points belonging to the ground. The coordinate determination module is used to respond to a positioning request, determine the target object in the real-time monitoring image collected by the monitoring device, and determine the three-dimensional coordinates of the target object based on the two-dimensional coordinates of the target object in the real-time monitoring image, the intrinsic parameter matrix and extrinsic parameter matrix of the monitoring device, and the ground depth function.

9. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the method described in any one of claims 1-7.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method described in any one of claims 1-7.