Object positioning system and object positioning method

By combining instance segmentation models and historical trajectories, the problem of inconsistent object positions caused by differences in images captured by different cameras is solved, improving the accuracy of object positioning and the safety of self-propelled devices.

CN122244824APending Publication Date: 2026-06-19LITE ON TECH CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
LITE ON TECH CORP
Filing Date
2025-11-03
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, differences in images captured by different cameras lead to inconsistent determination of object positions, reducing positional accuracy and increasing the risk of collisions.

Method used

An instance segmentation model is used to generate a mask in the image frame and project it onto the top view plane of the global coordinate system. By identifying the front edge and reference position, the measurement position is generated. Combined with the historical trajectory and the predicted position, the distance is calculated to update the entity position.

Benefits of technology

It improves the accuracy of object positioning, reduces the demand for computing resources, and enhances the safety of self-propelled devices.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244824A_ABST
    Figure CN122244824A_ABST
Patent Text Reader

Abstract

This invention proposes an object positioning system and method. The object positioning system includes a processing device, a sensing camera, and a memory. The sensing camera is coupled to the processing device and mounted on a self-propelled device to generate image frames. The processing device executes computer-readable program code stored in the memory to: generate a mask of an entity in the image frame and determine its category using an instance segmentation model; project the mask onto a top view plane of a global coordinate system to generate a projection mask; identify the front edge of the projection mask relative to the sensing camera; determine a reference position corresponding to the front edge; and generate a measured position of the entity on the top view plane based on the reference position and the entity category.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to image analysis technology, and more particularly to an object positioning system and object positioning method. Background Technology

[0002] Bounding box representation is a common method used in vehicle processing devices to determine the position or movement of surrounding objects. In existing methods, the processing device selects a specific point in an image corresponding to the bounding box of the object to determine its position in space. However, because different cameras capturing the same object may produce different results, the position determined based on images from different cameras may be inconsistent. Therefore, the accuracy of the object's position is reduced, potentially increasing the risk of collision.

[0003] Therefore, an object positioning system and its object positioning method are needed to address the above challenges. Summary of the Invention

[0004] One embodiment of the present invention provides an object positioning system, which includes a processing device, a sensing camera, and a memory. The sensing camera is coupled to the processing device and disposed on a self-propelled device, wherein the sensing camera is configured to generate an image frame. The memory includes computer-readable program code executable by the processing device.

[0005] The processing device executes the computer-readable program code to generate a mask of an entity in the image frame using an instance segmentation model and determine the category of the entity. The processing device further projects the mask onto a top-view (BEV) plane of a global coordinate system associated with the self-propelled device to generate a projected mask. On the top-view plane, the processing device identifies a front edge relative to the sensing camera. The processing device further determines a reference position corresponding to the front edge, wherein the reference position includes at least one set of coordinates representing the entity on the top-view plane. Based on the reference position and the entity's category, the processing device further generates a measured position of the entity on the top-view plane.

[0006] Furthermore, the memory stores a historical trajectory including a previous position of the entity on the top-view plane, and the computer-readable code is executable by the processing device to generate a predicted position on the top-view plane based on the historical trajectory. The processing device further calculates a distance between the measured position and the predicted position. The processing device further associates the measured position with the predicted position to obtain an updated position of the entity on the top-view plane, provided that the distance between the measured position and the predicted position meets a second predetermined condition.

[0007] Another embodiment of the present invention provides an object positioning method, performed by a processing device, wherein the method includes generating an image frame by a sensing camera disposed on a self-propelled device. The method further includes generating a mask in the image frame using an instance segmentation model and determining the category of the entity. The method further includes projecting the mask onto a top view plane of a global coordinate system associated with the self-propelled device to generate a projection mask. The method further includes identifying a front edge of the projection mask relative to the sensing camera on the top view plane. The method further includes determining a reference position corresponding to the front edge, wherein the reference position includes at least one set of coordinates representing the entity on the top view plane. The method further includes generating a measured position of the entity on the top view plane based on the reference position and the category of the entity. Attached Figure Description

[0008] This invention can be more fully understood by referring to the accompanying drawings and reading the following detailed description and examples, wherein: Figure 1 This illustrates a scenario where a self-propelled device, according to an embodiment of the present disclosure, measures surrounding entities.

[0009] Figure 2 This diagram shows a block diagram of a processing apparatus for object positioning according to an embodiment of the present disclosure.

[0010] Figure 3 and Figure 4 This illustrates an object positioning method according to an embodiment of the present disclosure.

[0011] Figures 5A to 5D This describes a procedure for obtaining the front edge of an entity according to an embodiment of this disclosure.

[0012] Figure 6A and Figure 6B This describes a procedure for obtaining the front pixel group of the front edge according to an embodiment of the present disclosure.

[0013] Figures 7A to 7E This shows a procedure for obtaining the optimal rectangle according to an embodiment of the present disclosure.

[0014] Figures 8A to 8C This displays a program for generating a resized rectangle according to an embodiment of the present disclosure.

[0015] Figure 9 This invention illustrates a method for tracking and predicting the location of entities according to embodiments of the present disclosure.

[0016] Figure 10A and Figure 10B Displaying historical trajectories of entities with different predicted locations according to embodiments of this disclosure.

[0017] The attached figures are labeled as follows: 10: Self-propelled device 12, 14, 16, 18: Perception Camera 22,24,26,28: Entities 200: Block Diagram 210: Perception Camera 220: Processing device 222: Instance Segmentation Model 224: Memory IM: Image Frame CT: Entity Category CP: Camera pose HT: Historical Trajectory PL: Predicted Location ML: Measurement location 300: Object Positioning Method 302, 304, 306, 308, 310, 312: Steps 400: Object Positioning Method 402, 404, 406, 408, 410: Steps 510a, 520a: Entities 510b, 520b: Mask 510c, 520c: Projection mask PC1, PC2, PC3, PC4: Reference Points 530: Front edge 600a, 600b, 600c, 600d: pixels 610a, 610b: Dashed lines 620: Front pixel group 700:convex hull 700a, 700b, 700c, 700d: pixels 710a, 710b: Dashed lines 720: Front edge 730, 740: Candidate rectangles 750a, 750b, 750c, 750d: pixels / rectangle points D1, D2, D3, D4: Distance 810: Optimal Rectangle 812, 814: Midpoint 820: Adjusted rectangle A1, A2: Angles AB, BC: Sides of the rectangle 900: Tracking and Prediction Methods 902, 904, 906, 908, 910: Steps PD: Pre-determined distance PL1, PL2: Predicted location Detailed Implementation

[0018] The following description is intended to illustrate the general principles of the invention and should not be construed as limiting. The scope of the invention should be determined by the appended claims.

[0019] Figure 1 The illustration shows a scenario where a self-propelled apparatus 10, according to an embodiment of this disclosure, measures surrounding entities 22, 24, 26, and 28. Four sensing cameras 12, 14, 16, and 18 are mounted on the self-propelled apparatus 10 to generate image frames sufficient to cover the surrounding environment of entities 22, 24, 26, and 28. Figure 1 As shown, sensing cameras 12, 14, 16, and 18 are respectively mounted on the front, left, right, and rear sides of the self-propelled device 10. In other embodiments, different numbers of cameras can be mounted on the self-propelled device 10. Furthermore, the cameras can also be mounted on... Figure 1 Different locations other than those shown.

[0020] The self-propelled device 10 may be a self-driving vehicle, and the sensing cameras 12, 14, 16, and 18 may be fisheye cameras mounted on the self-driving vehicle. Each of the sensing cameras 12, 14, 16, and 18 can generate an image frame including one or more entities selected from entities 22, 24, 26, and 28, and output these image frames to a processing device within the self-propelled device 10 to perform object localization. Since in this embodiment, the sensing cameras 12, 14, 16, and 18 are implemented as fisheye cameras and configured to generate wide-angle images, image frames generated by different sensing cameras may include the same entities. For example, image frames generated by sensing cameras 12 and 16 may both include entity 24, while image frames generated by sensing cameras 16 and 18 may both include entity 26. By utilizing image frames with overlapping fields of view, the same entities can be captured by different cameras, thereby improving the accuracy of object localization.

[0021] Figure 2A block diagram 200 shows a processing apparatus 220 for object localization according to an embodiment of the present disclosure. A sensing camera 210 is mounted on a self-propelled device 10 to capture an entity 28 and generate an image frame IM. The image frame IM is then output to the processing apparatus 220 for object localization, further details of which are described below. The processing apparatus 220 includes an instance segmentation model 222 for determining the category CT of the entity 28. A memory 224 is configured to store computer-readable program code executable by the processing apparatus 220. Furthermore, the memory 224 also stores historical trajectories HT, predicted positions PL, and measured positions ML for trajectory prediction of the entity.

[0022] Memory 224 is also configured to store a camera attitude CP for projecting image frames IM onto the bird-eye-view (BEV) plane. The camera attitude CP includes extrinsic and intrinsic parameters of the sensed cameras 12, 14, 16, and 18 relative to the global coordinate system. Extrinsic parameters include height information (e.g., the distance between the sensed camera and the ground), horizontal position, and orientation information (e.g., the orientation of the self-propelled device). When the computer-readable program code is executed, the processing device 220 performs object localization on entity 28, further details of which can be found in [reference needed]. Figure 3 and Figure 4 .

[0023] Figure 3 and Figure 4 Methods 300 and 400 for positioning objects according to embodiments of the present disclosure are shown respectively. Figure 3 The method 300 shown represents the overall procedure executed by the processing device 220 for object localization. In step 302, each of the sensing cameras 12, 14, 16, and 18 generates image frames IM for at least one entity 22, 24, 26, and 28. In step 304, the image frames IM are provided to the processing device 220 to determine the category CT of the entity 22, 24, 26, or 28 included in the image frames IM. Using the instance segmentation model 222, the processing device 220 generates a mask for each included entity 22, 24, 26, and 28, and determines the category CT based on these masks (see [link to example]). Figure 5B In step 306, the processing device 220 projects the masks of entities 22, 24, 26, and 28 onto the top view (BEV) plane according to the camera pose CP stored in the memory 224 (see...). Figure 5C ).

[0024] After the masks are projected, in step 308, the processing device 220 identifies the front-facing edges of each projection mask relative to the sensing cameras 12, 14, 16, and 18 (see...). Figure 5D The front edge of the projection mask is determined by extracting the boundary contour of the projection mask. Then, in step 310, the processing device 220 determines, on the top view plane, the reference position of the front edge of entity 22, 24, 26, or 28 corresponding to the front edge on the BEV plane. In step 312, the processing device 220 generates the measurement position ML of entity 22, 24, 26, or 28 based on the reference position and the CT category.

[0025] Figure 4 The detailed procedures for steps 310 and 312 are shown as method 400. In step 402, the processing device 220 identifies a set of frontal pixels from the boundary contour. Then, in step 404, the processing device 220 generates a convex hull to encompass the frontal pixel set. In step 406, the processing device 220 uses a method similar to that used to identify the frontal pixel set to identify the front edge of the convex hull (hereinafter referred to as...). Figure 6A and Figure 6B (Further explanation). In step 408, the processing device 220 generates a plurality of candidate rectangles suitable for surrounding the front pixel group, and selects one of the candidate rectangles as the optimal rectangle to determine the reference position of the entity. In step 410, the processing device 220 resizes the optimal rectangle according to the category CT to generate a resized rectangle. Based on the resized rectangle, the processing device 220 determines the measurement position ML of the entity.

[0026] Through methods 300 and 400, the processing device 220 of this embodiment can identify the category, orientation, and distance of entities 22, 24, 26, and 28 relative to the self-propelled device 10. The processing device 220 employs instance segmentation instead of traditional bounding box methods. By measuring based on the boundary contours of entities, rather than solely on specific points of the bounding box, the accuracy of object localization is improved. Furthermore, methods 300 and 400 involve simple image processing, requiring fewer computational resources and exhibiting lower complexity compared to fully end-to-end deep learning methods.

[0027] The following will be aimed at Figures 5A to 8C A detailed explanation is provided to illustrate each step of method 300 and method 400.

[0028] Figures 5A to 5D The procedure for identifying the front edge 530 of entity 510a according to an embodiment of the present disclosure is shown. Figure 5A An image frame IM captured by a perception camera mounted on the vehicle is displayed, the image frame IM including entities 510a and 520a (step 302). Subsequently, as Figure 5BAs shown, image frame IM is provided to instance segmentation model 222 to generate masks 510b and 520b corresponding to entities 510a and 520a, respectively. After generating masks 510b and 520b, instance segmentation model 222 determines the category CT of each entity 510a and 520a based on its masks 510b and 520b. In this embodiment, both entities 510a and 520a can be classified as medium-sized vehicles (step 304).

[0029] exist Figure 5C In this process, masks 510b and 520b are projected onto the top view plane to generate projection masks 510c and 520c (step 306). In this embodiment, four sensing cameras are installed on the vehicle. These four sensing cameras are projected onto the top view plane and... Figure 5C The reference points are shown as PC1 to PC4. Since the image frame IM is generated by the sensing camera represented by reference point PC1, masks 510b and 520b are projected onto the top view plane using the camera pose CP of the sensing camera represented by reference point PC1. In other words, projection masks 510c and 520c are generated by extending projection lines from reference point PC1 on the top view plane based on the camera pose CP.

[0030] The top-view plane is a plane in the global coordinate system associated with the vehicle. Specifically, the processing device 220 determines the spatial transformation from the sensing camera's camera coordinate system to the global coordinate system based on the camera pose CP. Subsequently, the processing device 220 defines the top-view plane based on this spatial transformation. In one embodiment, such as Figure 5A As shown, the sensing camera that generates the image frame IM can be used as the center (or origin) of the top-view plane. In another embodiment, the top-view plane can be defined as the ground plane of the global coordinate system.

[0031] To make it clear, in Figure 5D Only reference point PC1 (representing the sensing camera used in this embodiment) and the front edge 530 (i.e., the main target of this embodiment) are shown. It should be noted that... Figure 5D The black line in the image represents the boundary contour of the front edge 530, which can be extracted using various methods. One method is to extend the front edge 530 outward by one pixel and generate another image. That is, the newly generated image will have a larger front edge. Then, the two images are subtracted, and the remaining pixels of the larger front edge form the boundary contour of the front edge 530 (step 308).

[0032] Figure 6Aand Figure 6B The flowchart illustrates the process of obtaining a front pixel group 620 of the front edge 530 according to an embodiment of the present disclosure. Object positioning requires knowing the orientation of entities 510a and 520a. Therefore, it is necessary to identify the side portions of entities 510a and 520a facing the sensing camera within the front edge 530. Since the side facing the sensing camera has the shortest distance among all sides, the following process is used to identify this side.

[0033] like Figure 6A As shown, after extracting the boundary contour of the front edge 530, multiple dashed lines extend from the reference point PC1 (only dashed lines 610a and 610b are shown in the figure). Each dashed line connects the reference point PC1 to one pixel of the boundary contour. The processing device 220 calculates the slope of each dashed line and the distance between each pixel and the reference point PC1. For dashed lines with the same slope, for example, the reference point PC1 is selected. Figure 6A Pixels 600a and 600b are both located on dashed line 610a, while pixels 600c and 600d are both located on dashed line 610b. Compared to their corresponding pixels 600b and 600d, pixels 600a and 600c are closer to reference point PC1, therefore pixels 600a and 600c are selected. After repeating the above process for each dashed line, as shown... Figure 6B As shown, a group of front pixels 620, including pixels 600a and 600c, will be selected.

[0034] Figures 7A to 7E The process of obtaining the optimal rectangle 810 according to an embodiment of this disclosure is shown. Figure 7A As shown, based on Figure 6B A convex hull 700 is generated from the displayed front pixel group 620 (step 404). Specifically, the convex hull 700 is the smallest convex polygon that encloses all front pixel groups 620. Various methods can be used to generate the convex hull based on a set of points, such as the Graham scan, the Quickhull algorithm, or the divide-and-conquer algorithm, but this disclosure is not limited thereto.

[0035] Similar to the process of identifying the side of an entity facing the sensing camera, such as... Figure 7B As shown, multiple dashed lines extend from the reference point PC1 (only dashed lines 710a and 710b are shown in the figure). Each dashed line connects the reference point PC1 to a pixel of the convex hull 700. The processing device 220 calculates the slope of each dashed line and the distance between each pixel and the reference point PC1. For pixels located on dashed lines with the same slope, the pixel with the smallest distance is selected (step 406).

[0036] For example, refer to Figure 7BPixels 700a and 700b are located on dashed line 710a, while pixels 700c and 700d are located on dashed line 710b. Compared to their corresponding pixels 700b and 700d, pixels 700a and 700c are closer to reference point PC1, therefore pixels 700a and 700c are selected. After repeating the above process for each dashed line, as shown... Figure 7C As shown, a front edge 720 of the convex hull 700 is identified. Through these two processes for identifying the side portion of the entity 510a facing the sensing camera, the orientation of the entity 510a can be determined more accurately, thereby improving the accuracy of object positioning.

[0037] After generating the front edge 720, the orientation of the entity 510a can be determined. Next, the processing device 220 further uses the rotated rectangle method to reconstruct the shape of the entity 510a. Figure 7D A candidate rectangle 730 suitable for enclosing the convex hull 700 is displayed. However, multiple candidate rectangles may exist that can fit the convex hull 700. To select the desired candidate rectangle, the processing device 220, as... Figure 7E As shown, the front edge 720 is used instead of the convex hull 700.

[0038] The front edge 720 comprises multiple pixels, and there exists a minimum distance between a specific point in the candidate rectangle 740 and each pixel of the front edge 720. For example, as Figure 7E As shown, there is a minimum distance D1 between point 750b in candidate rectangle 740 and pixel 750a of the front edge 720; similarly, there is a minimum distance D2 between point 750d in candidate rectangle 740 and pixel 750b of the front edge 720. After determining all the minimum distances between each pixel of the front edge 720 and the corresponding point of candidate rectangle 740, processing device 220 sums these minimum distances to calculate the total distance. The candidate rectangle with the smallest total distance is considered the most suitable and is selected as the best rectangle (step 408).

[0039] Figures 8A to 8C The flowchart for generating a resized rectangle 820 according to an embodiment of the present disclosure is shown. Figure 8A The convex hull 700 and the optimal rectangle 810 are shown. After the optimal rectangle 810 is selected, the processing device 220 further adjusts the size of the optimal rectangle 810 to determine the reference position of the entity 510a (steps 310 and 410).

[0040] As mentioned above, the accuracy of object positioning is affected by the object's orientation. Therefore, during the sizing process, it is necessary to determine the front side of object 510a. In this embodiment, the category CT of object 510a is determined to be a medium-sized vehicle, which means that the front side of object 510a is the shorter side. Figure 8B As shown, the optimal rectangle 810 has a short side AB and a long side BC facing the reference point PC1. Points 812 and 814 are the midpoints of the short side AB and the long side BC, respectively. Two dashed lines are extended from the reference point PC1 to the midpoints 812 and 814, respectively. In this way, one dashed line forms an angle A1 with the short side AB, and the other dashed line forms an angle A2 with the long side BC.

[0041] like Figure 8B As shown, angle A2 is greater than angle A1. This indicates that the longer side BC is more oriented towards the sensing camera than the shorter side AB. Next, as... Figure 8C As shown, based on the category CT of the optimal rectangle 810 and entity 510a, a rectangle 820 with adjusted dimensions can be obtained. In this embodiment, the longer side BC is selected as the key side of the optimal rectangle 810.

[0042] For example, entity 510a's category CT is a medium-sized vehicle, corresponding to an adjusted rectangle with predetermined dimensions. Next, the processing unit 220 compares the shorter and longer sides of the adjusted rectangle with the key sides of the optimal rectangle 810. For example... Figure 8C As shown, the longer side of the adjusted rectangle 820 is closer to the critical side (i.e., the longer side BC) of the optimal rectangle 810. In other words, compared to the shorter side of the adjusted rectangle 820, the longer side of the adjusted rectangle 820 is closer in length to the critical side (i.e., the longer side BC) of the optimal rectangle 810. Therefore, the longer side of the adjusted rectangle 820 is configured to align with the longer side BC of the optimal rectangle 810.

[0043] The adjusted rectangle 820 includes multiple coordinate sets for representing entity 510a on the top view plane. These coordinate sets (i.e., reference positions) are configured to generate the measurement position ML of entity 510a. For example, the coordinates of the center point of the adjusted rectangle 820 can be selected as the reference position. In another embodiment, the coordinates of the four corners of the adjusted rectangle 820 can be selected as the reference position. In another embodiment, the entire adjusted rectangle 820 can be selected as the reference position. That is, at least one set of coordinates included in the adjusted rectangle 820 can be selected to generate the measurement position ML of entity 510a.

[0044] The above procedure provides a method for locating objects relative to surrounding entities. Due to the self-propelled device 10 and / or surrounding entities 22, 24, 26 and 28 (such as...) Figure 1As shown, the object may be in a moving state, and its relative direction and speed are important parameters for driving safety. Therefore, this disclosure further provides a trajectory prediction method based on the aforementioned object positioning method.

[0045] Figure 9 A method 900 for tracking and predicting entity positions is shown in an embodiment of this disclosure. As the self-propelled device 10 moves, the processing device 220 measures the positions of surrounding entities at predetermined time intervals. These measured positions of entities are stored in... Figure 2 The memory 224 shown contains the historical trajectory HT for each entity. Next, in step 902, the processing device 220 generates the predicted position PL for that entity based on its respective historical trajectory HT. Simultaneously, the processing device 220 measures the current position of each entity. In step 904, the processing device 220 calculates the distance between the predicted position PL and the measured position ML.

[0046] In step 906, based on the fact that the distance exceeds a predetermined distance PD, the processing device 220 determines that the predicted position PL is unrelated to the measured position ML (step 910). Therefore, the predicted position PL will not be added to the historical trajectory HT. If the distance between the predicted position PL and the measured position ML does not exceed the predetermined distance PD, the processing device 220 determines that the predicted position PL is related to the measured position ML (step 908). Therefore, the predicted position PL will be added to the historical trajectory HT and used to generate subsequent predicted positions.

[0047] When generating the predicted position, the measured position is used as a correction to improve the accuracy of the trajectory prediction. Through method 900, the processing device 220 is able to generate a predicted position that is highly correlated with the measured position (i.e., the actual position) of the entity.

[0048] Figure 10A and Figure 10B The historical trajectory HT of entity 510a at different predicted locations is shown according to an embodiment of this disclosure. Figure 10A and Figure 10B The historical trajectory HT is the same as the measurement location ML. However, the processing device 220 respectively... Figure 10A and Figure 10B Different predicted positions PL1 and PL2 were generated. Figure 10A and Figure 10B The measured position ML in the figure represents the current position of the entity (i.e., the object position at the current timestamp), while the historical trajectory HT represents the entity's previous measured position ML. The processing device 220 generates predicted positions PL1 and PL2 based on the historical trajectory HT (step 902). Next, the processing device 220 calculates the distance D3 between the predicted position PL1 and the measured position ML, and the distance D4 between the predicted position PL2 and the measured position ML.

[0049] Assume that distance D3 is less than a predetermined distance PD, while distance D4 exceeds the predetermined distance PD. This would result in predicted position PL1 being associated with measured position ML, while predicted position PL2 would not be associated with measured position ML. Therefore, processing device 220 only adds predicted position PL1 to the historical trajectory HT.

[0050] The above embodiments describe the methods and procedures of this disclosure using a single sensing camera. However, the methods and procedures provided in this disclosure can also be implemented using multiple sensing cameras. For example, refer to... Figure 1 The sensing cameras 12 and 14 generate and output an image frame IM to the processing device 220. Then, methods 300 and 400 are executed to process each image frame IM and generate reference positions of surrounding entities.

[0051] In this embodiment, each sensing camera 12 and 14 generates a reference position for an entity. Considering potential errors in the camera pose CP of sensing cameras 12 and 14, the two reference positions may not coincide. Therefore, the processing device 220 determines whether the two reference positions meet a criterion. Specifically, the criterion includes that the two reference positions must be within a preset distance (this distance may differ from a predetermined distance PD). If the criterion is met, the processing device 220 merges the two reference positions (e.g., determining the midpoint as the reference position of the entity). If the criterion is not met, the processing device 220 executes methods 300 and 400 again to determine a new reference position for the entity.

[0052] This disclosure provides a method, procedure, and system for object localization and trajectory prediction of entities surrounding a self-propelled device. Compared to methods using bounding boxes, the method described in this disclosure improves accuracy through instance segmentation. Furthermore, compared to end-to-end deep learning methods, the method of this disclosure uses a combination of simple image processing techniques, thus reducing complexity.

[0053] While the invention has been described with reference to the above examples and preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. Rather, the invention is intended to cover various modifications and similar arrangements (as will be apparent to those skilled in the art). Therefore, the scope of the appended claims should be given the broadest interpretation to cover all such modifications and similar arrangements. The pixel with the shortest distance (step 402).

Claims

1. An object positioning system, comprising: A processing device; A sensing camera, coupled to the processing device and mounted on a self-propelled device, wherein the sensing camera is configured to generate an image frame. as well as A memory, including computer-readable program code, executed by the processing device for: Generate a mask for an entity in the image frame and use an instance segmentation model to determine the category of the entity; The mask is projected onto a top view plane of a global coordinate system associated with the self-propelled device to generate a projection mask; Identify the front edge of the projection mask on the top view plane relative to the sensing camera; Determine a reference position corresponding to the front edge, wherein the reference position includes at least one set of coordinates representing the entity on the top view plane; as well as Based on the reference location and the category of the entity, a measured location of the entity is generated on the top view plane.

2. The object positioning system of claim 1, wherein the memory stores a camera pose of the sensing camera associated with the self-propelled device, and the computer-readable program code is executed by the processing device to determine a spatial transformation from a camera coordinate system of the sensing camera to the global coordinate system based on the camera pose, and to define the top view plane based on the spatial transformation.

3. The object positioning system as claimed in claim 2, wherein the computer-readable program code is executed by the processing device for: Based on the camera's orientation, multiple projection lines are extended from a reference point on the top view plane to project the mask onto the top view plane of the global coordinate system.

4. The object positioning system of claim 3, wherein the reference point on the top view plane corresponds to a projection position of the sensing camera on the top view plane.

5. The object positioning system of claim 2, wherein the camera pose includes external and internal parameters of the sensing camera, and wherein the external parameters include height information, a horizontal position, and a direction information relative to the global coordinate system.

6. The object positioning system of claim 1, wherein the computer-readable program code is executed by the processing device to identify the front edge by extracting a boundary contour of the projection mask on the top view plane.

7. The object positioning system of claim 6, wherein the computer-readable program code is executed by the processing device for: Identify a front pixel group of the boundary contour of the projection mask, wherein the front pixel group is located on the side of the projection mask facing a reference point on the top view plane; From a group of candidate rectangles that are suitable for surrounding the front pixel group, select an optimal rectangle based on the distances between the front pixel group and the candidate rectangle group. Adjust the optimal rectangle according to the entity's category to obtain an adjusted rectangle that represents the entity on the top view plane; and The reference position is generated based on the adjusted rectangle.

8. The object positioning system of claim 7, wherein the operation of selecting the optimal rectangle from the candidate rectangle group further includes: A convex hull is generated based on the front pixel group; Identify a front edge of the convex hull, wherein the front edge is located on the side of the convex hull facing the reference point on the top view plane; as well as Based on the distance between the candidate rectangle group and the front edge, the optimal rectangle suitable for enclosing the convex hull is determined from the candidate rectangle group.

9. The object positioning system as described in claim 1, further comprising: Another sensing camera is configured to synchronously generate a first wide-angle image and a second wide-angle image, the first wide-angle image and the second wide-angle image having an overlapping field of view. The computer-readable program code is executed by the processing device to: Within the first wide-angle image and the second wide-angle image, a first mask and a second mask of the entity are generated respectively, and the instance segmentation model is used to identify a category of the entity; The first mask and the second mask are projected onto the top view plane of the global coordinate system associated with the self-propelled device to generate a first projection mask and a second projection mask. On the top view plane, a first front edge of the first projection mask is identified relative to the sensing camera, and a second front edge of the second projection mask is identified relative to the other sensing camera; Determine a first reference position corresponding to the first front edge and a second reference position corresponding to the second front edge, wherein each of the first and second reference positions includes at least one set of coordinates representing the entity in the top view plane; and When the first reference position and the second reference position satisfy a first predetermined condition, the first reference position and the second reference position are merged to generate the measured position of the entity at a current timestamp.

10. The object positioning system of claim 9, wherein the first predetermined condition includes the first reference position and the second reference position being within a first predetermined distance on the top view plane.

11. The object positioning system of claim 1, wherein the memory further stores a historical trajectory including a previous position of the entity in the top view plane, and the computer-readable program code is executed by the processing device to: A predicted location is generated on the top-view plane based on this historical trajectory; Calculate a distance between the measured location and the predicted location; and When the distance between the measured position and the predicted position satisfies a second predetermined condition, the measured position and the predicted position are linked to obtain an updated position of the entity in the top view plane.

12. The object positioning system of claim 11, wherein the second predetermined condition includes the distance between the measured position and the predicted position being within a second predetermined distance on the top view plane.

13. The object positioning system of claim 1, wherein the top view plane is defined as a ground plane of the global coordinate system.

14. The object positioning system as claimed in claim 1, wherein the sensing camera is a fisheye camera.

15. An object positioning method, executed by a processing device, the object positioning method comprising: An image frame is generated by a sensing camera configured on a self-propelled device; Generate a mask for an object in the image frame and identify the category of the object using an instance segmentation model; The mask is projected onto a top view plane of a global coordinate system associated with the self-propelled device to generate a projection mask; On the top view plane, identify a front edge of the projection mask relative to the sensing camera; Determine a reference position corresponding to the front edge, wherein the reference position includes at least one set of coordinates representing the object in the top view plane; as well as Based on the reference position and the category of the object, a measured position of the object on the top view plane is generated.

16. The object positioning method as described in claim 15, wherein the operation of projecting the mask onto the top view plane of the global coordinate system further includes: Based on a camera pose, a spatial transformation from a camera coordinate system of the sensing camera to the global coordinate system is determined, and the top view plane is defined according to the spatial transformation.

17. The object positioning method as described in claim 16, further comprising: Based on the camera pose, multiple projection lines are extended from a reference point on the top view plane to project the mask onto the top view plane of the global coordinate system.

18. The object positioning method of claim 17, wherein the reference point on the top view plane corresponds to a projection position of the sensing camera on the top view plane.

19. The object positioning method of claim 16, wherein the camera pose includes external parameters and internal parameters of the sensing camera, and wherein the external parameters include height information, a horizontal position, and a direction information of the sensing camera relative to the global coordinate system.

20. The object positioning method of claim 15, wherein the operation of identifying a front edge of the projection mask relative to the sensing camera on the top view plane further includes: The front edge is identified by extracting the boundary contour of the projection mask on the top view plane.

21. The object positioning method of claim 20, wherein the operation of determining the reference position corresponding to the front edge further comprises: Identify a front pixel group of the boundary contour of the projection mask, wherein the front pixel group is located on the side of the projection mask facing a reference point on the top view plane; From a group of candidate rectangles that are suitable for surrounding the front pixel group, select an optimal rectangle based on the distances between the front pixel group and the candidate rectangle group. The size of the optimal rectangle is adjusted based on the category of the object to obtain an adjusted rectangle used to represent the object on the top view plane; as well as The reference position is generated based on the adjusted rectangle.

22. The object positioning method of claim 21, wherein the operation of selecting the optimal rectangle from the candidate rectangle group further includes: A convex hull is generated based on the front pixel group; Identify a front edge of the convex hull, wherein the front edge is located on the side of the convex hull facing the reference point on the top view plane; as well as Based on the distances between the candidate rectangle group and the front edge, the optimal rectangle suitable for enclosing the convex hull is determined from the candidate rectangle group.

23. The object positioning method as described in claim 15, further comprising: A first wide-angle image and a second wide-angle image are generated synchronously by the sensing camera and another sensing camera, and the first wide-angle image and the second wide-angle image have an overlapping field of view; Using this instance segmentation model, a first mask and a second mask of an entity are generated in the first wide-angle image and the second wide-angle image respectively, and the category of the object is determined. The first mask and the second mask are projected onto the top view plane of the global coordinate system associated with the self-propelled device to generate a first projection mask and a second projection mask. On the top view plane, a first front edge of the first projection mask is identified relative to the sensing camera, and a second front edge of the second projection mask is identified relative to the other sensing camera; Determine a first reference position corresponding to the first front edge and a second reference position corresponding to the second front edge, wherein the first reference position and the second reference position each include at least one set of coordinates representing the top view plane; as well as When the first reference position and the second reference position satisfy a first predetermined condition, the first reference position and the second reference position are merged to generate the measured position of the entity at a current timestamp.

24. The object positioning method as claimed in claim 23, wherein the first predetermined condition includes the first reference position and the second reference position being within a first predetermined distance on the top view plane.

25. The object positioning method as described in claim 15, further comprising: Based on a historical trajectory of the object, a predicted position is generated on the top view plane; Calculate a distance between the measured location and the predicted location; as well as When the distance between the measured position and the predicted position satisfies a second predetermined condition, the measured position and the predicted position are linked to obtain an updated position of the object on the top view plane.

26. The object positioning method of claim 25, wherein the second predetermined condition includes the distance between the measured position and the predicted position being within a second predetermined distance on the top view plane.