A target detection method, device and electronic equipment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By converting fisheye images into cylindrical projection images and constructing 3D frustum point clouds, the problem of insufficient 3D mapping relationships in fisheye image target detection is solved, achieving high-precision 3D target detection and improving detection accuracy and the ability to identify distant targets.

CN122199253APending Publication Date: 2026-06-12GUANGZHOU AUTOMOBILE GROUP CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGZHOU AUTOMOBILE GROUP CO LTD
Filing Date: 2026-03-05
Publication Date: 2026-06-12

Application Information

Patent Timeline

05 Mar 2026

Application

12 Jun 2026

Publication

CN122199253A

IPC: G06T3/12; G06V10/40; G06V20/64; G06V10/25

AI Tagging

Application Domain

Geometric image transformation Three-dimensional object recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Semiconductor device
US20260170604A1Geometric image transformation Character and pattern recognition
Direction-aware dual-path offset decision and continuity recovery method and applications thereof
CN122199258AImage enhancement Geometric image transformation Industrial engineeringReliability engineering
Multimodal optical and radar image registration method, apparatus, device and storage medium
CN122199628AImage analysis Geometric image transformation
Foreign matter tracking method, and terminal therefor
WO2026121348A1Image analysis Geometric image transformation
Image data encoding / decoding method and apparatus
US20260172693A1Geometric image transformation Digital video signal modification

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122199253A_ABST

Patent Text Reader

Abstract

The application discloses a target detection method and device and electronic equipment. The method comprises the following steps: converting an original fisheye image into a target cylindrical projection image; performing feature extraction on the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image; converting the target feature map to a three-dimensional camera coordinate system based on a first mapping relationship to obtain a target point cloud, wherein the first mapping relationship is used to represent the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system; projecting the target point cloud to a bird's eye plane to obtain a bird's eye feature map; and detecting a three-dimensional target based on the bird's eye feature map to obtain a three-dimensional target detection result, thereby improving the accuracy of three-dimensional target detection based on the fisheye image.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing, and more specifically, to a target detection method, apparatus, and electronic device. Background Technology

[0002] Fisheye cameras have gained widespread application due to their ultra-wide field of view (typically up to 180 degrees) and ability to cover a panoramic environment with a single lens. For example, they are used in fields such as autonomous driving environmental perception, intelligent robot navigation, and security monitoring to achieve wide-angle 3D perception. However, fisheye lenses introduce strong radial distortion, causing objects in the image to appear distorted, with the distortion becoming more severe the closer the object is to the edge of the field of view. Therefore, accurately detecting targets from images acquired by fisheye cameras has become a significant technical challenge. Summary of the Invention

[0003] In view of this, embodiments of this application propose a target detection method, apparatus, and electronic device to improve the above-mentioned problems.

[0004] In a first aspect, embodiments of this application provide a target detection method, the method comprising: converting an original fisheye image into a target cylindrical projection image; extracting features from the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image; converting the target feature map to a three-dimensional camera coordinate system based on a first mapping relationship to obtain a target point cloud, wherein the first mapping relationship is used to characterize the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system; projecting the target point cloud onto a bird's-eye view plane to obtain a bird's-eye view feature map; and detecting a three-dimensional target based on the bird's-eye view feature map to obtain a three-dimensional target detection result.

[0005] Secondly, embodiments of this application provide a target detection device, comprising: a cylindrical projection image generation module, a cylindrical projection image feature extraction module, a frustum construction module, a viewpoint conversion module, and a target detection module. The cylindrical projection image generation module converts an original fisheye image into a target cylindrical projection image; the cylindrical projection image feature extraction module extracts features from the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image; the frustum construction module converts the target feature map to a three-dimensional camera coordinate system based on a first mapping relationship to obtain a target point cloud, wherein the first mapping relationship characterizes the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system; the viewpoint conversion module projects the target point cloud onto a bird's-eye view plane to obtain a bird's-eye view feature map; and the target detection module detects three-dimensional targets based on the bird's-eye view feature map to obtain a three-dimensional target detection result.

[0006] Thirdly, embodiments of this application provide an electronic device, including a memory and a processor, wherein the memory is coupled to the processor, the memory stores instructions, and when the instructions are executed by the processor, the processor executes the target detection method provided in the first aspect above.

[0007] Fourthly, embodiments of this application provide a vehicle that includes the electronic equipment provided in the third aspect above.

[0008] Fifthly, embodiments of this application provide a computer-readable storage medium storing program code, which can be invoked by a processor to execute the target detection method provided in the first aspect above.

[0009] In this application, the original fisheye image is converted into a target cylindrical projection image; features are extracted from the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image; the target feature map is converted to the three-dimensional camera coordinate system based on a first mapping relationship that characterizes the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system to obtain a target point cloud; the target point cloud is projected onto a bird's-eye view plane to obtain a bird's-eye view feature map; and three-dimensional targets are detected based on the bird's-eye view feature map to obtain three-dimensional target detection results. Thus, the fisheye image is converted into a cylindrical projection image, and the cylindrical projection preserves the panoramic information of the image. Based on the first mapping relationship between the camera coordinate system and the coordinates of the cylindrical projection image, a three-dimensional frustum point cloud is constructed to achieve accurate restoration of the cylindrical image to three-dimensional space. Then, the bird's-eye view is projected to obtain three-dimensional target detection results based on the bird's-eye view features. This reduces the error of multiple coordinate transformations and improves the accuracy of three-dimensional target detection on fisheye images. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 A schematic flowchart of a target detection method provided in an embodiment of this application is shown; Figure 2 A schematic flowchart of a target detection method provided in an embodiment of this application is shown; Figure 3 A schematic diagram of the mapping relationship construction process provided in an embodiment of this application is shown; Figure 4 A schematic flowchart of a target detection method provided in an embodiment of this application is shown; Figure 5 A block diagram of a target detection device according to an embodiment of this application is shown; Figure 6 A block diagram of an electronic device for performing a target detection method according to an embodiment of the present application is shown. Detailed Implementation

[0012] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

[0013] To better understand the solutions of the embodiments of this application, the technical terms used in the embodiments of this application will be explained below.

[0014] Bird's-eye view (BEV) maps information from cameras, LiDAR, millimeter-wave radar, or maps onto a single plane centered on the vehicle or based on world coordinates. This allows the autonomous driving system to see, as if from the air, the positions of all objects around the vehicle, lane lines, and the distribution of static obstacles and dynamic traffic participants. BEV transforms three-dimensional perception problems into two-dimensional spatial reasoning problems, facilitating the coupling of perception, prediction, and planning, thereby improving the safety of autonomous driving.

[0015] Horizontal Field of View (HFOV) describes the angle of view that a camera can capture in the horizontal direction.

[0016] The fisheye model, Kannala-Brandt (KB) model, is a universal geometric model widely used in fisheye lenses and ultra-wide-angle cameras. The model can uniformly handle fisheye lenses with a field of view of more than 180°, ranging from traditional pinhole cameras.

[0017] The implementation details of the technical solutions in the embodiments of this application are described in detail below: In the fields of computer vision and intelligent sensing technology, the application of fisheye images for three-dimensional (3D) target detection is becoming increasingly widespread. For example, in scenarios such as autonomous driving environment perception, intelligent robot navigation, and security monitoring, fisheye cameras are often needed to achieve wide-angle three-dimensional environment perception.

[0018] Currently, fisheye image target detection technologies mainly include the following methods: traditional perspective projection combined with 2D detection, isometric cylindrical projection combined with 2D detection, polar coordinate bird's-eye view (BEV) combined with 3D detection, and 3D detection based on spherical projection. Among them: 1. Traditional perspective projection + 2D detection method: This method "flattens" the fisheye image into a regular viewpoint image through perspective transformation, and then uses a 2D model (such as YOLO) for target detection. However, this method suffers from severe stretching (stretching rate >30%) in the edge region of the fisheye image, resulting in the loss of effective features. Furthermore, it does not establish a 3D coordinate mapping relationship and cannot output information such as the depth and height of the target, making it difficult to meet the requirements of 3D perception.

[0019] 2. Isometric Cylindrical Projection + 2D Detection Method: This method unfolds a fisheye image using isometric cylindrical projection, establishes a 2D coordinate transformation relationship between the unfolded image and the fisheye image, and then performs target recognition. However, the projection formula of this method is not related to the field of view of the fisheye camera, edge distortion is not optimized, and it can only output 2D detection boxes, failing to provide 3D spatial position information. Therefore, it is not suitable for scenarios requiring 3D perception, such as autonomous driving.

[0020] 3. Polar coordinate BEV+3D detection method: This method converts cylindrical projection images into polar coordinate BEV features and achieves 3D detection through center point regression. However, this method is prone to accumulating errors during multiple conversions between polar and Cartesian coordinates (error accumulation rate > 20%), and it does not pre-establish an accurate mapping table. The generation of cylindrical images relies on random cropping, resulting in low utilization of the original fisheye distortion information and poor accuracy in detecting distant targets.

[0021] 4. 3D detection method based on spherical projection: This method projects fisheye images onto a sphere for processing. While it preserves image information to some extent, the conversion between spherical coordinates and 3D spatial coordinates is complex, resulting in low computational efficiency, and there is still room for improvement in the accuracy of target localization.

[0022] For example, related technologies have proposed obtaining the unfolded image of a fisheye image through equidistant cylindrical projection and establishing coordinate transformation relationships; then using a pre-trained recognition model to detect the unfolded image, and finally mapping the coordinates of the recognition box back to the fisheye image. However, this scheme is only applicable to 2D object detection, does not involve accurate transformation of 3D spatial coordinates, and cannot provide depth information; moreover, equidistant cylindrical projection still has significant distortion in the edge region, and lacks an effective adaptation mechanism with 3D detection models.

[0023] In addition, a 3D object detection method based on cylindrical projection and polar coordinates has been proposed in related technologies. First, a planar view is generated through cylindrical projection, and the image is preprocessed and randomly cropped. Then, multi-scale features are extracted and fused, and features are output through viewpoint transformation and a Feature Pyramid Network (FPN). Next, polar coordinate transformation is performed using center point regression, and the loss is calculated. Finally, the center point, width, and height information are converted into predicted bounding boxes. However, this scheme uses polar coordinate BEV representation and center point regression, which has limited accuracy in detecting distant targets. Furthermore, the multiple transformations between polar and Cartesian coordinates accumulate errors, affecting 3D localization accuracy.

[0024] In summary, existing fisheye image target detection schemes suffer from limitations such as 2D detection, isometric projection distortion, lack of 3D mapping relationships, and polar coordinate transformation errors. Therefore, improving the accuracy of 3D target detection in fisheye images remains a pressing issue that needs to be addressed.

[0025] To address the aforementioned problems, the inventors, through extensive research, have developed the target detection method, apparatus, and electronic device provided in this application. By converting fisheye images into cylindrical projection images, the cylindrical projection preserves the panoramic information of the image. Based on a first mapping relationship between the camera coordinate system and the cylindrical projection image coordinates, a three-dimensional frustum point cloud is constructed, achieving accurate reconstruction from the cylindrical image to three-dimensional space. Then, a bird's-eye view projection is performed to obtain the three-dimensional target detection result based on the bird's-eye view features. This reduces the error from multiple coordinate transformations and improves the accuracy of three-dimensional target detection on fisheye images. The specific target detection method will be described in detail in subsequent embodiments.

[0026] The embodiments involved in this application will now be described with reference to the accompanying drawings.

[0027] Please see Figure 1 , Figure 1 A schematic flowchart of a target detection method according to an embodiment of this application is shown. In a specific embodiment, this target detection method can be applied to, for example... Figure 5 The target detection device 200 and the electronic device 100 equipped with the target detection device 200 are shown. Figure 6 The following will use an electronic device as an example to illustrate the specific process of this embodiment. Of course, it is understood that the electronic device used in this embodiment may include vehicles, in-vehicle terminals, computers, etc., and is not limited thereto. The following will focus on... Figure 1 The process shown will be described in detail. The target detection method may specifically include the following steps: Step S110: Convert the original fisheye image into a target cylindrical projection image.

[0028] In some implementations, the electronic device may include a fisheye camera, and may acquire images captured by the fisheye camera as raw fisheye images, and may perform 3D object detection on the raw fisheye images. Optionally, the electronic device may also be communicatively connected to the fisheye camera, and may acquire images captured by the fisheye camera as raw fisheye images, and may perform 3D object detection on the raw fisheye images.

[0029] For example, the electronic device can be a vehicle, which may be equipped with one or more fisheye cameras, such as one fisheye camera installed at the front license plate, rear license plate, left rearview mirror, and right rearview mirror of the vehicle; based on this, the vehicle can acquire the original fisheye images collected by the four surround-view fisheye cameras, and perform three-dimensional target detection on the original fisheye images collected by each fisheye camera.

[0030] In some implementations, after acquiring the original fisheye image, the electronic device can convert the original fisheye image into a target cylindrical projection image. For example, the electronic device is a vehicle, which can acquire original fisheye images from four surround views. The vehicle can then convert each of the four original surround view images into a target cylindrical projection image, thus obtaining the target cylindrical projection image corresponding to each of the four surround views.

[0031] The electronic device may have a pre-set correspondence between the pixel coordinates of the cylindrical image and the pixel coordinates of the fisheye image, such as a mapping table between the pixel coordinates of the cylindrical image and the pixel coordinates of the fisheye image (which can be understood as a first mapping table in this embodiment); the electronic device may convert the original fisheye image into a target cylindrical projection image based on the first mapping table.

[0032] As one feasible approach, the electronic device can initialize a blank cylindrical projection image (e.g., with dimensions width × height and pixel values initialized to 0); wherein the electronic device can traverse the initialized cylindrical projection image pixel by pixel, for each cylindrical pixel (u) of the cylindrical projection image c v c The pixel coordinates (u) of the corresponding fisheye image can be read from the first mapping table. f v f ).

[0033] Among them, if the pixel coordinates (u) of the fisheye image are determined based on the cylindrical pixels of the initialized blank cylindrical projection image. f v f Within the effective range of the original fisheye image (0≤u) f Width of the original fisheye image, 0 ≤ v f <height of the original fisheye image), the electronic device can then use bilinear interpolation to extract the height of the original fisheye image in (u f v f The RGB pixel value at position (u) is assigned to the initial blank cylindrical projection image. c v c ). Among them, if the pixel coordinates (u) of the fisheye image are determined based on the cylindrical pixels of the initialized blank cylindrical projection image. f v fIf the cylindrical pixels of the initialized blank cylindrical projection image are outside the effective range of the original fisheye image, it can be determined that they are outside the field of view of the original fisheye image. The electronic device can then use black values to fill in the (u) of the initialized blank cylindrical projection image. c v c This allows us to obtain a projection image of the target cylindrical surface.

[0034] It is understood that, in this embodiment, the electronic device performs a bidirectional mapping between the coordinates of the cylindrical image and the coordinates of the fisheye image according to the first mapping table. The fisheye image is converted into an initialized blank cylindrical image through the first mapping table to obtain the target cylindrical projection image, which reduces the information loss in the feature projection process. Moreover, for the edge area of the fisheye image, the panoramic information of the fisheye image is preserved, which improves the accuracy of three-dimensional target detection of the fisheye image.

[0035] Step S120: Extract features from the target cylindrical projection image to obtain the target feature map corresponding to the target cylindrical projection image.

[0036] In some implementations, after obtaining the target cylindrical projection image corresponding to the original fisheye image, the electronic device can perform feature extraction on the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image. The electronic device may have pre-set feature extraction algorithms, such as SIFT, Transformer, CNN, color feature extraction, edge feature extraction, and texture feature extraction algorithms. Specifically, the electronic device can use these feature extraction algorithms to extract features from the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image (e.g., a multi-scale feature map with 256 channels and a resolution of 1 / 16th that of the target cylindrical projection image).

[0037] For example, the feature extraction algorithm can be the BEV LSS model, which uses EfficientNet-B4 as the backbone network and extracts multi-scale features through 5 downsampling layers: Layers 1-2 (downsampling by 4 times): can extract low-level texture features (such as target edges, contours, etc.) for small target detection; Layers 3-4 (downsampling by 8 times): can extract mid-level semantic features (such as target local structure, etc.) for preliminary target category differentiation; Layer 5 (downsampling by 16 times): can fuse multi-scale features through spatial pyramid pooling (SPP) to output a target feature map with 256 channels, balancing image accuracy and computational efficiency in image processing.

[0038] Step S130: Based on the first mapping relationship, the target feature map is transformed to the three-dimensional camera coordinate system to obtain the target point cloud, wherein the first mapping relationship is used to characterize the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system.

[0039] In some implementations, the electronic device may pre-store a first mapping relationship. This first mapping relationship can be obtained based on the cylindrical projection parameters and the intrinsic parameters of the fisheye camera corresponding to the original fisheye image. It can be used to characterize the mapping or transformation relationship between the coordinates of the cylindrical projection image and the coordinates in the 3D camera coordinate system. Specifically, after acquiring the target feature map corresponding to the target cylindrical projection image, the electronic device can transform the target feature map to the 3D camera coordinate system based on the first mapping relationship to obtain the target point cloud. This associates 2D features (e.g., 256 channels) with 3D points, forming a 3D frustum point cloud with "feature-coordinate" binding.

[0040] For example, the first mapping relationship is used to characterize the mapping relationship between the pixel coordinates of the cylindrical image and the coordinates in the three-dimensional camera coordinate system, such as a mapping table between the pixel coordinates of the cylindrical image and the coordinates in the three-dimensional camera coordinate system (which can be understood as the second mapping table in this embodiment); wherein, the electronic device can obtain the 3D point (x, y, z) in the three-dimensional camera coordinate system corresponding to each pixel in the target feature map based on the second mapping table, thereby obtaining the target point cloud.

[0041] Optionally, the electronic device can transform the target point cloud in the 3D camera coordinate system to the world coordinate system to obtain the target point cloud in the world coordinate system. Specifically, the electronic device can convert the 3D point cloud in the 3D camera coordinate system to a point cloud in the world coordinate system based on cylindrical projection extrinsic parameters to obtain the target point cloud. The cylindrical projection extrinsic parameters can be determined based on the camera extrinsic parameters corresponding to the original fisheye image. For example, the electronic device is a vehicle. The cylindrical projection extrinsic parameters can include a translation vector Tc and a rotation matrix Rc. The translation vector Rc can use the offset Tf of the original fisheye camera extrinsic parameters, i.e., Tc = Tf (e.g., a 3×1 translation vector [T_x, T_y, T_z]^T, etc.), thereby ensuring that the cylindrical projection and the original fisheye image are consistent in position in the vehicle coordinate system. The rotation matrix Rc can be set to a fixed orientation angle (in radians) according to the fisheye camera's viewpoint type, which can be the angle of rotation around the y-axis of the vehicle coordinate system. For example, if the electronic device is a vehicle, the rotation matrix is a 3×3 rotation matrix. It can be a forward-looking camera with a facing angle of 0 rad (fisheye camera horizontally facing forward), Rc = [[0,0,1], [-1,0,0], [0,-1,0]]; a left-looking camera with a facing angle of π / 2 rad (fisheye camera horizontally facing left), Rc = [[1,0,0], [0,0,1], [0,-1,0]]; a right-looking camera with a facing angle of -π / 2 rad (fisheye camera horizontally facing right), Rc = [[-1,0,0], [0,0,-1], [0,-1,0]]; and a rear-looking camera with a facing angle of π rad (fisheye camera horizontally facing rear), Rc = [[0,0,-1], [1,0,0], [0,-1,0]].

[0042] Step S140: Project the target point cloud onto the bird's-eye view plane to obtain a bird's-eye view feature map.

[0043] In some implementations, after obtaining the target point cloud, the electronic device can project the target point cloud onto a bird's-eye view plane to obtain a bird's-eye view feature map. Optionally, the electronic device can define the XZ plane in the 3D camera coordinate system as the bird's-eye view plane; wherein, the bird's-eye view plane can be divided into BEV space according to a grid of 0.1m × 0.1m; wherein, the electronic device can project the target point cloud onto this bird's-eye view plane to obtain a bird's-eye view feature map. The electronic device can perform "Splat" summation (fusion of multi-pixel features within the same grid) on the 3D frustum point cloud features within each BEV grid to generate a BEV feature map with a resolution of 1000 × 1000 (e.g., covering the 100m × 100m detection range of a fisheye camera). Optionally, the electronic device can also define the XZ plane in the world coordinate system as the bird's-eye view plane (BEV plane, i.e., ground top view) and project the target point cloud onto this bird's-eye view plane to obtain a bird's-eye view feature map.

[0044] Step S150: Detect three-dimensional targets based on the bird's-eye view feature map and obtain three-dimensional target detection results.

[0045] In some implementations, after obtaining the bird's-eye view feature map corresponding to the original fisheye image, the electronic device can detect 3D targets based on the bird's-eye view feature map to obtain 3D target detection results. Specifically, the electronic device may have a pre-set target detection algorithm, which can be used to perform 3D target detection on the bird's-eye view feature map to obtain 3D target detection results. These 3D target detection results include, but are not limited to, the target's corresponding 3D bounding box, size, category, and confidence level.

[0046] For example, an electronic device can pre-configure a BEV LSS model, which can be used for 3D object detection. The detection head of the BEV LSS model can be a CenterPoint architecture, used for 3D object detection, thus combining cylindrical projection with BEV perception to achieve accurate detection of 3D objects in fisheye images. The electronic device can input a bird's-eye view feature map into the BEV LSS model and predict object parameters on the BEV feature map based on the detection head of the BEV LSS model, such as the 3D bounding box of the object in world coordinates (center coordinates (X...). obj Y obj Z obj ), Dimensions L (length) × W (width) × H (height), Heading angle θ obj Category (e.g., pedestrians, vehicles, obstacles, etc.), confidence level (0~1).

[0047] In some implementations, after obtaining the 3D target detection results corresponding to the original fisheye image, the electronic device can perform environmental perception, path planning, and security monitoring based on the 3D target detection results. Thus, according to the first and second mapping tables, the cylindrical projection method is combined with BEV perception. Unlike traditional isometric or isometric cylindrical projection, this method establishes a precise bidirectional mapping between the camera coordinate system and the cylindrical projection image coordinate system. It utilizes the panoramic information preserved by the cylindrical projection to optimize the 3D view frustum construction of the LSS model. The precise coordinate mapping relationship reduces information loss during feature projection, overcomes the problem of coordinate transformation error accumulation in the polar coordinate BEV method, and improves the detection accuracy of distant targets.

[0048] An embodiment of this application provides a target detection method that converts an original fisheye image into a cylindrical projection image of a target; extracts features from the cylindrical projection image to obtain a target feature map corresponding to the cylindrical projection image; transforms the target feature map to the 3D camera coordinate system based on a first mapping relationship that characterizes the coordinates of the cylindrical projection image and the coordinates in the 3D camera coordinate system to obtain a target point cloud; projects the target point cloud onto a bird's-eye view plane to obtain a bird's-eye view feature map; and detects 3D targets based on the bird's-eye view feature map to obtain 3D target detection results. This method converts the fisheye image into a cylindrical projection image, retains the panoramic information of the image using cylindrical projection, and constructs a 3D frustum point cloud based on the first mapping relationship between the camera coordinate system and the coordinates of the cylindrical projection image, achieving accurate restoration of the cylindrical image to 3D space. Then, a bird's-eye view projection is performed to obtain 3D target detection results based on the bird's-eye view features. This reduces the error from multiple coordinate transformations and improves the accuracy of 3D target detection on fisheye images.

[0049] Please see Figure 2 , Figure 2 A schematic flowchart of a target detection method according to an embodiment of this application is shown. This method is applied to the aforementioned electronic device, and will be discussed below. Figure 2 The process shown will be described in detail. The target detection method may specifically include the following steps: Step S201: Based on the first mapping relationship and the second mapping relationship, convert the original fisheye image into a target cylindrical projection image, wherein the second mapping relationship is used to characterize the mapping relationship between the coordinates in the three-dimensional camera coordinate system and the coordinates of the fisheye image.

[0050] In some implementations, the electronic device may pre-store a first mapping relationship and a second mapping relationship. Optionally, the electronic device may also determine the first mapping relationship and the second mapping relationship before converting the original fisheye image into a target cylindrical projection image.

[0051] In some implementations, a first mapping relationship may be pre-set in the electronic device; wherein the first mapping relationship may be obtained based on cylindrical projection parameters and the intrinsic parameters of the fisheye camera corresponding to the original fisheye image. The cylindrical projection parameters include, but are not limited to, cylinder width, height, and horizontal field of view (HFOV) in rad. The intrinsic parameters of the fisheye camera corresponding to the original fisheye image include, but are not limited to, the KB model distortion coefficients k1~k4 and the lens horizontal focal length fx. f Lens vertical focal length fy f The horizontal coordinate of the principal point is cx f and the vertical coordinate cy of the principal point f wait.

[0052] Specifically, the electronic device can obtain the cylindrical projection intrinsic parameters based on the cylindrical projection parameters, and can obtain a first mapping relationship based on the cylindrical projection intrinsic parameters and the cylindrical projection reference radius. It can also convert the initialized cylindrical image to a 3D camera coordinate system based on the first mapping relationship to obtain the camera coordinate system coordinates. Furthermore, the electronic device can determine the horizontal and vertical focal lengths of the cylinder based on the cylinder image width and the horizontal field of view; it can also determine the abscissa of the principal point of the cylinder image based on the cylinder image width; it can also determine the ordinate of the principal point of the cylinder image based on the cylinder image height; and it can obtain the cylindrical projection intrinsic parameters based on the horizontal and vertical focal lengths, the abscissa of the principal point of the cylinder image, and the ordinate of the principal point of the cylinder image.

[0053] For example, the horizontal focal length fx of the cylindrical surface c =Cylindrical vertical focal length fy c =Cylinder plot width / Horizontal field of view (HFOV) of the cylinder plot, thus ensuring that the horizontal focal length of the cylinder is associated with the field of view and pixel range, and that the horizontal focal length is consistent with the vertical focal length of the cylinder, guaranteeing that the horizontal and vertical proportions of the cylinder projection are consistent. For example, the horizontal coordinate cx of the principal point of the cylinder image... c =Cylinder plot width / 2, y-coordinate of principal point in cylinder plot c = height / 2 of the cylinder plot, thus centering the principal point of the cylinder plot.

[0054] In some implementations, the electronic device may pre-store a cylindrical projection reference radius; wherein, the cylindrical projection reference radius can characterize a fixed distance in the positive z-axis direction of the three-dimensional camera coordinate system to ensure the uniqueness of the coordinate mapping. The cylindrical projection reference radius can be set by the user or obtained from third-party experimental data, and is not limited here; for example, the cylindrical projection reference radius Radius is set by the user to 1m.

[0055] After obtaining the cylindrical projection intrinsic parameters and the cylindrical projection reference radius, the electronic device can determine the first mapping relationship based on the cylindrical projection intrinsic parameters and the cylindrical projection reference radius, and can convert the initialized cylindrical image to the three-dimensional camera coordinate system based on the first mapping relationship to obtain the camera coordinate system coordinates.

[0056] For example, an electronic device can iterate through each pixel coordinate (u) of an initialized blank cylindrical image. c v c )(u c ∈[0, width-1], v c ∈[0, height-1]), and can be based on the cylindrical projection intrinsic parameters (e.g., the x-coordinate of the principal point of the cylindrical image cx). c Horizontal focal length of the cylindrical surface fx cThe ordinate of the principal point of the cylindrical image is cy. c And the vertical focal length fy of the cylinder c ) and the cylindrical projection reference radius Radius, to determine the pixel coordinates (u c v c The coordinates are transformed to 3D points (x, y, z) in the 3D camera coordinate system. The first mapping relationship may include: x = Radius × sin((u c - cx c ) / fx c ), y = Radius × ((v c - cy c ) / fy c ), z = Radius × cos((u c - cx c ) / fx c ).

[0057] In some implementations, the electronic device converts the initial blank cylindrical image to a three-dimensional camera coordinate system. After obtaining the camera coordinate system coordinates, it can determine the distortion radius of the original fisheye image and the target polar angle of the initial blank cylindrical image based on the camera coordinate system coordinates and the distortion parameters of the fisheye model corresponding to the original fisheye image.

[0058] For example, camera coordinates (x, y, z) can be converted to pixel coordinates (u, z) of the original fisheye image using a fisheye model. f v f Among these, the electronic device can determine the angle θ between the ray and the optical axis when there is no distortion, based on the camera coordinate system coordinates (x, y, z). ideal = atan2 (sqrt (x 2 + y 2 ), z), are the incident angles after offline distortion. The electronic device can determine the incident angle based on the distortion coefficients k1~k4 of the fisheye model and the ideal incident angle θ. ideal Determine the incident angle θ after distortion. distort =θ ideal × (1 + k1×θ ideal 2 + k2×θ ideal 4 + k3×θ ideal 6 + k4×θ ideal 8 The electronic device can also be based on the horizontal focal length of the fisheye camera corresponding to the original fisheye image and the distorted incident angle θ. distortDetermine the distortion radius r corresponding to the original fisheye image. distort =θ distort Furthermore, the electronic device can also obtain the angle φ = atan2(y,x) between the projection of the 3D point onto the xy plane and the x-axis, based on the camera coordinate system coordinates (x, y, z), and use it as the target polar angle.

[0059] In some implementations, after the electronic device obtains the distortion radius corresponding to the original fisheye image, the target polar angle corresponding to the initialized blank cylindrical image, and the fisheye camera intrinsic parameters corresponding to the original fisheye image, it can obtain a second mapping relationship based on the fisheye camera intrinsic parameters, distortion radius, and target polar angle used to acquire the original fisheye image. This second mapping relationship can be used to characterize the mapping relationship between coordinates in the three-dimensional camera coordinate system and the coordinates of the fisheye image.

[0060] For example, the camera coordinate system coordinates (x, y, z) and the original fisheye image coordinates (u, z) are compared. f v f The transformation relationship is: u f =cx f +fx f ×r distort ×cosφ;v f =cy f +fy f ×r distort × sinφ. Where, cx f The horizontal coordinates of the optical center in the intrinsic parameters of a fisheye camera, cy, can be used to characterize these coordinates. f It can characterize the vertical coordinates of the optical center in the intrinsic parameters of a fisheye camera. Where, fx f It can characterize the horizontal focal length in the intrinsic parameters of a fisheye camera, fy f It can characterize the vertical focal length in the intrinsic parameters of a fisheye camera. Where, r distort It can characterize the distortion radius, and φ can characterize the target polar angle.

[0061] In some implementations, after determining the first and second mapping relationships, the electronic device can convert the original fisheye image into a target cylindrical projection image based on these relationships. It is understood that the first mapping relationship can characterize the transformation relationship between the cylindrical projection image coordinates and the camera coordinate system coordinates, and the second mapping relationship can characterize the transformation relationship between the camera coordinate system coordinates and the original fisheye image coordinates. Based on this, the electronic device can obtain the transformation relationship between the cylindrical projection image coordinates and the original fisheye image coordinates based on the first and second mapping relationships, as a third mapping relationship. The electronic device can then convert the original fisheye image into the target cylindrical projection image based on this third mapping relationship.

[0062] For example, the electronic device can map the cylindrical projection image coordinates (u) according to the first mapping relationship. c v c )(u c ∈[0,width-1],v c Transform between points (x, y, z) in the camera coordinate system (∈[0, height-1]) and 3D points in the camera coordinate system: x = Radius × sin((u c - cx c ) / fx c ) y = Radius × ((v c - cy c ) / fy c ) z = Radius × cos((u c - cx c ) / fx c ) In this context, the electronic device can also map the 3D points (x, y, z) of the camera coordinate system to the pixel coordinates (u, z) of the original fisheye image according to the second mapping relationship. f v f Mutual conversion: cx f + fx f × r distort × cosφ;v f = cy f + fy f × r distort × sinφ φ = atan2(y, x) r distort =θ distort θ distort = θ ideal × (1 + k1×θ ideal 2 + k2×θ ideal 4 + k3×θ ideal 6 + k4×θ ideal 8 ) θ ideal = atan2 (sqrt (x 2 + y 2 ), z) Specifically, the electronic device can construct a mapping table (Map) for converting between cylindrical images and fisheye images based on the first mapping relationship and the second mapping relationship. c2f Map c2f [(u c v c )]=(u f v f The electronic device can also construct a mapping table (Map) for mutual conversion between cylindrical image coordinates and camera coordinates based on the first mapping relationship. c2cam Map c2cam [(u c v c Based on this, the electronic device can initialize a blank cylindrical projection image (e.g., with dimensions width × height and pixel values initialized to 0), and after acquiring the original fisheye image, it can iterate through and initialize the blank cylindrical projection image pixel by pixel, for each cylindrical pixel (u c v c ), from Map c2f Read the corresponding raw fisheye pixels (u) f v f ). Among them, if (u f , v f Within the effective range of the original fisheye image (0≤u) f < Fisheye image width, 0≤v f < Fisheye image height), the original fisheye image was extracted using bilinear interpolation at (u f v f The RGB pixel value at position (u) is assigned to (u). c v c ), if (u f v f If the image is outside the range (outside the fisheye view), black values are used to fill in the gaps, thus converting the original fisheye image into a target cylindrical projection image.

[0063] For example, please refer to Figure 3 This illustration shows a flowchart of the mapping relationship construction provided in an embodiment of this application. The electronic device can calculate basic parameters (cylindrical projection intrinsic parameters) based on the cylindrical projection parameters. For example, the cylindrical projection intrinsic parameters are determined based on the cylindrical projection parameters: the abscissa cx of the principal point of the cylindrical image. c =Cylinder plot width / 2, y-coordinate of principal point in cylinder plot c=Cylinder height / 2, thus centering the principal point of the cylinder image. The electronic device can construct a transformation relationship (first mapping relationship) between the coordinates of the cylinder projection image and the camera coordinate system based on the cylinder projection intrinsic parameters and the cylinder projection reference radius. It can also determine the distortion radius of the original fisheye image and the target polar angle of the cylinder projection image based on the camera coordinate system coordinates and the distortion parameters of the fisheye model corresponding to the original fisheye image. Furthermore, it can obtain a transformation relationship (second mapping relationship) between the camera coordinate system coordinates and the coordinates of the original fisheye image based on the fisheye camera intrinsic parameters, distortion radius, and target polar angle. Finally, it can construct a mapping table (Map) for mutual conversion between cylinder images and fisheye images based on the first and second mapping relationships. c2f And a mapping table (Map) for converting between cylindrical image coordinates and camera coordinates. c2cam Furthermore, the electronic device can construct cylindrical projection extrinsic parameters based on the fisheye camera extrinsic parameters corresponding to the original fisheye image. Based on a custom cylindrical projection formula and the Kannala-Brandt (KB) fisheye model, a mapping table is generated for "cylindrical projection image → camera coordinate system → original fisheye image." This mapping table is then used to resample the original fisheye image to generate a cylindrical projection image. Subsequently, using the cylindrical projection image as input, the BEV LSS model is employed to complete feature extraction, viewpoint transformation, and 3D object detection. The detection results are then visualized and projected onto the cylindrical projection image. This approach is suitable for scenarios requiring wide-view 3D environmental perception, such as autonomous driving and robot navigation, improving the accuracy of 3D object detection in fisheye images, enhancing the detection capabilities for edge regions and distant objects, and maintaining computational efficiency. Furthermore, in this embodiment, the LSS model can effectively utilize the panoramic information preserved by cylindrical projection, and can accurately estimate the depth by referencing the projection radius parameter, thus solving the problem of feature dilution of distant targets in the polar coordinate BEV method. In addition, the custom cylindrical projection maintains a uniform angular resolution in the horizontal direction, so that distant targets can still maintain identifiable feature intensity in the BEV feature map, thereby enhancing the edge and distant target detection capabilities.

[0064] Step S202: Extract features from the target cylindrical projection image using a feature extraction algorithm to obtain a multi-scale feature map corresponding to the target cylindrical projection image, which is used as the target feature map. The feature extraction algorithm includes at least one of an image texture feature extraction algorithm and an image semantic feature extraction algorithm.

[0065] In some implementations, the electronic device may have a pre-configured feature extraction algorithm, which may include at least one of an image texture feature extraction algorithm and an image semantic feature extraction algorithm. For example, the feature extraction algorithm may include the BEV LSS model, which can be used for low-order texture feature extraction and mid-order semantic feature extraction, and can fuse multi-scale features through spatial pyramid pooling (SPP) to output a feature map with 256 channels. The electronic device can use the feature extraction algorithm to extract features from the target cylindrical projection image to obtain a multi-scale feature map corresponding to the target cylindrical projection image, which serves as the target feature map.

[0066] Step S203: Based on the first mapping relationship, the target feature map is transformed into the three-dimensional camera coordinate system to obtain the first point cloud.

[0067] In some implementations, after obtaining the target feature map corresponding to the target cylindrical projection image, the electronic device can transform the target feature map into a three-dimensional camera coordinate system based on a first mapping relationship to obtain a first point cloud. The electronic device can achieve this by using the first mapping relationship Map. c2cam Obtain the 3D point (x, y, z) in the camera coordinate system corresponding to each pixel of the target feature map, and then associate the 2D features (e.g., 256 channels) with the 3D points to form a 3D view frustum cloud with "feature-target" binding.

[0068] In some implementations, before the electronic device transforms the target feature map into the three-dimensional camera coordinate system based on the first mapping relationship and obtains the first point cloud, it can obtain the cylindrical projection intrinsic parameters based on the cylindrical projection parameters; and obtain the first mapping relationship based on the cylindrical projection intrinsic parameters and the cylindrical projection reference radius, and then transform the coordinates of the target feature map into the three-dimensional camera coordinate system based on the first mapping relationship to obtain the first point cloud.

[0069] The cylindrical projection intrinsic parameters may include the horizontal focal length of the cylinder, the vertical focal length of the cylinder, the abscissa of the principal point of the cylindrical image, and the ordinate of the principal point of the cylindrical image. The electronic device can obtain the horizontal and depth coordinates of the first point cloud in the 3D camera coordinate system based on the cylindrical projection reference radius, the width coordinates of the target feature map, the abscissa of the principal point of the cylindrical image, the horizontal focal length of the cylinder, and the first mapping relationship; and obtain the vertical coordinates of the first point cloud in the 3D camera coordinate system based on the cylindrical projection reference radius, the height coordinates of the target feature map, the ordinate of the principal point of the cylindrical image, the vertical focal length of the cylinder, and the first mapping relationship.

[0070] Step S204: Convert the first point cloud into a point cloud in the world coordinate system based on the cylindrical projection extrinsic parameters to obtain the target point cloud, wherein the cylindrical projection extrinsic parameters are obtained based on the fisheye camera extrinsic parameters used to acquire the original fisheye image.

[0071] In some implementations, after obtaining the first point cloud, the electronic device can convert the first point cloud into a point cloud in the world coordinate system based on cylindrical projection extrinsic parameters to obtain the target point cloud. The cylindrical projection extrinsic parameters can be obtained based on the fisheye camera extrinsic parameters corresponding to the original fisheye image. These cylindrical projection extrinsic parameters can include a rotation matrix and a translation vector; the translation vector can be consistent with the offset of the fisheye camera extrinsic parameters, for example, it can be a 3×3 rotation matrix; the rotation matrix can be a fixed orientation angle of rotation around the y-axis of the electronic device's coordinate system, set according to the fisheye camera's viewpoint type.

[0072] For example, after obtaining the first point cloud, the electronic device can use the cylindrical projection extrinsic parameters (rotation matrix Rc and translation vector Tc) to convert the 3D points (x, y, z) in the camera coordinate system into points (Xw, Yw, Zw) = Rc × [x; y; z] + Tc in the world coordinate system to obtain the target point cloud.

[0073] Step S205: Project the target point cloud onto the bird's-eye view plane to obtain a bird's-eye view feature map.

[0074] For a detailed description of step S205, please refer to the previous description of step S140, which will not be repeated here.

[0075] Step S206: Detect three-dimensional targets based on the bird's-eye view feature map and obtain three-dimensional target detection results, wherein the three-dimensional target detection results include the three-dimensional bounding box corresponding to the target.

[0076] In some implementations, after obtaining the bird's-eye feature map corresponding to the original fisheye image, the electronic device can detect three-dimensional targets based on the bird's-eye feature map and obtain three-dimensional target detection results. The three-dimensional target detection results include, but are not limited to, the three-dimensional bounding box corresponding to the target (e.g., the target's center coordinates (Xobj, Yobj, Zobj), size L (length) × W (width) × H (height), heading angle θobj), category (e.g., pedestrian, vehicle, obstacle, etc.), and confidence level (0~1), etc.

[0077] In some implementations, the 3D target detection result includes the confidence level corresponding to the target. The electronic device, after acquiring the 3D target detection result corresponding to the original fisheye image, can filter false detections based on the confidence levels of the targets included in the 3D target detection result. Specifically, the electronic device can filter out targets with confidence levels less than a confidence threshold in the 3D target detection result, retaining only those with confidence levels greater than or equal to the confidence threshold. The confidence threshold can be preset in the electronic device, set by the user, or obtained from third-party experimental data; for example, it can be set by the user to 0.5.

[0078] Step S207: Based on the fisheye camera extrinsic parameters used to acquire the original fisheye image, transform the three-dimensional bounding box corresponding to the target to the three-dimensional camera coordinate system to obtain the camera coordinates corresponding to the three-dimensional bounding box.

[0079] In some implementations, the 3D target detection result may include the 3D bounding box corresponding to the target. Considering that the 3D target detection result can correspond to the world coordinate system, the electronic device can transform the 3D bounding box corresponding to the target to the 3D camera coordinate system according to the fisheye camera extrinsic parameters used to acquire the original fisheye image, and obtain the camera coordinates corresponding to the 3D bounding box. Subsequently, the 3D bounding box corresponding to the target is projected onto the cylindrical projection image of the target to realize the visualization processing of the 3D target detection result.

[0080] The electronic device can acquire the vertices of the 3D bounding box corresponding to the target (e.g., the eight vertices (x1, y1, z1) ~ (x8, y8, z8) in the world coordinate system), and can transform the 3D bounding box corresponding to the target to the 3D camera coordinate system based on the fisheye camera's extrinsic parameters to obtain the camera coordinates corresponding to the 3D bounding box. The fisheye camera's extrinsic parameters may include an extrinsic parameter matrix R; the electronic device can obtain the inverse matrix R of the extrinsic parameter matrix R. -1 Convert the 8 vertices (x1, y1, z1) ~ (x8, y8, z8) in the world coordinate system to points [x; y; z] = R in the camera coordinate system. -1 × [Xw - T x ; Yw - Tᵧ; Zw -T_z].

[0081] Step S208: Based on the first mapping relationship, convert the camera coordinates corresponding to the three-dimensional bounding box into cylindrical pixel coordinates.

[0082] In some implementations, after obtaining the camera coordinates corresponding to the 3D bounding box in the target 3D detection result, the electronic device can convert the camera coordinates corresponding to the 3D bounding box into cylindrical pixel coordinates according to a first mapping relationship. For example, the electronic device can use the Map corresponding to the first mapping relationship... c2cam The camera coordinates corresponding to each 3D bounding box Mapped to cylindrical image pixels .

[0083] Step S209: Display the three-dimensional bounding box corresponding to the target on the target cylindrical projection image according to the cylindrical pixel coordinates.

[0084] In some implementations, after obtaining the cylindrical pixel coordinates corresponding to the 3D bounding box in the 3D detection result of the target, the electronic device can display the 3D bounding box corresponding to the target on the cylindrical projection image of the target based on the cylindrical pixel coordinates. Specifically, the electronic device can draw the 3D detection box on the cylindrical projection image of the target based on the cylindrical pixel coordinates and label the target's category and confidence level, thereby displaying the 3D detection box, category, and confidence level of the 3D target on the cylindrical projection image, achieving visualization of the 3D target detection results.

[0085] In this embodiment, a custom cylindrical projection scheme is used, and a novel calculation formula for projecting fisheye images into cylindrical images is designed, which differs from traditional equidistant or isometric cylindrical projections. By establishing a precise bidirectional mapping between the camera coordinate system (x, y, z) and the cylindrical projection image coordinate system (u, v), and introducing a Radius parameter, accurate reconstruction of the cylindrical projection image into 3D space is achieved. This solves the problem that existing equidistant cylindrical projections cannot accurately deduce 3D coordinates, and also improves the accuracy of 3D target spatial positioning, especially in the image edge region (projecting the fisheye image onto the initialized cylindrical projection image). Furthermore, in this embodiment, the collaborative design of cylindrical projection and the BEV LSS model utilizes the panoramic information preserved by cylindrical projection to optimize the 3D view frustum construction of the LSS model. Through precise coordinate mapping relationships, information loss during feature projection is reduced, overcoming the problem of coordinate transformation error accumulation in the polar coordinate BEV method, and improving the detection accuracy of distant targets.

[0086] For example, please refer to Figure 4This document illustrates a flowchart of a target detection method provided in an embodiment of this application. The electronic device can convert the original fisheye image into a target cylindrical projection image based on a mapping table that converts between cylindrical projection images and fisheye images. It can then extract features from the target cylindrical projection image to obtain a target feature map. Based on the target feature map, it can construct a view frustum point cloud in the world coordinate system to obtain a target point cloud. Furthermore, it can perform a BEV (Browser-Earth View) transformation on the target point cloud to obtain a bird's-eye view feature map. Based on the bird's-eye view feature map, it can detect 3D targets to obtain 3D target detection results. Finally, it can project the 3D target detection results onto the target cylindrical projection image to visualize the 3D target detection results. It is understood that in this embodiment, the custom-designed bidirectional cylindrical projection mapping mechanism accurately preserves spatial geometric relationships and reduces distortion errors during the projection process. Moreover, since the conversion formula from cylindrical projection image to 3D space is directly based on geometric derivation, it avoids the cumulative errors caused by approximate conversion, significantly improving the positioning accuracy of the 3D bounding box and enhancing the accuracy of 3D target detection.

[0087] The target detection method provided in one embodiment of this application is compared to... Figure 1 The target detection method shown can also transform the target feature map into the 3D camera coordinate system based on the first mapping relationship to obtain the first point cloud; and transform the first point cloud into a point cloud in the world coordinate system based on the cylindrical projection extrinsic parameters to obtain the target point cloud. The cylindrical projection extrinsic parameters are obtained based on the fisheye camera extrinsic parameters used to acquire the original fisheye image. Thus, according to the first mapping relationship between the camera coordinate system and the cylindrical projection image coordinates, a 3D frustum point cloud is constructed, and the point cloud is transformed into the world coordinate system. Bird's-eye feature extraction is performed in the world coordinate system, which improves the accuracy of 3D target detection of fisheye images.

[0088] Meanwhile, in this embodiment, the 3D target detection result can include the 3D bounding box corresponding to the target. In this embodiment, after detecting the 3D target based on the bird's-eye feature map and obtaining the 3D target detection result, the 3D bounding box corresponding to the target is transformed to the 3D camera coordinate system according to the fisheye camera extrinsic parameters used to acquire the original fisheye image, and the camera coordinates corresponding to the 3D bounding box are obtained. According to the cylindrical projection intrinsic parameters and the first mapping relationship, the camera coordinates corresponding to the 3D bounding box are converted into cylindrical pixel coordinates. According to the cylindrical pixel coordinates, the 3D bounding box corresponding to the target is displayed on the cylindrical projection image of the target, thereby visualizing the target detection result and improving the user experience.

[0089] In addition, this embodiment can also convert the original fisheye image into a target cylindrical projection image according to the first mapping relationship and the second mapping relationship. The second mapping relationship is used to characterize the mapping relationship between the coordinates in the three-dimensional camera coordinate system and the coordinates of the fisheye image, thereby customizing the mapping relationship between the cylindrical projection image → camera coordinate system → original fisheye image, so as to directly convert the image form according to the mapping relationship, avoiding the cumulative error caused by approximate conversion and improving the positioning accuracy of target detection.

[0090] Meanwhile, this embodiment can also extract features from the target cylindrical projection image through a feature extraction algorithm to obtain a multi-scale feature map corresponding to the target cylindrical projection image, which serves as the target feature map. The feature extraction algorithm includes at least one of an image texture feature extraction algorithm and an image semantic feature extraction algorithm, thereby obtaining a multi-scale feature map of the cylindrical projection image for target detection, reducing the computational load of target detection and improving the accuracy of three-dimensional target detection for fisheye images.

[0091] Please see Figure 5 , Figure 5 This diagram illustrates a block diagram of a target detection device according to an embodiment of the present application. The target detection device 200 is applied to the aforementioned electronic device, and will be discussed below. Figure 5 The process shown is described in detail. The target detection device 200 may include: a cylindrical projection image generation module 210, a cylindrical projection image feature extraction module 220, a view frustum construction module 230, a viewpoint conversion module 240, and a target detection module 250, wherein: The cylindrical projection image generation module 210 is used to convert the original fisheye image into a target cylindrical projection image.

[0092] The cylindrical projection image feature extraction module 220 is used to extract features from the target cylindrical projection image to obtain the target feature map corresponding to the target cylindrical projection image.

[0093] The view frustum construction module 230 is used to transform the target feature map to the three-dimensional camera coordinate system based on the first mapping relationship to obtain the target point cloud, wherein the first mapping relationship is used to characterize the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system.

[0094] The view conversion module 240 is used to project the target point cloud onto the bird's-eye view plane to obtain a bird's-eye view feature map.

[0095] The target detection module 250 is used to detect three-dimensional targets based on the bird's-eye view feature map and obtain three-dimensional target detection results.

[0096] Furthermore, the view frustum construction module 230 may include: a 3D camera coordinate system point cloud acquisition unit and a world coordinate system point cloud acquisition unit, wherein: The three-dimensional camera coordinate system point cloud acquisition unit is used to transform the target feature map into the three-dimensional camera coordinate system based on the first mapping relationship to obtain the first point cloud.

[0097] The world coordinate system point cloud acquisition unit is used to convert the first point cloud into a point cloud in the world coordinate system based on the cylindrical projection extrinsic parameters to obtain the target point cloud. The cylindrical projection extrinsic parameters are obtained based on the extrinsic parameters of the fisheye camera used to acquire the original fisheye image.

[0098] Furthermore, before transforming the target feature map into the three-dimensional camera coordinate system based on the first mapping relationship to obtain the first point cloud, the target detection device 200 may further include: a cylindrical projection intrinsic parameter acquisition unit and a first mapping relationship acquisition unit, wherein: The cylindrical projection intrinsic parameter acquisition unit is used to acquire the cylindrical projection intrinsic parameters based on the cylindrical projection parameters.

[0099] The first mapping relationship acquisition unit is used to acquire the first mapping relationship based on the cylindrical projection intrinsic parameters and the cylindrical projection reference radius.

[0100] Further, the cylindrical projection intrinsic parameters include the horizontal focal length of the cylinder, the vertical focal length of the cylinder, the abscissa of the principal point of the cylindrical image, and the ordinate of the principal point of the cylindrical image. The three-dimensional camera coordinate system point cloud acquisition unit may include: a horizontal coordinate and depth coordinate acquisition unit for the first point cloud and a vertical coordinate acquisition unit for the first point cloud, wherein: The first point cloud horizontal and depth coordinate acquisition unit is used to acquire the horizontal and depth coordinates of the first point cloud in the three-dimensional camera coordinate system based on the cylindrical projection reference radius, the width coordinate of the target feature map, the horizontal coordinate of the principal point of the cylindrical image, and the horizontal focal length of the cylindrical image.

[0101] The first point cloud vertical coordinate acquisition unit is used to acquire the vertical coordinates of the first point cloud in the three-dimensional camera coordinate system based on the cylindrical projection reference radius, the height coordinates of the target feature map, the ordinate of the principal point of the cylindrical image, and the vertical focal length of the cylindrical image.

[0102] Furthermore, the cylindrical projection image generation module 210 may include: a cylindrical projection image generation subunit, wherein: A cylindrical projection image generation subunit is used to convert the original fisheye image into the target cylindrical projection image according to the first mapping relationship and the second mapping relationship, wherein the second mapping relationship is used to characterize the mapping relationship between the coordinates in the three-dimensional camera coordinate system and the coordinates of the fisheye image.

[0103] Further, before converting the original fisheye image into the target cylindrical projection image according to the first mapping relationship and the second mapping relationship, the target detection device 200 may further include: a camera coordinate system coordinate acquisition unit, a distortion radius determination unit, and a second mapping relationship acquisition unit, wherein: The camera coordinate system coordinate acquisition unit is used to convert the initialized cylindrical image to the three-dimensional camera coordinate system based on the first mapping relationship, and obtain the camera coordinate system coordinates.

[0104] The distortion radius determination unit is used to determine the distortion radius corresponding to the original fisheye image and the target polar angle corresponding to the initialized cylindrical image based on the coordinates of the camera coordinate system and the distortion parameters of the fisheye model corresponding to the original fisheye image.

[0105] The second mapping relationship acquisition unit is used to acquire the second mapping relationship based on the fisheye camera intrinsic parameters used to acquire the original fisheye image, the distortion radius, and the target polar angle.

[0106] Furthermore, the 3D target detection result includes the 3D bounding box corresponding to the target. After detecting the 3D target based on the bird's-eye view feature map and obtaining the 3D target detection result, the target detection device 200 may further include: a camera coordinate acquisition unit for the 3D bounding box, a cylindrical pixel coordinate acquisition unit for the 3D bounding box, and a 3D bounding box visualization unit, wherein: The camera coordinate acquisition unit for the three-dimensional bounding box is used to transform the three-dimensional bounding box corresponding to the target to the three-dimensional camera coordinate system based on the fisheye camera extrinsic parameters used to acquire the original fisheye image, and obtain the camera coordinates corresponding to the three-dimensional bounding box.

[0107] The cylindrical pixel coordinate acquisition unit of the three-dimensional bounding box is used to convert the camera coordinates corresponding to the three-dimensional bounding box into cylindrical pixel coordinates according to the first mapping relationship.

[0108] A 3D bounding box visualization unit is used to display the 3D bounding box corresponding to the target on the target cylindrical projection image based on the cylindrical pixel coordinates.

[0109] Furthermore, the cylindrical projection image feature extraction module 220 may include: a multi-scale feature map acquisition unit, wherein: A multi-scale feature map acquisition unit is used to extract features from the target cylindrical projection image using a feature extraction algorithm to obtain a multi-scale feature map corresponding to the target cylindrical projection image, which is used as the target feature map. The feature extraction algorithm includes at least one of an image texture feature extraction algorithm and an image semantic feature extraction algorithm.

[0110] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described device and module can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0111] In the several embodiments provided in this application, the coupling between modules can be electrical, mechanical, or other forms of coupling.

[0112] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0113] Please see Figure 6 This document illustrates a structural block diagram of an electronic device according to an embodiment of this application. The electronic device 100 can be a vehicle, in-vehicle terminal, server, computer, or other device with processing capabilities. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs. The one or more application programs may be stored in the memory 120 and configured to be executed by one or more processors 110. The one or more programs are configured to perform the methods described in the foregoing method embodiments.

[0114] The processor 110 may include one or more processing cores. The processor 110 connects to various parts within the electronic device 100 using various interfaces and lines, and performs various functions and processes data of the vehicle 100 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and by calling data stored in the memory 120. Optionally, the processor 110 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 110 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content to be displayed; and the modem handles wireless communication. It is understood that the modem may also not be integrated into the processor 110 and may be implemented separately using a communication chip.

[0115] The memory 120 may include random access memory (RAM) or read-only memory (ROM). The memory 120 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area. The program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, sound playback functionality, image playback functionality, etc.), and instructions for implementing the various method embodiments described below. The data storage area may also store data created by the electronic device 100 during use (such as phonebook data, audio and video data, chat log data, etc.).

[0116] In this embodiment, the vehicle may include the electronic devices described in the above embodiments. The vehicle may be an electric vehicle, a car, or other mobile device.

[0117] In this embodiment, a computer-readable medium stores program code, which can be called by a processor to execute the methods described in the above method embodiments.

[0118] Computer-readable storage media can be electronic storage devices such as flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, or ROM. Optionally, computer-readable storage media includes non-transitory computer-readable storage medium. The computer-readable storage medium has storage space for program code that performs any of the method steps described above. This program code can be read from or written to one or more computer program products. The program code can be compressed, for example, in a suitable form.

[0119] In this application, "multiple" refers to two or more.

[0120] In this application, unless otherwise expressly defined, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection between two components. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.

[0121] The terms “first,” “second,” “third,” “fourth,” etc., in this application (if present) are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0122] In this application, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, in this application, the character " / " generally indicates that the preceding and following related objects have an "or" relationship.

[0123] Unless otherwise specified, all steps in this application may be performed sequentially or randomly. For example, if the method includes steps A and B, it means that the method may include steps A and B performed sequentially, or it may include steps B and A performed sequentially. For example, if the method may also include step C, it means that step C may be added to the method in any order. For example, the method may include steps A, B, and C, or it may include steps A, C, and B, or it may include steps C, A, and B, etc.

[0124] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A target detection method, characterized in that, The method includes: Convert the original fisheye image into a target cylindrical projection image; Feature extraction is performed on the target cylindrical projection image to obtain the target feature map corresponding to the target cylindrical projection image; The target feature map is transformed to the three-dimensional camera coordinate system based on the first mapping relationship to obtain the target point cloud. The first mapping relationship is used to characterize the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system. The target point cloud is projected onto a bird's-eye view plane to obtain a bird's-eye view feature map; Three-dimensional targets are detected based on the bird's-eye view feature map, and three-dimensional target detection results are obtained.

2. The method according to claim 1, characterized in that, The step of transforming the target feature map to a 3D camera coordinate system based on the first mapping relationship to obtain the target point cloud includes: Based on the first mapping relationship, the target feature map is transformed into a three-dimensional camera coordinate system to obtain a first point cloud; The first point cloud is converted into a point cloud in the world coordinate system based on the cylindrical projection extrinsic parameters to obtain the target point cloud. The cylindrical projection extrinsic parameters are obtained based on the fisheye camera extrinsic parameters used to acquire the original fisheye image.

3. The method according to claim 2, characterized in that, Before transforming the target feature map into the 3D camera coordinate system based on the first mapping relationship to obtain the first point cloud, the method further includes: Based on the cylindrical projection parameters, obtain the intrinsic parameters of the cylindrical projection. The first mapping relationship is obtained based on the cylindrical projection intrinsic parameters and the cylindrical projection reference radius.

4. The method according to claim 3, characterized in that, The cylindrical projection intrinsic parameters include the horizontal focal length of the cylinder, the vertical focal length of the cylinder, the abscissa of the principal point of the cylindrical image, and the ordinate of the principal point of the cylindrical image. The step of transforming the target feature map to the three-dimensional camera coordinate system based on the first mapping relationship to obtain the first point cloud includes: Based on the cylindrical projection reference radius, the width coordinates of the target feature map, the horizontal coordinates of the principal point of the cylindrical image, and the horizontal focal length of the cylindrical image, the horizontal coordinates and depth coordinates of the first point cloud in the three-dimensional camera coordinate system of the target feature map are obtained. Based on the cylindrical projection reference radius, the height coordinates of the target feature map, the ordinate of the principal point of the cylindrical image, and the vertical focal length of the cylindrical image, the vertical coordinates of the first point cloud in the three-dimensional camera coordinate system are obtained by transforming the coordinates of the target feature map to those of the first point cloud.

5. The method according to any one of claims 1-4, characterized in that, The process of converting the original fisheye image into a target cylindrical projection image includes: Based on the first mapping relationship and the second mapping relationship, the original fisheye image is converted into the target cylindrical projection image, wherein the second mapping relationship is used to characterize the mapping relationship between the coordinates in the three-dimensional camera coordinate system and the coordinates of the fisheye image.

6. The method according to claim 5, characterized in that, Before converting the original fisheye image into the target cylindrical projection image according to the first mapping relationship and the second mapping relationship, the method further includes: Based on the first mapping relationship, the initialized cylindrical image is transformed into the three-dimensional camera coordinate system to obtain the camera coordinate system coordinates; Based on the coordinates of the camera coordinate system and the distortion parameters of the fisheye model corresponding to the original fisheye image, determine the distortion radius corresponding to the original fisheye image and the target polar angle corresponding to the initialized cylindrical image; The second mapping relationship is obtained based on the intrinsic parameters of the fisheye camera used to acquire the original fisheye image, the distortion radius, and the target polar angle.

7. The method according to any one of claims 1-4, characterized in that, The 3D target detection result includes the 3D bounding box corresponding to the target. After detecting the 3D target based on the bird's-eye view feature map and obtaining the 3D target detection result, the method further includes: Based on the fisheye camera extrinsic parameters used to acquire the original fisheye image, the 3D bounding box corresponding to the target is transformed to the 3D camera coordinate system to obtain the camera coordinates corresponding to the 3D bounding box; Based on the first mapping relationship, the camera coordinates corresponding to the three-dimensional bounding box are converted into cylindrical pixel coordinates; Based on the cylindrical pixel coordinates, display the three-dimensional bounding box corresponding to the target on the target cylindrical projection image.

8. The method according to any one of claims 1-4, characterized in that, The step of extracting features from the target cylindrical projection image to obtain the target feature map corresponding to the target cylindrical projection image includes: The target cylindrical projection image is subjected to feature extraction by a feature extraction algorithm to obtain a multi-scale feature map corresponding to the target cylindrical projection image, which is used as the target feature map. The feature extraction algorithm includes at least one of image texture feature extraction algorithm and image semantic feature extraction algorithm.

9. A target detection device, characterized in that, The device includes: The cylindrical projection image generation module is used to convert the original fisheye image into the target cylindrical projection image; The cylindrical projection image feature extraction module is used to extract features from the target cylindrical projection image to obtain a target feature map corresponding to the target cylindrical projection image; A view frustum construction module is used to transform the target feature map to a three-dimensional camera coordinate system based on a first mapping relationship to obtain a target point cloud, wherein the first mapping relationship is used to characterize the mapping relationship between the coordinates of the cylindrical projection image and the coordinates in the three-dimensional camera coordinate system; The viewpoint conversion module is used to project the target point cloud onto a bird's-eye view plane to obtain a bird's-eye view feature map; The target detection module is used to detect three-dimensional targets based on the bird's-eye view feature map and obtain three-dimensional target detection results.

10. An electronic device, characterized in that, include: One or more processors; Memory; One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications being configured to perform the method as described in any one of claims 1-8.

11. A vehicle, characterized in that, The vehicle includes the electronic equipment as described in claim 10.