Storage location detection method, electronic device, and computer-readable medium

By fusing features from multi-view fisheye images and inverse perspective transformation images, and using a feature extraction and transformation network for storage location detection, the problem of insufficient robustness and accuracy in existing storage location detection technologies is solved, achieving stronger robustness and more accurate storage location detection results.

CN115331196BActive Publication Date: 2026-06-12BEIJING MAICHI ZHIXING TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING MAICHI ZHIXING TECHNOLOGY CO LTD
Filing Date
2022-07-20
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The robustness and accuracy of existing technologies for warehouse location detection are low, mainly because inverse perspective transformation images have missing information, distortion, and are easily affected by environmental factors.

Method used

By acquiring multi-view fisheye images captured by multiple fisheye cameras and inverse perspective transformation images generated based on these images, and inputting them into a storage location detection model for detection, the features of the original multi-view fisheye images and inverse perspective transformation images are fused, and storage location detection is performed using feature extraction, feature transformation, and detection networks.

🎯Benefits of technology

It improves the robustness and accuracy of parking space detection, mitigates the information loss in inverse perspective transformation images and the adverse effects of environmental factors, and is suitable for parking space detection of different vehicles.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115331196B_ABST
    Figure CN115331196B_ABST
Patent Text Reader

Abstract

Embodiments of the application disclose a storage location detection method, an electronic device and a computer readable medium. An embodiment of the method comprises: acquiring a plurality of multi-view fisheye images captured by a plurality of fisheye cameras, and acquiring inverse perspective transformation images generated based on the plurality of multi-view fisheye images, the plurality of fisheye cameras being arranged on a vehicle; inputting the plurality of multi-view fisheye images and the inverse perspective transformation images into a storage location detection model to perform storage location detection, and obtaining a storage location detection result output by the storage location detection model; wherein the storage location detection result is used to indicate the orientation of a storage location. The embodiment can improve the robustness of storage location detection and the accuracy of the storage location detection result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, specifically to storage location detection methods, electronic devices, and computer-readable media. Background Technology

[0002] With the development of intelligent driving technology, users have an urgent need for assisted parking or automatic parking. In order to provide users with effective assisted parking or automatic parking services, accurate detection of parking spaces near the vehicle is required.

[0003] In existing technologies, a typical approach is to train a storage location detection model by inputting inverse perspective mapping (IPM) data obtained from multiple fisheye images into the model for storage location detection. However, because the storage location detection model uses relatively limited information, and the IPM images suffer from missing information, distortion, and are easily affected by environmental factors (such as lighting and occlusion), the robustness of storage location detection is poor, and the accuracy of the detection results is low. Summary of the Invention

[0004] This application proposes a warehouse location detection method, electronic device, and computer-readable medium to address the technical problems of poor robustness and low accuracy of warehouse location detection results in the prior art.

[0005] In a first aspect, embodiments of this application provide a warehouse location detection method, the method comprising: acquiring multi-view fisheye images captured by multiple fisheye cameras, and acquiring an inverse perspective transformation image generated based on the multi-view fisheye images, wherein the multiple fisheye cameras are mounted on a vehicle; inputting the multi-view fisheye images and the inverse perspective transformation image into a warehouse location detection model for warehouse location detection, and obtaining a warehouse location detection result output by the warehouse location detection model; wherein the warehouse location detection result is used to indicate the location of the warehouse location.

[0006] In a second aspect, embodiments of this application provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in the first aspect.

[0007] Thirdly, embodiments of this application provide a computer-readable medium having a computer program stored thereon that, when executed by a processor, implements the method described in the first aspect.

[0008] Fourthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the method described in the first aspect.

[0009] The storage location detection method, electronic device, and computer-readable medium provided in this application acquire multi-view fisheye images captured by multiple fisheye cameras installed on a vehicle, as well as inverse perspective transformed images generated based on the aforementioned multi-view fisheye images. The acquired multi-view fisheye images and inverse perspective transformed images are then input into a storage location detection model to obtain storage location detection results indicating the location of the storage location. Because the storage location detection model uses both the inverse perspective transformed images and the original multi-view fisheye images during the storage location detection process, it can improve the perception range of the storage location detection model, mitigate the adverse effects of information loss, distortion, and environmental factors caused by the inverse perspective transformed images, make the storage location detection more robust, and make the storage location detection results more accurate. Attached Figure Description

[0010] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0011] Figure 1 This is a flowchart of an embodiment of the warehouse location detection method according to this application;

[0012] Figure 2 This is a schematic diagram of the feature extraction process of the feature extraction network according to this application;

[0013] Figure 3 This is a schematic diagram of the processing procedure of the feature transformation network according to this application;

[0014] Figure 4 This is a schematic diagram of one embodiment of the warehouse location detection device according to this application;

[0015] Figure 5 This is a schematic diagram of the structure of an electronic device used to implement the embodiments of this application. Detailed Implementation

[0016] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0017] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0018] It should be noted that all actions involving the acquisition of signals, information, or data in this application are carried out in compliance with the relevant data protection laws and policies of the country where the application is located, and with the authorization granted by the owner of the relevant device.

[0019] In recent years, significant progress has been made in research on technologies based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition. Artificial intelligence (AI) is an emerging science and technology that studies and develops theories, methods, technologies, and application systems to simulate and extend human intelligence. AI is a comprehensive discipline involving numerous technologies, including chips, big data, cloud computing, the Internet of Things, distributed storage, deep learning, machine learning, and neural networks. Computer vision, as an important branch of AI, specifically enables machines to recognize the world. Computer vision technologies typically include face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, object detection, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, behavior recognition, 3D reconstruction, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), computational photography, and robot navigation and localization. With the research and advancement of artificial intelligence technology, this technology has been applied in numerous fields, such as security, urban management, traffic management, building management, park management, facial recognition access control, facial recognition attendance, logistics management, warehouse management, robotics, intelligent marketing, computational photography, mobile imaging, cloud services, smart homes, wearable devices, autonomous driving, autonomous driving, smart healthcare, facial payment, facial unlocking, fingerprint unlocking, identity verification, smart screens, smart TVs, cameras, mobile internet, live streaming, beautification, makeup, medical aesthetics, and intelligent temperature measurement.

[0020] In the field of autonomous driving, to provide users with effective assisted parking or automated parking services, it is typically necessary to accurately detect parking spaces near the vehicle using artificial intelligence technology. Existing technologies usually rely on inverse perspective mapping (IPM) images, composed of multiple fisheye images, for parking space detection. However, IPM images suffer from information loss, distortion, and are easily affected by environmental factors (such as lighting and occlusion), resulting in low robustness and accuracy in parking space detection, and consequently, low accuracy of the detection results. This application provides a parking space detection method that improves the robustness and accuracy of parking space detection results.

[0021] Please refer to Figure 1The diagram illustrates a flow 100 of an embodiment of the warehouse location detection method according to this application. The entity executing the warehouse location detection method can be an electronic device. This electronic device can be communicatively connected to multiple fisheye cameras installed in a vehicle. These multiple fisheye cameras can be used to capture multi-view fisheye images. The electronic device can be an in-vehicle device installed in the vehicle, such as an in-vehicle terminal, or an electronic device installed outside the vehicle, such as a server; no limitation is made here.

[0022] The storage location detection method includes the following steps:

[0023] Step 101: Obtain multi-view fisheye images captured by multiple fisheye cameras, and obtain inverse perspective transformation images generated based on the multi-view fisheye images.

[0024] In this embodiment, the aforementioned executing entity can acquire multi-view fisheye images captured by the multiple fisheye cameras. These multi-view fisheye images can be fisheye images from multiple perspectives acquired simultaneously. In practice, the number of fisheye cameras can be set as needed and can be positioned at different locations within the vehicle to capture fisheye images from different perspectives, wherein one fisheye camera captures a fisheye image from one perspective. For example, this may include, but is not limited to: front-view fisheye images, rear-view fisheye images, left-view fisheye images, and right-view fisheye images.

[0025] In this embodiment, the inverse perspective transformed image can be generated based on the multi-view fisheye image. For example, the multi-view fisheye image can be converted into an inverse perspective transformed image using an inverse perspective transformed algorithm, and then the inverse perspective transformed images are stitched together to obtain the final inverse perspective transformed image.

[0026] Step 102: Input the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model to perform storage location detection and obtain the storage location detection result.

[0027] In this embodiment, after the execution entity obtains the multi-view fisheye image and the inverse perspective transformation image generated based on the multi-view fisheye image, the multi-view fisheye image and the inverse perspective transformation image can be input into the pre-trained storage location detection model to obtain the storage location detection result.

[0028] The storage location detection model can be pre-trained using machine learning methods (such as supervised learning). The base model used to train the storage location detection model can be a neural network, etc. The storage location detection model can perform feature extraction, feature transformation, and detection on the input information to output the storage location detection result.

[0029] In this embodiment, the parking space detection result can be used to indicate the orientation of the parking space. Orientation can include direction and location. Specifically, the parking space detection result may include, but is not limited to, the location information, direction information, and corner point pair information of the parking space corner points. The corner point pair information can be used to indicate which two corner points can form a parking space. Based on the parking space detection result, the aforementioned execution entity can estimate the vehicle's pose, thereby controlling the vehicle to park based on the pose.

[0030] The method provided in the above embodiments of this application acquires multi-view fisheye images captured by multiple fisheye cameras installed on a vehicle, and inverse perspective transformation images generated based on the multi-view fisheye images. The acquired multi-view fisheye images and inverse perspective transformation images are then input into a warehouse location detection model to obtain warehouse location detection results indicating the warehouse location. Since the warehouse location detection model uses both the inverse perspective transformation image and the original multi-view fisheye image during the warehouse location detection process, it can improve the perception range of the warehouse location detection model, mitigate the adverse effects of information loss, distortion, and environmental factors caused by the inverse perspective transformation image, make the warehouse location detection more robust, and make the warehouse location detection results more accurate.

[0031] In some optional embodiments, before performing warehouse location detection based on the warehouse location detection model, the aforementioned execution entity may also obtain parameter information from the multiple fisheye cameras. For example, the parameter information may include, but is not limited to, at least one of the following: intrinsic parameter information and extrinsic parameter information. Intrinsic parameter information may include, but is not limited to, focal length, distortion parameters, etc. Extrinsic parameter information may include, but is not limited to, rotation and translation matrices between the camera coordinate system and the world coordinate system, etc.

[0032] When inputting multi-view fisheye images and inverse perspective transformed images into a storage location detection model for storage location detection, parameter information, multi-view fisheye images, and inverse perspective transformed images can all be input into the model. Because the storage location detection model also incorporates parameter information from the fisheye camera during the detection process, it is applicable to storage location detection for different vehicles and has stronger generalization capabilities.

[0033] In some optional embodiments, the storage location detection model can perform storage location detection through the following steps: First, extract the mixed features of the multi-view fisheye image and the inverse perspective transformation image; then, fuse the parameter information and the mixed features to obtain bird's-eye view features; finally, detect the bird's-eye view features to obtain the storage location detection result. In practice, the storage location detection model may include a feature extraction network, a feature transformation network, and a detection network. The feature extraction network can be used to extract the mixed features of the multi-view fisheye image and the inverse perspective transformation image. The feature transformation network can be used to fuse parameter information and mixed features to obtain bird's-eye view (BEV) features. The detection network can be used to detect the bird's-eye view features to obtain the storage location detection result.

[0034] In some examples, the feature extraction network can use neural network architectures such as ResNet (Deep Residual Network), RegNet (Network Design Space), and HRNet (High-Resolution Net). Among these, HRNet is a multi-scale network architecture that can be used to extract multi-scale features. Taking the HRNet architecture as an example... Figure 2 A schematic diagram of the feature extraction process of a feature extraction network is shown. For example... Figure 2 As shown, Figure 2 Each row of the subnet corresponds to a feature scale. After inputting four fisheye images from different perspectives (left-view, right-view, front-view, and back-view fisheye images) and an inverse perspective transformed image into the feature extraction network, the network can extract feature maps of four sizes. Since the resolutions of feature maps of different sizes are different, a total of four resolution feature maps can be extracted. HRNet has a parallel structure; during feature extraction, subnets that progressively add high-resolution features down to low-resolution features can be added, and these multi-resolution subnets can be connected in parallel to perform multiple multi-scale feature fusions, thus enriching the extracted features.

[0035] In some examples, feature transformation networks can be built upon transformer networks. In practice, transformers employ a self-attention mechanism, including an encoder and a decoder. Here, the decoder can be removed from the transformer, and the feature transformation network is built based on the remaining network structure after removing the decoder. The encoder can contain multiple layers (e.g., six identical layers). Each layer can contain two sub-layers: a self-attention layer and a feedforward layer. The self-attention mechanism is embodied in the self-attention layer. The feature transformation network encodes and decodes the received features, thereby achieving feature transformation. Due to the high degree of parallelization and strong modeling capabilities of transformers, feature transformation networks can achieve low response times and high detection accuracy in various complex environments for location detection tasks.

[0036] In some examples, the detection network may include fully connected layers, which can predict the location detection result from the received bird's-eye view features.

[0037] The aforementioned feature extraction network extracts hybrid features from multi-view fisheye images and inverse perspective transformed images. By fusing the inverse perspective transformed images with the original multi-view fisheye images, the perception range is improved, mitigating the adverse effects of information loss, distortion, and environmental factors caused by inverse perspective transformed images. This results in stronger robustness and more accurate storage location detection. Furthermore, the feature transformation network incorporates parameter information from the fisheye camera, making the storage location detection model applicable to storage location detection for different vehicles, thus exhibiting stronger generalization ability.

[0038] In some optional embodiments, when the storage location detection model fuses parameter information and hybrid features, it can first convert the parameter information into location encoding information; then, it fuses the location encoding information and hybrid features to generate bird's-eye view features.

[0039] In practice, feature transformation networks can further include encoding and converter modules. See also Figure 3 This illustrates a schematic diagram of the feature transformation network's processing procedure. For example... Figure 3As shown, the encoding module can be used to convert parameter information into location-encoded information. The transformer module can be used to fuse the location-encoded information and mixed features to generate bird's-eye view features. In practice, the encoding module can include fully connected layers and ReLU (Rectified Linear Unit) activation function layers. The initial value of the location-encoded information can be a random value, which can be updated during model training. The transformer module can be obtained based on the remaining network structure after removing the decoder from the transformer network.

[0040] In some optional embodiments, the aforementioned hybrid features may include multiple sets of feature maps, each set having the same number of feature channels. Since the hybrid features may not conform to the input format requirements of the encoding module, when fusing the location encoding information and the hybrid features, feature maps with the same feature channel order from each set can first be merged into a merged feature map, resulting in an updated hybrid feature containing multiple merged feature maps. Then, the updated hybrid feature and the location encoding information are fused to obtain the bird's-eye view feature.

[0041] In practice, when inputting location-encoded information and hybrid features into the converter module of the feature transformation network, the hybrid features can first be format-converted as described above to obtain updated hybrid features. Then, the updated hybrid features and location-encoded information are input into the converter module of the feature transformation network to obtain the bird's-eye view features. Continuing the example above, the shape of the hybrid features is (32, 5, 256, 13, 13), which can be considered as 5 groups of features with 256 channels and a size of 13×13. After merging feature maps with the same channel order into a merged feature map, the hybrid features can be converted to a shape of (32, 5×13×13, 256), i.e., (32, 845, 256). It should be noted that batch size and number of channels do not participate in the conversion. The first two layers of the converter module can be two linear layers. After the target format features of shape (32, 845, 256) are input into the converter module, two linearly processed feature information of shape (32, 845, 256) can be obtained through the above two linear layers, which can be used as key and value respectively. The above positional encoding information, key and value are input into the subsequent network structure of the converter module to obtain the bird's-eye view feature of shape (32, 256, 13, 13).

[0042] In some optional embodiments, since the sizes of the multi-view fisheye images and the inverse perspective transformed images may differ, they can first be converted into images of the same size. Then, the converted images of the same size can be concatted to obtain the input data. When inputting the multi-view fisheye images and the inverse perspective transformed images into the storage location detection model for storage location detection, the input data can be directly input into the model. When it is necessary to combine parameter information for storage location detection, the input data and parameter information can be input together into the model.

[0043] In practice, multi-view fisheye images and inverse perspective transformation images can be merged based on image channels. For example, if each image has 3 image channels, resulting in 4 fisheye and inverse perspective transformation images, merging these images yields input data with 5×3 image channels. This input data is then fed into a feature extraction network to obtain hybrid features. Specifically, after unifying the dimensions of the images, the image shape can be represented as (3, 416, 416), where 3 represents the number of channels and 416 represents the width and height, respectively. Merging these 5 images yields input data with a shape of (5, 3, 416, 416). The shape of the hybrid features can be defined as (batch_size, num_image, channel, feature_map_size_x, feature_map_size_y), where batch_size is the batch size, num_image is the number of images, feature_map_size_x is the feature map width, and feature_map_size_y is the feature map height. If the batch size is 32, the input data can be represented as (32, 5, 3, 416, 416). After inputting the data of size (32, 5, 3, 416, 416) into the feature extraction network, a mixed feature with a shape of (32, 5, 256, 13, 13) can be obtained. The mixed feature has 256 channels and a feature map shape of 13×13.

[0044] By merging multi-view fisheye images and inverse perspective transformation images into the feature extraction network, the processing time can be reduced and the feature extraction efficiency can be improved compared to the conventional method of inputting images sequentially one by one.

[0045] Further reference Figure 4 As an implementation of the methods shown in the above figures, this application provides an embodiment of a warehouse location detection device, which is similar to... Figure 1 The method embodiment shown corresponds to this. Specifically, this device can be applied to an electronic device. The aforementioned electronic device is communicatively connected to a fisheye camera installed in a vehicle.

[0046] like Figure 4 As shown, the warehouse location detection device 400 of this embodiment includes: a first acquisition unit 401, used to acquire multi-view fisheye images captured by multiple fisheye cameras, and to acquire an inverse perspective transformation image generated based on the multi-view fisheye images, wherein the multiple fisheye cameras are installed on the vehicle; and a detection unit 402, used to input the multi-view fisheye images and the inverse perspective transformation image into a warehouse location detection model for warehouse location detection, and to obtain a warehouse location detection result output by the warehouse location detection model; wherein the warehouse location detection result is used to indicate the location of the warehouse location.

[0047] In some optional implementations of this embodiment, the above-mentioned device further includes: a first acquisition unit, used to acquire parameter information of the plurality of fisheye cameras; and a detection unit 402, further used to input the parameter information, the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model for storage location detection.

[0048] In some optional implementations of this embodiment, the above-mentioned storage location detection model performs storage location detection through the following steps: extracting the mixed features of the above-mentioned multi-view fisheye image and the above-mentioned inverse perspective transformation image; fusing the above-mentioned parameter information and the above-mentioned mixed features to obtain bird's-eye view features; and detecting the above-mentioned bird's-eye view features to obtain the storage location detection result.

[0049] In some optional implementations of this embodiment, the above-mentioned fusion processing of the parameter information and the mixed features to obtain the bird's-eye view features includes: converting the parameter information into location encoding information; and fusion processing of the location encoding information and the mixed features to generate the bird's-eye view features.

[0050] In some optional implementations of this embodiment, the storage location detection model includes a converter module, and the step of fusing the location encoding information and the hybrid features to generate the bird's-eye view features includes: fusing the location encoding information and the hybrid features through the converter module to generate the bird's-eye view features.

[0051] In some optional implementations of this embodiment, the above-mentioned hybrid feature includes multiple sets of feature maps, each set of feature maps having the same number of feature channels; the above-mentioned fusion processing of the above-mentioned location encoding information and the above-mentioned hybrid feature to generate the above-mentioned bird's-eye view feature includes: merging feature maps with the same feature channel order in each set of feature maps into a merged feature map to obtain an updated hybrid feature containing multiple merged feature maps; and fusing the above-mentioned updated hybrid feature and the above-mentioned location encoding information to obtain the above-mentioned bird's-eye view feature.

[0052] In some optional implementations of this embodiment, before inputting the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model, the storage location detection model is further used to convert the multi-view fisheye image and the inverse perspective transformation image into images of the same size; merge the images of the same size to obtain input data; the inputting the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model includes: inputting the input data into the storage location detection model.

[0053] In some optional implementations of this embodiment, the above parameter information includes at least one of the following: intrinsic parameter information and extrinsic parameter information; the above multi-view fisheye images include: front-view fisheye image, rear-view fisheye image, left-view fisheye image, and right-view fisheye image.

[0054] The apparatus provided in the above embodiments of this application acquires multi-view fisheye images captured by multiple fisheye cameras installed on a vehicle, and inverse perspective transformation images generated based on the multi-view fisheye images. The acquired multi-view fisheye images and inverse perspective transformation images are then input into a storage location detection model to obtain storage location detection results that indicate the location of the storage location. Since the storage location detection model uses both the inverse perspective transformation image and the original multi-view fisheye image during the storage location detection process, it can improve the perception range of the storage location detection model, mitigate the adverse effects of information loss, distortion, and environmental factors caused by the inverse perspective transformation image, and make the storage location detection more robust and accurate.

[0055] This application also provides an electronic device, including one or more processors and a storage device storing one or more programs thereon. When the one or more programs are executed by the one or more processors, the one or more processors implement the above-described warehouse location detection method.

[0056] The following is for reference. Figure 5 It shows a schematic diagram of the structure of an electronic device used to implement some embodiments of this application. Figure 5 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this application.

[0057] like Figure 5As shown, the electronic device 500 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 501, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 502 or a program loaded from a storage device 508 into a random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the electronic device 500. The processing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0058] Typically, the following devices can be connected to I / O interface 505: input devices 506 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 507 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 508 including, for example, disks, hard disks, etc.; and communication devices 509. Communication device 509 allows electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 5 An electronic device 500 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively. Figure 5 Each box shown can represent a device or multiple devices as needed.

[0059] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described warehouse location detection method.

[0060] In particular, according to some embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 509, or installed from storage device 508, or installed from ROM 502. When the computer program is executed by processing device 501, it performs the functions defined in the methods of some embodiments of this application.

[0061] This application also provides a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the above-described warehouse location detection method.

[0062] It should be noted that the computer-readable medium described in some embodiments of this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this application, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0063] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol, such as HTTP (Hypertext Transfer Protocol), and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.

[0064] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device. The aforementioned computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: acquire multi-view fisheye images captured by multiple fisheye cameras and inverse perspective transformation images generated based on the multi-view fisheye images, wherein the multiple fisheye cameras are installed in the vehicle; input the multi-view fisheye images and the inverse perspective transformation images into a storage location detection model for storage location detection, and obtain storage location detection results output by the storage location detection model; wherein the storage location detection results are used to indicate the location of the storage location. The storage location detection in this embodiment has better robustness, and this embodiment improves the accuracy of the storage location detection results.

[0065] Computer program code for performing operations of some embodiments of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++; and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network, or it can be connected to an external computer (e.g., via the Internet using an Internet service provider), including local area networks (LANs) or wide area networks (WANs).

[0066] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0067] The units described in some embodiments of this application can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor may be described as including a first determining unit, a second determining unit, a selecting unit, and a third determining unit. The names of these units do not necessarily limit the specific unit itself.

[0068] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0069] The above description is merely a selection of preferred embodiments of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features with similar functions disclosed in the embodiments of this application.

Claims

1. A method for detecting storage locations, characterized in that, The method includes: Acquire multi-view fisheye images captured by multiple fisheye cameras, and acquire inverse perspective transformation images generated based on the multi-view fisheye images, wherein the multiple fisheye cameras are installed on the vehicle; Obtain parameter information of the plurality of fisheye cameras; the parameter information includes at least one of the following: intrinsic parameter information and extrinsic parameter information; The parameter information, the multi-view fisheye image, and the inverse perspective transformation image are input into the storage location detection model to perform storage location detection, and the storage location detection result output by the storage location detection model is obtained; wherein, the storage location detection model includes a feature extraction network, a feature transformation network, and a detection network, and the feature transformation network includes a converter module; The storage location detection specifically includes the following steps: Extract the hybrid features of the multi-view fisheye image and the inverse perspective transformation image; the hybrid features include multiple sets of feature maps, each set of feature maps having the same number of feature channels; merge feature maps with the same feature channel order in each set of feature maps into a merged feature map, obtaining an updated hybrid feature containing multiple merged feature maps; convert the parameter information into location encoding information; perform fusion processing on the location encoding information and the updated hybrid features through the converter module to obtain bird's-eye view features; detect the bird's-eye view features to obtain the storage location detection result; The storage location detection result is used to indicate the location of the storage location.

2. The method according to claim 1, characterized in that, Before inputting the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model for storage location detection, the method further includes: Convert the multi-view fisheye image and the inverse perspective transformation image into images of the same size; The images of the same size are merged to obtain the input data; The step of inputting the multi-view fisheye image and the inverse perspective transformation image into the storage location detection model for storage location detection includes: The input data is fed into the storage location detection model for storage location detection.

3. The method according to claim 1 or 2, characterized in that, The multi-view fisheye images include: front-view fisheye image, rear-view fisheye image, left-view fisheye image, and right-view fisheye image.

4. An electronic device, characterized in that, include: One or more processors; Storage device, on which one or more programs are stored, When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-3.

5. A computer-readable medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-3.

6. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1-3.