A method for realizing automatic inventory of an automated stereoscopic warehouse based on linkage of a camera and a stacker

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By linking the stacker crane with the camera, the focal plane and depth of field are dynamically adjusted, solving the problem of image clarity differences between front and rear rows of goods in multi-depth shelves, and realizing efficient and accurate inventory counting in automated warehouses.

CN122264698APending Publication Date: 2026-06-23GUANGZHOU HOTENT SOFTWARE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGZHOU HOTENT SOFTWARE CO LTD
Filing Date: 2026-04-01
Publication Date: 2026-06-23

Application Information

Patent Timeline

01 Apr 2026

Application

23 Jun 2026

Publication

CN122264698A

IPC: G06Q10/087; H04N23/69; H04N23/67; H04N23/958; B65G1/04; B65G1/137; G06Q10/047; G06T7/73; G06T7/80; G06V20/60; G06V10/12; G06V20/62; G06K7/14

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Technology Topics

Computer graphics (images)Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Calendar photo frame (smart calendar photo frame advertising machine)
CN310026928SComputer graphics (images)Engineering
Graphical user interface [computer screen layout]
JP1829334SGraphical user interface Computer graphics (images)
Image capture control device
JP2026101110AOptical signalling Optical viewingCommunication unitMedicine
Splicing screen and splicing display apparatus
US12660113B2
GUI
JP1829806SGraphical user interface Computer graphics (images)

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In automated storage and retrieval systems with multi-depth racking, traditional inventory methods are difficult to adapt to the diversity and complexity of goods locations, resulting in differences in the clarity of images of goods in front and behind rows, which affects the accuracy and efficiency of inventory counting.

Method used

By planning the optimal movement path of the stacker crane, combining the fork extension stroke and camera image acquisition, the focal plane offset and depth coverage are dynamically adjusted. The variable zoom camera and focusing motor enable clear imaging of goods in the front and rear rows at the same time, and identification of marking information in multi-level deep shelves.

Benefits of technology

It achieves high efficiency and accuracy in multi-depth shelf inventory counting, ensuring clear imaging of goods in front and behind rows simultaneously, improving the inventory counting efficiency and accuracy of automated warehouses, and avoiding operational interference.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122264698A_ABST

Patent Text Reader

Abstract

The application provides a method for realizing automatic inventory of an automated stereoscopic warehouse based on linkage of a camera and a stacker, and relates to the technical field of intelligent inventory, comprising: pre-processing a current shelf image, identifying a storage location cargo form, analyzing pixel displacement distribution of front and rear misalignment of the cargo according to a focal plane offset, identifying a tray bearing tilt state, and determining a degree of depth of field coverage deviation caused by the tilted tray bearing; positioning a target area to be refocused according to coordinates of identification information with severe detail loss, combining relative distances between each row of cargos and the camera, driving a variable focus camera focusing motor of the target area, dynamically adjusting a depth of field coverage range of the target area, and obtaining an optimized focal plane position; obtaining a second real-time telescopic stroke corresponding to the optimized focal plane position, positioning identification areas of each row of cargos in combination with edges of the rear row of cargos, identifying identification information of each row of cargos in a multi-layer depth shelf, and determining a target focal length at which the front and rear cargos are simultaneously clearly imaged.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent inventory technology, and in particular to a method for automatic inventory management in automated warehouses based on the linkage between cameras and stacker cranes. Background Technology

[0002] In the field of modern logistics and warehousing management, inventory operations in automated storage and retrieval systems (AS / RS) are a crucial link in improving efficiency and accuracy, and their importance is self-evident. With the rapid development of e-commerce and supply chains, the rapid identification and accurate management of goods within warehouses directly affect a company's operating costs and customer satisfaction. However, how to achieve efficient inventory in complex environments remains a key issue that the industry urgently needs to address. Currently, many warehouse inventory methods often struggle to adapt to the diversity and complexity of goods locations when dealing with multi-depth racking. Especially when goods are misaligned between rows on the rack, traditional fixed-viewpoint or static imaging methods can easily lead to blurred or even completely unidentifiable information about some goods. This limitation does not stem solely from insufficient equipment performance but from a lack of dynamic perception and adaptation to differences in goods depth, resulting in a significant gap between the imaging effect and actual needs. A deeper technical challenge lies in how to balance the contradiction between imaging range and clarity in multi-depth racking environments. Expanding the imaging range usually means covering more rows of goods, but it also introduces the problem of differences in clarity between goods at different depths. Especially when pallets on the rack are placed at an angle, the distance difference between goods at different depths further amplifies this contradiction, leading to the loss of detailed information about goods in the back rows. For example, in an inventory count, although a camera can cover three rows of goods on a shelf, the labels on the front rows are clearly visible due to varying distances, while the labels on the back rows are blurry and illegible. This directly impacts the accuracy and efficiency of the inventory count. Therefore, a key issue in improving the efficiency of automated warehouse inventory counts is how to dynamically adjust imaging parameters based on changes in the distance between the goods and the camera during multi-depth shelf inventory counts to ensure clear images of both front and back rows. Summary of the Invention

[0003] This invention provides a method for automatic inventory management in an automated warehouse based on the linkage between a camera and a stacker crane. The method includes: Based on the warehouse location information of the goods to be inventoried, the optimal movement path of the stacker crane is planned. At the same time, the first real-time extension stroke of the forks in the movement path of the stacker crane is collected, the current shelf image is obtained, the relative distance between each row of goods and the camera in the movement path of the stacker crane is determined, and the focal plane offset of the current movement path of the stacker crane is obtained. The current shelf image is preprocessed to identify the shape of goods in the warehouse. If goods are present, the pixel displacement distribution of the goods is analyzed based on the focal plane offset to identify the tilt state of the pallet and determine the degree of depth coverage deviation caused by the tilted pallet. Assess whether the depth of field coverage deviation exceeds a preset threshold. If it does, process the current shelf image to determine the area of increased resolution difference, and determine the coordinates of the identification information with severe loss of detail based on the area of increased resolution difference. Based on the coordinates of the severely lost identification information, the target area to be refocused is located. Combined with the relative distance between each row of goods and the camera, the focus motor of the variable zoom camera in the target area is driven to dynamically adjust the depth of field coverage of the target area and obtain the optimized focal plane position. The second real-time telescopic stroke corresponding to the optimized focal plane position is obtained. Combined with the edge positioning of the rear goods, the identification area of each row of goods is located, the identification information of each row of goods in the multi-layer depth shelf is identified, and the target focal length for simultaneous clear imaging of front and rear goods is determined. The current focus response time of the zoom camera is obtained, the matching degree between the target focal length and the focus response time is evaluated, the imaging balance degree is determined based on the matching degree, and a verification image of clear imaging of goods at multiple depths is obtained. The identification information and image shape of each row of goods in the verification image are identified, and the inventory of goods on the multi-depth shelf is completed.

[0004] Furthermore, the optimal movement path of the stacker crane is planned based on the warehouse location information of the goods to be inventoried. Simultaneously, the first real-time extension stroke of the forks along the stacker crane's movement path is collected, and the current rack image is acquired. The relative distance between each row of goods and the camera along the stacker crane's movement path is determined, and the focal plane offset of the current stacker crane's movement path is obtained, including: Obtain the storage location code sequence of the goods to be inventoried, read the real-time operation status data in the warehouse management system, identify the channel occupancy status of the current inbound and outbound operations, generate an available channel time window matrix based on the channel occupancy timestamp and the estimated operation completion time, arrange the storage location inventory order through the matching operation of the time window matrix and the storage location code sequence, and form an inventory path sequence to be executed. Based on the inventory path sequence, the Dijkstra algorithm is used to calculate the shortest movement path of the stacker crane from its current position to each target storage location in the aisle. During the movement of the stacker crane along the path, the fork extension stroke value is collected in real time by the fork position sensor as the first real-time extension stroke. At the same time, the camera is triggered to collect the current shelf image and record the pixel coordinate position of the front, middle and back rows of goods in the image. By using the correspondence between the pixel coordinate positions and the first real-time telescopic stroke, the actual distance between the front, middle, and rear rows of goods and the camera is calculated. Based on the camera lens focal length parameters and object distance information, the imaging distance is calculated based on the optical imaging principle to determine the actual position of the current focal plane. By comparing the theoretical position of the focal plane when the front, middle, and rear rows of goods are clearly imaged with the actual position of the current focal plane, the focal plane offset of the current stacker crane movement path is obtained.

[0005] Furthermore, the step of obtaining the second real-time telescopic stroke corresponding to the optimized focal plane position, combining it with the positioning of the marking areas of each row of goods at the edge of the rear goods, identifying the marking information of each row of goods in the multi-layer depth shelf, and determining the target focal length for simultaneous clear imaging of the front and rear goods includes: Obtain the second real-time extension stroke value corresponding to the optimized focal plane position. The second real-time extension stroke value is the current extension stroke data of the fork after dynamic focus adjustment. Read the current data of the fork position sensor. Calculate the actual distance from the camera to each row of goods based on the second real-time extension stroke value. Combine the edge coordinates of the rear row of goods and locate the marked area range of each layer of goods in the front, middle and rear rows through coordinate mapping relationship. Extract the image blocks of each marked area through the region segmentation method. The image block is preprocessed, the barcode area is processed by adaptive binarization, the barcode decoder is used to identify the one-dimensional barcode and two-dimensional barcode information to obtain the goods code data, the text information on the label is extracted by optical character recognition method, and the goods category attribute is determined by matching the goods code and text information with the pre-stored goods list. Based on the cargo category attributes, the color histogram and contour shape features of the corresponding image blocks are extracted. Image features of similar cargo are identified through color clustering. The ratio of the contour area to the standard cargo area is calculated. When the ratio is an integer multiple, it is determined to be a stack of multiple items. The number of cargo in each row is counted, and the proportion of successful identification of the identification information in each row is calculated. At the same time, the image features and morphological features of the cargo are identified. The cargo category is determined based on the extracted cargo code and text information. The quantity is counted by combining the cargo image features and morphological features. The current focal length value is determined as the target focal length for simultaneous clear imaging of cargo in front and behind.

[0006] Furthermore, the preprocessing of the current shelf image to identify the shape of goods in the storage location, and if goods are present, the pixel displacement distribution of the goods' forward and backward misalignment is analyzed based on the focal plane offset to identify the pallet's tilting state and determine the degree of depth-of-field coverage deviation caused by the tilted pallet, including: The current shelf image is preprocessed by using Gaussian filtering to remove image noise and histogram equalization to enhance image contrast, resulting in an enhanced image. The Canny edge detection algorithm is then used to extract the outline of the goods on the enhanced image, and the presence of goods in the storage location is determined based on the closure and area threshold of the outline.

[0007] Furthermore, the assessment of whether the depth-of-field coverage deviation exceeds a preset threshold is performed. If it does, the current shelf image is processed to determine the area of increased resolution difference, and the coordinates of severely detail-loss-prone identification information are determined based on the area of increased resolution difference, including: The depth coverage deviation value is obtained and compared with a preset threshold. If the deviation exceeds the threshold, the current shelf image is partitioned into front, middle and back areas according to the shelf depth direction. In the back area, the Sobel operator is used to calculate the gradient values in the horizontal and vertical directions, and the set of edge pixels of the back goods is determined by the gradient magnitude.

[0008] Furthermore, based on the coordinates of the severely lost identification information, the target area to be refocused is located. Combined with the relative distance between each row of goods and the camera, the focus motor of the variable-focus camera in the target area is driven to dynamically adjust the depth-of-field coverage of the target area, resulting in an optimized focal plane position, including: Based on the coordinates of the severely lost identification information, the boundary of the target area to be refocused is delineated in the image space, the coordinates of the center point of the target area are obtained, and the average object distance of the goods in the target area is calculated by combining the relative distance data between each row of goods and the camera. The required focal length adjustment is calculated according to the thin lens imaging principle, and a drive pulse is sent to the focusing motor of the variable zoom camera through the drive circuit.

[0009] Furthermore, the process of acquiring the current focus response time of the variable-focus camera, evaluating the matching degree between the target focal length and the focus response time, determining the imaging balance degree based on the matching degree, obtaining a verification image of clear multi-depth cargo imaging, identifying the identification information and image shape of each row of cargo in the verification image, and completing the inventory of cargo on the multi-depth shelf includes: The current focus response time of the zoom camera is obtained, and the time data from the time the focus motor receives the drive signal to the time the lens reaches the target position is read. The theoretical response time is calculated by dividing the lens movement distance corresponding to the target focal length by the motor movement speed. The matching degree is evaluated by the ratio of the actual response time to the theoretical response time. Image acquisition is triggered according to the matching degree to obtain a verification image of clear imaging of goods at multiple depths. The verification image is partitioned to identify the goods identification information in each area of the front row, middle row, and rear row.

[0010] The technical solutions provided by the embodiments of the present invention may include the following beneficial effects: This invention discloses a method for automated inventory counting in an automated storage and retrieval system (AS / RS) based on the linkage of a camera and a stacker crane. It addresses the unique business scenarios where improper stacker crane path planning in traditional inventory counting leads to conflicts with inbound and outbound operations, and the difference in depth of field between front and rear rows of goods on multi-level deep racks causes blurred labels and difficulty in identification. By integrating storage location information and real-time warehouse operation status, the method plans the optimal avoidance path for the stacker crane, ensuring that inventory counting does not interfere with normal operations. Simultaneously, it acquires real-time images of the fork extension stroke and racks during path movement, calculates focal plane offset, dynamically assesses depth of field coverage deviation, and accurately locates the coordinates of labels with lost details when the deviation exceeds a threshold. This drives the variable-focus camera to refocus on the rear goods area, and combined with a second extension stroke, achieves clear imaging of both front and rear goods simultaneously. Ultimately, it accurately identifies the label information, morphological characteristics, and quantity of multiple rows of goods and automatically verifies inventory data. This method significantly improves the efficiency and accuracy of multi-level deep rack inventory counting, achieving zero-interference operation and high-precision intelligent identification. Attached Figure Description

[0011] Figure 1 This is a flowchart illustrating a method for automatic inventory management in an automated warehouse based on the linkage between a camera and a stacker crane, according to the present invention.

[0012] Figure 2 This is a schematic diagram of a method for automatic inventory management in an automated warehouse based on the linkage between a camera and a stacker crane, according to the present invention.

[0013] Figure 3 This is another schematic diagram of a method for automatic inventory counting in an automated warehouse based on the linkage between a camera and a stacker crane, according to the present invention. Detailed Implementation

[0014] To further understand the content of this invention, a detailed description of the invention is provided in conjunction with the accompanying drawings and embodiments. The specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention. It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0015] like Figures 1-3 This embodiment of a method for automatic inventory management in an automated warehouse based on the linkage between a camera and a stacker crane may specifically include: Step S101: Based on the warehouse location information of the goods to be inventoried, plan the optimal movement path of the stacker crane. At the same time, collect the first real-time extension stroke of the forks in the movement path of the stacker crane, and obtain the current shelf image. Determine the relative distance between each row of goods and the camera in the movement path of the stacker crane, and obtain the focal plane offset of the current movement path of the stacker crane.

[0016] The system acquires the storage location code sequence of the goods to be inventoried, reads real-time operation status data from the warehouse management system, identifies the occupancy status of currently performing inbound / outbound operations, and generates an available channel time window matrix based on the channel occupancy timestamp and the estimated completion time of the operation. By matching the time window matrix with the storage location code sequence, the storage location inventory order is arranged to form an inventory path sequence to be executed. Based on the inventory path sequence, the Dijkstra algorithm is used to calculate the shortest movement path of the stacker crane from its current position to each target storage location within the aisle. During the stacker crane's movement along the path, the fork extension stroke value is collected in real-time by the fork position sensor as the first real-time extension stroke. Simultaneously, a camera is triggered to capture the current shelf image, recording the pixel coordinate positions of the front, middle, and back rows of goods in the image. By using the correspondence between the pixel coordinate positions and the first real-time telescopic stroke, the actual distance between the front, middle, and rear rows of goods and the camera is calculated. Based on the camera lens focal length parameters and object distance information, the imaging distance is calculated based on the optical imaging principle to determine the actual position of the current focal plane. By comparing the theoretical position of the focal plane when the front, middle, and rear rows of goods are clearly imaged with the actual position of the current focal plane, the focal plane offset of the current stacker crane movement path is obtained.

[0017] In one implementation, the system acquires information on all target storage locations for the goods to be inventoried, and combines this with the real-time warehouse inbound and outbound operation status to avoid busy aisles and occupied storage locations, thus determining the order of multi-location inventory checks and ensuring that inventory checks do not interfere with normal inbound and outbound operations. When acquiring the storage location code sequence for the goods to be inventoried, the system extracts the daily inventory task list from the warehouse management database. Each storage location code includes three dimensions: aisle number, layer number, and column number.

[0018] For example, the storage location code A03-05-12 indicates the storage location in the 12th column of the 5th layer of the 3rd row in aisle A. By monitoring the operating status of stacker cranes in each aisle in real time, the system records the type of task currently being performed by each stacker crane, the estimated completion time, and the aisle area occupied.

[0019] Specifically, the generation process of the time window matrix is based on two key parameters: channel occupancy timestamps and estimated operation completion time. The system divides time into multiple time segments according to a preset granularity, with each time segment corresponding to an element in the matrix. When a channel is occupied within a specific time segment, the corresponding matrix element value is set as an occupancy status indicator; when idle, it is set as an idle status indicator. By scanning the entire matrix, a sequence of consecutive idle time segments is identified, forming an available time window. When the time window is matched with the storage location code sequence, the system calculates the inventory time required for each storage location, including stacker crane movement time, fork extension and retraction time, and image acquisition time, ensuring that the inventory operation can be completed within the available time window.

[0020] It should be noted that Dijkstra's algorithm, when calculating the movement path of the stacker crane, abstracts the aisle network of the automated warehouse as a weighted directed graph, where nodes represent storage locations and edge weights represent the movement time of the stacker crane between adjacent storage locations. Starting from the current position of the stacker crane, the algorithm gradually expands to each target storage location, maintaining a set of visited nodes and a set of unvisited nodes. By continuously updating the shortest distance value of each node, it ultimately obtains the shortest path from the starting point to all target storage locations.

[0021] In one possible implementation, the fork position sensor uses laser ranging to measure the extension distance of the forks relative to the upright in real time. As the forks extend forward, the sensor collects position data every 100 milliseconds, forming a continuous extension / retraction curve. The camera's trigger is synchronized with the fork position; when the forks reach the preset shooting position, image acquisition is automatically triggered.

[0022] Preferably, the focal plane offset is calculated based on the principles of optical imaging. According to the Gaussian imaging formula 1 / f = 1 / u + 1 / v, where f is the lens focal length, u is the object distance (distance from the goods to the lens), and v is the image distance (distance from the lens to the imaging plane). The goods in the front row are closer to the camera, while those in the middle and rear rows are farther away, resulting in different optimal imaging plane positions for each row. By calculating the theoretical imaging distance v for each of the three rows of goods, the average value is taken as the reference focal plane position. This average is then compared with the current actual focal plane position to obtain the focal plane offset, providing data support for subsequent dynamic focus adjustments.

[0023] Step S102: Preprocess the current shelf image, identify the shape of goods in the storage location, and if there are goods, analyze the pixel displacement distribution of the goods' front and rear misalignment based on the focal plane offset, identify the pallet carrying tilt state, and determine the degree of depth coverage deviation caused by the tilted pallet carrying.

[0024] The current shelf image is preprocessed by using Gaussian filtering to remove image noise and histogram equalization to enhance image contrast, resulting in an enhanced image. The Canny edge detection algorithm is then applied to this enhanced image to extract the outlines of the goods. The presence of goods in the storage location is determined based on the closure and area threshold of the outlines. If a closed outline is detected and its area exceeds a preset threshold, goods are confirmed to be present in the storage location. The coordinates of the bounding rectangle of the goods outline are obtained. Based on these bounding rectangle coordinates, the goods region is located in the enhanced image. The pixel coordinate sets of the front and back rows of goods are extracted, and the pixel distance difference between the center points of the front and back rows is calculated. Combined with focal plane offset data, the actual misalignment distance between the front and back rows of goods is determined through the mapping relationship between the pixel distance difference and the actual distance. The pixel coordinates of the four corner points of the pallet are obtained, and the tilt angle between the pallet plane and the horizontal plane is calculated using these corner coordinates. Based on the tilt angle and the misalignment distance of the goods, the distances from the front and rear of the pallet to the camera are calculated. The effective depth of field (DF) under the current imaging conditions is determined using the lens focal length and aperture parameters. The effective DF refers to the distance range from the nearest sharp point to the farthest sharp point. It is then determined whether the distance difference between the front and rear of the pallet exceeds the effective DF. If the distance difference exceeds the effective DF, the actual distances from the goods at different heights to the camera are calculated based on the tilt angle and the goods height. By comparing the difference between the required DF for each layer of goods and the current actual DF coverage, the coordinate range of the insufficient DF coverage area in the image is determined. Finally, the degree of DF deviation caused by the tilted pallet is quantified by multiplying the distance exceeding the DF by the tilt angle.

[0025] In one implementation, the stacker crane moves to the first target storage location according to the planned path, adjusts its posture to align the camera with the storage location area, and acquires an image of the current storage location area. During preprocessing of the current shelf image, Gaussian filtering uses a 5×5 convolution kernel to perform convolution operations on the original image. The kernel has the highest weight at the center and decreases towards the edges, smoothing noise points in the image through weighted averaging. Histogram equalization redistributes gray values by counting the number of pixels at each gray level in the image, making the gray level distribution more uniform and enhancing details in dark areas and tonal gradations in bright areas.

[0026] Specifically, the Canny edge detection algorithm includes three key steps in extracting cargo contours: gradient calculation, non-maximum suppression, and dual-threshold processing. Gradient calculation uses the Sobel operator to calculate the gradient components in the horizontal and vertical directions, determining edge strength and direction based on gradient magnitude and direction. Non-maximum suppression compares the gradient magnitude of the current pixel with that of its neighbors along the gradient direction, retaining local maxima and suppressing non-edge pixels. Dual-threshold processing sets two thresholds: a high threshold for pixels above the high threshold and a low threshold for pixels below the low threshold. Weak edges in between are determined to be retained through connectivity analysis. When the detected contour forms a closed region with an area exceeding a preset threshold of 500 square pixels, cargo is determined to be present in the storage location, and the coordinates of the top-left and bottom-right corners of the enclosing rectangle are recorded.

[0027] It should be noted that when extracting the pixel coordinates of the front and rear rows of goods, the system delineates the region of interest in the enhanced image based on the coordinates of the circumscribed rectangle and distinguishes between the front and rear rows of goods using a color clustering method. The front row of goods is typically located at the bottom of the image and has higher pixel brightness, while the rear row of goods is located at the top of the image and has lower brightness due to occlusion. The center point of gravity of each row of goods is calculated as the center point, and the pixel distance difference between the center points of the front and rear rows reflects the positional offset of the goods on the image plane.

[0028] In one possible implementation, the mapping relationship between pixel distance and actual distance is determined by camera calibration parameters. The camera intrinsic parameter matrix includes focal length, principal point coordinates, and distortion coefficients, while the extrinsic parameter matrix includes rotation and translation parameters. According to the pinhole imaging model, three-dimensional points in actual space are projected onto the image plane to form two-dimensional pixels. The mapping relationship is represented by Z×[u,v,1]T=K×[R|t]×[X,Y,Z,1]T, where Z is the depth value, K is the intrinsic parameter matrix, R is the rotation matrix, and t is the translation vector. The actual misalignment distance between the front and rear cargo is calculated in reverse using focal plane offset data and depth information.

[0029] Preferably, the pallet tilt angle is calculated based on the spatial relationship of the four corner points of the pallet. A corner detection algorithm identifies four feature points on the pallet edge, and the PnP algorithm is used to solve the pallet's pose in the camera coordinate system based on the two-dimensional pixel coordinates and the known physical dimensions of the pallet. The angle between the pallet plane normal vector and the horizontal plane normal vector is the tilt angle, which directly affects the difference in distance between the front and rear goods and the camera.

[0030] For example, depth of field characterizes the range of object distances within which an imaging system can produce a clear image, and its calculation involves the concept of the circle of confusion diameter. The circle of confusion refers to the blurred spot formed by a point light source on the imaging plane; when the diameter of the circle of confusion is smaller than the resolving limit of the human eye, the image appears clear. The near distance of the effective depth of field is Dn = f. 2 ×s / (f2 +N×c×s), the distance to the farthest point is Df=f 2 ×s / (f 2 -N×c×s), where f is the focal length, s is the focusing distance, N is the aperture value, and c is the diameter of the circle of confusion. The distances from the front and rear ends of the tray to the camera are calculated using a trigonometric function relationship between the tilt angle and the tray length. When the difference between the front and rear distances exceeds the range of Df-Dn, it is determined that the effective depth of field is exceeded.

[0031] In one embodiment, when quantifying the degree of depth-of-field coverage deviation, the system establishes a multi-level deviation assessment mechanism. Based on the cargo height, the cargo space on the pallet is divided into three height zones: bottom, middle, and top. The distance from each layer of cargo to the camera is calculated by adding a height offset to the base distance. The ideal depth-of-field range required for each layer of cargo is compared with the actual depth-of-field coverage range provided by the current optical system, and the intersection and difference regions are calculated. Cargo within the intersection region can be clearly imaged, while cargo within the difference region is at risk of being blurred.

[0032] Understandably, areas with insufficient depth of field coverage manifest as reduced contrast and loss of detail in an image. The sharpness level of a region is assessed by calculating the average gradient magnitude of a local image patch. Regions with insufficient depth of field coverage are marked when the gradient magnitude is below a preset threshold. The distance exceeding the depth of field is multiplied by the sine of the tilt angle to obtain the vertical offset. This offset is then calculated as the ratio of the offset to the tray length, normalized, and used as a quantitative indicator of the degree of depth of field coverage deviation.

[0033] For example, when the deviation exceeds 0.3, it indicates that more than 30% of the pallet area cannot be clearly imaged. At this time, the system marks the storage location as needing dynamic focus adjustment. By recording the deviation values at different tilt angles, a correspondence table between tilt state and image quality is established, providing a reference for subsequent adaptive focus control and achieving accurate inventory counting in multi-depth shelving environments.

[0034] Step S103: Evaluate whether the depth of field coverage deviation exceeds a preset threshold. If it does, process the current shelf image to determine the edge of the back row of goods. By comparing the changes in the clarity of the edges of the front and back rows of goods, identify the area with expanded resolution difference. Based on this area, determine the coordinates of the identification information with severe loss of detail.

[0035] The depth-of-field coverage deviation is obtained and compared with a preset threshold. If the deviation exceeds the threshold, the current shelf image is partitioned into front, middle, and back areas based on the shelf depth direction. Within the back area, the Sobel operator is used to calculate horizontal and vertical gradient values. The gradient magnitude determines the set of edge pixels for the back goods, and adjacent edge points are connected to form a complete back goods edge contour. Based on the back goods edge contour, the grayscale gradient values of each pixel on the edge line are extracted, and the standard deviation of the gradient values is calculated as the back edge sharpness value. The corresponding edge contours are extracted within the front area, and the front edge sharpness value is calculated. The sharpness reduction rate is obtained by dividing the back sharpness value by the front sharpness value. When the reduction rate is lower than a preset ratio, the location is marked as a region with increased resolution difference. Within the region of increased resolution difference, an image block is traversed using a sliding window. The local contrast value within each window is calculated, where the contrast value is the difference between the maximum and minimum grayscale values within the window. Texture complexity, i.e., the entropy of the gray-level co-occurrence matrix within the window, is also calculated. When the contrast value is below a threshold and the texture complexity is above a threshold, the window is determined to contain label information. The center coordinates of windows meeting the conditions are recorded. Adjacent windows are merged to form connected candidate regions for label information. Based on the shape characteristics and size ratio of these candidate regions, rectangular regions conforming to the cargo label specifications are selected. The grayscale distribution histogram within these regions is extracted. The bimodal characteristic of the histogram is used to identify the boundary between text and background, determining the boundaries of areas with severe detail loss. The coordinates of the label information are output, completing the precise positioning of the cargo label.

[0036] In one implementation, the assessment of depth-of-field coverage deviation is based on tilt angle and distance difference data obtained from prior processing. Once the stacker crane reaches the target storage location, the system reads the stored deviation value, which is normalized and mapped to a range of 0 to 1. Preset thresholds are set according to the type of goods and label specifications in the warehouse; for food goods with dense label information, a threshold of 0.25 is set, while for industrial parts with larger labels, a threshold of 0.35 is set.

[0037] Specifically, the image partitioning process employs an adaptive partitioning method based on depth information. The system divides the image space into three regions—front, middle, and back—according to shelf structure parameters, including beam spacing, pallet depth, and the number of goods stacking layers. The front region occupies the lower third of the image, the middle region the middle third, and the back region the upper third. The Sobel operator is applied to the back region using a 3×3 convolution kernel, with the horizontal kernel set to [-1,0,1;-2,0,2;-1,0,1] and the vertical kernel set to [-1,-2,-1;0,0,0;1,2,1]. After convolution, the gradient magnitude of each pixel is obtained by taking the square root of the sum of the squares of the horizontal and vertical gradient components, and the gradient direction is calculated using the arctangent function. When the gradient magnitude exceeds a set threshold, the pixel is marked as an edge point, and discrete edge points are connected into a continuous contour using an eight-neighborhood connection criterion.

[0038] It should be noted that the calculation of sharpness values involves statistical principles. After extracting the grayscale gradient values of all pixels on the edge contour, the mean and standard deviation of these gradient values are calculated. The standard deviation reflects the dispersion of the gradient values. In sharp images, the edge gradient values are concentrated and have large amplitudes, resulting in a relatively small standard deviation; in blurry images, the edge gradient values are scattered and have small amplitudes, resulting in a relatively large standard deviation. The sharpness degradation rate is obtained by dividing the sharpness values of the rear rows by the sharpness values of the front rows. When this ratio is below 0.6, it indicates a severe degradation in the quality of the rear rows of images.

[0039] In one possible implementation, the size of the sliding window is dynamically adjusted based on the expected size of the label. The initial window size is set to 32×32 pixels, and it slides in 8-pixel increments within areas of increasing resolution difference. The local contrast at each window location is calculated using the Weber contrast formula, which is the difference between the maximum and minimum grayscale values within the window divided by the average grayscale value.

[0040] Preferably, texture complexity is calculated based on the gray-level co-occurrence matrix (GLCM), which describes the spatial distribution of pixel pairs in an image. The GLCM P(i,j,d,θ) represents the frequency at which a pixel with gray value i and a pixel with gray value j co-occur simultaneously under the conditions of direction θ and distance d. Typically, GLCMs are calculated for four directions: 0°, 45°, 90°, and 135°, with the distance d set to one pixel. After the matrix is constructed, its entropy value is calculated as a texture complexity indicator: entropy H = -ΣΣP(i,j) × log(P(i,j)), where P(i,j) are the normalized elements of the GLCM. A higher entropy value indicates a more complex texture. The texture complexity of the goods label area is significantly higher than that of the uniform background area because it contains text and barcode information. When the contrast is below 0.3 and the texture entropy value is above 4.5, the window is determined to likely contain identification information.

[0041] For example, the merging of adjacent windows employs a region growing algorithm. Starting with a seed window that meets the conditions, it checks whether windows within its eight neighborhoods also meet the contrast and texture complexity conditions. If they do, the neighboring window is merged into the current region, and the process continues to check the neighborhoods of newly added windows until no new windows meet the merging conditions. The merged connected regions form candidate regions for identification information, each represented by its minimum bounding rectangle.

[0042] In one embodiment, the specification selection of cargo labels is based on prior knowledge. Standard cargo labels in automated warehouses are typically rectangular, with an aspect ratio between 1.5 and 3, and an area ranging from 2,000 to 20,000 square pixels. The system calculates the aspect ratio and area of each candidate region and selects regions that meet the label specifications.

[0043] Understandably, the bimodal characteristic of the grayscale histogram stems from the binarization of the label image. The label background is typically white or light-colored, while the text and barcode are black or dark-colored, resulting in two distinct peaks in the grayscale histogram. By finding the valley between the two peaks in the histogram, the segmentation threshold between the text and the background is determined. When the histogram does not exhibit a clear bimodal pattern, it indicates severe image quality degradation and significant loss of detail in that area. The system records the boundary coordinates of these areas and outputs the final label information coordinates, which indicate the locations of the cargo labels that require focused attention and refocusing.

[0044] Step S104: Locate the target area to be refocused based on the coordinates of the identification information with severe loss of detail. Combine the relative distance between each row of goods and the camera, drive the focus motor of the variable zoom camera in the target area to dynamically adjust the depth of field coverage of the target area and obtain the optimized focal plane position.

[0045] Based on the coordinates of the severely lost detail markers, the boundary of the target area to be refocused is delineated in the image space. The coordinates of the center point of the target area are obtained. Combined with the relative distance data between each row of goods and the camera, the average object distance of the goods within the target area is calculated. Based on the thin lens imaging principle, the required focal length adjustment is calculated based on the relationship between object distance, image distance, and focal length. A drive signal for the focusing motor is generated based on the focal length adjustment. The drive signal includes pulse quantity and pulse frequency parameters. The pulse quantity determines the lens movement distance, and the pulse frequency determines the movement speed. Drive pulses are sent to the focusing motor of the variable zoom camera through the drive circuit. The focusing motor drives the lens assembly to move. The position information fed back by the lens position sensor is read in real time. The drive stops when the lens reaches the target position. Using the lens position information and aperture parameters, and based on the depth-of-field calculation principle, the near and far distances of the depth-of-field coverage are determined according to the focusing distance, aperture value, and circle of confusion diameter parameters. The current depth-of-field coverage is obtained by comparing the distance distribution of the front and rear rows of goods within the target area with whether it falls within the depth-of-field range. If there are goods whose distances exceed the depth-of-field range, the lens position is finely adjusted until the depth-of-field range covers all target goods. The final lens position is recorded as the optimized focal plane position.

[0046] In one implementation, after obtaining the coordinates of the severely lost detail identification information through preprocessing, the system constructs a minimum bounding rectangle based on these coordinates as the target area to be refocused. The center point of the target area is obtained by calculating the intersection of the rectangle's diagonals, and this center point serves as the reference position for focus adjustment.

[0047] Specifically, the application of the thin lens imaging principle is reflected in the precise calculation of the focal length adjustment. The system reads stored distance data for each row of goods, covering different distance configurations for the front, middle, and rear rows. The average object distance within the target area is calculated using a weighted average method, with the weights determined based on the area proportion of each row of goods within the target area. According to the imaging relationship 1 / u + 1 / v = 1 / f, when the object distance u changes, the focal length f needs to be adjusted accordingly to maintain a clear image. The focal length adjustment Δf is determined by the difference between the current focal length and the target focal length, which is converted into the physical distance the lens assembly needs to move.

[0048] It should be noted that the pulse parameters of the drive signal directly determine the focusing accuracy. The number of pulses is linearly related to the lens movement distance; precise displacement control is achieved by controlling the number of pulses. The pulse frequency setting must ensure focusing speed while avoiding vibration caused by excessive movement. The drive circuit adopts a constant current drive method, with the current value set at 70% of the rated current of the focusing motor to ensure smooth motor operation.

[0049] Preferably, the lens position sensor uses a photoelectric encoder to achieve position feedback. The encoder has a resolution of 100 pulses per millimeter, which can accurately detect the actual position of the lens. The system implements closed-loop control by comparing the difference between the target position and the actual position. When the difference is less than 0.05 millimeters, it is determined that the target position has been reached.

[0050] In one possible implementation, the dynamic adjustment of depth-of-field coverage is achieved through iterative optimization. After the system initially calculates the depth-of-field range, it examines the distance distribution of all goods within the target area, identifying goods not covered by the depth of field. By adjusting the aperture value N, the depth-of-field range is expanded while keeping the focal length constant. When the aperture is adjusted from f / 2.8 to f / 5.6, the depth-of-field range can be expanded by approximately 1.5 times. If there are still goods outside the depth of field, the lens position is fine-tuned, adjusting by 0.1 mm each time, and the depth of field range is recalculated until all target goods fall within the depth of field. The final lens position value is stored as the optimized focal plane position, achieving clear imaging in multi-depth shelf environments.

[0051] Step S105: Obtain the second real-time telescopic stroke corresponding to the optimized focal plane position, combine the positioning of the marking areas of each row of goods at the edge of the rear goods, identify the marking information of each row of goods in the multi-layer depth shelf, and determine the target focal length for simultaneous clear imaging of the front and rear goods.

[0052] The second real-time extension stroke value corresponding to the optimized focal plane position is obtained. This second real-time extension stroke value is the current extension stroke data of the fork after dynamic focus adjustment. The current data of the fork position sensor is read, and the actual distance from the camera to each row of goods is calculated based on the second real-time extension stroke value. Combined with the edge coordinates of the rear row of goods, the identification area range of each layer of goods in the front, middle, and rear rows is located through coordinate mapping. Image blocks of each identification area are extracted using a region segmentation method. The image blocks are preprocessed, and the barcode area is processed using adaptive binarization. The barcode decoder identifies the one-dimensional and two-dimensional barcode information to obtain the goods code data. The text information on the label is extracted using optical character recognition (OCR). The goods code and text information are matched with a pre-stored goods list to determine the goods category attribute. Based on the cargo category attributes, the color histogram and contour shape features of the corresponding image blocks are extracted. Image features of similar cargo are identified through color clustering. The ratio of the contour area to the standard cargo area is calculated. When the ratio is an integer multiple, it is determined to be a stack of multiple items. The number of cargo in each row is counted, and the proportion of successful identification of the identification information in each row is calculated. At the same time, the image features and morphological features of the cargo are identified. The cargo category is determined based on the extracted cargo code and text information. The quantity is counted by combining the cargo image features and morphological features. The current focal length value is determined as the target focal length for simultaneous clear imaging of cargo in front and behind.

[0053] In one implementation, the second real-time extension stroke value is obtained through real-time acquisition by the fork position sensor, and this value represents the distance the forks extend beyond the column. Based on the geometric relationship between the fork extension stroke and the camera's field of view, the system establishes a mapping table from the stroke value to image coordinates, achieving precise positioning of each row of goods.

[0054] Specifically, the coordinate mapping relationship is established based on the camera calibration parameters and the dimensions of the rack structure. When the fork extension stroke is 2 meters, the camera's field of view covers the front row of goods, corresponding to the lower third of the image; when the stroke is 3 meters, it covers the middle row of goods, corresponding to the middle area of the image; and when the stroke is 4 meters, it covers the rear row of goods, corresponding to the upper area of the image. The system automatically adjusts the boundaries of each area based on the edge coordinates of the rear row of goods, and segments independent image blocks of labeled regions using connected component analysis. Each image block contains a complete goods label, approximately 200×150 pixels in size.

[0055] It should be noted that barcode recognition uses dedicated decoding algorithms to process 1D and 2D barcodes. 1D barcode recognition detects changes in stripe width through scan lines, converting the black and white stripe sequence into digital codes. 2D barcode recognition determines the code area range through locator detection and parses the data matrix according to encoding rules. Optical character recognition uses a template matching method, comparing the printed text on the label with a character template library to identify textual information such as product name, specifications, and batch number.

[0056] Preferably, the quantity of goods is counted based on the calculation of the outline area ratio. The system pre-stores the standard outline area of various types of goods. By extracting the outline of the goods in the current image, the ratio of the actual area to the standard area is calculated. When the ratio is 1.0±0.1, it is determined to be a single item; when the ratio is 2.0±0.2, it is determined to be two items stacked together, and so on.

[0057] In one possible implementation, the target focal length is determined through a recognition success rate assessment. The system counts the number of successfully recognized tags for the front, middle, and back rows, and calculates the proportion of successfully recognized tags to the total number of tags. The total number of tags refers to all tags on the shelf, and the tag information includes barcodes, QR codes, and text information on the product tags. The system presets acceptable recognition rate thresholds for different rows. For the front row, where goods are closer, the imaging quality is optimal, and a higher recognition rate requirement is set; for the middle and back rows, where goods are farther away, the recognition rate requirements decrease sequentially. When the actual recognition rate for each row reaches the corresponding preset threshold, the current focal length setting is deemed reasonable. The system records this focal length value as the target focal length, which enables clear imaging of goods at different depths within the shelf. In step S106, this target focal length is used to verify the imaging effect, completing the optimization of key parameters for the inventory operation.

[0058] Step S106: Obtain the current focus response time of the variable focal length camera, evaluate the matching degree between the target focal length and the focus response time, determine the imaging balance degree based on the matching degree, obtain a verification image of clear imaging of multi-depth goods, identify the identification information and image shape of each row of goods in the verification image, and complete the inventory of goods on the multi-depth shelf.

[0059] The system acquires the current focusing response time of the zoom camera, reads the time data from when the focusing motor receives the drive signal to when the lens reaches the target position, calculates the theoretical response time by dividing the lens movement distance corresponding to the target focal length by the motor movement speed, and evaluates the matching degree by the ratio of the actual response time to the theoretical response time. When the ratio is within a preset range, the matching is considered good, and the imaging balance degree is determined based on the matching degree. Image acquisition is triggered based on the imaging balance degree to acquire a verification image of clear multi-depth goods. The verification image is partitioned to identify the goods identification information in each area of the front, middle, and back rows, extracting barcode data and text content. At the same time, edge detection is used to obtain the morphological features of the goods image, and the contour integrity value and texture clarity value are recorded. Based on the identification information and image morphological features, the system compares them with the goods information stored in the warehouse management system to verify the matching of goods categories. The quantity of goods is counted by contour area. Combined with the current storage location coordinate information, the three data of goods category, goods quantity, and storage location are automatically verified. When the verification results are consistent, the inventory data is confirmed to be accurate, the inventory result of the storage location is recorded, and the inventory of goods in the multi-depth shelf is completed.

[0060] In one implementation, the focus response time is acquired via a built-in timer. The system starts timing simultaneously with sending the drive signal and stops timing when the lens position sensor sends a signal indicating that the target position has been reached. The recorded duration is the actual response time. The theoretical response time is calculated based on the physical distance the lens needs to move and the rated speed of the motor, typically 10 millimeters per second.

[0061] Specifically, the matching degree is evaluated using a ratio method. When the ratio of the actual response time to the theoretical response time is between 0.9 and 1.1, it indicates that the focusing system is working normally and the imaging balance is good. If the ratio exceeds this range, the system determines that there is mechanical delay or control error, and the drive parameters need to be adjusted. The imaging balance degree is divided into three levels: excellent, acceptable, and needs adjustment, each corresponding to a different ratio range.

[0062] It should be noted that the timing of image acquisition for verification is closely related to the degree of image balance. When the image balance reaches a satisfactory level or above, the system triggers high-resolution image acquisition to obtain a complete verification image containing the front, middle, and rear rows of goods. The partitioning process employs an adaptive segmentation method based on depth information: the front row corresponds to the lower part of the image, the middle row to the middle section, and the rear row to the upper section. Identification information is extracted independently for each region.

[0063] Preferably, the acquisition of image morphological features includes two key indicators. The contour integrity value is calculated by obtaining the detected contour perimeter using the Canny edge detection method, with the formula CI = DP / TP, where CI represents contour integrity, DP represents the detected perimeter, and TP represents the theoretical perimeter calculated based on the standard dimensions of the cargo. The closer the value is to 1, the more complete the contour. The texture sharpness value is obtained by calculating the gradient magnitude of a local region using the Sobel operator and then calculating its standard deviation, with the formula TCD = σ(G), where TCD represents texture sharpness, σ represents the standard deviation, and G represents the gradient magnitude. The larger the standard deviation, the clearer the texture details. These two indicators together reflect the imaging quality of the cargo image.

[0064] In one possible implementation, the automatic verification process employs triple verification. Goods category verification is achieved by comparing the identified barcode information with the goods code recorded in the system; goods quantity verification is achieved by comparing the statistical outline quantity with the required quantity recorded in the system; and warehouse location verification is achieved by matching the coordinates of the current stacker crane with the warehouse location number in the system. When all three data points match, the system confirms the accuracy of the inventory count for that warehouse location and automatically updates the inventory status marker.

[0065] Understandably, the completion of the entire inventory process is marked by the acquisition and verification of all images of the storage locations to be inventoried. The system records the inventory timestamp, goods information, and verification results for each storage location, forming a complete inventory data record to achieve automated and accurate inventory counting in multi-depth racking environments. In another implementation, the stacker crane moves sequentially to the remaining target storage locations according to a preset path, repeating the above steps to complete the identification and verification of goods in all storage locations to be inventoried, synchronously recording the location information of goods in each storage location, summarizing the goods category, quantity, and location data of all target storage locations, comparing and verifying them with the inventory information in the warehouse management system, and generating an inventory report.

[0066] The above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. The present invention has been described in detail with reference to preferred embodiments. Those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications and substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for automatic inventory management in an automated warehouse based on the linkage between a camera and a stacker crane, characterized in that, The method includes: Based on the warehouse location information of the goods to be inventoried, the optimal movement path of the stacker crane is planned. At the same time, the first real-time extension stroke of the forks in the movement path of the stacker crane is collected, the current shelf image is obtained, the relative distance between each row of goods and the camera in the movement path of the stacker crane is determined, and the focal plane offset of the current movement path of the stacker crane is obtained. The current shelf image is preprocessed to identify the shape of goods in the warehouse. If goods are present, the pixel displacement distribution of the goods is analyzed based on the focal plane offset to identify the tilt state of the pallet and determine the degree of depth coverage deviation caused by the tilted pallet. Assess whether the depth of field coverage deviation exceeds a preset threshold. If it does, process the current shelf image to determine the area of increased resolution difference, and determine the coordinates of the identification information with severe loss of detail based on the area of increased resolution difference. Based on the coordinates of the severely lost identification information, the target area to be refocused is located. Combined with the relative distance between each row of goods and the camera, the focus motor of the variable zoom camera in the target area is driven to dynamically adjust the depth of field coverage of the target area and obtain the optimized focal plane position. The second real-time telescopic stroke corresponding to the optimized focal plane position is obtained. Combined with the edge positioning of the rear goods, the identification area of each row of goods is located, the identification information of each row of goods in the multi-layer depth shelf is identified, and the target focal length for simultaneous clear imaging of front and rear goods is determined. The current focus response time of the zoom camera is obtained, the matching degree between the target focal length and the focus response time is evaluated, the imaging balance degree is determined based on the matching degree, and a verification image of clear imaging of goods at multiple depths is obtained. The identification information and image shape of each row of goods in the verification image are identified, and the inventory of goods on the multi-depth shelf is completed.

2. The method according to claim 1, characterized in that, The process involves planning the optimal movement path of the stacker crane based on the warehouse location information of the goods to be inventoried, simultaneously collecting the first real-time extension stroke of the forks along the movement path of the stacker crane, acquiring the current rack image, determining the relative distance between each row of goods and the camera along the movement path of the stacker crane, and obtaining the focal plane offset of the current movement path of the stacker crane, including: Obtain the storage location code sequence of the goods to be inventoried, read the real-time operation status data in the warehouse management system, identify the channel occupancy status of the current inbound and outbound operations, generate an available channel time window matrix based on the channel occupancy timestamp and the estimated operation completion time, arrange the storage location inventory order through the matching operation of the time window matrix and the storage location code sequence, and form an inventory path sequence to be executed. Based on the inventory path sequence, the Dijkstra algorithm is used to calculate the shortest movement path of the stacker crane from its current position to each target storage location in the aisle. During the movement of the stacker crane along the path, the fork extension stroke value is collected in real time by the fork position sensor as the first real-time extension stroke. At the same time, the camera is triggered to collect the current shelf image and record the pixel coordinate position of the front, middle and back rows of goods in the image. By using the correspondence between the pixel coordinate positions and the first real-time telescopic stroke, the actual distance between the front, middle, and rear rows of goods and the camera is calculated. Based on the camera lens focal length parameters and object distance information, the imaging distance is calculated based on the optical imaging principle to determine the actual position of the current focal plane. By comparing the theoretical position of the focal plane when the front, middle, and rear rows of goods are clearly imaged with the actual position of the current focal plane, the focal plane offset of the current stacker crane movement path is obtained.

3. The method according to claim 1, characterized in that, The process of obtaining the second real-time telescopic stroke corresponding to the optimized focal plane position, combining it with the positioning of the marking areas of each row of goods at the edge of the rear goods, identifying the marking information of each row of goods in the multi-layer depth shelf, and determining the target focal length for simultaneous clear imaging of the front and rear goods includes: Obtain the second real-time extension stroke value corresponding to the optimized focal plane position. The second real-time extension stroke value is the current extension stroke data of the fork after dynamic focus adjustment. Read the current data of the fork position sensor. Calculate the actual distance from the camera to each row of goods based on the second real-time extension stroke value. Combine the edge coordinates of the rear row of goods and locate the marked area range of each layer of goods in the front, middle and rear rows through coordinate mapping relationship. Extract the image blocks of each marked area through the region segmentation method. The image block is preprocessed, the barcode area is processed by adaptive binarization, the barcode decoder is used to identify the one-dimensional barcode and two-dimensional barcode information to obtain the goods code data, the text information on the label is extracted by optical character recognition method, and the goods category attribute is determined by matching the goods code and text information with the pre-stored goods list. Based on the cargo category attributes, the color histogram and contour shape features of the corresponding image blocks are extracted. Image features of similar cargo are identified through color clustering. The ratio of the contour area to the standard cargo area is calculated. When the ratio is an integer multiple, it is determined to be a stack of multiple items. The number of cargo in each row is counted, and the proportion of successful identification of the identification information in each row is calculated. At the same time, the image features and morphological features of the cargo are identified. The cargo category is determined based on the extracted cargo code and text information. The quantity is counted by combining the cargo image features and morphological features. The current focal length value is determined as the target focal length for simultaneous clear imaging of cargo in front and behind.

4. The method according to claim 1, characterized in that, The process of preprocessing the current shelf image, identifying the shape of goods in the storage location, and if goods are present, analyzing the pixel displacement distribution of the goods' misalignment based on the focal plane offset, identifying the pallet's tilt state, and determining the degree of depth-of-field coverage deviation caused by the tilted pallet load includes: The current shelf image is preprocessed by using Gaussian filtering to remove image noise and histogram equalization to enhance image contrast, resulting in an enhanced image. The Canny edge detection algorithm is then used to extract the outline of the goods on the enhanced image, and the presence of goods in the storage location is determined based on the closure and area threshold of the outline.

5. The method according to claim 1, characterized in that, The assessment determines whether the depth-of-field coverage deviation exceeds a preset threshold. If it does, the current shelf image is processed to identify areas of increased resolution difference. Based on these areas, the coordinates of severely detail-loss-prone identification information are determined, including: The depth coverage deviation value is obtained and compared with a preset threshold. If the deviation exceeds the threshold, the current shelf image is partitioned into front, middle and back areas according to the shelf depth direction. In the back area, the Sobel operator is used to calculate the gradient values in the horizontal and vertical directions, and the set of edge pixels of the back goods is determined by the gradient magnitude.

6. The method according to claim 1, characterized in that, Based on the coordinates of the severely lost identification information, the target area to be refocused is located. Combining this with the relative distances between each row of goods and the camera, the focus motor of the variable-focus camera in the target area is driven to dynamically adjust the depth-of-field coverage of the target area, resulting in an optimized focal plane position, including: Based on the coordinates of the severely lost identification information, the boundary of the target area to be refocused is delineated in the image space, the coordinates of the center point of the target area are obtained, and the average object distance of the goods in the target area is calculated by combining the relative distance data between each row of goods and the camera. The required focal length adjustment is calculated according to the thin lens imaging principle, and a drive pulse is sent to the focusing motor of the variable zoom camera through the drive circuit.

7. The method according to claim 1, characterized in that, The process of acquiring the current focus response time of the zoom camera, evaluating the matching degree between the target focal length and the focus response time, determining the imaging balance degree based on the matching degree, obtaining a verification image of clear multi-depth cargo imaging, identifying the identification information and image shape of each row of cargo in the verification image, and completing the inventory of cargo on the multi-depth shelf includes: The current focus response time of the zoom camera is obtained, and the time data from the time the focus motor receives the drive signal to the time the lens reaches the target position is read. The theoretical response time is calculated by dividing the lens movement distance corresponding to the target focal length by the motor movement speed. The matching degree is evaluated by the ratio of the actual response time to the theoretical response time. Image acquisition is triggered according to the matching degree to obtain a verification image of clear imaging of goods at multiple depths. The verification image is partitioned to identify the goods identification information in each area of the front row, middle row, and rear row.