Machine vision-based unattended management system for electric power materials

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing polarization-guided depth restoration and depth-guided nonmaximum suppression algorithms, the identification problems caused by surface reflection and dense stacking of metal materials in power material warehouses have been solved, achieving high-precision and reliable material management.

CN122244773APending Publication Date: 2026-06-19HANGZHOU FANSHENG TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HANGZHOU FANSHENG TECH CO LTD
Filing Date: 2026-03-18
Publication Date: 2026-06-19

Application Information

Patent Timeline

18 Mar 2026

Application

19 Jun 2026

Publication

CN122244773A

IPC: G06V20/50; G06V10/764; G06V10/28; G06V10/44; G06V10/82; G06V10/72; G06V10/52; G06V10/26; G06V20/70; G06N3/0455; G06N3/0442; G06N3/0464; G06V10/74; G06V10/75

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing power material warehouse management systems suffer from problems such as depth measurement failure when dealing with surface reflections of metal materials, inaccurate material identification in densely stacked scenarios, and insufficient identification reliability due to nameplate wear.

Method used

By employing polarization-guided depth restoration technology and a depth-guided nonmaximum suppression algorithm, combined with adaptive illumination enhancement and progressive knowledge base matching verification, the specular reflection area on the metal surface is identified through polarization degree information, deep voids are repaired, and foreground and background materials are accurately distinguished in densely stacked scenes. Material information is extracted using nameplate positioning and text recognition technology.

Benefits of technology

It improves the positioning accuracy and identification rate of metal materials, reduces the dependence on special light sources and hardware configurations, and enhances the system's environmental adaptability and the reliability of identification results.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244773A_ABST

Patent Text Reader

Abstract

This invention relates to the field of intelligent warehouse management technology, and discloses an unmanned power material management system based on machine vision. The system includes an image acquisition and processing module, a material detection and positioning module, a nameplate recognition and information extraction module, a 3D positioning and identity verification module, and an anomaly monitoring and early warning module. It solves the problem of deep voids caused by surface reflection of metallic materials through polarization-guided depth restoration technology, achieves accurate identification of densely stacked materials using a depth-guided dual-stream detection network and a non-maximum suppression algorithm, improves identification reliability through progressive knowledge base matching verification and association verification, and achieves anomaly detection and tiered early warning through personnel behavior analysis and status monitoring. This invention enables automatic identification, real-time monitoring, and intelligent inventory management of power materials, improving warehouse management efficiency and security capabilities.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent warehouse management technology, and more specifically, to an unmanned power material management system based on machine vision. Background Technology

[0002] Warehouse management of power supplies is a crucial aspect of power supply company operations. Traditional management methods rely on manual inventory and record-keeping, which is inefficient and prone to errors. As the power system expands and the types and quantities of materials continue to increase, manual management is no longer sufficient to meet the demands for rapid response and precise management. This is especially true in emergency repair scenarios, where the timeliness of material allocation directly impacts power supply reliability.

[0003] Existing automated warehouse management systems mostly use RFID tags or QR code technology for material identification, but these methods require each material to be tagged, resulting in high costs and a large maintenance workload. While machine vision-based identification methods can reduce reliance on tags, they face many technical challenges in practical applications in power material warehouses. For example, the reflective surface of metal materials causes depth measurement failures, materials in densely stacked environments are difficult to segment accurately due to mutual obstruction, and nameplate wear and degradation make text recognition difficult.

[0004] Existing material identification systems typically address metal surface reflections by adjusting the light source or adding polarizing filters, but these methods have limited effectiveness and increase hardware costs. In densely stacked scenarios, traditional target detection algorithms easily misclassify foreground and background materials as the same object, leading to missed detections or duplicate counts. For nameplate recognition, existing methods require high image quality, and the recognition rate drops significantly when the nameplate is worn or under uneven lighting. Furthermore, there is a lack of effective verification mechanisms to improve recognition reliability. Summary of the Invention

[0005] This invention provides an unattended management system for power materials based on machine vision, which solves the technical problems in related technologies, such as the failure of depth measurement due to surface reflection of metal materials, inaccurate material identification in densely stacked scenarios, and insufficient identification reliability caused by nameplate wear and degradation.

[0006] This invention provides a machine vision-based unmanned power material management system, comprising: The image acquisition and processing module is used to acquire polarization images and depth data of warehouse materials. It uses polarization-guided depth restoration and illumination-adaptive enhancement to obtain a color image with balanced illumination and a complete and stable depth image. The material detection and positioning module is used to acquire uniformly lit color images and complete and stable depth images. It identifies densely stacked materials through a depth-guided dual-stream detection network and generates a list of candidate material areas. The nameplate recognition and information extraction module is used to extract material nameplate information from the candidate material area list using nameplate positioning and text recognition technology, thereby obtaining structured material information and its overall credibility. The 3D positioning and identity verification module is used to perform 3D positioning of materials by combining structured material information and complete and stable depth images. The recognition results are verified by matching the knowledge base to realize material identity confirmation and inventory change recording. The anomaly monitoring and early warning module is used to detect anomalies based on material identification results and inventory change records, through personnel behavior analysis and status monitoring, and generate early warning information according to the severity level.

[0007] In a preferred embodiment, the image acquisition and processing module includes: Deploy an RGB-D camera with polarization acquisition capability to simultaneously acquire color images, depth images, and images at four polarization angles; for each pixel position in the image, extract the light intensity value at the corresponding position under the four polarization angles, calculate the light intensity difference between the zero-degree and ninety-degree directions and the light intensity difference between the forty-five-degree and one hundred and thirty-five-degree directions, then calculate the sum of the squares of the two differences and take the square root, and divide by the sum of the light intensities at the four polarization angles to obtain the degree of polarization. A polarization degree calculation process is performed on all pixels in the image to generate a polarization degree map. The polarization degree information is used to perform weighted fusion repair on the hole regions in the depth image. A search neighborhood is set around each hole pixel, and the radius of the neighborhood is adaptively determined according to the hole size. All effective depth values and their corresponding polarization degree values are extracted within the search neighborhood. The effective depth value is the depth value that is within the effective measurement range and the polarization degree of the corresponding position is less than the preset high reflectivity threshold. The contribution weight of each effective depth value in the neighborhood to the hole pixel is calculated. The weight is obtained by multiplying the spatial distance weight and the polarization degree similarity weight. The filling depth value of the hole pixel is obtained by normalizing and weighting all effective depth values in the neighborhood according to their comprehensive weight.

[0008] In a preferred embodiment, the image acquisition and processing module further includes: The brightness histogram of the image is analyzed to calculate the brightness mean and standard deviation. An enhancement strategy is automatically selected based on the illumination characteristics. Multi-scale illumination estimation is used to enhance low-light images, and contrast-limited adaptive histogram equalization is used to process overexposed images. Connectivity analysis is performed on residual hole regions. Small holes with an area less than a first area threshold are filled using a distance-weighted interpolation method based on the surrounding depth value. Large holes with an area greater than the first area threshold are constrained by combining edge information of the color image. Temporal consistency filtering is performed on depth images of multiple consecutive frames. For each pixel position in the depth image, the depth value sequence of the corresponding position in the multi-frame image is extracted. Outliers with a difference between the depth value and the median of the sequence exceeding a preset depth difference threshold are removed. The remaining depth values are then weighted and averaged, with the weights decaying over time.

[0009] In a preferred embodiment, the material detection and positioning module includes: A depth-guided dual-stream object detection network is constructed, comprising two parallel feature extraction branches: RGB stream and depth stream. The RGB stream processes color images to extract appearance features, while the depth stream processes depth images to extract geometric features. The backbone networks of both the RGB and depth streams adopt residual network structures, including an input layer and several residual stages. Each residual stage contains several residual blocks, and each residual block contains convolutional layers and skip connections. A cross-modal feature fusion module is executed at each stage of the dual-stream network, first concatenating the two feature maps along the channel dimension to obtain a concatenated feature map. The concatenated features are processed by convolutional layers. The output of the convolutional layers is batch normalized and activated to obtain the fused feature map. The fused feature map is then added to the original feature map of the RGB stream through residual connections to obtain the cross-modal fused feature.

[0010] In a preferred embodiment, the material detection and positioning module further includes: Cross-modal fusion features are input into a feature pyramid network for multi-scale processing. The feature pyramid network uses a top-down path and lateral connections to achieve feature fusion. A detection head network is used for material detection at each feature pyramid layer, and the detection head adopts an anchor box mechanism. Depth information of the region corresponding to the detection box is extracted from the complete and stable depth image. Depth-guided non-maximum suppression is applied to the detection results. The non-maximum suppression algorithm is executed independently for each material category. First, all detection boxes of the corresponding category are extracted and sorted from high to low category confidence. The detection box with the highest confidence is selected as the retained box. The intersection-union ratio of the retained box and the remaining detection boxes is calculated. At the same time, the depth values of the regions corresponding to the two detection boxes are extracted and the average depth value of the two regions is calculated. If the cross-union ratio is greater than the preset overlap threshold and the depth difference is less than the preset depth difference threshold, the detection box with lower confidence will be suppressed. If the cross-union ratio is greater than the preset overlap threshold but the depth difference is greater than the preset depth difference threshold, both detection boxes will be retained.

[0011] In a preferred embodiment, the nameplate recognition and information extraction module includes: The material image blocks are cropped according to the bounding box coordinates, and a nameplate localization network is constructed to perform semantic segmentation on the material image blocks. The nameplate localization network adopts an encoder-decoder architecture. The encoder contains several downsampling stages, and the decoder contains several upsampling stages. Each stage of the decoder and the corresponding stage of the encoder are fused through skip connections. Morphological processing is performed on the binary mask of the nameplate area. First, morphological closing operation is performed to fill the small holes inside the nameplate area, and then morphological opening operation is performed to remove isolated noise points. Connectivity analysis is performed on the processed binary mask to extract all connected candidate nameplate areas. The connected region with the largest area is selected as the final nameplate area. The minimum bounding rectangle of the nameplate area is calculated to determine the precise position of the nameplate.

[0012] In a preferred embodiment, the nameplate recognition and information extraction module further includes: The text detection network employs a segmentation-based detection method. The enhanced nameplate image is input into the network, and multi-scale features are extracted through the feature extraction backbone network. The feature maps are then input into the text region prediction branch and the text geometric attribute prediction branch. The text recognition network adopts a convolutional recurrent neural network architecture, which includes a convolutional feature extraction module, a recurrent sequence encoding module, and a connection-based temporal classification and decoding module. The recurrent sequence encoding module uses a bidirectional long short-term memory network structure to encode feature sequences simultaneously from left to right and from right to left. A keyword dictionary was constructed based on the standard format of power equipment nameplates. Keyword matching was performed on the nameplate text content to locate the position of each parameter. The extracted parameters were then organized into structured data. The overall credibility was obtained by weighted fusion of detection confidence, location confidence, average text detection confidence, average recognition confidence, and parsing confidence.

[0013] In a preferred embodiment, the three-dimensional positioning and authentication module includes: Based on the coordinates of the detection bounding box of the material, the corresponding rectangular area is located in the depth image. All depth values within the bounding box range are extracted from the depth image to form a depth value matrix. After removing invalid depth values, the median of the depth values is calculated as the representative depth value of the material surface. The three-dimensional coordinates of the materials in the camera coordinate system are calculated based on the back projection formula of the pinhole camera model. The camera coordinates of the materials are converted into the three-dimensional position in the warehouse world coordinate system through the extrinsic parameter matrix. The Kalman filter algorithm is used to perform temporal fusion of continuous multi-frame position data. The state vector of the Kalman filter contains the three-dimensional coordinates and three-dimensional velocity of the materials. The measurement noise covariance matrix of the observation model is set according to the uncertainty of depth measurement.

[0014] In a preferred embodiment, the three-dimensional positioning and authentication module further includes: The actual physical size of the material is estimated based on the bounding box size and the representative depth value. The estimated physical size is then compared and verified with the standard size of the corresponding material in the knowledge base. The knowledge base is constructed in a progressive manner. For each identified material, its key attributes are extracted for matching. All candidate materials with the same category as the corresponding material are searched in the knowledge base. The similarity between the identified material and each candidate material is calculated. The similarity calculation takes into account both text similarity and attribute similarity. The material with the highest similarity is selected as the matching result. The correlation verification includes matching relationship verification and parameter compatibility verification. Matching relationship verification checks whether the materials that need to be used together exist at the same time, and parameter compatibility verification checks whether the technical parameters of materials in the same area are compatible.

[0015] In a preferred embodiment, the anomaly monitoring and early warning module includes: Extract the category, model, quantity, and three-dimensional location information of each material from the material identification results and compare it with the data of the previous moment in the historical time series database. For materials with changes in quantity, analyze whether there are corresponding authorized entry and exit records. For materials with unchanged quantity, compare their spatial location. If the Euclidean distance between the current location and the historical location exceeds the preset location change threshold, generate a location change anomaly record. Personnel detection networks and multi-target tracking algorithms are used to analyze personnel behavior, establish personnel movement trajectories, analyze the spatial relationship between personnel movement trajectories and material areas, and match personnel retrieval operations with authorization records. Three-dimensional spatial analysis methods are used to monitor the stacking status of materials, detecting the stacking height, tilt status, and whether the materials have exceeded the storage location boundaries. For various abnormal information, warning levels are determined according to their type and severity: Level 1 warning indicates a serious abnormality that requires immediate handling, Level 2 warning indicates an important abnormality that requires prompt handling, and Level 3 warning indicates a general abnormality that requires attention.

[0016] The beneficial effects of this invention are as follows: By using polarization-guided depth repair technology, the specular reflection area on the metal surface is identified using polarization degree information, and a weighted fusion method is used to repair deep holes. This solves the problem of traditional RGB-D cameras failing to measure the depth on the surface of metal materials, improves the positioning accuracy and recognition accuracy of metal materials, reduces the dependence on special light sources and hardware configurations, and is suitable for the actual application environment of power material warehouses. The algorithm employs a depth-guided nonmaximum suppression algorithm, which comprehensively considers the planar overlap and depth difference of the detection boxes to accurately distinguish between foreground and background materials in densely stacked scenes. This avoids the false suppression problem of traditional methods and improves the completeness and accuracy of material detection. Combined with progressive knowledge base matching verification and association verification mechanisms, the reliability of the recognition results and the system's environmental adaptability are improved. Attached Figure Description

[0017] Figure 1 This is a block diagram of an unattended power material management system based on machine vision according to the present invention; Figure 2 This is a flowchart of an unattended power material management system based on machine vision, according to the present invention. Detailed Implementation

[0018] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, some features described in the examples may be combined in other examples.

[0019] At least one embodiment of the present invention discloses a machine vision-based unmanned power material management system, such as... Figures 1 to 2 As shown, it includes the following steps: The image acquisition and processing module is used to acquire polarization images and depth data of warehouse materials. It uses polarization-guided depth restoration and illumination-adaptive enhancement to obtain a color image with balanced illumination and a complete and stable depth image. S1.1 Deploy a camera network and perform multi-angle image acquisition, simultaneously acquiring color images, depth images, and polarization angle images. Calculate the polarization degree parameters at each pixel location and output the original color image, original depth image, and polarization degree map. Deploy industrial-grade RGB-D cameras at a preset distance directly in front of each row of shelves, preferably cameras with polarization acquisition capabilities. During the system deployment phase, the camera's intrinsic and extrinsic parameters have been calibrated, obtaining the camera's intrinsic and extrinsic parameter matrices. The intrinsic parameter matrix contains the camera's focal length and principal point coordinates, while the extrinsic parameter matrix contains the camera's position and attitude information in the warehouse's world coordinate system. These calibration parameters are used for subsequent 3D coordinate transformation. Insulators, surge arrester housings, and hardware accessories in electrical equipment are largely made of metal or ceramic materials with smooth surfaces and high reflectivity. Under warehouse lighting conditions, these materials are prone to specular reflection, causing the depth measurement module of the traditional RGB-D camera to receive oversaturated reflected light, resulting in the inability to obtain effective depth values and large areas of depth holes in the depth image.

[0020] For cameras with polarization capabilities, the polarization properties of light are used to solve the reflection problem. When light is reflected from an object's surface, the polarization state of the reflected light changes. Specular reflection has a strong polarization characteristic, while diffuse reflection has a weaker polarization. By analyzing the light intensity distribution at different polarization angles, the specular reflection component and the diffuse reflection component can be separated, thus allowing the extraction of surface information of the object even under high reflectivity conditions. The camera operates at a preset acquisition frequency. For polarization cameras, each frame simultaneously outputs a color image with four polarization angles and a depth image. For ordinary cameras, both a color image and a depth image are output. The depth image uses structured light technology to obtain depth values in millimeters and stores them. The acquired image data is transmitted to a local server via a network, and the image data includes timestamps for subsequent time-series analysis.

[0021] Simultaneously, polarization feature parameters are extracted from images at four polarization angles to analyze the reflection characteristics of the object's surface. For each pixel location in the image, the light intensity values at four polarization angles (0°, 45°, 90°, and 135°) are extracted and denoted as the first, second, third, and fourth light intensity values. Based on polarization optics theory, the degree of polarization at this pixel location is calculated. Specifically, the intensity difference between the 0° and 90° directions is calculated first, then the intensity difference between the 45° and 135° directions. The square root of the sum of the squares of these two differences is then calculated, and finally, the result is divided by the sum of the light intensities at the four polarization angles to obtain the degree of polarization. The degree of polarization represents the degree of polarization of light, ranging from zero to one. A polarization degree close to one indicates highly polarized light, typically corresponding to specular reflection, while a polarization degree close to zero indicates natural or diffuse light. The polarization angle is calculated by first calculating the ratio of the intensity difference between 0° and 90° to the intensity difference between 45° and 135°, then performing an arctangent operation on this ratio, and finally multiplying by 0.5 to obtain the polarization angle. The above-described calculations of polarization degree and polarization angle are performed on all pixels in the image, generating a polarization degree map and a polarization angle map. These two images reflect the distribution of reflectivity characteristics of object surfaces in the scene. Metallic surfaces and highlight areas show high values in the polarization degree map, while diffuse reflective surfaces show low values. For ordinary cameras without polarization capabilities, reflection problems can be mitigated by adjusting exposure parameters or using high dynamic range imaging techniques.

[0022] S1.2 receives the raw depth image acquired in S1.1 and the calculated polarization map. It uses the polarization information to perform weighted fusion repair on the hole regions in the depth image, outputting a preliminarily repaired depth image. For cameras with polarization capabilities, the hole regions in the depth image are mainly caused by specular reflection from metal surfaces; these regions exhibit high polarization characteristics in the polarization map. The depth image is traversed to detect pixels with a depth value of zero or exceeding the camera's effective measurement range, and these pixels are marked as hole pixels. For each hole pixel, its corresponding polarization value in the polarization map is extracted. Simultaneously, a search neighborhood is set around the pixel. The neighborhood radius is adaptively determined based on the hole size; a smaller neighborhood radius is used for isolated small holes, and a larger neighborhood radius is used for large, contiguous holes.

[0023] Within the search neighborhood, all valid depth values and their corresponding polarization values are extracted. A valid depth value is defined as a depth value within the effective measurement range whose corresponding polarization degree is less than the high reflectivity threshold. The high reflectivity threshold is used to distinguish between specular reflection and diffuse reflection regions. Based on polarization optics characteristics, the polarization degree of specular reflection light is typically between 0.7 and 1.0, while the polarization degree of diffuse reflection light is typically between 0 and 0.3. Therefore, the high reflectivity threshold is usually set to 0.5 to 0.7, which can be adjusted according to the material properties of the surface and the camera calibration results. For each valid depth value within the neighborhood, its contribution weight to the hole pixel is calculated. The weight consists of two parts: spatial distance weight and polarization degree similarity weight. The spatial distance weight is calculated based on the Euclidean distance from the valid depth pixel to the hole pixel; the closer the distance, the greater the weight. The polarization degree similarity weight is calculated based on the absolute value of the difference between the polarization degree of the valid depth pixel and the polarization degree of the hole pixel; the closer the polarization degrees, the greater the weight.

[0024] The spatial distance weight and polarization similarity weight are multiplied to obtain a comprehensive weight. All effective depth values within the neighborhood are then normalized and weighted according to this comprehensive weight to obtain the filling depth value of the hole pixel. This polarization-guided weighted fusion method can reasonably infer the depth value of the hole region by utilizing the depth information of surrounding areas with similar surface reflectivity. Compared to simple spatial interpolation methods, this method considers the consistency of the object's surface material, resulting in more accurate repair. The above repair process is performed on all hole pixels in the depth image to obtain a preliminarily repaired depth image. For ordinary cameras without polarization capabilities, a weighted interpolation method based on neighborhood depth values is used for depth hole repair. Weighted average filling is performed based on the spatial distance of the effective depth values around the hole pixel. Although the effect is slightly inferior to the polarization-guided method, it still meets the identification needs of most materials.

[0025] S1.3 receives the raw color image acquired by S1.1, processes it using an adaptive illumination enhancement method, and outputs a color image with balanced illumination. Emergency material warehouses in district and county power supply stations typically use a combination of natural lighting and artificial illumination. Illumination conditions vary with time and weather, directly affecting the accuracy of material identification. The system analyzes the image's brightness histogram to calculate the mean and standard deviation of brightness, and automatically selects an enhancement strategy based on illumination characteristics. For low-light images, a multi-scale illumination estimation method is used for enhancement, restoring the true appearance of objects by estimating and removing illumination components. For overexposed images, a contrast-limited adaptive histogram equalization method is used to enhance details in dark areas while avoiding excessive enhancement in bright areas. For images under normal illumination, a slight contrast stretch is applied, linearly mapping the image's grayscale values to the full range to enhance visual effects. The enhanced color image has uniform illumination and moderate contrast, providing high-quality input for subsequent material inspection and nameplate recognition.

[0026] S1.4 receives the preliminary repaired depth image output from S1.2 and the illumination-equalized color image output from S1.3. It employs neighborhood interpolation and multi-frame temporal filtering to output a complete and stable depth image and an illumination-equalized color image. After polarization-guided depth repair, most holes caused by metallic reflections in the depth image have been filled, but a small number of small holes due to extreme reflections or object edges may still exist. These residual holes need further completion to ensure the accuracy of subsequent 3D positioning. First, residual hole regions in the depth image are detected. Each pixel in the depth image is traversed; if a pixel's depth value is zero or exceeds the camera's effective measurement range, the pixel is marked as a hole pixel. Connectivity analysis is performed on the hole pixels, clustering adjacent hole pixels into hole regions, and the area of each hole region is calculated.

[0027] For small holes with an area smaller than the first area threshold, a distance-weighted interpolation method based on surrounding depth values is used for filling. For each pixel in the hole region, a search window is set around it, and all valid depth values within the window are extracted. The weighted average of these depth values is calculated as the filling depth value of that pixel. The weight is inversely proportional to the distance from the pixel to the center of the hole; the closer the distance, the greater the weight. The first area threshold is used to distinguish between small and large holes. For an image with a resolution of 1920 x 1080 pixels, the first area threshold is usually set to one hundred to two hundred pixels. It can be adjusted according to the minimum size of the material and the image resolution. Specifically, the adjustment method is to use 10% to 20% of the pixel area corresponding to the minimum size of the material as a reference value. This method assumes that the surface of the object is smooth and continuous in local areas, and the depth value of the hole region can be obtained by smoothly transitioning from the surrounding depth values.

[0028] For large holes with an area greater than the first area threshold, the interpolation of the surrounding depth values may not be accurate enough, requiring constraints based on edge information from the color image. Edge features are extracted from the enhanced color image obtained in step S1.3, and an edge detection algorithm is used to detect strong edges in the image. Strong edges typically correspond to object boundaries, and the depth values on both sides of the boundary may exhibit abrupt changes. The advantage of using the enhanced color image for edge detection is that the enhanced image has better contrast and clarity, enabling more accurate detection of object boundaries. During depth completion, if the hole region crosses the edge of the color image, simple smoothing interpolation is not performed; instead, interpolation is performed separately based on the depth values on both sides of the edge, maintaining the discontinuity of depth.

[0029] This completion method, based on neighborhood interpolation and edge constraints, effectively fills residual holes in the depth image, ensuring the integrity of the depth data. Temporal consistency filtering is applied to multiple consecutive depth images to reduce random noise. Depth camera measurements are affected by factors such as ambient lighting and the reflectivity of object surfaces, resulting in some random noise and fluctuations in single-frame depth images. Temporal fusion of multiple consecutive depth images, utilizing historical depth information to smooth the current depth measurement, further improves the stability of the depth data. The temporal filtering window includes several depth images, including the current frame and historical frames, for filtering.

[0030] For each pixel location in the depth image, the depth value sequence at that location across multiple frames is extracted. Outliers are first removed; an outlier is defined as a depth value whose difference from the median of the sequence exceeds a depth difference threshold. After outlier removal, the remaining depth values are weighted and averaged, with the weights decreasing over time. Newer frames are assigned higher weights to ensure rapid response to material movement. The weighted average yields a filtered depth value that incorporates both current observations and historical information, effectively reducing random noise. Temporal filtering is then applied to all pixels in the entire depth image to obtain a stable, filtered depth image. The output consists of an enhanced color image and a fully inpainted, filtered depth image, both pixel-level registered, serving as input data for subsequent material detection and identification.

[0031] The material detection and positioning module is used to acquire uniformly lit color images and complete and stable depth images. It identifies densely stacked materials through a depth-guided dual-stream detection network and generates a list of candidate material areas. S2.1 receives a uniformly illuminated color image and a complete and stable depth image, constructs a depth-guided dual-stream target detection network, and outputs RGB features and depth features. In power supply warehouses, small items such as hardware accessories and cable connectors are often densely stacked in the same storage location, with close contact and even partial overlap. Relying solely on color images is insufficient to accurately segment the boundaries of adjacent items, as they may have similar color and texture features. Depth information provides the positional relationship of items in three-dimensional space. Even if two items are closely adjacent in the color image, they may exhibit different depth values in the depth image due to their different positions. Utilizing depth information can effectively distinguish between foreground and background items.

[0032] This step designs a dual-stream network architecture comprising two parallel feature extraction branches: an RGB stream and a depth stream. The RGB stream processes color images to extract appearance features, while the depth stream processes depth images to extract geometric features. The features from the two streams are fused at multiple layers of the network to achieve complementary enhancement. The backbone network of the RGB stream adopts a residual network structure, including an input layer and several residual stages. The input layer receives a resized color image, which is then normalized to scale pixel values to a preset range. Each residual stage contains several residual blocks, each containing a convolutional layer and skip connections. Skip connections directly add the input to the output, allowing gradients to propagate directly backward, mitigating the gradient vanishing problem in deep networks and improving the network's training efficiency.

[0033] The backbone network structure of the Deep Stream network is the same as that of the RGB Stream network, but the input is a single-channel depth image, which is resized and normalized before being fed into the network. The convolutional layers of the Deep Stream network learn to extract geometric features from the depth image, including object surface normals, depth gradients, and depth discontinuities. These geometric features are crucial for distinguishing foreground and background materials and recognizing the 3D shape of materials. The network employs a transfer learning strategy, first pre-training on a publicly available object detection dataset to obtain general object detection capabilities, and then fine-tuning using power material samples. For common material categories, fifty to one hundred images are collected and labeled under different angles and lighting conditions for each category. For rare material categories, data augmentation techniques are used to expand the sample size. The system supports incremental learning; when a new material type is added, it can quickly adapt to the new category through online learning with a small number of samples without retraining the entire network.

[0034] Step S2.2 receives the RGB and depth features output from S2.1 and fuses them using a cross-modal feature fusion module, outputting a multi-scale fused feature map. The cross-modal feature fusion module executes at each stage of the dual-stream network, fusing the feature maps from the RGB and depth streams, enabling the network to comprehensively utilize both appearance and geometric information. The fusion module first concatenates the two feature maps along their channel dimensions to obtain a concatenated feature map. Then, a convolutional layer processes the concatenated feature map, learning how to adaptively fuse RGB and depth features. For textured material regions, the network learns to retain more RGB features; for material regions with similar textures but different depths, the network learns to retain more depth features.

[0035] The output of the convolutional layer undergoes batch normalization and activation functions to obtain a fused feature map. To preserve the original feature information, the fused feature map is added to the original feature map of the RGB stream via residual connections to obtain the final cross-modal fused feature. Cross-modal feature fusion is performed at multiple stages of the network, resulting in multiple fused feature maps at different scales. These multi-scale fused features contain rich appearance and geometric information, providing powerful feature representations for subsequent material detection.

[0036] Step S2.3 receives the multi-scale fused feature map output from S2.2, employs a feature pyramid network and a multi-scale detection strategy, and outputs preliminary detection results. The cross-modal fused features are input into the feature pyramid network for multi-scale processing. The feature pyramid network uses a top-down path and lateral connections to achieve feature fusion. The deepest fused feature map is reduced in channel number through convolution, and then upsampled using bilinear interpolation to obtain a higher resolution feature map. The upsampled feature map is then added element-wise with the corresponding stage's fused feature map to achieve lateral connection fusion. The fused feature map contains deep semantic information and mid-level shape information.

[0037] Further upsampling and lateral connections are performed to obtain several feature pyramid layers with different resolutions. A detection head network is used for material detection in each feature pyramid layer. The detection head employs an anchor box mechanism, pre-setting multiple anchor boxes with different aspect ratios and scales at each location in the feature map. These anchor boxes serve as the initial positions for candidate detection boxes. Different sizes of anchor boxes are pre-set for feature maps of different resolutions; larger anchor boxes are used for lower-resolution feature maps to detect large materials, while smaller anchor boxes are used for higher-resolution feature maps to detect small materials.

[0038] Step S2.4 receives the preliminary detection results from S2.3 and the complete stable depth image from S1. It then uses a depth-guided non-maximum suppression (NMS) algorithm to filter the detection results and outputs a list of candidate material regions. Traditional NMS algorithms determine overlap solely based on the intersection-union ratio (IUU) of the detection boxes on the image plane. However, in densely stacked scenes, foreground and background materials may highly overlap on the image plane, making it easy for traditional methods to incorrectly suppress the detection boxes of foreground materials. The system provides detection results, each including bounding box coordinates, class probability distribution, and maximum class confidence.

[0039] S2.4: Extract depth information of the region corresponding to the detection box from the complete stable depth image output from S1, and perform depth-guided non-maximum suppression on the detection results to remove overlapping redundant detection boxes. Traditional non-maximum suppression algorithms only determine whether the detection boxes overlap based on the intersection-union ratio (IU / U) on the image plane. However, in densely stacked scenes, foreground and background materials may highly overlap on the image plane, and traditional methods easily suppress the detection boxes of foreground materials incorrectly. This step designs a depth-guided non-maximum suppression algorithm that comprehensively considers the planar overlap and depth difference of the detection boxes, which is one of the core innovations of this invention. The non-maximum suppression algorithm is executed independently for each material category. First, all detection boxes of that category are extracted and sorted from high to low according to category confidence.

[0040] The detection box with the highest confidence is selected as the retained box. The intersection-union ratio (IUR) of this retained box and the remaining detection boxes is calculated. The IUR is defined as the area of the intersection of the two detection boxes divided by the area of their union. Simultaneously, the depth values of the corresponding regions of the two detection boxes are extracted, and the average depth value of the two regions is calculated. If the difference in the average depth values of the two regions exceeds a depth difference threshold, it indicates that the materials corresponding to the two detection boxes are separated in 3D space, and even if they overlap on the image plane, they should not be suppressed. Specifically, if the IUR is greater than the overlap threshold and the depth difference is less than the depth difference threshold, it is considered that the two detection boxes detect the same material, and the detection box with the lower confidence is suppressed. If the IUR is greater than the overlap threshold but the depth difference is greater than the depth difference threshold, it is considered that the two detection boxes detect different materials, and both are retained.

[0041] The depth difference threshold is typically set to ten to twenty times the camera's depth measurement accuracy, depending on the camera's accuracy. For cameras with an accuracy of one millimeter, the threshold can be set to ten to twenty millimeters. The overlap threshold is typically set to 0.5 to 0.7, which can be adjusted according to the density of the materials. A depth-guided non-maximum suppression system can accurately preserve the detection boxes of foreground and background materials in densely stacked scenes, avoiding false suppression and improving detection integrity. A comprehensive confidence score is calculated for the preserved detection results, and threshold filtering is performed. The comprehensive confidence score considers both category confidence and depth consistency. First, the depth values of the corresponding regions of the detection boxes are extracted, and the standard deviation of the depth values is calculated. The standard deviation reflects the uniformity of depth within the region. For a single material, its surface depth should be relatively uniform with a small standard deviation. If the standard deviation is too large, it may indicate that the detection box contains multiple overlapping materials or that the detection box boundaries are inaccurate.

[0042] The depth consistency factor is defined and calculated using the depth standard deviation. A smaller depth standard deviation indicates a depth consistency factor closer to a unit value. The overall confidence score is calculated by combining the class confidence score with the depth consistency factor. The introduction of the depth consistency factor allows detection results with a reasonable depth distribution to achieve higher overall confidence. For all detection results, an overall confidence score is calculated, and results with an overall confidence score below a detection threshold are filtered out. This detection threshold is determined by adjusting the score on the validation dataset to achieve a balance between precision and recall, typically set to 0.7 to 0.9.

[0043] The filtered detection results are sorted from high to low based on overall confidence level, and a candidate material region list is output. Each candidate region includes bounding box coordinates, category label, overall confidence score, and average depth value. The bounding box coordinates include the coordinates of the top-left corner, width, and height. The category label is the category name of the material. The overall confidence score reflects the reliability of the detection result, and the average depth value is the average depth value of all pixels within the detection box area. The candidate material region list serves as input for subsequent nameplate information extraction and material identification.

[0044] The nameplate recognition and information extraction module is used to extract material nameplate information from the candidate material area list using nameplate positioning and text recognition technology, thereby obtaining structured material information and its overall credibility. S3.1 Receives a list of candidate material regions and a color image with balanced illumination. Crops material image blocks based on bounding box coordinates, performs semantic segmentation using a nameplate localization network, and outputs a nameplate region mask and a nameplate image. The bounding box coordinates include the top-left corner coordinates, width, and height. During cropping, the bounding box is appropriately expanded to include the complete appearance of the material and surrounding context information, avoiding overly tight cropping that would truncate the nameplate information at the boundary. The size of the cropped material image block varies depending on the detection box size. To ensure consistency in subsequent processing, the material image block is adjusted to a fixed size. The adjustment method uses aspect ratio-preserving scaling: the long side of the image is scaled to a preset size, the short side is scaled proportionally, and then pixels are padded along the short side to achieve the preset image size.

[0045] A nameplate localization network is constructed to perform semantic segmentation on material image blocks to accurately locate the nameplate region. The nameplate localization network adopts an encoder-decoder architecture. The encoder is responsible for extracting image features, and the decoder is responsible for restoring spatial resolution and generating pixel-level classification results. The encoder contains several downsampling stages, each containing convolutional layers and pooling layers. The convolutional layers extract image features, and the pooling layers reduce the resolution of the feature map. The decoder contains several upsampling stages, each containing an upsampling layer and a convolutional layer. The upsampling layers enlarge the feature map size, and the convolutional layers refine the feature representation.

[0046] Each stage of the decoder fuses features with its corresponding stage of the encoder via skip connections. These skip connections concatenate the encoder's feature map with the decoder's upsampled feature map along the channel dimension. The fused feature map contains both low-level detail information and high-level semantic information. The decoder ultimately outputs a feature map, where each channel represents the normalized probability of each pixel belonging to the background or the nameplate. The probability map of the nameplate channel is extracted and thresholded. Pixels with probabilities greater than a preset threshold are labeled as nameplate pixels, and pixels with probabilities less than the threshold are labeled as background pixels, resulting in a binary mask for the nameplate region.

[0047] S3.2: Receive the nameplate region mask output from S3.1, extract the nameplate region using morphological processing and connected component analysis, preprocess and enhance the nameplate image, and output the enhanced nameplate image. First, perform morphological processing on the binary mask to remove noise and fill holes. First, perform morphological closing operations to fill small holes and broken edges within the nameplate region, then perform morphological opening operations to remove isolated noise points. Next, perform connected component analysis on the processed binary mask to extract all connected candidate nameplate regions. Calculate the area of each connected component and select the connected component with the largest area as the final nameplate region, since nameplates are typically the largest text markings on materials.

[0048] The minimum bounding rectangle of the nameplate area is calculated, and the coordinates of the four vertices of the bounding rectangle determine the precise location of the nameplate. The average probability value of the pixels within the nameplate area is calculated as the location confidence score, which reflects the network's degree of confidence in the nameplate location result. The location confidence score is combined with the detection confidence score of the material to obtain the cumulative confidence score, which reflects the reliability of the entire process from detection to location. If the location confidence score is lower than a preset location threshold, it indicates that the nameplate location is unreliable, possibly because the material has no obvious nameplate or the nameplate is obscured. The material is then marked as a material to be confirmed and recorded in the pending review list, skipping further processing.

[0049] For materials with a location confidence level higher than a threshold, the nameplate area image is cropped from the material image block based on the coordinates of the minimum bounding rectangle. During cropping, a slight expansion of the bounding rectangle is performed to ensure that the text at the edge of the nameplate is not truncated. Preprocessing of the nameplate image includes contrast enhancement, noise reduction, and sharpening to improve text clarity, providing high-quality input for subsequent text detection and recognition. For materials without nameplates or with unidentifiable nameplates, the system employs a supplementary appearance feature-based recognition method. This method extracts the material's shape, color, and texture features and matches them with standard material appearance templates in the knowledge base, selecting the material category with the highest similarity. Although the accuracy of appearance recognition is lower than that of nameplate recognition, it serves as a backup solution when nameplate recognition fails.

[0050] S3.3 Receives the enhanced nameplate image output from S3.2, employs a text detection and recognition network to extract text information, and outputs the character sequence of the text line and recognition confidence score. The text detection network uses a segmentation-based detection method to transform the text detection task into a pixel-level classification problem. The enhanced nameplate image input to the network is used to extract multi-scale features through a feature extraction backbone network. The feature maps obtained after feature extraction are input to the text region prediction branch and the text geometric attribute prediction branch. The text region prediction branch outputs the probability that each pixel belongs to the text to form a text probability map, which is then thresholded to obtain a binary mask of the text region. The text geometric attribute prediction branch outputs the distance from each text pixel to the boundary of its corresponding text line; this distance information is used to cluster pixels belonging to the same text line together.

[0051] Connectivity analysis is performed on the binary mask of the text region to extract all connected text regions. For each text region, its bounding box is calculated using predicted geometric properties. The bounding box is represented by a rotated rectangle to handle tilted text lines. The average text probability of pixels within the bounding box of each text line is calculated as the detection confidence of that text line. Text lines with detection confidence below the text detection threshold are filtered out, retaining only those with high confidence. The retained text lines are sorted from top to bottom and left to right according to their spatial position in the nameplate to conform to the typical reading order of the nameplate. Character-level text recognition is performed on each detected text line image. The text line image is cropped from the nameplate image based on the rotated rectangle bounding box of the text line. If the text line is tilted, it is corrected to a horizontal orientation using an affine transformation. The height of the corrected text line image is normalized, and the width is scaled proportionally.

[0052] Normalized text line images are input into a text recognition network, which employs a convolutional recurrent neural network architecture comprising a convolutional feature extraction module, a recurrent sequence encoding module, and a connection-based temporal classification and decoding module. The convolutional feature extraction module converts the input text line image into a feature sequence, where each element corresponds to a visual feature of a local region within the text line. This feature sequence is then input to the recurrent sequence encoding module, which uses a bidirectional long short-term memory network structure to encode the feature sequence simultaneously from left to right and from right to left. The forward network captures the left-side contextual information of the characters, while the backward network captures the right-side contextual information. The hidden states from both directions are concatenated to obtain a character representation that incorporates both bidirectional contextual information.

[0053] The encoded sequence is input to the connection-based temporal classification decoding module, which predicts the character sequence directly from sequence features without requiring character-level positional annotations. At each time step, the decoder outputs the probability distribution of character categories, including numbers, uppercase letters, lowercase letters, common symbols, and whitespace. A greedy decoding strategy selects the category with the highest probability at each time step as the predicted character. Post-processing is then performed, first merging consecutively repeated characters, then removing whitespace, finally obtaining the final character sequence. The average predicted probability of each character in the sequence is calculated as the recognition confidence score, reflecting the network's degree of certainty about the recognition result. The output is the character sequence of the text line and the recognition confidence score.

[0054] S3.4 receives the character sequence of the text lines output from S3.3 and the recognition confidence level, and uses semantic parsing to extract structured parameters, outputting structured material information and its overall credibility. The character sequences of all text lines are combined according to their order of appearance on the nameplate to form the complete nameplate text content. According to the standard format and industry specifications for power material nameplates, nameplates typically include information such as material name, model, specifications, rated voltage, rated current, manufacturer, and production date, identified by specific keywords. A keyword dictionary is constructed, containing standard keywords and variations of material parameters. Keyword matching is performed on the nameplate text content to locate the position of each parameter; the string immediately following the keyword is the value of that parameter. The extracted parameter values are formatted and their units are identified, using regular expressions to extract the numerical values and units.

[0055] For textual parameters such as model number and specifications, extract complete alphanumeric strings. Organize the extracted parameters into structured data, including key-value pairs of parameter names and values. Calculate the confidence score for semantic parsing. The parsing confidence score is comprehensively evaluated based on the number of successfully extracted parameters and the correctness of the parameter value format. If key parameters such as model number and specifications are successfully extracted and in the correct format, the parsing confidence score is high; if key parameters are missing or have abnormal formats, the parsing confidence score is low. The confidence scores from each processing stage are fused to calculate the overall credibility of the material information. The overall credibility score is obtained by weighted fusion of detection confidence score, location confidence score, average text detection confidence score, average recognition confidence score, and parsing confidence score. This step-by-step fusion of confidence scores quantifies the reliability of the entire recognition process.

[0056] The output structured material information includes parameters such as material category, model, specifications, rated voltage, and rated current, as well as overall confidence level and confidence level distribution at each stage. The overall confidence level serves as the basis for subsequent processing and system prompts for administrator review; low-confidence identification results require manual confirmation.

[0057] The 3D positioning and identity verification module is used to perform 3D positioning of materials by combining structured material information and complete and stable depth images. The recognition results are verified by matching the knowledge base to realize material identity confirmation and inventory change recording. S4.1 receives the complete stable depth image and candidate material region list output by S1, extracts the depth data of the material regions and performs statistical analysis, outputting the representative depth value of the material and the depth measurement quality assessment result. Based on the detection bounding box coordinates of the material, the corresponding rectangular region is located in the depth image. Since the color image and depth image have already undergone pixel-level registration during the acquisition stage, the bounding box coordinates can be directly applied to the depth image. All depth values within the bounding box range are extracted from the depth image to form a depth value matrix. The depth value matrix is validated; in some cases, the depth camera cannot obtain valid depth measurements, and invalid depth values are usually marked as zero or special values in the depth image. Each element of the depth value matrix is traversed; if the depth value is equal to zero or exceeds the effective measurement range of the camera, the depth value is marked as invalid and discarded.

[0058] After removing invalid values, the number of valid depth values remaining is counted. If the percentage of valid depth values is lower than a preset threshold, it indicates poor depth measurement quality for that material area, possibly due to surface characteristics or occlusion, reducing the reliability of the material location results. Valid depth values are sorted from smallest to largest, and the median is calculated as the representative depth of the material surface. The median is more robust to outliers than the mean, effectively suppressing the influence of a few abnormal depth points. The standard deviation of the depth values is calculated as an indicator of depth measurement uncertainty. The standard deviation reflects the dispersion of depth values; a large standard deviation indicates uneven depth on the material surface or significant measurement noise, resulting in high positioning uncertainty.

[0059] The depth standard deviation threshold is determined based on the camera's depth measurement accuracy. When the standard deviation exceeds the depth standard deviation threshold, it indicates significant surface undulations or measurement anomalies. The interquartile range (IQM) of the depth values is calculated; the IQM is the difference between the 75th percentile and the 25th percentile. The IQM also reflects the dispersion of the depth distribution and is less sensitive to extreme values than the standard deviation. The quality of depth measurement is assessed by combining the standard deviation and the IQM. If either the standard deviation exceeds the depth standard deviation threshold or the IQM exceeds the IQM threshold, the depth measurement quality is considered poor, increasing the estimated uncertainty in positioning.

[0060] S4.2 receives the material's representative depth value and bounding box coordinates output from S4.1. Using a pinhole camera model and coordinate system transformation method, it performs Kalman filtering on multiple consecutive frames of position data, outputting the material's 3D position in the warehouse world coordinate system. The pixel coordinates of the material in the image are taken from the center point of its detected bounding box. The x-coordinate of the center point equals the x-coordinate of the top-left corner of the bounding box plus half the width of the bounding box; the y-coordinate of the center point equals the y-coordinate of the top-left corner of the bounding box plus half the height of the bounding box. The camera intrinsic parameter matrix was obtained during the system calibration phase. The intrinsic parameter matrix includes the camera's focal length and principal point coordinates. The focal length includes both lateral and longitudinal focal lengths, and the principal point coordinates are the position of the image center in the pixel coordinate system. According to the back-projection formula of the pinhole camera model, the x-coordinate of the material in the camera coordinate system is equal to the pixel x-coordinate minus the principal point x-coordinate, multiplied by the depth value, and divided by the lateral focal length; the y-coordinate is equal to the pixel y-coordinate minus the principal point y-coordinate, multiplied by the depth value, and divided by the longitudinal focal length; the depth coordinate is directly equal to the measured depth value.

[0061] The 3D coordinates of the material's center point in the camera coordinate system are obtained through back projection calculations, with the origin of the coordinate system located at the camera's optical center. The warehouse world coordinate system is a predefined global coordinate system, with its origin located at a fixed reference position within the warehouse. The position and orientation of each camera in the world coordinate system have been determined through calibration during system installation, and the calibration results are represented as extrinsic parameter matrices, which include rotation matrices and translation vectors. The rotation matrix describes the rotation relationship between the camera coordinate system and the world coordinate system, while the translation vector describes the position of the camera's optical center in the world coordinate system. The coordinate system transformation formula is: world coordinates equal to the rotation matrix multiplied by the camera coordinates plus the translation vector. Matrix operations are used to convert the material's camera coordinates to world coordinates. After transformation, the 3D position of the material's center point in the warehouse world coordinate system is obtained. This position represents the material's absolute spatial coordinates and can be used for material positioning, navigation, and warehouse location management.

[0062] To improve positioning accuracy and stability, temporal fusion is performed on the 3D coordinates of the same material calculated from multiple consecutive frames of images. Due to noise in depth measurements, the 3D coordinates calculated in a single frame fluctuate; multi-frame fusion smooths out random errors. For each material, a historical position sequence is established, containing the 3D coordinates calculated from the most recent few frames. A Kalman filter algorithm is used for temporal fusion. Kalman filtering is an optimal recursive estimation method that can combine historical state estimates and current observations to provide the optimal estimate of the current state. The state vector of the Kalman filter contains the material's 3D coordinates and 3D velocity. The state transition model assumes the material moves at a constant velocity; the position at the next moment is equal to the current position plus the velocity multiplied by the time interval, while the velocity remains constant. The observation model assumes the observed 3D coordinates are equal to the true position plus measurement noise. The covariance matrix of the measurement noise is set according to the uncertainty of depth measurements; measurements with greater uncertainty are assigned a larger noise variance and have a smaller weight in the filtering process.

[0063] Kalman filtering iteratively executes two steps: prediction and update. The prediction step predicts the current state based on the state estimate from the previous time step and the state transition model. The update step corrects the predicted state based on the current observations, with the correction weights determined by the Kalman gain, which is dynamically calculated based on prediction and measurement uncertainties. After Kalman filtering, a smooth 3D coordinate sequence is obtained. The latest filtered result is taken as the current position estimate of the material. The filtered position estimate has high accuracy and good stability.

[0064] S4.3 receives the representative depth value and bounding box size output from S4.1, estimates the actual physical size of the material, and compares it with the standard size in the knowledge base for verification, outputting the size verification result. The actual width of the material is equal to the bounding box width multiplied by the depth value and then divided by the camera's lateral focal length; the actual height is equal to the bounding box height multiplied by the depth value and then divided by the camera's longitudinal focal length. The estimated physical size is compared with the standard size of this type of material in the material knowledge base. The standard size is obtained from the product database based on the material's specifications and model. The relative deviation between the estimated size and the standard size is calculated. If the relative deviation exceeds the size deviation threshold, it indicates that the size estimation result does not match the expectation. Possible reasons include incorrect material category identification, incorrect specification and model identification, or a large depth measurement error. In this case, the overall credibility of the material identification result is reduced, and the material is marked as pending confirmation. For materials with size deviations within a reasonable range, the size comparison result is used as cross-validation of the identification result to enhance the reliability of the identification.

[0065] S4.4 receives the structured material information output by S3 and the size verification results output by S4.3. It employs knowledge base similarity matching and rule-based association verification methods to compare verified materials with the historical database and update inventory information, outputting material identification results and inventory change records. The knowledge base is built incrementally. Initially, basic data, including the model specifications of commonly used materials, can be imported from existing equipment ledgers and product manuals. During system operation, the system prompts the administrator for confirmation and correction, gradually accumulating and improving the knowledge base content. For materials not present in the knowledge base, the system marks them as new materials and prompts the administrator to supplement information; after confirmation, they are automatically added to the knowledge base. For each identified material, key attributes are extracted for matching, including material category, model string, and specification parameters. The system searches the knowledge base for all candidate materials of the same category, calculating the similarity between the identified material and each candidate material. The similarity calculation comprehensively considers text similarity and attribute similarity.

[0066] Text similarity primarily targets model string, employing a string matching algorithm to calculate the similarity between two strings. Attribute similarity targets specifications and technical parameters; for numerical parameters such as rated voltage and rated current, the relative error between the identified value and the standard value is converted into a similarity score. The total similarity is obtained by weighted summation of text similarity and attribute similarity, with model text having a higher weight as it is the primary basis for material identification. After calculating the total similarity for all candidate materials, the material with the highest similarity is selected as the matching result. If the highest similarity exceeds the matching threshold, the match is considered successful, and the identified material is mapped to that standard material, updating the material information to the standardized information of the standard material. If the highest similarity is below the matching threshold, the match is considered unsuccessful, the material is marked as unknown, the administrator is prompted for confirmation, and it is added to the knowledge base. The purpose of association verification is to check whether the identified material combinations conform to the supporting usage specifications and technical standards for power materials and to identify potential identification errors.

[0067] The verification rules include matching relationship verification and parameter compatibility verification. Matching relationship verification checks whether required materials exist simultaneously. It queries the matching relationship information of each identified material in the knowledge base for matching requirements. If a material is identified but its corresponding matching material is not, a matching missing warning is generated, indicating a possible error in the material identification or that a matching material has been omitted. Parameter compatibility verification checks whether the technical parameters of materials within the same area are compatible, primarily checking key parameters such as rated voltage and rated current. If the rated voltage of identified materials differs significantly, a parameter incompatibility warning is generated, possibly indicating an incorrect material identification or improper material placement. The above verification rules are applied to each material, and the number and severity of warnings triggered by that material are tallied to calculate a verification score. The verification score reflects the reliability of the material identification result.

[0068] For materials with low verification scores, the system prompts the administrator for review. For suspicious materials, the system analyzes the reasons for the low verification scores, checking which warning rules were triggered. If a missing item warning is triggered, the system checks if other unidentified materials exist in the scenario. If a parameter incompatibility warning is triggered, it suspects that the material's model or specification has been incorrectly identified; in this case, the original image block of the material can be re-extracted for nameplate information. The system compares the verification results of successfully verified materials with the historical material database and updates the inventory information. The historical material database records historical data for all materials in the warehouse, including material identification information, entry time, storage location, status information, and entry / exit records. For each successfully verified material, a matching record is queried in the historical database based on its model and specifications.

[0069] If a record for the material exists in the database, compare the currently identified spatial location with the location recorded in the database. If the locations match or the deviation is within the location tolerance range, it indicates that the material has not been moved. Update the last monitoring time for the material. If the location deviation exceeds the location tolerance, it indicates that the material has been moved. Update the material's location information and generate a location change record. Compare the quantity of each type of material currently identified with the inventory quantity recorded in the database. If the quantities match, the inventory is confirmed to be accurate. If the quantities do not match, generate an inventory discrepancy record. If the currently identified material has no record in the database, it indicates that it is a newly received material. Generate a new material record and insert the record into the database.

[0070] If a material with a record in the database does not appear in the current identification, it may be because the material has been shipped out or is obscured and not identified. Check the most recent shipping record; if a shipping record exists, confirm that the material has been shipped out; otherwise, mark the material as pending confirmation, as there may be missed detection. Output highly reliable material identification results and inventory change records, summarizing all verified material identification results. Each material contains complete identity information, spatial information, and credibility information. The identification result list serves as the main output of the system for use by inventory management, material query, and inventory counting functions. Summarize inventory change records, including newly received materials, materials with location changes, materials with inventory discrepancies, and materials pending confirmation. Change records are sorted by priority; inventory discrepancies and materials pending confirmation have higher priority and require timely processing. Store the identification results and change records in the database and simultaneously push them to downstream business systems via a message queue to achieve real-time data synchronization.

[0071] The anomaly monitoring and early warning module is used to detect anomalies based on material identification results and inventory change records, through personnel behavior analysis and status monitoring, and generate early warning information according to the severity level. S5.1 receives the material identification results and inventory change records output by S4, and uses time-series comparison and spatial location analysis methods to output the material quantity and location anomaly detection results. It extracts the category, model, quantity, and three-dimensional location information of each material from the material identification results and compares it with the data from the previous time-series database. For each material category, it compares the currently identified quantity with the historical quantity. If the quantity decreases and the decrease exceeds the quantity change threshold, a quantity decrease anomaly record is generated, containing the material category, model, decreased quantity, and detection time. If the quantity increases and the increase exceeds the quantity change threshold, a quantity increase anomaly record is generated. The quantity increase may correspond to an inbound operation or a previous missed detection.

[0072] For materials with quantity changes, further analysis is performed to determine if corresponding authorized inbound / outbound records exist. Authorization records for the current time period are queried from the inbound / outbound management system. If a matching authorization record exists, the quantity change is considered normal; otherwise, the anomaly is marked as an unauthorized change requiring close monitoring. For materials with unchanged quantities, their spatial location is compared, and the Euclidean distance between the current and historical locations is calculated. If the distance exceeds the location change threshold, it indicates that the material has moved, generating a location change anomaly record. This record includes the material identifier, original location coordinates, current location coordinates, movement distance, and detection time. The location change threshold is set based on the material size and shelf spacing, typically half to one times the minimum material size, ensuring meaningful location changes are detected while avoiding false alarms due to measurement errors. A list of material quantity and location anomaly detection results is output, with each anomaly record including anomaly type, material information, anomaly details, and detection time.

[0073] S5.2 receives the material identification result and the evenly lit color image output from S4. It analyzes personnel behavior using a personnel detection network and a multi-target tracking algorithm, matches personnel retrieval operations with authorization records, and outputs an unauthorized retrieval warning. Personnel detection is performed in the evenly lit color image using a deep learning-based personnel detection network to identify personnel targets in the image. The network outputs the bounding box coordinates and detection confidence of the personnel. Multi-target tracking is performed on the personnel detected in multiple consecutive frames, establishing their motion trajectories. The tracking algorithm matches personnel between adjacent frames based on their appearance and motion characteristics, assigning a unique tracking identifier to each person.

[0074] The system analyzes the spatial relationship between personnel movement trajectories and the material area to determine if personnel are approaching the material area. If the distance between the personnel's bounding box and the material's bounding box is less than an interaction distance threshold, it is considered that the personnel are interacting with the material. The interaction distance threshold is set according to the image resolution and material size, typically 0.5 to 1 times the width of the material's bounding box. Upon detecting personnel-material interaction, the system records the start time of the interaction and the initial state of the material, continuously tracking the states of both. When the personnel leave the material area, the system records the end time of the interaction and the final state of the material. By comparing the initial and final states of the material, if the quantity of material decreases or its location changes significantly, it is considered that the personnel have performed a retrieval operation, generating a retrieval event record. This record includes the personnel tracking identifier, material information, retrieval time, and quantity retrieved.

[0075] The system queries authorized outbound records for the current time period from the inbound / outbound management system. These records include the authorized personnel's identity, authorized material category and model, authorized quantity, and authorized time period. Detected outbound events are matched against these authorized records, with matching rules including: outbound time within the authorized time period, outbound material matching the authorized material, and outbound quantity not exceeding the authorized quantity. If an outbound event perfectly matches an authorized record, the outbound operation is considered legitimate, and the execution status of the authorized record is updated. If an outbound event cannot match any authorized record, an unauthorized outbound alert is generated, including personnel tracking information, material information, outbound time, outbound quantity, and alert level.

[0076] The level of unauthorized access warnings is determined based on the importance and value of the materials. Unauthorized access to high-value or critical materials is set as a Level 1 warning, while unauthorized access to general materials is set as a Level 2 warning. The system outputs a list of unauthorized access warnings, with each warning containing detailed event information and its warning level.

[0077] S5.3 receives the material identification results and complete stable depth image output from S4, uses a 3D spatial analysis method to monitor the material stacking status, and outputs stacking anomaly warning information. It extracts the 3D position, bounding box size, and depth data of each material from the material identification results to analyze whether the material stacking status complies with safety regulations. First, it detects the stacking height of the materials. For multiple materials vertically stacked in the same storage location, it calculates the number of stacking layers and the total height based on the differences in their 3D vertical coordinates. It compares the stacking height with the height limit of the storage location. The height limit is set according to the load-bearing capacity of the rack and safety regulations. If the stacking height exceeds the height limit, a stacking over-limit warning is generated. The warning information includes the storage location identifier, stacking height, height limit, and excess quantity.

[0078] The system detects the tilt of materials by calculating the normal vector of the material's surface using depth images. This normal vector is obtained through depth gradient calculations. For regularly placed materials, the normal vector of the top surface should be close to the vertical direction. The angle between the material's top surface normal vector and the vertical direction is calculated. If the angle exceeds a tilt angle threshold, the material is considered tilted, generating a tilt warning. The warning information includes the material's identifier, tilt angle, and tilt direction. The tilt angle threshold is set based on the material's stability and safety requirements. A smaller threshold is set for fragile or hazardous materials, while a more lenient threshold can be set for materials with good stability. The system also detects whether materials have exceeded their storage location boundaries. The boundary range of each storage location is obtained from the racking management system. In the world coordinate system, the boundary range is defined as a rectangular area in three-dimensional space.

[0079] The system determines whether the 3D position of the material is within the boundary of its assigned storage location. If the center point of the material exceeds the boundary or the overlap between the material's bounding box and the storage location boundary is less than the overlap threshold, the material is considered to have crossed the boundary, generating a boundary crossing warning. The warning information includes the material identifier, current position, assigned storage location, and boundary crossing distance. Boundary crossing may be caused by improper placement of materials or material slippage, requiring timely handling to avoid damage to materials or impact on passageway safety. The system outputs a list of abnormal stacking warnings. Each warning includes the anomaly type, material information, anomaly details, and warning level. Overstack and tilt warnings determine their warning levels based on the degree of overstack and tilt angle, while boundary crossing warnings determine their warning levels based on the boundary crossing distance and the importance of the material.

[0080] S5.4 receives various anomaly detection results from S5.1 to S5.3, and uses a tiered warning system and multi-channel push mechanism to output tiered warning information and push it to the monitoring system. It summarizes various anomaly information such as abnormal material quantity and location, unauthorized access warnings, and abnormal stacking warnings, and determines the warning level for each anomaly based on its type and severity. The tiered warning system is as follows: Level 1 warnings are serious anomalies requiring immediate action, including unauthorized access to high-value materials, abnormal quantity of critical materials, stacking exceeding limits with safety hazards, and severe tilting. Upon triggering a Level 1 warning, the system immediately notifies administrators and security personnel via SMS and telephone, and simultaneously displays an alarm window on the monitoring interface.

[0081] Level 2 alerts indicate critical anomalies requiring prompt attention. These include unauthorized access to general materials, changes in material location, minor stacking exceeding limits or tilting, and materials crossing boundaries. Level 2 alerts are communicated to relevant personnel via system messages and emails and are highlighted in the alert list on the monitoring interface. Level 3 alerts indicate general anomalies requiring attention. These include minor fluctuations in material quantities, low-confidence identification results, and new materials not found in the knowledge base. Level 3 alerts are logged in the system log and displayed normally in the alert list on the monitoring interface. Administrators can review and handle these alerts periodically. For multiple similar alerts generated within a short period for the same material or area, the system aggregates the alerts to avoid information overload caused by duplicate alerts. The aggregation rule merges multiple alerts of the same material and anomaly type within a preset time window into one, retaining the earliest alert time and the most severe alert level.

[0082] Early warning information is pushed to the monitoring system interface in real time. The monitoring interface includes modules such as a shelf floor plan, real-time video stream, warning list, and timeline. On the shelf floor plan, the location of abnormal materials is marked according to their world coordinates. Different warning levels are indicated by different colors: red for level one, orange for level two, and yellow for level three. Administrators can intuitively see the spatial distribution of abnormal materials. The timeline marks the time when the warning occurred, allowing administrators to review historical warning records and their handling.

[0083] The alert list is sorted by alert level and time, displaying detailed information for each alert, including alert level, anomaly type, material information, anomaly details, detection time, and handling status. Administrators can click on alert entries to view related images and video clips, confirm the anomaly, and mark the handling status. The system supports closed-loop management of alerts. After handling an anomaly, the administrator marks the alert as handled and fills in the handling result in the system. The system records the handling time and the personnel involved, and the alert records are archived in the historical database for subsequent statistical analysis and pattern mining. Through real-time alerts and visualization, the system improves the efficiency and security of warehouse management, realizing a shift from a passive inventory management model to a proactive monitoring model.

[0084] In one embodiment of the invention, the invention focuses on the practical application scenario of an emergency material warehouse in a district / county power supply station. This warehouse is responsible for the storage and management of power repair materials within the region. By deploying the intelligent management system of the present invention, automatic identification, real-time monitoring, and intelligent inventory of materials are achieved. Specific application data examples are given below.

[0085] In the identification of metal materials, items such as surge arresters and hardware fittings in warehouses have highly reflective surfaces. Traditional RGB-D cameras produce large, deep voids on these surfaces, leading to positioning failures. This system employs polarization-guided depth repair technology. By analyzing polarization information to identify specular reflection areas, it uses weighted fusion repair based on depth information from surrounding areas with similar surface characteristics, successfully repairing deep voids and achieving accurate positioning of metal materials. During an inventory check, the system scanned metal materials on shelves in Zone A and successfully identified highly reflective items such as surge arresters and hardware fittings, achieving a significantly higher accuracy rate than traditional methods.

[0086] In identifying densely stacked materials, multiple cable connectors and hardware accessories were piled on the shelves in Zone B, with the materials in close contact and highly overlapping on the image plane. This system employs a depth-guided non-maximum suppression algorithm, distinguishing between foreground and background materials based on depth differences, successfully identifying all materials and avoiding the false suppression problem of traditional methods. The system can accurately distinguish between foreground and background materials, maintaining detection integrity even in cases of high overlap on the image plane.

[0087] Regarding the identification of degraded nameplates, some material nameplates have become rusted and worn due to long-term storage, resulting in illegible text. The system successfully extracted nameplate information and completed material identification through image preprocessing enhancement and knowledge base-assisted recognition, combined with a progressive knowledge base verification mechanism. For new materials not found in the knowledge base, the system automatically marks them and prompts the administrator to supplement the information. After confirmation, the information is added to the knowledge base, enabling continuous improvement of the knowledge base.

[0088] During the material receiving process, the system automatically collects images of the incoming materials to identify their category and model. Table 1 shows some of the material data identified during a specific receiving operation.

[0089] Table 1. Example of inbound material identification data

[0090] During the material outbound process, the system automatically identifies the retrieved materials and records the outbound information. Once authorized personnel enter the warehouse after authentication, the system continuously monitors the material status. The system uses a personnel detection network to identify personnel targets in color images and establishes personnel movement trajectories through a multi-target tracking algorithm. When the system detects personnel approaching the cable connector storage location on the 4th floor of row 1 in section B, the system records the interaction start time as 14:32 on February 5, 2026. At this time, there are five cable connectors of model JLS-10 / 3×50 in that location. The system continuously tracks the interaction between personnel and materials. After the personnel leave the location at 14:35, the system detects that the number of cable connectors has decreased to three, indicating that the personnel have performed a retrieval operation. The system queries the inbound / outbound management system and finds an authorization record for that time period. The authorized personnel is the emergency repair team leader with employee number 2026, the authorized material is cable connectors JLS-10 / 3×50, the authorized quantity is two, and the authorized time period is from 14:30 to 15:00. The system will match the detected retrieval events with the authorization records, confirm that the retrieval time, material type and quantity all meet the authorization requirements, determine it as a legal retrieval operation, automatically update the inventory records and generate an outbound order.

[0091] During the inventory count, the system performs a comprehensive scan of the warehouse to identify all materials and compares them with inventory records. Table 2 shows some of the inventory data detected during a particular inventory count.

[0092] Table 2, Example of Inventory Count Data

[0093] The system performed anomaly detection on the inventory results and found that the actual number of HY5WS-17 / 50 surge arresters identified was 35, while the theoretical inventory quantity was 36, a decrease of one. The system checked the inbound and outbound records and found no authorized outbound record for this material during that time period, generating an anomaly alert. The alert information included the material category as surge arrester, model as HY5WS-17 / 50, a decrease in quantity of one, a detection time of 2026-02-08 10:25, and an anomaly type of unauthorized quantity reduction. Because surge arresters are high-value critical materials, the system classified this alert as a Level 1 alert, immediately notifying the warehouse manager and security personnel via SMS and telephone. Simultaneously, a red alert window popped up on the monitoring interface, and the location of row 2, shelf 3 in section A was marked in red on the shelf floor plan.

[0094] For cable connector JLS-10 / 3×50, the actual number identified was 122, while the theoretical inventory quantity was 120, resulting in an increase of two. System analysis suggests that this may be due to missed detections during previous inventory checks or recent inbound operations that were not promptly entered into the system, generating an anomaly record with a level 3 alert. This record is logged in the system log for administrator review.

[0095] Regarding stacking status monitoring, the system detected that the stacking height of safety helmets in row 3, layer 5 in zone C was 1.8 meters, exceeding the height limit set for this location by 1.5 meters, with an over-limit of 0.3 meters, generating a stacking over-limit warning. The system classified this warning as a level 2 warning, notifying the administrator via system message. The location of row 3, layer 5 in zone C was marked in orange on the shelf plan on the monitoring interface, with the warning information displaying a stacking height of 1.8 meters, a height limit of 1.5 meters, and an over-limit of 0.3 meters. The system also detected that the drop-out fuses in row 2, layer 1 in zone B were tilted. The angle between the normal vector of the top surface of the material and the vertical direction was calculated using depth image and was 18 degrees, exceeding the tilt angle threshold of 15 degrees, generating a tilt warning at level 2, prompting the administrator to adjust the material placement in time to prevent tipping.

[0096] After the system was implemented, it achieved automated management of daily operations, reducing inventory counting time from the traditional three days to less than two hours, and significantly improving counting accuracy. Emergency repair personnel can retrieve materials themselves through identity authentication at night and on holidays. The system automatically completes personnel detection, retrieval behavior recognition, and authorization matching, reducing response time from an average of thirty minutes to less than five minutes, significantly improving emergency response capabilities. The application of the early warning function transformed warehouse management from passive inventory counting to proactive monitoring. During a certain month of operation, the system generated three Level 1 warnings, fifteen Level 2 warnings, and forty-two Level 3 warnings. All Level 1 warnings were handled within ten minutes, effectively preventing material loss and safety accidents. The visual display of the monitoring interface allows administrators to intuitively grasp the warehouse status. The shelving floor plan displays the real-time distribution of materials and abnormal locations, and the timeline records historical warnings and handling, providing data support for management decisions. The successful application of the system provides an effective solution for the intelligent transformation of power material management.

[0097] The specific embodiments of the present invention have been described in detail above. By organically combining techniques such as polarization-guided depth repair, depth-guided dual-stream target detection, and progressive knowledge base verification, an innovative solution is proposed to address practical difficulties in power material warehouses, such as metal surface reflection, dense stacking obstruction, and nameplate wear and degradation. This achieves high-precision identification and reliable intelligent management of power materials, solves problems such as insufficient identification accuracy and poor environmental adaptability in existing technologies, and provides an effective technical solution for the intelligent transformation of power enterprise material management.

[0098] The embodiments of the present invention have been described above. However, the embodiments are not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make more equivalent embodiments under the guidance of the present embodiments, and all of them are within the protection scope of the present embodiments.

Claims

1. A machine vision-based unmanned power material management system, characterized in that, include: The image acquisition and processing module is used to acquire polarization images and depth data of warehouse materials. It uses polarization-guided depth restoration and illumination-adaptive enhancement to obtain a color image with balanced illumination and a complete and stable depth image. The material detection and positioning module is used to acquire uniformly lit color images and complete and stable depth images. It identifies densely stacked materials through a depth-guided dual-stream detection network and generates a list of candidate material areas. The nameplate recognition and information extraction module is used to extract material nameplate information from the candidate material area list using nameplate positioning and text recognition technology, thereby obtaining structured material information and its overall credibility. The 3D positioning and identity verification module is used to perform 3D positioning of materials by combining structured material information and complete and stable depth images. The recognition results are verified by matching the knowledge base to realize material identity confirmation and inventory change recording. The anomaly monitoring and early warning module is used to detect anomalies based on material identification results and inventory change records, through personnel behavior analysis and status monitoring, and generate early warning information according to the severity level.

2. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The image acquisition and processing module includes: Deploy an RGB-D camera with polarization acquisition capability to simultaneously acquire color images, depth images, and images at four polarization angles; for each pixel position in the image, extract the light intensity value at the corresponding position under the four polarization angles, calculate the light intensity difference between the zero-degree and ninety-degree directions and the light intensity difference between the forty-five-degree and one hundred and thirty-five-degree directions, then calculate the sum of the squares of the two differences and take the square root, and divide by the sum of the light intensities at the four polarization angles to obtain the degree of polarization. A polarization degree calculation process is performed on all pixels in the image to generate a polarization degree map. The polarization degree information is used to perform weighted fusion repair on the hole regions in the depth image. A search neighborhood is set around each hole pixel, and the radius of the neighborhood is adaptively determined according to the hole size. All effective depth values and their corresponding polarization degree values are extracted within the search neighborhood. The effective depth value is the depth value that is within the effective measurement range and the polarization degree of the corresponding position is less than the preset high reflectivity threshold. The contribution weight of each effective depth value in the neighborhood to the hole pixel is calculated. The weight is obtained by multiplying the spatial distance weight and the polarization degree similarity weight. The filling depth value of the hole pixel is obtained by normalizing and weighting all effective depth values in the neighborhood according to their comprehensive weight.

3. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The image acquisition and processing module further includes: The brightness histogram of the image is analyzed to calculate the brightness mean and standard deviation. An enhancement strategy is automatically selected based on the illumination characteristics. Multi-scale illumination estimation is used to enhance low-light images, and contrast-limited adaptive histogram equalization is used to process overexposed images. Connectivity analysis is performed on residual hole regions. Small holes with an area less than a first area threshold are filled using a distance-weighted interpolation method based on the surrounding depth value. Large holes with an area greater than the first area threshold are constrained by combining edge information of the color image. Temporal consistency filtering is performed on depth images of multiple consecutive frames. For each pixel position in the depth image, the depth value sequence of the corresponding position in the multi-frame image is extracted. Outliers with a difference between the depth value and the median of the sequence exceeding a preset depth difference threshold are removed. The remaining depth values are then weighted and averaged, with the weights decaying over time.

4. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The material detection and positioning module includes: A depth-guided dual-stream object detection network is constructed, comprising two parallel feature extraction branches: RGB stream and depth stream. The RGB stream processes color images to extract appearance features, while the depth stream processes depth images to extract geometric features. The backbone networks of both the RGB and depth streams adopt residual network structures, including an input layer and several residual stages. Each residual stage contains several residual blocks, and each residual block contains convolutional layers and skip connections. A cross-modal feature fusion module is executed at each stage of the dual-stream network, first concatenating the two feature maps along the channel dimension to obtain a concatenated feature map. The concatenated features are processed by convolutional layers. The output of the convolutional layers is batch normalized and activated to obtain the fused feature map. The fused feature map is then added to the original feature map of the RGB stream through residual connections to obtain the cross-modal fused feature.

5. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The material detection and positioning module also includes: Cross-modal fusion features are input into a feature pyramid network for multi-scale processing. The feature pyramid network uses a top-down path and lateral connections to achieve feature fusion. A detection head network is used for material detection at each feature pyramid layer, and the detection head adopts an anchor box mechanism. Depth information of the region corresponding to the detection box is extracted from the complete and stable depth image. Depth-guided non-maximum suppression is applied to the detection results. The non-maximum suppression algorithm is executed independently for each material category. First, all detection boxes of the corresponding category are extracted and sorted from high to low category confidence. The detection box with the highest confidence is selected as the retained box. The intersection-union ratio of the retained box and the remaining detection boxes is calculated. At the same time, the depth values of the regions corresponding to the two detection boxes are extracted and the average depth value of the two regions is calculated. If the cross-union ratio is greater than the preset overlap threshold and the depth difference is less than the preset depth difference threshold, the detection box with lower confidence will be suppressed. If the cross-union ratio is greater than the preset overlap threshold but the depth difference is greater than the preset depth difference threshold, both detection boxes will be retained.

6. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The nameplate recognition and information extraction module includes: The material image blocks are cropped according to the bounding box coordinates, and a nameplate localization network is constructed to perform semantic segmentation on the material image blocks. The nameplate localization network adopts an encoder-decoder architecture. The encoder contains several downsampling stages, and the decoder contains several upsampling stages. Each stage of the decoder and the corresponding stage of the encoder are fused through skip connections. Morphological processing is performed on the binary mask of the nameplate area. First, morphological closing operation is performed to fill the small holes inside the nameplate area, and then morphological opening operation is performed to remove isolated noise points. Connectivity analysis is performed on the processed binary mask to extract all connected candidate nameplate areas. The connected region with the largest area is selected as the final nameplate area. The minimum bounding rectangle of the nameplate area is calculated to determine the precise position of the nameplate.

7. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The nameplate recognition and information extraction module also includes: The text detection network employs a segmentation-based detection method. The enhanced nameplate image is input into the network, and multi-scale features are extracted through the feature extraction backbone network. The feature maps are then input into the text region prediction branch and the text geometric attribute prediction branch. The text recognition network adopts a convolutional recurrent neural network architecture, which includes a convolutional feature extraction module, a recurrent sequence encoding module, and a connection-based temporal classification and decoding module. The recurrent sequence encoding module uses a bidirectional long short-term memory network structure to encode feature sequences simultaneously from left to right and from right to left. A keyword dictionary was constructed based on the standard format of power equipment nameplates. Keyword matching was performed on the nameplate text content to locate the position of each parameter. The extracted parameters were then organized into structured data. The overall credibility was obtained by weighted fusion of detection confidence, location confidence, average text detection confidence, average recognition confidence, and parsing confidence.

8. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The three-dimensional positioning and authentication module includes: Based on the coordinates of the detection bounding box of the material, the corresponding rectangular area is located in the depth image. All depth values within the bounding box range are extracted from the depth image to form a depth value matrix. After removing invalid depth values, the median of the depth values is calculated as the representative depth value of the material surface. The three-dimensional coordinates of the materials in the camera coordinate system are calculated based on the back projection formula of the pinhole camera model. The camera coordinates of the materials are converted into the three-dimensional position in the warehouse world coordinate system through the extrinsic parameter matrix. The Kalman filter algorithm is used to perform temporal fusion of continuous multi-frame position data. The state vector of the Kalman filter contains the three-dimensional coordinates and three-dimensional velocity of the materials. The measurement noise covariance matrix of the observation model is set according to the uncertainty of depth measurement.

9. The unmanned power material management system based on machine vision according to claim 1, characterized in that, The three-dimensional positioning and authentication module also includes: The actual physical size of the material is estimated based on the bounding box size and the representative depth value. The estimated physical size is then compared and verified with the standard size of the corresponding material in the knowledge base. The knowledge base is constructed in a progressive manner. For each identified material, its key attributes are extracted for matching. All candidate materials with the same category as the corresponding material are searched in the knowledge base. The similarity between the identified material and each candidate material is calculated. The similarity calculation takes into account both text similarity and attribute similarity. The material with the highest similarity is selected as the matching result. The correlation verification includes matching relationship verification and parameter compatibility verification. Matching relationship verification checks whether the materials that need to be used together exist at the same time, and parameter compatibility verification checks whether the technical parameters of materials in the same area are compatible.

10. A machine vision-based unmanned power material management system according to claim 1, characterized in that, The anomaly monitoring and early warning module includes: Extract the category, model, quantity, and three-dimensional location information of each material from the material identification results and compare it with the data of the previous moment in the historical time series database. For materials with changes in quantity, analyze whether there are corresponding authorized entry and exit records. For materials with unchanged quantity, compare their spatial location. If the Euclidean distance between the current location and the historical location exceeds the preset location change threshold, generate a location change anomaly record. Personnel detection networks and multi-target tracking algorithms are used to analyze personnel behavior, establish personnel movement trajectories, analyze the spatial relationship between personnel movement trajectories and material areas, and match personnel retrieval operations with authorization records. Three-dimensional spatial analysis methods are used to monitor the stacking status of materials, detecting the stacking height, tilt status, and whether the materials have exceeded the storage location boundaries. For various abnormal information, warning levels are determined according to their type and severity: Level 1 warning indicates a serious abnormality that requires immediate handling, Level 2 warning indicates an important abnormality that requires prompt handling, and Level 3 warning indicates a general abnormality that requires attention.