A smart timber sorting system based on multi-perspective collaboration and multi-modal fusion
The intelligent timber inventory management system, which integrates multimodal data fusion and multi-view collaboration, solves the problems of timber counting drift and omission in multi-view stacking scenarios, achieves high-precision timber measurement and statistics, and improves the robustness and transparency of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHANGJIAGANG ZHONGLI OCEAN SHIPPING TALLY CO LTD
- Filing Date
- 2026-04-09
- Publication Date
- 2026-06-30
AI Technical Summary
Existing automated sorting solutions suffer from topological inaccuracies in 2D feature matching due to visual homogenization of timber end faces in multi-view stacking scenarios, resulting in counting drift and omissions. 2D image processing solutions are affected by random tilt projections, making it difficult to accurately restore the true tilt angle of the end face. Furthermore, the single visible light modality has low robustness under complex working conditions, making it difficult to achieve accurate reconstruction of multi-view topology and accurate mapping of three-dimensional spatial scale in large-scale stacking scenarios.
A multimodal data acquisition array is used to acquire high-resolution color images, 3D laser point clouds, and long-wave infrared thermal imaging data. Combined with a spatiotemporal reference synchronization module, a multi-source feature deep fusion module, a 3D spatial geometric correction engine, and a cross-view topology collaborative deduplication module, the accurate identification and measurement of wood end faces are achieved through multi-view collaboration and multimodal fusion.
It enables high-precision measurement of timber diameter data under non-orthophoto conditions, eliminates counting drift and omission problems, improves the reliability and robustness of volume calculation, ensures the uniqueness and objectivity of tallying and statistical results, reduces manual labor intensity, and improves the fairness and transparency of trade settlement.
Smart Images

Figure CN121999421B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and smart logistics, specifically a smart timber sorting system based on multi-view collaboration and multi-modal fusion. Background Technology
[0002] With the expansion of global timber trade and the advancement of digital transformation in forestry, automated cargo handling of logs and products has become key to improving the efficiency of port warehousing, logistics turnover and trade settlement. Traditional manual on-site measurement, coding and counting have problems such as high labor intensity, subjective judgment differences, parallax fatigue and environmental obstruction, resulting in data lag and bias.
[0003] Existing automated sorting solutions use high-resolution industrial cameras to capture images of the end faces of timber, combine them with deep learning models to achieve detection and feature extraction, and introduce multi-view image stitching technology to construct a global digital view. This approach is superior when dealing with individual timber or simple standardized stacking scenarios, and significantly improves the level of automation in counting.
[0004] The increasing demands for measurement accuracy and system robustness in industrial settings present limitations in existing technologies. These limitations include: visual homogenization of timber end faces in multi-view stacking scenarios, leading to inaccurate 2D feature matching topology, causing counting drift and undercounting; the 2D image processing scheme is affected by random tilt projection, making it difficult to accurately reproduce the true tilt angle of the end face, resulting in deviations in diameter and volume calculations; and the single visible light modality has low robustness under complex working conditions. The core technical bottleneck is to achieve accurate topological reconstruction of large-scale stacking scenarios from multiple perspectives, accurate mapping of three-dimensional spatial scale under oblique projection, and multi-source information fusion compensation. To address these challenges, this invention provides a timber intelligent sorting system based on multi-view collaboration and multi-modal fusion. Summary of the Invention
[0005] In order to overcome the shortcomings of the prior art, at least one technical problem raised in the background art is solved.
[0006] The technical solution adopted by this invention to solve its technical problem is: a timber intelligent sorting system based on multi-view collaboration and multi-modal fusion, the system being built on a mobile sorting operation platform, the system comprising:
[0007] A multimodal data acquisition array is used to acquire multi-source sensing data of timber stacks to be sorted, including high-resolution color image data, three-dimensional laser point cloud data, and long-wave infrared thermal imaging data.
[0008] The spatiotemporal reference synchronization module is electrically connected to the multimodal data acquisition array and is used to provide a time alignment reference and spatial pose parameters for the multi-source sensing data.
[0009] The multi-source feature deep fusion module is used to perform spatial voxel mapping on the multi-source sensing data and generate an enhanced feature map for identifying the end face of wood by dynamically adjusting the feature weights of different modalities.
[0010] The 3D spatial geometry correction engine is used to construct the inverse mapping matrix of projective transformation based on the 3D spatial pose of the wood end face, restore the wood contour in the 2D image to the orthogonal circular contour in physical space, and calculate the physical scale parameters of the wood.
[0011] The cross-view topology collaborative deduplication module is used to build a global dynamic spatial database based on absolute geographic coordinates. Through spatiotemporal topology association and motion compensation logic, it can uniquely identify and deduplicate timber targets that are repeatedly observed from different perspectives.
[0012] The central control and data storage system is used to coordinate the concurrent operation of each module, generate electronic inventory lists with geofence tags, and transmit the inventory results to the remote management terminal.
[0013] The multimodal data acquisition array is fixedly mounted on the support frame of the sorting and handling platform. It includes at least two sets of high-resolution industrial cameras, one set of multi-line lidar, and one set of long-wave infrared thermal imaging sensors. The high-resolution industrial cameras use global shutter CMOS sensors with an effective pixel count of at least 12 million and are equipped with low-distortion industrial lenses with constant apertures to acquire high-definition color texture information of the wood end face. The multi-line lidar is positioned above the geometric center of the camera array, with a horizontal field of view of at least 120 degrees, a vertical field of view of at least 40 degrees, and a ranging accuracy of ±2 cm. It is used to scan the three-dimensional spatial point cloud data of the wood stack in real time. The long-wave infrared thermal imaging sensor operates in the wavelength range of 8 micrometers to 14 micrometers and is used to extract the end face contour features by utilizing the difference in emissivity between the wood and the environmental background when there is insufficient lighting or when the wood end face is obstructed by coverings.
[0014] The spatiotemporal reference synchronization module includes a high-precision inertial measurement module, a differential positioning system, and a hardware trigger controller;
[0015] The hardware trigger controller is electrically connected to the industrial camera, LiDAR, and infrared sensor via external trigger cables. Driven by a field-programmable gate array, the hardware trigger controller generates a frequency-controlled synchronization pulse sequence to ensure that all sensors complete data sampling at the same microsecond-level timestamp. The inertial measurement module records the three-axis acceleration and three-axis angular velocity of the cargo handling platform at the moment of sampling. The differential positioning system provides centimeter-level spatial coordinates. The two systems are fused through loosely or tightly coupled logic to output the six-degree-of-freedom real-time pose parameters of the cargo handling platform. The real-time pose parameters include latitude and longitude, altitude, pitch angle, roll angle, and yaw angle.
[0016] The multi-source feature deep fusion module runs within an embedded computing core. This module receives a two-dimensional image stream from an industrial camera and a three-dimensional point cloud stream from a lidar.
[0017] The camera intrinsic parameter matrix, the lidar extrinsic parameter matrix, and the relative transformation matrix between the two are obtained through offline calibration.
[0018] During the online processing phase, this module uses reprojection logic to project the 3D point cloud onto the 2D image plane, achieving pixel-level spatial alignment.
[0019] For the end face features of wood, this module adopts a semantic segmentation architecture based on deep residual networks to perform pixel-level annotation of the end face of wood in color images;
[0020] Using depth information obtained by lidar, background noise interference from non-timber areas is eliminated;
[0021] In areas where image textures are indistinct due to dirt or peeling, this module automatically calls contour data from a long-wave infrared thermal imaging sensor for compensation. Through dynamic adjustment of multimodal feature weights, it generates an enhanced end-face candidate region feature map.
[0022] The spatial voxelization mapping strategy divides the spatial volume into several voxel modules. Each voxel module simultaneously stores the occupancy status and reflection intensity from the multi-line lidar, the RGB color components from the industrial camera, and the temperature components from the long-wave infrared thermal imaging sensor, forming a five-dimensional feature data structure.
[0023] When extracting features from the end face of wood, an adaptive illumination compensation logic is introduced. When the image contrast of the visible light mode is lower than a preset contrast threshold, the feature weights of the reflection intensity of the long-wave infrared thermal imaging sensor and the multi-line lidar are automatically adjusted.
[0024] The three-dimensional spatial geometry correction engine obtains the local point cloud slices corresponding to each detected wood end face from the multi-source feature deep fusion module. The engine uses random sampling consistency logic to fit the geometric plane representing the wood end face in the local point cloud slices.
[0025] Calculate the normal vector of the geometric plane and perform a vector angle calculation with the direction vector of the camera optical axis to determine the actual tilt posture of the wood end face in three-dimensional space.
[0026] Based on the calculated tilt angle and rotation angle, the engine constructs an inverse mapping matrix for projective transformation, restoring the elliptical projection contour in the two-dimensional image to an orthographic circle contour in three-dimensional physical space.
[0027] Based on this, the pixel scale is converted into the physical scale by using the camera's focal length parameters and the instantaneous depth distance measured by the lidar, and the precise diameter, circumference and cross-sectional area of each end face of the wood are calculated.
[0028] The three-dimensional spatial geometry correction engine also includes:
[0029] The correction module based on the longitudinal axis prediction of timber is used to combine the sparse point cloud captured by the multi-line lidar on the side of the timber to estimate the overall orientation and length of a single timber, and to perform an integral operation on the end face area and the estimated effective length to realize the measurement of the volume of a single timber.
[0030] The edge filtering module based on geometric curvature variation is used to calculate the local curvature of sampling points on the contour line after extracting the end face contour, remove abnormal points with abrupt curvature changes, and use a robust fitting algorithm to fit the remaining effective feature points to eliminate measurement errors caused by heartwood cracking or sapwood damage.
[0031] The cross-view topology collaborative deduplication module is used to process overlapping observation data generated during the movement of the work platform. This module establishes a global dynamic spatial database with the absolute geographic coordinates provided by RTK-GNSS as the reference coordinate system. Whenever the system identifies a new piece of timber, it stores the coordinates of its three-dimensional centroid, the direction of its normal vector, and the corrected physical characteristic parameters into the database and assigns a unique global identification number, i.e., UID.
[0032] When the work platform moves to the next view pose, the module uses the current pose parameters to predict the theoretical projection position of the stored wood on the current image plane. By performing spatiotemporal topological association between the target detected in the current frame and the predicted target in the database, and combining the motion compensation algorithm logic, it determines whether the current target is a recorded target.
[0033] If both the spatial distance deviation and feature similarity are within the set deviation threshold, they are determined to be the same target, and their pose attributes are updated without increasing the count value.
[0034] If it does not meet the requirements, it is determined to be newly entered wood in the field of vision. This collaborative logic based on spatial absolute coordinates and motion vector prediction fundamentally eliminates the deduplication failure caused by visual self-similarity.
[0035] The feature vector includes descriptors of wood end-face annual ring distribution frequency, core area eccentricity, and texture complexity extracted using a deep convolutional network;
[0036] The state estimation logic based on pose graph optimization is used to generate key frames during the movement of the cargo handling platform. A constraint network is constructed by matching common observation targets between adjacent key frames, and nonlinear optimization logic is used to adjust the pose of the key frames to minimize the observation residuals in the global scope.
[0037] The closed-loop detection logic corrects the pose drift by identifying the characteristic timber features when the sorting platform returns to the vicinity of the known geographical coordinates. If the pose closure error exceeds the set error threshold, the pose of the keyframes along the entire path is adjusted using linear interpolation logic.
[0038] The central control and data storage system is responsible for coordinating the operation status of each module and transmitting the final inventory results, including the total number of timbers, diameter distribution histogram, total volume assessment data, and electronic inventory list with geofence tags, to the remote management terminal via an encrypted wireless link.
[0039] Preferably, in the multimodal data acquisition array, the mounting bracket of the industrial camera is equipped with a three-axis active image stabilization gimbal. The control logic of the image stabilization gimbal is linked in real time with the inertial measurement module, and by reverse compensation for the high-frequency vibration of the cargo handling platform on bumpy roads or during movement, it ensures the stability of the optical axis during image acquisition, thereby reducing feature extraction errors caused by motion blur;
[0040] An auxiliary infrared fill light array, whose emission wavelength matches the peak photosensitive response of the industrial camera, is controlled by the central control and data storage system and automatically turns on when the ambient light level is below a set threshold.
[0041] Preferably, the multi-source feature deep fusion module further includes an adaptive illumination compensation logic; this logic adjusts the exposure time and gain parameters of the industrial camera in real time by analyzing the distribution characteristics of the image histogram; under backlight or strong shadow conditions, the module enhances the recognition of wood end-face texture in dark areas through local contrast enhancement processing; when the ambient light is lower than the set illumination threshold, the system automatically activates the auxiliary infrared fill light array, and the emission wavelength of the fill light matches the near-infrared response peak of the industrial camera's photosensitive element to maintain a stable imaging signal-to-noise ratio.
[0042] Preferably, the three-dimensional spatial geometry correction engine introduces a correction factor based on the prediction of the longitudinal axis of the wood when calculating the volume. The engine not only analyzes the geometry of the end face, but also combines the sparse point cloud captured by the lidar on the side of the wood to estimate the overall orientation and length of the wood. By integrating the end face area with the estimated effective length, the engine achieves a refined measurement of the volume of a single piece of wood, rather than using a uniform average length assumption.
[0043] Preferably, the cross-view topology collaborative deduplication module also has breakpoint resume and closed-loop detection logic. When the cargo handling platform completes a cycle of a circular path or a reciprocating path, the system identifies the landmark timber features of known geographical coordinates, performs global optimization and correction, eliminates accumulated pose drift errors, and ensures the statistical consistency of the total number of cargo handled.
[0044] The system also includes:
[0045] The dynamic occlusion processing module is used to detect the edge of the back row of wood at the gap using the multi-echo technology of the multi-line lidar, and infer the shape of the occlusion area by combining the three-dimensional spatial occupancy relationship, and extract the features of the target from different perspectives for splicing and supplementation.
[0046] The physical environment correction module is used to integrate temperature and humidity sensors to acquire atmospheric environmental parameters and correct the refractive index of the time-of-flight data of the multi-line lidar accordingly.
[0047] The self-diagnostic module is used to monitor the current, temperature and frame rate of each sensor in the multimodal data acquisition array in real time, and triggers an audible and visual alarm when lens contamination or signal loss is detected.
[0048] The beneficial effects of this invention are as follows:
[0049] 1. The intelligent timber sorting system based on multi-view collaboration and multi-modal fusion described in this invention reconstructs the geometric posture of the timber end face in three-dimensional space through the deep fusion of lidar and visible light camera. Unlike traditional two-dimensional image processing schemes, it completely eliminates projective distortion caused by the tilt of the shooting angle by using end face normal vector correction logic. This makes the diameter data obtained under non-orthophoto conditions have a high degree of accuracy consistent with the physical measured value, thus improving the reliability of volume calculation.
[0050] 2. The intelligent timber sorting system based on multi-view collaboration and multi-modal fusion described in this invention, through the tight coupling positioning of RTK-GNSS and IMU, can establish a unique geospatial label for each piece of timber. This deduplication logic based on absolute physical location overcomes the mismatch and omission problems that are prone to occur in traditional visual stitching algorithms in large-scale, high-similarity stacking scenarios, ensuring the uniqueness and objectivity of the sorting and statistical results.
[0051] 3. The intelligent timber sorting system based on multi-view collaboration and multi-modal fusion described in this invention integrates an infrared thermal imaging sensor, an auxiliary infrared supplementary lighting system, and adaptive illumination compensation logic. The system can effectively resist the adverse effects of drastic changes in illumination, end-face dirt, and overlapping shadows. The collaborative compensation of multi-modal data improves the robustness of feature extraction.
[0052] 4. The intelligent timber sorting system based on multi-view collaboration and multi-modal fusion described in this invention allows sorting personnel to intuitively view the stacking status on a remote terminal through a real-time generated three-dimensional digital twin model with geographic coordinates. Data such as the diameter, volume, and location of each timber can be traced. This not only greatly reduces the intensity of manual labor, but also ensures the fairness and transparency of timber trade settlement by reducing human measurement errors and subjective intervention. Attached Figure Description
[0053] The invention will now be further described with reference to the accompanying drawings.
[0054] Figure 1 This is a structural block diagram of a wood intelligent sorting system based on multi-view collaboration and multi-modal fusion in this invention. Detailed Implementation
[0055] To make the technical means, creative features, objectives and effects of this invention easier to understand, the invention will be further described below in conjunction with specific embodiments.
[0056] like Figure 1As shown in the embodiment of the present invention, a timber intelligent sorting system based on multi-view collaboration and multi-modal fusion is deployed on a mobile sorting operation platform. The platform can be designed as a vehicle-mounted, rail-mounted, or hoisted type according to the needs of the operation scenario.
[0057] At the physical architecture level, the system achieves high-precision and automated inventory management of timber stacks through the organic combination of a multimodal data acquisition array, a spatiotemporal reference synchronization module, a multi-source feature deep fusion module, a three-dimensional spatial geometric correction engine, a cross-view topology collaborative deduplication module, and a central control and data storage system.
[0058] Regarding the engineering layout of the multimodal data acquisition array, it is securely mounted on the top reinforced support frame of the tallying platform. The array includes two sets of symmetrically distributed high-resolution industrial cameras. These two sets of cameras use global shutter CMOS sensors with a single effective pixel of over 12 million, and are equipped with industrial-grade lenses with fixed focal length and ultra-low distortion characteristics. The design of its optical system ensures that even at the edge of the field of view, the image resolution can still support the extraction of fine textures.
[0059] Located directly above the geometric center of the camera array is a multi-line lidar. The mounting reference plane of this lidar is strictly parallel to or offset from the imaging plane of the industrial camera. The multi-line lidar has a horizontal field of view of no less than 120 degrees and a vertical field of view of no less than 40 degrees. Its laser emission frequency and scanning line number have been optimized to ensure that the ranging accuracy is stable at ±2 cm within a typical working distance of 3 to 15 meters.
[0060] The array also integrates a set of long-wave infrared thermal imaging sensors, whose sensing band is locked in the range of 8 micrometers to 14 micrometers, mainly used to capture the difference in thermal emissivity between the end face of the wood and the background environment.
[0061] To ensure the quality of the acquired data, the industrial camera is connected to the support frame via a three-axis active image stabilization gimbal. The three-axis active image stabilization gimbal integrates a high-frequency servo motor and drive circuit, and its control signal comes directly from the inertial measurement module in the spatiotemporal reference synchronization module.
[0062] When the cargo handling platform moves in the rugged timber yard, the resulting bumps and vibrations are sensed in real time by the inertial measurement module. The three-axis active image stabilization gimbal compensates for the pitch, roll and yaw directions by reverse motion, so that the optical axis of the industrial camera always points to the preset observation area, thus eliminating the image blur caused by motion from a physical level.
[0063] In situations where ambient light is insufficient, the system will automatically activate an auxiliary infrared fill light array. The near-infrared light emitted by this array matches the peak sensitivity of the industrial camera's image sensor, providing sufficient exposure energy to the camera without producing visible light pollution.
[0064] The spatiotemporal reference synchronization module serves as the time and space skeleton of the system. Its core is a hardware trigger controller driven by a field-programmable gate array. This controller is physically connected to all sensors in the multimodal data acquisition array via hardwired connections.
[0065] Specifically, the hardware trigger controller generates a set of nanosecond-level jittering synchronization pulse sequences, which serve as external trigger signals for all sensors. This forces the industrial camera's exposure start time, the start time of each scan of the lidar, and the frame acquisition time of the infrared thermal imaging sensor to be perfectly aligned, with the time deviation strictly controlled within 1 microsecond.
[0066] In terms of spatial positioning, the differential positioning system, combined with the inertial measurement module, provides centimeter-level absolute geographic coordinates and high-precision attitude parameters for each set of synchronously acquired data. These data are processed internally by the spatiotemporal reference synchronization module and encapsulated into a unified data frame header, providing a foundation for subsequent cross-view fusion.
[0067] The multi-source feature deep fusion module is executed within a ruggedized embedded computing core, and its primary task is to align data from different modalities in spatial dimensions.
[0068] During the system initialization phase, the camera intrinsic parameters, lidar extrinsic parameters, and relative transformation matrices between various sensors obtained through the offline calibration program are loaded into memory.
[0069] During online processing, the system uses reprojection logic to project the 3D point cloud captured by the LiDAR onto the 2D image coordinate system generated by the industrial camera. Each image pixel not only carries RGB color information, but is also given a depth attribute obtained by point cloud interpolation.
[0070] The multi-source feature deep fusion module adopts a spatial voxelization mapping strategy, which divides the spatial volume into a large number of tiny cubic modules. Each voxel serves as a multi-dimensional feature container, storing occupancy status, reflection intensity, color components, and temperature components.
[0071] In the process of identifying the end face of wood, the multi-source feature deep fusion module not only relies on the texture features of visible light, but also automatically increases the feature weights of the long-wave infrared thermal imaging sensor and lidar reflection intensity data when the visual features of the end face of wood are degraded due to soil covering, peeling or light shadows.
[0072] For example, at dusk, the end face of a piece of wood may appear as a faint shadow under visible light, but due to the difference in heat capacity between the wood fibers and the surrounding air and soil, it will show a clear temperature gradient outline in the infrared image. Through this dynamic adjustment of multimodal weights, the system can generate a fused enhanced feature map, thereby accurately locking the candidate region of the end face of the wood.
[0073] The intervention of the three-dimensional spatial geometric correction engine has solved the most troublesome problem of oblique projection error in the tallying operation. In actual operation, it is difficult for the tallying operation platform to always be directly facing the end face of each piece of wood for shooting.
[0074] When there is an angle, the round end face of the wood will appear as an ellipse in the image, and traditional two-dimensional measurement methods will produce huge diameter deviations.
[0075] The 3D spatial geometry correction engine obtains local point cloud slices corresponding to candidate regions and fits the geometric plane of the wood end face in 3D space using random sampling consistency logic. The system determines the true tilt attitude of the end face by calculating the 3D angle between the normal vector of the fitted plane and the direction vector of the camera optical axis.
[0076] The engine constructs an inverse mapping matrix of projective transformation, which remaps the deformed contour points in the image back to their orthogonal positions in three-dimensional physical space. Combined with the instantaneous depth distance provided by the LiDAR and the optical focal length of the camera, the system can convert the distance between pixels into a real physical millimeter value.
[0077] Furthermore, the 3D spatial geometry correction engine not only focuses on the two-dimensional properties of the end face, but also incorporates a correction factor based on the prediction of the longitudinal axis of the wood. By analyzing the sparse point cloud captured by the lidar on the side of the wood, the engine can estimate the axial direction of a single piece of wood. This axial information, combined with the end face area, can achieve accurate calculation of the volume of a single piece of wood without assuming that all wood has a uniform length.
[0078] The cross-view topology collaborative deduplication module is key to ensuring the accuracy of the total number of goods. This module establishes a global dynamic spatial database, whose coordinate system is based on the absolute geographic coordinates provided by the differential positioning system.
[0079] Whenever a new wood end face is identified and its physical parameters are calculated, the system assigns it a unique global identification number and stores its three-dimensional centroid coordinates, plane normal vector, and physical dimensions in the database.
[0080] As the tallying platform moves along the timber stack, new observation data will be continuously generated from subsequent perspectives. The cross-view topology collaborative deduplication module will use the current pose parameters to predict the expected position of the timber already stored in the database on the current image plane.
[0081] If the physical distance deviation and feature similarity between the detected target and the predicted target in the current frame are both lower than the set deviation threshold, the system determines that the target is an old target, updates its pose attributes only to optimize the database accuracy, and does not count it again in the total.
[0082] In actual engineering implementation, the system's central control and data storage system is responsible for coordinating the entire process. The hardware platform adopts a ruggedized computer that integrates a high-performance GPU and a multi-core CPU, and conducts heat dissipation through an aluminum-magnesium alloy shell. Data transmission between modules is completed through high-speed industrial Ethernet. In terms of software architecture, the system runs on a customized real-time Linux operating system kernel, and each processing link is encapsulated as an independent microservice.
[0083] Based on the above, the implementation logic of this system is as follows:
[0084] At the physical level, the core computing nodes of the system adopt a ruggedized embedded computing platform that integrates a high-performance graphics processor and a multi-core central processing unit. The platform conducts heat dissipation through a sealed aluminum-magnesium alloy shell, meeting the IP67 protection requirements to cope with the extreme environment of high dust and high humidity in the timber yard.
[0085] Each sensor is connected to the main control box via a highly flexible shielded cable. The cable connectors are aviation plugs to ensure the stability of the electrical connection under mechanical vibration.
[0086] In terms of data flow logic, after the system starts up, it first enters the self-test and initialization state. The inertial measurement module performs zero-bias calibration, the RTK-GNSS module searches for satellite signals and waits to enter the high-precision fixed solution state. Once a valid spatiotemporal reference is obtained, the hardware trigger controller starts to send synchronization signals to each acquisition sensor. The raw image data output by the industrial camera is denoised and tone mapped by the preprocessing circuit. The raw message output by the lidar is converted into a point cloud set in the local coordinate system through protocol parsing.
[0087] In the specific implementation of multimodal fusion, the system adopts a spatial voxelization mapping strategy, which divides the spatial volume into several tiny cubic modules. Each voxel simultaneously stores the occupancy status and reflection intensity from the lidar, as well as the RGB color components from the camera and the temperature components from the infrared sensor. This five-dimensional information data structure provides extremely high-dimensional feature input for subsequent semantic recognition. During the end face recognition process, the system uses a pre-trained deep learning model to extract features from the five-dimensional voxel field. By sliding the convolution operator in three-dimensional space, it directly locates the voxel clusters that conform to the geometric and physical characteristics of the wood end face.
[0088] For the correction of oblique projection, the system does not rely on any preset placement angle assumptions. When the system identifies an end face voxel cluster, the 3D spatial geometry correction engine extracts the centroid position of the voxel cluster in 3D space; it uses principal component analysis logic to extract the principal direction of the voxel cluster distribution and defines it as the normal vector direction of the end face; the system maintains a virtual projection plane, which is always perpendicular to the camera optical axis; the engine calculates the rotation and translation matrix between the real end face plane and the virtual projection plane, and maps the detected end face contour pixels back to their real physical positions in 3D space one by one. This correction method based on point cloud geometry can effectively handle the drastic fluctuations in the angle between the end face and the camera within the range of 0 to 75 degrees, ensuring that the absolute deviation of the diameter measurement is controlled within 3 mm.
[0089] In the engineering implementation of multi-view collaborative work, the system adopts a state estimation framework based on pose graph optimization. Whenever the sorting platform moves a certain distance or rotates a certain angle, the system automatically generates a key frame. Each key frame records the local spatial coordinates, UID, and feature descriptors of all observed timber targets at the current moment. By matching the common observed targets between adjacent key frames, the system constructs a constraint network and uses nonlinear optimization logic to continuously adjust the pose estimation values of each key frame, so as to minimize the observation residuals of all timber targets in the global scope. This processing method not only solves the deduplication problem, but also generates a three-dimensional digital twin model of timber stacks in real time.
[0090] The system software's logical architecture employs a multi-threaded concurrent processing mechanism. The data acquisition thread is responsible for writing high-bandwidth sensor data into a circular buffer in real time; the feature extraction thread reads data from the buffer and performs parallel target detection; the geometric correction and deduplication thread maintains and updates the global map in the background; and the threads are synchronized with each other through mutexes and semaphores to ensure the temporal strictness and logical coherence of the data flow during processing.
[0091] Furthermore, to address potential issues such as soil coverage, localized damage, or shadow interference on the wood end face, this system incorporates attention-based feature weighting logic in the feature extraction layer. When the confidence level of a particular modality, such as visible light texture, is low, the system automatically increases the weights from the infrared modality or the laser reflection intensity modality. For instance, a wet wood end face exhibits significant characteristics in laser reflection intensity, while a dirty end face can still maintain a clear outline in the infrared thermogram due to differences in heat capacity. This multimodal complementary mechanism allows the system to maintain an end face detection rate of over 99.5% even in harsh environments such as after rain, at dusk, or during sandstorms.
[0092] Furthermore, in the processing of the wood end face, the present invention has designed a special robust fitting logic for imperfect shapes such as heartwood cracking and sapwood damage. After extracting the end face contour, the system does not simply look for the smallest circumcircle, but uses an edge filtering mechanism based on geometric curvature changes to remove abnormal points caused by bark peeling, and only retains the effective feature points representing the cross-section of the trunk for roundness or ellipticity fitting. This refined processing method further reduces measurement error.
[0093] Furthermore, in the multi-view collaboration process, the system also includes a dynamic occlusion processing logic. When the end face of a piece of wood is partially occluded by the wood in front from a certain viewpoint, the system automatically identifies the occluded area using the spatial occupancy relationship of the 3D point cloud, and extracts the complete features of the target from subsequent or preceding views for splicing and supplementation. This occlusion inference capability based on spatial relationships ensures that, in complex scenarios of tightly stacked goods, each sorting target can obtain a physical parameter evaluation that is closest to the true value.
[0094] Furthermore, under the management of the central control system, all collected raw multimodal data are indexed and stored according to a unified timeline, which provides a complete original chain of evidence for subsequent quality traceability and dispute resolution. The system's built-in self-diagnostic module can monitor the health status of each sensor in real time. Once it is found that the lens is blocked by a large area of dirt or the GNSS signal is lost, it will immediately notify the operators through audible and visual alarms and interface prompts to prevent the generation of erroneous cargo handling data.
[0095] The specific data processing flow within the system:
[0096] In each set of synchronous sampling sequences, the system first executes a point cloud preprocessing procedure. This procedure uses a pass-through filter to remove irrelevant point clouds from the background according to a preset working distance range; then, voxel mesh downsampling logic is applied to reduce the amount of point cloud data while preserving topological features, thereby improving computational efficiency.
[0097] For the filtered point cloud, cluster analysis based on Euclidean distance was used to initially divide the independent timber stacking areas.
[0098] In the image processing sub-link, the system first performs distortion correction, using the pre-stored camera lens distortion coefficients to perform radial and tangential correction on the original image;
[0099] The RGB image is transformed into a more feature-discriminating space, such as Lab space, through color space conversion, and the edge strength of the wood end face is enhanced by multi-scale morphological operators.
[0100] After the image feature map and the projected point cloud are aligned in the feature space, the system enters the critical end face semantic parsing stage. The system not only identifies that this is a wood end face, but also extracts the micro-feature descriptors of the end face through a deep convolutional network, including texture complexity, annual ring distribution frequency, core area eccentricity, etc.
[0101] These descriptors are quantized into a series of fixed-length feature vectors, which are stored in the UID database along with the coordinate information;
[0102] In cross-view matching, in addition to positional constraints, cosine similarity comparison of feature vectors provides a double guarantee for deduplication.
[0103] For the 3D geometric correction engine, the specific operation involves converting the feature point set of the point cloud in the local coordinate system into a normalized coordinate system with the centroid of the end face as the origin. In this coordinate system, the principal plane equation of the end face is determined by analyzing the distribution of the point cloud in each axis.
[0104] The correction logic then projects the contour points in the original image onto the principal plane and calculates the spatial distribution radius of each point. In order to obtain the final radius, the system uses median radius extraction logic, which effectively eliminates outlier interference caused by jagged edges or small patches of skin residue at the contour edges.
[0105] During the global statistics phase, the system sets up geofences based on the latitude and longitude boundaries of the cargo handling area, and only counts timber within the geofence to prevent the collection of irrelevant interfering targets outside the geofence. After completing the scanning of the entire stack, the system performs a global consistency check, confirming the pose closure of the start and end points through loop closure detection logic. If the pose closure error exceeds the set error threshold, the system uses linear interpolation logic to fine-tune the pose of the keyframes along the entire path and updates the associated timber positions synchronously, ensuring that the final cargo handling report has extremely high spatial accuracy and logical rigor.
[0106] The sensor array is mounted on a reinforced beam with shock-absorbing pads. All exposed lens surfaces are coated with a nano-level hydrophobic coating, which can automatically reduce the adhesion of water stains and dust. The system's power supply module is designed with a wide voltage input module and an electromagnetic compatibility filter, which can directly utilize the power supply of the cargo handling platform and effectively isolate electromagnetic interference from high-power motors.
[0107] At the software implementation level, the data processing architecture adopts a containerized microservice model. Each functional module, such as the acquisition-driven service, feature fusion service, and geometric correction service, runs within an independent logical container. This architecture not only enhances the system's fault tolerance—ensuring the main control process continues to run even if a non-core service experiences an occasional failure—but also greatly facilitates future system upgrades. When more advanced edge recognition models or point cloud processing logic emerge, system evolution can be achieved simply by replacing the corresponding microservice image.
[0108] In terms of data communication security, the link layer between the tallying platform and the remote management terminal adopts the TLS encryption protocol, and each batch of tallying data is uploaded with a digital signature to ensure that the tallying results are not tampered with during transmission.
[0109] The system also has a local caching function. In areas of the port where the wireless signal is unstable, the cargo handling data is first stored in the vehicle's high-speed solid-state drive. Once the signal is restored, it will automatically retransmit the data and synchronize it with the cloud.
[0110] To address the physical differences between different wood species, the system has a built-in parameter configuration library, allowing users to select specific identification strategy parameters based on the type of wood being sorted, such as coniferous or broadleaf wood.
[0111] For tree species with highly irregular end-face features, the system automatically increases the weight of laser point cloud in geometric contour extraction; while for tree species with clear annual ring textures, the visible light mode is given a higher priority in diameter calculation. This targeted parameter adaptation further ensures the system's universality in diverse trade scenarios.
[0112] The system's real-time status monitoring interface uses augmented reality technology to overlay the recognition results with the real-time video stream, allowing inventory personnel to intuitively see the green highlight mark of each identified timber and its corresponding real-time diameter data on the screen.
[0113] If the system detects a suspected target with low confidence, it will prompt manual verification with a yellow warning mark. This human-machine collaboration mode maintains a high degree of automation while retaining the last line of defense for human intervention, ensuring the flawless handling process.
[0114] In terms of maintainability design, the system has a full life cycle status monitoring function. By collecting metadata such as sensor operating current, internal temperature, and data frame rate, the system can predict potential hardware failure risks. For example, when the detector temperature of the infrared thermal imaging sensor continues to rise abnormally, the system will automatically trigger an early warning to remind maintenance personnel to check the cooling air duct or clean the heat sink. This preventive maintenance logic significantly reduces the system's unplanned downtime and ensures the efficient operation of port logistics.
[0115] Furthermore, when processing image stitching in overlapping areas from multiple perspectives, the system first searches for stable geometric feature points, such as the center of annual rings or specific crack bifurcation points, within a specific semantic region on the end face of the wood. Since these feature points have clear physical meanings, their matching robustness is far higher than that of blind local gradient feature matching. Through these high-quality corresponding points, the system can calculate a more accurate cross-perspective transformation matrix, thereby achieving seamless texture mapping and target deduplication in the global coordinate system.
[0116] Furthermore, the system's diameter-level calculation module also considers the influence of the environmental refractive index on laser ranging. By integrating temperature and humidity sensors, the system can acquire atmospheric environmental parameters in real time and correct the time-of-flight data of the lidar. This extreme pursuit of physical environmental details ensures that the system's measurement accuracy remains at the highest industrial-grade standard under different seasons and climate conditions.
[0117] Furthermore, for the deep internal gaps that may exist in timber stacks, lidar, through multi-echo technology, can penetrate some branches or gaps to obtain deeper spatial information. By analyzing multiple echo signals, the system can identify the end edges hidden behind the surface timber, thereby providing a more accurate assessment of stacking density and volume, and providing data support for optimizing the volume ratio of logistics warehousing.
[0118] Example 1
[0119] This embodiment was tested at a port timber terminal. The test subject was a batch of tightly stacked radiata pine logs. The tallying platform moved parallel to the stack at a speed of 1.5 m / s, with the working distance maintained between 5 and 8 meters. The industrial camera configuration was: 1 / 1.2-inch global shutter sensor chip, 12.3 million effective pixels, 16mm lens focal length, and fixed aperture of F4.0. The lidar configuration was: 32-line rotating lidar, ranging frequency of 640kHz, and vertical resolution of 1.25 degrees. The synchronization frequency was: the system's unified trigger frequency was set to 10Hz. The environmental conditions were: cloudy, and some of the log ends were covered with soil from the loading and unloading process.
[0120] Comparative Example 1
[0121] The traditional single-view visual sorting system relies solely on high-resolution cameras for image acquisition, uses conventional deep learning algorithms for end face detection, calculates diameter based on image pixel ratios, and lacks global deduplication functionality based on geographic coordinates, primarily relying on image stitching technology.
[0122] By comparing and sorting 1000 logs with manually measured standard data, the following experimental data table was obtained:
[0123]
[0124] Data analysis shows that in complex stacked scenarios, due to the lack of depth information and spatial geometric correction logic in Comparative Example 1, the diameter measurement error increases rapidly when the shooting angle is skewed or the end face is damaged. However, this invention can restore the oblique projection to an orthophoto image through a three-dimensional spatial geometric correction engine, thereby ensuring measurement accuracy at the physical level. The cross-view topology collaborative deduplication module based on RTK-GNSS and IMU completely solves the problem of duplicate statistics in mobile operations.
[0125] Specifically, the multi-source feature deep fusion module employs adaptive illumination compensation logic during processing. This logic monitors the image histogram in real time and performs local contrast stretching on the wood end face in the shadow area. When the ambient brightness is below 30 lux, the system not only activates the auxiliary infrared fill light array but also automatically switches to the contour extraction mode based on the infrared thermal imaging sensor. In this mode, due to the difference in moisture content between the fresh cut surface and the old bark of the wood end face, its thermal emissivity exhibits obvious ring-shaped characteristics, which provides a second layer of protection for the extraction of growth rings and center points in addition to visible light.
[0126] In the detailed implementation of 3D geometric correction, the system first executes a point cloud preprocessing program, which removes background point clouds, such as cranes and containers behind stacks, according to a preset working distance range using a pass-through filter.
[0127] By applying voxel mesh downsampling logic, the point cloud density is reduced to a level that the computing core can process in real time while preserving key topological features. For each selected end face voxel cluster, the 3D spatial geometry correction engine calculates its principal component directions. Specifically, the system analyzes the scatter matrix of the voxel cluster in 3D space and extracts the eigenvector corresponding to the minimum eigenvalue as the normal vector of the plane. In this way, even if the end face itself is not completely flat due to sawing, the system can fit a most representative mean plane.
[0128] To address imperfect shapes on the end faces of wood, such as heartwood cracks or localized bark peeling, this invention introduces an edge filtering mechanism based on geometric curvature changes in the feature extraction layer. After extracting the contour, the system calculates the local curvature of each sampling point on the contour line. Outliers with abrupt changes in curvature, usually caused by cracks or drooping bark, are automatically removed by the system. Instead, a robust ellipse fitting algorithm is used to fit the remaining valid edge points. This approach can obtain parameters that are closer to the true diameter of the wood than simply finding the smallest circumcircle.
[0129] In the multi-view collaborative software logic, the system adopts a multi-threaded concurrent processing mechanism. The data acquisition thread uses zero-copy technology to write high-bandwidth image and point cloud data into a circular buffer; the feature extraction thread reads data from the buffer in parallel and uses GPU-accelerated operators for target detection; the geometric correction and deduplication thread maintains the global map in the background. This architecture ensures that even during high-speed movement, the system will not experience frame drops or processing backlog.
[0130] The cross-view topology collaborative deduplication module also has closed-loop detection logic. When the cargo handling platform completes a cycle of circular path and returns to the vicinity of the starting position, the system triggers global consistency optimization by identifying the landmark timber features of known geographic coordinate points. If the accumulated pose drift error is found to exceed the preset error threshold of 10 cm, the system will automatically start the adjustment logic to fine-tune the pose of the key frames of the entire path. This process is similar to the closed-loop differential correction in the field of surveying and mapping, ensuring that the final cargo handling report remains logically consistent after long-distance operations.
[0131] The central control and data storage system is also specially designed with a data security and integrity verification mechanism. Each set of generated inventory data, including the UID, timestamp, spatial coordinates, diameter parameters and corresponding multimodal original slices of a single timber, will generate a unique hash value and be digitally signed. This data is synchronized to the cloud management platform in real time through encrypted wireless links, such as 5G private networks or satellite links. This end-to-end digital evidence storage provides tamper-proof evidence support for subsequent quality traceability and trade dispute resolution.
[0132] At the human-computer interaction level, the system uses augmented reality technology to overlay the cargo handling status on the operation monitoring interface in real time. Operators can directly see each piece of timber marked as counted by the system on the screen. The end face of the timber is covered by a green semi-transparent digital ring, and the center of the ring displays the predicted diameter of the timber.
[0133] If a target is identified by the system as a suspected target, such as one that is severely damaged or whose area is obscured by more than 70%, the interface will flash yellow to alert the operator, allowing the operator to manually confirm or retest by clicking the screen. This closed loop of human-machine collaboration ensures the accuracy of cargo handling in extreme and special circumstances.
[0134] Furthermore, in the multi-view collaboration process, the system includes a dynamic occlusion processing logic. In tightly stacked timber, the back row of timber is often partially occluded by the front row. The system uses the multi-echo technology of lidar to detect the edges of the back row of timber that are exposed due to gaps. Combined with the spatial occupancy relationship of the three-dimensional point cloud, the system can infer the possible shape of the occluded area and find the complete visible frame of the target from different observation angles generated by subsequent movement. Once a match is successful, the system will automatically perform feature stitching to obtain the complete physical parameters of the occluded timber.
[0135] All sensors are encapsulated in an airtight protective cover with a temperature control system. The surface of the cover is coated with a nano-level hydrophobic and anti-scaling layer, which can effectively prevent rainwater drips and dust adsorption. The small semiconductor cooling / heating chip integrated inside the cover can stabilize the internal ambient temperature between 15 and 30 degrees Celsius, ensuring that the precision optical components and electronic chips will not experience thermal drift due to drastic fluctuations in ambient temperature, thereby guaranteeing long-term calibration accuracy.
[0136] This system is not only suitable for log inventory management, but its flexible parameter configuration library also supports the adaptation of different specifications of wood such as square timber and boards. For regularly shaped wood, the three-dimensional spatial geometric correction engine will automatically switch to the rectangle fitting mode and add detection logic for the flatness of the sides. This high degree of versatility makes this invention widely applicable to the entire industry chain, including logging, timber yards, port transshipment, and warehousing statistics in back-end processing plants.
[0137] In the advanced maintenance mode of the system, the self-diagnostic module monitors the health status of the sensor array in real time. For example, if the industrial camera detects that the average contrast of multiple consecutive frames of images is lower than the contrast threshold, or if the lidar reports that the echo intensity in a certain sector is abnormally low, the system will determine that the lens may be blocked or contaminated and immediately issue a cleaning warning to the operator.
[0138] In summary, this invention constructs a complete and rigorous intelligent timber sorting technology system through multi-sensor hard synchronization acquisition, three-dimensional spatial geometric reconstruction, global deduplication based on absolute geographic coordinates, and robust design for harsh working conditions. It not only solves the bottleneck of accuracy in two-dimensional vision solutions, but also overcomes the shortcomings of traditional manual sorting in terms of efficiency and subjectivity.
[0139] By implementing this invention, the measurement accuracy in the timber trade process has been qualitatively improved, and key technical support has been provided for the transparency and intelligence of forestry logistics. The functions of each module and their collaborative logic shown in this embodiment fully demonstrate the high feasibility and excellent performance of this invention in practical engineering applications.
[0140] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.
Claims
1. A wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion, characterized in that, The system is built on a mobile cargo handling platform and includes: A multimodal data acquisition array acquires multi-source sensing data of the timber stacks to be sorted, including color image data, three-dimensional laser point cloud data, and long-wave infrared thermal imaging data. The spatiotemporal reference synchronization module is electrically connected to the multimodal data acquisition array to provide a time alignment reference and spatial pose parameters for the multi-source sensing data. The multi-source feature deep fusion module performs spatial voxel mapping on the multi-source sensing data and generates an enhanced feature map for identifying the end face of wood by dynamically adjusting the feature weights of different modalities. The 3D spatial geometry correction engine constructs an inverse mapping matrix of projective transformation based on the 3D spatial pose of the wood end face, restores the wood contour in the 2D image to the orthogonal circular contour in physical space, and calculates the physical scale parameters of the wood. The cross-view topology collaborative deduplication module uses absolute geographic coordinates as a benchmark to construct a global dynamic spatial database. Through spatiotemporal topology association and motion compensation logic, it uniquely identifies and deduplicates timber targets that are repeatedly observed from different perspectives. The central control and data storage system coordinates the concurrent operation of each module, generates electronic inventory lists with geofence tags, and transmits the electronic inventory lists to the remote management terminal. The multimodal data acquisition array includes: At least two symmetrically distributed industrial cameras, employing global shutter CMOS sensors, are used to capture the texture features of the wood end face; A set of multi-line lidar is positioned above the geometric center of the industrial camera for real-time scanning of the three-dimensional point cloud of the timber stack. A set of long-wave infrared thermal imaging sensors is used to extract the contour features of the wood end face by utilizing the difference in emissivity between the wood and the environmental background. The industrial camera is mounted on the sorting platform via a three-axis active image stabilization gimbal. The three-axis active image stabilization gimbal uses reverse compensation logic to cancel the vibration of the sorting platform based on the real-time motion parameters provided by the spatiotemporal reference synchronization module. An auxiliary infrared fill light array, whose emission wavelength matches the peak photosensitive response of the industrial camera, is controlled by the central control and data storage system and automatically turns on when the ambient light intensity is lower than the set light intensity threshold. The three-dimensional spatial geometry correction engine includes performing the following steps when calculating timber diameter grades: The local point cloud slices corresponding to each end face of the wood to be identified are obtained from the multi-source feature deep fusion module, and the geometric plane representing the end face of the wood is fitted by random sampling consistency logic. Calculate the normal vector of the geometric plane, and determine the actual tilt posture of the wood end face in three-dimensional space based on the three-dimensional angle between the normal vector and the camera optical axis direction vector. Construct the inverse mapping matrix of the projective transformation to resample the elliptical end-face projection contour in the two-dimensional image into an orthophoto circle contour in the three-dimensional physical space. By combining the focal length parameters of the industrial camera with the depth distance measured by the multi-line lidar, the physical diameter, perimeter, and cross-sectional area of the wood end face are calculated. The three-dimensional spatial geometry correction engine also includes: The correction module based on the longitudinal axis prediction of timber is used to combine the sparse point cloud captured by the multi-line lidar on the side of the timber to estimate the overall orientation and length of a single timber, and to perform an integral operation on the end face area and the estimated effective length to realize the measurement of the volume of a single timber. The edge filtering module based on geometric curvature variation is used to calculate the local curvature of sampling points on the contour line after extracting the end face contour, remove abnormal points with abrupt curvature changes, and use a robust fitting algorithm to fit the remaining effective feature points to eliminate measurement errors caused by heartwood cracking or sapwood damage. 2.The wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion of claim 1, characterized in that, The spatiotemporal reference synchronization module includes: The hardware trigger controller, driven by a field-programmable gate array (FPGA), is connected to the industrial camera, multi-line lidar, and long-wave infrared thermal imaging sensor via hardwired connections to generate frequency-controlled synchronization pulse sequences. The inertial measurement module records the triaxial acceleration and triaxial angular velocity of the sorting platform at the instant of sampling. The differential positioning system provides the absolute spatial coordinates of the sorting platform at the moment of sampling; The spatiotemporal reference synchronization module integrates data from the inertial measurement module and the differential positioning system to output the six-degree-of-freedom real-time pose parameters of the cargo handling platform. 3.The wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion of claim 2, characterized in that, The multi-source feature deep fusion module executes the following logic during operation: The parameter matrix obtained through offline calibration enables pixel-level spatial alignment between the 3D point cloud captured by the multi-line lidar and the 2D image captured by the industrial camera. A spatial voxelization mapping strategy is adopted to divide the spatial volume into several voxel modules. Each voxel module simultaneously stores the occupancy status and reflection intensity from the multi-line lidar, the RGB color components from the industrial camera, and the temperature components from the long-wave infrared thermal imaging sensor, forming a five-dimensional feature data structure. When extracting features from the end face of wood, an adaptive illumination compensation logic is introduced. When the image contrast of the visible light mode is lower than a preset threshold, the feature weights of the reflection intensity of the long-wave infrared thermal imaging sensor and the multi-line lidar are automatically adjusted. 4.The wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion of claim 2, characterized in that, The cross-view topology collaborative deduplication module includes the following logic when processing overlapping observation data: A global spatial database is established based on the absolute geographic coordinates provided by the differential positioning system. A unique global identification number (UID) is assigned to each newly identified timber, and its three-dimensional centroid coordinates, plane normal vector, and corrected physical characteristic parameters are stored. When the cargo handling platform moves to the subsequent viewpoint, it predicts the theoretical projection position of the timber stored in the database on the current image plane based on the current pose parameters. By calculating the spatial distance deviation and feature vector similarity between the detected target in the current frame and the predicted target in the database, it is determined whether the current target is a recorded target; The feature vector includes descriptors of wood end-face annual ring distribution frequency, core area eccentricity, and texture complexity extracted using a deep convolutional network; If both the spatial distance deviation and the feature vector similarity are within the set deviation threshold, they are determined to be the same target, and their pose attributes are updated. 5.The wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion of claim 4, characterized in that, The cross-view topology collaborative deduplication module also includes: The state estimation logic based on pose graph optimization is used to generate key frames during the movement of the cargo handling platform. A constraint network is constructed by matching common observation targets between adjacent key frames, and nonlinear optimization logic is used to adjust the pose of the key frames to minimize the observation residuals in the global scope. The closed-loop detection logic involves correcting the pose drift by identifying key timber features when the sorting platform returns to the known geographical coordinates. If the pose closure error exceeds a set error threshold, linear interpolation logic is used to adjust the pose of the keyframes along the entire path. 6.The wood intelligent cargo handling system based on multi-view cooperation and multi-modal fusion of claim 1, characterized in that, The central control and data storage system includes the following operations in its operation management: A multi-threaded concurrent processing mechanism is adopted, using mutexes and semaphores to synchronize the data acquisition thread, feature extraction thread, and geometric correction and deduplication thread; It provides a real-time monitoring interface based on augmented reality (AR) technology, which overlays the recognition results, real-time path data, and global identification number (UID) onto the real-time video stream as a semi-transparent layer.
7. The intelligent timber sorting system based on multi-view collaboration and multi-modal fusion according to claim 1, characterized in that, The system also includes: The dynamic occlusion processing module uses the multi-echo technology of the multi-line lidar to detect the edge of the back row of wood at the gap, and combines the three-dimensional spatial occupancy relationship to infer the shape of the occluded area, and extracts the features of the target from different perspectives for splicing and supplementation. The physical environment correction module integrates temperature and humidity sensors to acquire atmospheric environmental parameters and performs refractive index correction on the time-of-flight data of the multi-line lidar. The self-diagnostic module monitors the current, temperature, and frame rate of each sensor in the multimodal data acquisition array in real time, and triggers an audible and visual alarm when lens contamination or signal loss is detected.