A coal gangue multi-modal fusion identification method adaptive to complex working conditions
By employing a multimodal fusion recognition method, utilizing visible light, spectral, and 3D point cloud data, and combining global feature attenuation weights, the problem of low accuracy in coal gangue identification under complex working conditions was solved, achieving efficient and accurate coal gangue classification and ensuring the quality of clean coal.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUIZHOU SURVEY & DESIGN RES INST FOR WATER RESOURCES & HYDROPOWER
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-19
AI Technical Summary
Under complex working conditions, existing technologies are unable to effectively identify coal gangue, resulting in low identification accuracy. Coal gangue is mixed into the clean coal bin, causing a decline in coal quality and environmental pollution.
A multimodal fusion recognition method is adopted. By acquiring visible light images, hyperspectral images and three-dimensional point cloud data, a pre-trained multimodal network is used to analyze the predicted probability distribution of coal and gangue in individual material clusters. Coal and gangue are classified by combining global feature attenuation weights, which solves the problem of recognition accuracy under complex working conditions.
It significantly improves the stability and accuracy of identification under complex working conditions, effectively prevents large-volume disguised gangue from mixing into clean coal, and ensures that the ash content of washed clean coal meets the standards.
Smart Images

Figure CN122244623A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of multimodal data processing technology, and more specifically to a multimodal fusion identification method for coal gangue under complex working conditions. Background Technology
[0002] Coal gangue is a solid waste generated during coal mining and washing. Its composition is complex and contains a wide variety of impurities. If coal gangue is not effectively separated, it will not only degrade coal quality but also occupy significant land resources and cause environmental pollution. Therefore, accurate and efficient identification of coal gangue is a crucial step in achieving the clean utilization of coal.
[0003] Existing technologies typically employ a combination of visible light cameras, hyperspectral cameras, and 3D depth cameras for multimodal fusion identification of coal gangue materials. However, in actual working conditions, the coal gangue materials on the conveyor belt are usually in a wet state. Under the high-speed vibration of the conveyor belt, the wet coal slime can spread and adhere to adjacent materials. Furthermore, the 3D depth camera is affected by mutual occlusion between materials, resulting in incomplete local point cloud data. Under these complex conditions, existing technologies typically use fixed weight fusion or simple numerical filtering, which can easily lead to coal gangue being mixed into the clean coal bin, resulting in a significant reduction in identification accuracy. Summary of the Invention
[0004] To address the technical problem that, under actual working conditions, coal and gangue materials on conveyor belts are usually in a wet state, and under the high-speed vibration of the conveyor belt, wet coal slime can spread and adhere to adjacent materials, and the 3D depth camera is affected by mutual occlusion between materials, resulting in incomplete local point cloud data, fixed weight fusion or simple numerical filtering can easily lead to coal and gangue mixing into the clean coal bin, causing a significant reduction in recognition accuracy, the present invention aims to provide a multimodal fusion recognition method for coal and gangue applicable to complex working conditions. The specific technical solution adopted is as follows: At each time point, acquire visible light images, hyperspectral images, and 3D point clouds of coal gangue materials within the same field of view; perform clustering and segmentation on the 3D point clouds to obtain individual material clusters; determine the occupied volume and center coordinates of each individual material cluster. A pre-trained multimodal network is used to analyze the data of the corresponding regions of each individual material cluster in visible light images, hyperspectral images, and 3D point clouds, outputting the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions. Based on the difference divergence and probability values between the predicted probability distributions of coal gangue, the prediction difference between the surface and interior of each individual material cluster is determined. Based on the center coordinates of all individual material clusters, the adjacency relationship between individual material clusters is analyzed, and based on this, combined with the prediction difference between the surface and interior of individual material clusters, the confidence level of coal slime pollution for each individual material cluster is determined. The global feature attenuation weight is determined based on the coal slime pollution confidence and occupied volume of each individual material cluster; at each time point, the global feature attenuation weight and the multimodal features of each individual material cluster are fused to determine the coal gangue classification result.
[0005] Furthermore, the method for obtaining the monomeric material clusters includes: The DBSCAN clustering algorithm is used to cluster and segment the 3D point cloud at each time step to obtain all 3D point cloud clusters. Each 3D point cloud cluster is treated as a single material cluster. The neighborhood search radius and the minimum number of contained points are preset values.
[0006] Furthermore, determining the occupied volume and center coordinates of each individual material cluster includes: At each time step, for each of the individual material clusters, a three-dimensional bounding box is generated that encloses its outermost edge; The occupied volume is calculated based on the dimensions of the three-dimensional bounding box, and the geometric center point of the three-dimensional bounding box is used as the center coordinates.
[0007] Furthermore, the method of using a pre-trained multimodal network to analyze the data of each individual material cluster in the corresponding region of the visible light image, hyperspectral image, and 3D point cloud, and outputting the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions, includes: Based on a pre-calibrated extrinsic matrix, the visible light image and hyperspectral image are subjected to perspective projection, and the pixels and spectral curves are mapped to the corresponding three-dimensional point cloud coordinates, thereby determining the visible light image slice, hyperspectral sequence slice and three-dimensional point cloud cluster corresponding to each individual material cluster. The visible light image slices, hyperspectral sequence slices, and three-dimensional point cloud clusters are respectively input into the pre-trained visible light network, hyperspectral network, and point cloud network for forward inference to obtain high-dimensional features under visible light, hyperspectral, and depth conditions. The values in each of the high-dimensional features are normalized, and the corresponding two-dimensional numerical vectors are output as the predicted probability distributions of coal and gangue under visible light, hyperspectral, and depth-of-field conditions, respectively. The values in the two-dimensional numerical vectors are the coal probability and the gangue probability, respectively.
[0008] Furthermore, the method for obtaining the predicted difference between the table and the interior includes: At each time point, the difference divergence between the visible light coal gangue prediction probability distribution, the hyperspectral coal gangue prediction probability distribution, and the depth-of-field coal gangue prediction probability distribution for each individual material cluster is analyzed to determine the prediction difference between any two coal gangue prediction probability distributions. The coal probability in the hyperspectral coal gangue prediction probability distribution is extracted as the hyperspectral coal judgment probability. The product of the negative correlation mapping between the prediction difference between visible light and hyperspectral light, the prediction difference between hyperspectral light and depth of field, and the hyperspectral coal judgment probability is used as the surface-to-interior prediction difference for each individual material cluster.
[0009] Further, determining the prediction difference between any two coal gangue prediction probability distributions includes: At each time point, in the visible light prediction probability distribution, hyperspectral prediction probability distribution, and depth-of-field prediction probability distribution of each individual material cluster, the information entropy distance between each pair is calculated as the prediction difference between each mode, wherein the information entropy distance is the Jensen-Shannon divergence based on the logarithm with a base of 2.
[0010] Furthermore, the method for obtaining the confidence level of coal slime pollution includes: Based on the center coordinates of all individual material clusters, the adjacency relationship between individual material clusters is analyzed. On this basis, the numerical characteristics of the difference between the surface and interior predictions of individual material clusters within a local area are analyzed to determine the point cloud missing isolation degree of each individual material cluster. Calculate the difference between the point cloud isolation degree of the single material cluster and the preset isolation degree judgment threshold. If it is positive, the product of the difference and the preset noise attenuation coefficient is negatively correlated and normalized, and the result is used as the attenuation factor. Otherwise, the preset value is used as the attenuation factor. The product of the attenuation factor corresponding to the individual material cluster and the difference between the prediction in the table is used as the confidence level of the coal slime pollution of the individual material cluster. If the point cloud isolation degree of the individual material cluster is assigned an upper limit constant, then its coal slime pollution confidence level is directly set to 0.
[0011] Furthermore, the method for obtaining the point cloud's missing isolation degree includes: Extract the x and y coordinate nodes of the center coordinates of each individual material cluster on the two-dimensional projection plane of the x and y axes; The two-dimensional Delaunay triangulation algorithm is used to connect the horizontal and vertical coordinate nodes of all individual material clusters to generate a non-intersecting material surface adjacency graph. The numerical difference between any two connected individual material clusters on the z-axis in the adjacency graph is less than a preset height threshold. For any single material cluster, neighboring single material clusters that are directly connected to it are identified in the adjacency graph of the material surface. The average of the predicted difference between the surface and interior of all neighboring single material clusters is calculated as the mean difference. The predicted difference between the surface and interior of the single material cluster is used as the numerator, and the sum of the mean difference and a preset minimum positive number is used as the denominator. The resulting ratio is used as the point cloud isolation degree of the single material cluster. If the single material cluster does not have any directly connected neighboring single material clusters, its point cloud isolation degree is set to a preset upper limit value.
[0012] Furthermore, the method for obtaining the global feature decay weights includes: The volume occupied by each individual material cluster in the field of view at each time step is multiplied by its corresponding coal slime pollution confidence level and then summed to obtain the total volume of coal slime encapsulation. The difference between 1 and the coal slime pollution confidence level of each individual material cluster is taken as the safety probability of each individual material cluster. The occupied volume of each individual material cluster is multiplied by its corresponding safety probability and then summed to obtain the total volume of normal material. The sum of the total volume of normal material, the total volume of coal slime, and a preset constant is used as the denominator, and the total volume of normal material is used as the numerator. The resulting ratio is used as the global feature attenuation weight in the field of view at each time step.
[0013] Furthermore, at each time step, the global feature attenuation weights and the multimodal features of each individual material cluster are fused to determine the coal gangue classification result, including: For each individual material cluster, we extract the visible light high-dimensional features, hyperspectral high-dimensional features, and depth-of-field high-dimensional features generated during the forward inference process of the multimodal network. At each time point, the global feature attenuation weight is multiplied by the visible light high-dimensional feature vector and the hyperspectral high-dimensional feature vector of each individual material cluster to obtain the visible light feature array and the hyperspectral feature array. The visible light feature array, the hyperspectral feature array, and the depth high-dimensional feature are concatenated to obtain the comprehensive feature vector after channel fusion. The comprehensive feature vector is input into a multilayer perceptron classifier to perform forward inference, and the specific label of each individual material cluster is output as the final coal gangue classification result. The specific label is divided into coal label or gangue label.
[0014] The present invention has the following beneficial effects: By synchronously acquiring multimodal data of coal gangue materials within the same field of view, it is possible to ensure strict spatial and temporal alignment of visible light, hyperspectral, and 3D point cloud data, providing a foundation for subsequent accurate fusion and significantly improving the recognition stability under complex working conditions (such as material flow and vibration). Clustering and segmenting the coal gangue materials using 3D point clouds decomposes the mixture into independent units, avoiding misclassification caused by material adhesion or overlap. Furthermore, the quantitative analysis of occupied volume and center coordinates provides key geometric parameters for subsequent adjacency relationship judgment and global feature weighting. Utilizing multimodal networks to fully mine the complementary information of visible light (texture / color), hyperspectral (compositional spectrum), and 3D point cloud (spatial structure), the predicted probability distribution of coal gangue under each data type is obtained. Simultaneously, the difference divergence and numerical characteristics of the predicted probability distribution of coal gangue between different data types are analyzed to determine the surface-to-internal prediction difference of individual material clusters. This is used to characterize the degree of inconsistency between the optical signal on the material surface and the internal geometric contour, effectively improving the recognition capability for complex scenarios such as coal gangue surface covered with coal slime and similar colors. Furthermore, the central coordinate adjacency analysis incorporates spatial context information. By analyzing the adjacency relationships between individual material clusters and combining this with the predicted difference between the surface and interior of individual material clusters, the coal slime contamination confidence of individual material clusters is calculated. This effectively distinguishes between "isolated point cloud fragments" caused by hardware occlusion and genuine "coal slime contamination." By determining the global feature attenuation weight based on the coal slime contamination confidence and the volume occupied by individual components, and using this weight for dynamic fusion classification of multimodal features, it can effectively prevent large-volume disguised gangue from mixing into clean coal under harsh working conditions where large-area coal slime contamination occurs on the conveyor belt, thus strongly ensuring that the ash content of the washed clean coal meets the standards. Attached Figure Description
[0015] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 This is a flowchart of a method for multimodal fusion identification of coal gangue adapted to complex working conditions, provided as an embodiment of the present invention. Detailed Implementation
[0017] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a multimodal fusion identification method for coal gangue adapted to complex working conditions proposed according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0019] The following description, in conjunction with the accompanying drawings, details a specific scheme for a multimodal fusion identification method for coal gangue adapted to complex working conditions provided by the present invention.
[0020] Please see Figure 1 The diagram illustrates a flowchart of a multimodal fusion identification method for coal gangue adapted to complex working conditions, provided by an embodiment of the present invention. The method includes the following steps: Step S1: At each time point, acquire visible light images, hyperspectral images, and three-dimensional point clouds of coal gangue materials within the same field of view; perform clustering and segmentation on the three-dimensional point clouds to obtain individual material clusters; determine the occupied volume and center coordinates of each individual material cluster.
[0021] On the coal gangue conveyor belt, a multi-sensor rigid mounting bracket integrates a visible light camera, a hyperspectral camera, and a 3D depth-sensing camera onto the same mechanical structure, all facing the same detection area of the conveyor belt. This ensures that the optical axes of all cameras are parallel and their fields of view overlap, guaranteeing that the images acquired by each camera contain the same material object. A standard-sized black and white checkerboard calibration plate is placed on the stationary conveyor belt surface. The three cameras simultaneously capture images of this calibration plate. Based on the multiple sets of images and point clouds captured, the system uses a known technique based on Zhang Zhengyou's calibration method to calculate the rotation and translation extrinsic parameter matrices of the visible light camera and the hyperspectral camera relative to the absolute coordinate system of the 3D depth-sensing camera.
[0022] After entering the online real-time processing flow, the synchronous acquisition timestamp is sent out according to the preset fixed acquisition frequency (e.g., 20Hz). Whenever this timestamp is received The visible light camera, hyperspectral camera, and 3D depth camera simultaneously capture visible light images, hyperspectral images, and 3D point clouds of coal gangue materials within the common field of view at the current moment.
[0023] After obtaining the visible light image, hyperspectral image, and 3D point cloud at each moment, the 3D point cloud can be clustered to obtain individual clusters, which are used to decompose the mixture into independent individuals and avoid misclassification caused by material adhesion or overlap.
[0024] Preferably, in one embodiment of the present invention, the method for obtaining monomeric material clusters includes: The coal gangue on the material conveyor belt has extremely irregular shapes and huge differences in size. The DBSCAN clustering algorithm does not require a preset number of clusters and can adaptively discover material point clouds of arbitrary shapes. Therefore, based on the DBSCAN clustering algorithm, the three-dimensional point cloud at each time point is clustered and segmented to obtain all three-dimensional point cloud clusters. Each three-dimensional point cloud cluster is regarded as a single material cluster. The neighborhood search radius and the minimum number of contained points are preset values. In this embodiment of the invention, the neighborhood search radius is set to 3mm and the minimum number of contained points is set to 20. The specific values can be adjusted according to the implementation scenario and are not limited here.
[0025] It should be noted that the DBSCAN clustering algorithm is a well-known technique, and the specific process will not be described in detail here.
[0026] The evaluation standard for the quality of clean coal in coal washing plants is the total volume or weight of the mixed gangue, rather than the number of gangue. Therefore, after obtaining all the individual material clusters, the occupied volume and coordinate center of each individual material cluster can be determined, which can provide key geometric parameters for subsequent analysis.
[0027] Preferably, in one embodiment of the present invention, determining the occupied volume and center coordinates of each individual material cluster includes: Given the highly irregular shape of coal gangue, at each time step, for each individual material cluster, the AABB (Axis-Aligned Bounding Box) algorithm is first used to generate a 3D bounding box that encloses its outermost edge, thus defining the maximum physical outline of the individual material cluster. Then, based on the dimensions (length, width, and height) of the 3D bounding box, the occupied volume (the product of length, width, and height) is calculated, and the geometric center point of the 3D bounding box is used as the center coordinate.
[0028] It should be noted that the AABB algorithm is a well-known technique, and the specific process will not be elaborated here.
[0029] Step S2: Analyze the data of each individual material cluster in the corresponding region of the visible light image, hyperspectral image, and 3D point cloud using a pre-trained multimodal network, and output the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions; determine the surface-to-interior prediction difference degree of each individual material cluster based on the difference divergence and probability values between the predicted probability distributions of coal gangue; analyze the adjacency relationship between individual material clusters based on the center coordinates of all individual material clusters, and on this basis, combine the surface-to-interior prediction difference degree of individual material clusters to determine the coal slime pollution confidence level of each individual material cluster.
[0030] Visible light is easily affected by dust or becomes ineffective due to similar colors. Hyperspectral imaging can identify the differences in mineral composition (chemical properties) of coal and gangue through spectral features, while 3D point clouds can reflect the differences in geometric shape after crushing (physical properties). Therefore, in this step, data complementarity from different modalities is utilized to extract identification features of coal and gangue from three dimensions: chemical composition (hyperspectral), surface texture (visible light), and physical morphology (3D point cloud). This outputs multi-dimensional classification evidence, yielding the predicted probability distribution of coal and gangue under visible light, hyperspectral, and depth-of-field conditions.
[0031] Preferably, in one embodiment of the present invention, a pre-trained multimodal network is used to analyze the data of the corresponding region of each individual material cluster in visible light images, hyperspectral sequences, and three-dimensional point clouds, and output the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions, including: Based on the pre-calibrated extrinsic parameter matrix (the rotation and translation extrinsic parameter matrix calculated in step S1), perspective projection is performed on the visible light image and the hyperspectral image, mapping the pixels in the visible light image and the spectral curves in the hyperspectral image to the corresponding three-dimensional point cloud coordinates, thereby achieving spatial alignment of multimodal data, and thus determining the visible light image slice, hyperspectral sequence slice and three-dimensional point cloud cluster corresponding to each individual material cluster.
[0032] Then, visible light image slices, hyperspectral sequence slices, and 3D point cloud clusters are input into pre-trained visible light networks, hyperspectral networks, and point cloud networks for forward inference to obtain high-dimensional features in visible light, hyperspectral, and depth-of-field conditions.
[0033] The values in each high-dimensional feature are normalized (using the Softmax activation function), and the corresponding two-dimensional numerical vectors are output as the predicted probability distributions of coal and gangue under visible light, hyperspectral, and depth of field. The values in the two-dimensional numerical vectors are the coal probability (first value) and the gangue probability (second value), respectively.
[0034] It should be noted that the multimodal network in this embodiment of the invention refers to a visible light network, a hyperspectral network, and a point cloud network. The visible light network uses the YOLOv8n network, which is good at extracting the surface texture, color, and two-dimensional contour edges of 2D visible light. The hyperspectral network uses the 1D-ResNet18 network, which can deeply mine the peak and trough features of the hyperspectral curve to reflect the different chemical absorption rates of coal and gangue. The point cloud network uses the PointNet network, which can directly process disordered three-dimensional point sets and extract the real spatial volume, morphology, and surface roughness. The training process is briefly described as follows: A pre-established offline sample set is read, which includes clean materials and materials contaminated to varying degrees by coal slime. For this offline sample set, supervised pre-training is performed on the YOLOv8n network, the 1D-ResNet18 network, and the PointNet network, respectively. When performing offline pre-training on the multilayer perceptron (MLP) classifier that will subsequently perform fusion decision-making, simulated attenuation weights randomly generated in the range of 0 to 1 are introduced. Using these simulated attenuation weights, the optical (visible light and hyperspectral) high-dimensional features of the offline sample set are synchronously and randomly scaled. By introducing this random scaling data augmentation operation, the multilayer perceptron classifier can adapt to the dynamic fluctuations of the optical feature amplitude during the training phase, thereby establishing a classification mapping relationship based on depth features as the dominant factor under extreme pollution conditions. After pre-training is completed, the internal weight parameters of the above networks are fixed.
[0035] In complex real-world applications, if a piece of gangue is encased in black coal slime, visible light and hyperspectral imaging can only detect the surface slime (predicting it as "coal" with a high probability), while 3D point cloud imaging detects its geometric shape (predicting it as "gangue" with a high probability). This physical phenomenon of "inconsistency between surface (optical) and internal (morphological)" is mathematically represented by a large divergence in probability distributions between different modes. Therefore, based on the divergence and probability values between the predicted probability distributions of coal and gangue under different modes, the surface-to-internal prediction difference of each individual material cluster can be determined, which characterizes the consistency of the data from different modes in determining the surface material of the individual material cluster.
[0036] Preferably, in one embodiment of the present invention, the method for obtaining the difference between the predictions in the table and the content includes: JS divergence measures the difference between two probability distributions. Therefore, at each time point, for each individual material cluster, the information entropy distance between any two predicted probability distributions of coal and gangue under visible light, hyperspectral light, and depth of field is calculated as the prediction difference between modes. The information entropy distance is the Jensen-Shannon divergence based on a logarithm with a base of 2. Thus, any two of the three predicted probability distributions of coal and gangue have a prediction difference; the larger the value, the lower the consistency between the two modes' judgments on the surface material of the individual material cluster.
[0037] Then, the coal probability in the hyperspectral coal gangue prediction probability distribution is extracted as the hyperspectral coal identification probability. The hyperspectral coal identification probability is used as a constraint factor to ensure that the subsequent analysis is based on the presence of high carbon signals on the surface of the individual material cluster. The smaller the prediction difference between visible light and hyperspectral, the higher the consistency between the surface materials detected by visible light and hyperspectral. Therefore, the prediction difference is negatively correlated and mapped to correct the logical relationship, serving as a surface consistency factor. The larger the prediction difference between hyperspectral and depth of field, the more serious the discrepancy between the surface carbon signal of the individual material cluster sensed by the optical camera and the internal geometric contour of the individual material cluster sensed by the depth of field camera.
[0038] Therefore, the product of the surface consistency factor, the prediction difference between hyperspectral and depth of field, and the probability of coal identification by hyperspectral is used as the prediction difference between the surface and interior of each individual material cluster. The greater the prediction difference between the surface and interior, the higher the suspicion that the surface of the individual material cluster is wrapped with high carbon material and the interior shows the outline of gangue, that is, the higher the degree of deviation between the surface and interior prediction.
[0039] In actual conveyor belt operation, coal slime splashing, dripping, or water vapor adhesion usually presents a "sheet-like distribution," meaning that the pollution has a strong spatial continuity. Therefore, by establishing the adjacency relationship between individual material clusters and combining this with the predicted difference between the surface and interior of individual material clusters, the confidence level of coal slime pollution for each individual material cluster can be determined, which can better reflect the pollution assessment situation under real physical conditions.
[0040] Preferably, in one embodiment of the present invention, the method for obtaining the confidence level of coal slime pollution includes: Since the material spreading on the conveyor belt has strong two-dimensional planar characteristics, the height represented by the z-axis has low reference value for adjacency analysis. Therefore, the horizontal and vertical coordinate nodes of the center coordinates of each individual material cluster are extracted on the two-dimensional projection plane of the x-axis and y-axis, which greatly reduces the complexity and accurately restores the physical path of coal slime on the surface of the conveyor belt.
[0041] The two-dimensional Delaunay triangulation algorithm (a well-known technique, which will not be elaborated here) is used to connect the horizontal and vertical coordinate nodes of all individual material clusters to generate a non-intersecting material surface adjacency graph. In order to avoid incorrectly connecting stacked materials, the difference in the z-axis values of any two connected individual material clusters in the adjacency graph should be less than a preset height threshold (which can be set according to the implementation scenario, and is not limited here). In this material surface adjacency graph, two individual material clusters that are directly connected are determined to be in contact or directly adjacent to each other on the conveyor belt surface, which has the physical basis for the mutual penetration of coal slime.
[0042] Therefore, for any single material cluster, neighboring single material clusters with direct connections are identified in the material surface adjacency diagram. The average of the predicted surface-to-interior difference of all neighboring single material clusters is calculated as the mean difference, which characterizes the overall pollution divergence level of the local environment in which the single material cluster is located. The predicted surface-to-interior difference of the single material cluster is used as the numerator, and the sum of the mean difference and a preset minimum positive number is used as the denominator. The resulting ratio is the point cloud missing isolation degree of the single material cluster. The closer the point cloud missing isolation degree is to 1, the more consistent the surface-to-interior divergence of the single material cluster is with the surrounding single material clusters, which is consistent with the situation of coal slime contamination. If the point cloud missing isolation degree is significantly greater than 1, it indicates that the surface-to-interior divergence of the single material cluster is much higher than that of its surrounding single material clusters, and it is considered that the isolation of the single material cluster is higher, showing an abnormal state of absolute isolation. It should be noted that the preset minimum positive number is set to 10. -6 Its function is to prevent the denominator from being 0, and the specific value can be adjusted according to the implementation scenario.
[0043] Specifically, if the individual material cluster does not have any directly connected adjacent individual material clusters, its point cloud isolation degree is set to a preset upper limit value. In this embodiment of the present invention, the preset upper limit value is 2. The specific value can be adjusted according to the implementation scenario and is not limited here.
[0044] Then, the difference between the point cloud isolation degree of the single material cluster and the preset isolation degree judgment threshold (set to 1.2 in this embodiment, the specific value can be set according to the implementation scenario) is calculated. If it is positive, it indicates that the single material cluster exhibits a significant outlier and isolation state in the two-dimensional topology. In this case, it is considered that the divergence of the single material cluster is likely due to independent calculation noise caused by local point cloud missing due to factors such as occlusion of the depth camera. Therefore, the product of this difference and the preset noise attenuation coefficient (set to 5.0 in this embodiment, the specific value can be adjusted according to the implementation scenario) is negatively correlated and normalized, and used as the attenuation factor. At this time, the attenuation factor is small, which is convenient for the removal of occasional noise in the subsequent process. Conversely, if the difference between the point cloud isolation degree of the single material cluster and the preset isolation degree judgment threshold is not positive, it indicates that the divergence between the surface and interior of the single material cluster is more consistent with the surrounding single material clusters, which is consistent with the situation of coal slime contamination. Therefore, the preset value of 1 is used as the attenuation factor. The negative correlation mapping and normalization here can be performed using the formula ,in, Let x represent an exponential function with the natural constant e as the base, and let x represent the independent variable.
[0045] Finally, the product of the attenuation factor corresponding to the individual material cluster and the difference between the prediction in the table is used as the confidence level of the coal slime pollution of the individual material cluster. The larger the value, the lower the possibility of false alarm and the higher the confidence level. If the point cloud isolation degree of the individual material cluster is assigned an upper limit constant, then its coal slime pollution confidence level is directly set to 0.
[0046] Step S3: Determine the global feature attenuation weight based on the coal slime pollution confidence and occupied volume of each individual material cluster; at each time point, fuse the global feature attenuation weight and the multimodal features of each individual material cluster to determine the coal gangue classification result.
[0047] In coal washing plant operations, the benchmark for assessing the quality of clean coal is the total volume of mixed gangue rather than the number of gangue pieces. However, if the network features are modified in isolation based on the confidence level of coal slime mis-contamination of individual material clusters, it is easy to miss some extra-large gangue pieces disguised by heavy coal slime, which will lead to more serious ash content damage. Therefore, in this step, the global feature attenuation weight is first determined based on the confidence level of coal slime contamination and the occupied volume of each individual material cluster.
[0048] Preferably, in one embodiment of the present invention, the method for obtaining the global feature decay weight includes: The volume occupied by each individual material cluster in the field of view at each time step is multiplied by its corresponding coal slime contamination confidence level and then summed to obtain the total volume of coal slime encapsulation. Since the coal slime contamination confidence level represents the probability that the individual material cluster after noise reduction is considered to be a true coal slime encapsulation, the larger the total volume of coal slime encapsulation obtained at this time, the higher the material space load of the large-area continuous contamination of lean coal slime in the field of view at that time step, and the more severe the contamination degree.
[0049] The difference between 1 and the coal slime contamination confidence of each individual material cluster is taken as the safety probability of each individual material cluster, which is the probability that the individual material cluster is not contaminated by coal slime, the surface is clean, and the subsequent classification prediction results are more reliable. Then, the occupied volume of each individual material cluster is multiplied by its corresponding safety probability and summed to obtain the total volume of normal material, which reflects the total capacity of normal material whose surface is not covered by coal slime in the field of view at that moment.
[0050] Finally, the sum of the total volume of normal material, the total volume of coal slime, and the preset minimum positive number is used as the denominator, and the total volume of normal material is used as the numerator. The resulting ratio is used as the global feature attenuation weight in the field of view at each moment. This calculation process ensures that the generated global feature attenuation weight is strictly between 0 and 1. The closer it is to 0, the more serious the coal slime contamination load on the conveyor belt. Through this global weight, the trust benchmark can be tilted towards the internal geometric contour that is not affected by surface color in subsequent calculations to prevent interference from any surface optical false alarms.
[0051] It should be noted that the default minimum positive number is set to 10. -6 To prevent the denominator from being zero, the specific value can be adjusted according to the implementation scenario.
[0052] In an ideal, clean environment, visible light (color and texture) and hyperspectral data (chemical composition) are the modes with the highest identification accuracy. However, under coal slime cover, these two modes provide extremely high false confidence levels. Therefore, by using the global feature attenuation weights obtained above, the multimodal features of each individual material cluster can be adjusted and then fused. This allows for adjustment of the participation weights of various modes, thereby determining a more accurate coal gangue classification result.
[0053] Preferably, in one embodiment of the present invention, the method for obtaining the coal gangue classification result includes: For each individual material cluster, we extract the visible light high-dimensional features, hyperspectral high-dimensional features, and depth-of-field high-dimensional features generated during the forward inference process of the multimodal network.
[0054] Then, at each time step, the global feature attenuation weight is multiplied by the visible light high-dimensional feature vector and the hyperspectral high-dimensional feature vector of each individual material cluster to obtain the visible light feature array and the hyperspectral feature array, which are used to significantly reduce the overall optical features.
[0055] The visible light feature array, the hyperspectral feature array, and the depth high-dimensional feature are concatenated to obtain a comprehensive feature vector after channel fusion. Finally, the comprehensive feature vector is input into a multilayer perceptron classifier to perform forward inference, outputting a specific label for each individual material cluster as the final coal and gangue classification result. The specific label is divided into a coal label or a gangue label.
[0056] In summary, by synchronously acquiring multimodal data of coal gangue materials within the same field of view, it is possible to ensure strict spatial and temporal alignment of visible light, hyperspectral, and 3D point cloud data, providing a foundation for subsequent accurate fusion and significantly improving the recognition stability under complex working conditions (such as material flow and vibration). Clustering and segmenting coal gangue materials using 3D point clouds decomposes the mixture into independent units, avoiding misclassification caused by material adhesion or overlap. Furthermore, the quantitative analysis of occupied volume and center coordinates provides key geometric parameters for subsequent adjacency relationship judgment and global feature weighting. By fully utilizing the complementary information of visible light (texture / color), hyperspectral (compositional spectrum), and 3D point cloud (spatial structure) through multimodal networks, the predicted probability distribution of coal gangue under each data type is obtained. Simultaneously, the difference divergence and numerical characteristics of the predicted probability distribution of coal gangue between different data types are analyzed to determine the surface-to-internal prediction difference of individual material clusters. This is used to characterize the degree of inconsistency between the surface optical signal and the internal geometric contour of the material, effectively improving the recognition capability for complex scenarios such as coal gangue surface covered with coal slime and similar colors. Furthermore, the central coordinate adjacency analysis incorporates spatial context information. By analyzing the adjacency relationships between individual material clusters and combining this with the predicted difference between the surface and interior of individual material clusters, the coal slime contamination confidence of individual material clusters is calculated. This effectively distinguishes between "isolated point cloud fragments" caused by hardware occlusion and genuine "coal slime contamination." By determining the global feature attenuation weight based on the coal slime contamination confidence and the volume occupied by individual components, and using this weight for dynamic fusion classification of multimodal features, it can effectively prevent large-volume disguised gangue from mixing into clean coal under harsh working conditions where large-area coal slime contamination occurs on the conveyor belt, thus strongly ensuring that the ash content of the washed clean coal meets the standards.
[0057] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0058] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.
Claims
1. A multimodal fusion identification method for coal gangue adapted to complex working conditions, characterized in that, The method includes: At each time point, acquire visible light images, hyperspectral images, and 3D point clouds of coal gangue materials within the same field of view; perform clustering and segmentation on the 3D point clouds to obtain individual material clusters; determine the occupied volume and center coordinates of each individual material cluster. A pre-trained multimodal network is used to analyze the data of the corresponding regions of each individual material cluster in visible light images, hyperspectral images, and 3D point clouds, outputting the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions. Based on the difference divergence and probability values between the predicted probability distributions of coal gangue, the prediction difference between the surface and interior of each individual material cluster is determined. Based on the center coordinates of all individual material clusters, the adjacency relationship between individual material clusters is analyzed, and based on this, combined with the prediction difference between the surface and interior of individual material clusters, the confidence level of coal slime pollution for each individual material cluster is determined. The global feature attenuation weight is determined based on the coal slime pollution confidence and occupied volume of each individual material cluster; at each time point, the global feature attenuation weight and the multimodal features of each individual material cluster are fused to determine the coal gangue classification result.
2. The method for multimodal fusion identification of coal gangue adapted to complex working conditions according to claim 1, characterized in that, The method for obtaining the monomeric material clusters includes: The DBSCAN clustering algorithm is used to cluster and segment the 3D point cloud at each time step to obtain all 3D point cloud clusters. Each 3D point cloud cluster is treated as a single material cluster. The neighborhood search radius and the minimum number of contained points are preset values.
3. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 1, characterized in that, Determining the occupied volume and center coordinates of each individual material cluster includes: At each time step, for each of the individual material clusters, a three-dimensional bounding box is generated that encloses its outermost edge; The occupied volume is calculated based on the dimensions of the three-dimensional bounding box, and the geometric center point of the three-dimensional bounding box is used as the center coordinates.
4. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 1, characterized in that, The method utilizes a pre-trained multimodal network to analyze the data of each individual material cluster in the corresponding region of visible light images, hyperspectral images, and 3D point clouds, outputting the predicted probability distribution of coal gangue under visible light, hyperspectral, and depth-of-field conditions, including: Based on a pre-calibrated extrinsic matrix, the visible light image and hyperspectral image are subjected to perspective projection, and the pixels and spectral curves are mapped to the corresponding three-dimensional point cloud coordinates, thereby determining the visible light image slice, hyperspectral sequence slice and three-dimensional point cloud cluster corresponding to each individual material cluster. The visible light image slices, hyperspectral sequence slices, and three-dimensional point cloud clusters are respectively input into the pre-trained visible light network, hyperspectral network, and point cloud network for forward inference to obtain high-dimensional features under visible light, hyperspectral, and depth conditions. The values in each of the high-dimensional features are normalized, and the corresponding two-dimensional numerical vectors are output as the predicted probability distributions of coal and gangue under visible light, hyperspectral, and depth-of-field conditions, respectively. The values in the two-dimensional numerical vectors are the coal probability and the gangue probability, respectively.
5. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 4, characterized in that, The methods for obtaining the predicted difference between the table and the table include: At each time point, the difference divergence between the visible light coal gangue prediction probability distribution, the hyperspectral coal gangue prediction probability distribution, and the depth-of-field coal gangue prediction probability distribution for each individual material cluster is analyzed to determine the prediction difference between any two coal gangue prediction probability distributions. The coal probability in the hyperspectral coal gangue prediction probability distribution is extracted as the hyperspectral coal judgment probability. The product of the negative correlation mapping between the prediction difference between visible light and hyperspectral light, the prediction difference between hyperspectral light and depth of field, and the hyperspectral coal judgment probability is used as the surface-to-interior prediction difference for each individual material cluster.
6. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 5, characterized in that, Determining the prediction difference between any two coal gangue prediction probability distributions includes: At each time point, in the visible light prediction probability distribution, hyperspectral prediction probability distribution, and depth-of-field prediction probability distribution of each individual material cluster, the information entropy distance between each pair is calculated as the prediction difference between each mode, wherein the information entropy distance is the Jensen-Shannon divergence based on the logarithm with a base of 2.
7. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 1, characterized in that, The method for obtaining the confidence level of coal slime pollution includes: Based on the center coordinates of all individual material clusters, the adjacency relationship between individual material clusters is analyzed. On this basis, the numerical characteristics of the difference between the surface and interior predictions of individual material clusters within a local area are analyzed to determine the point cloud missing isolation degree of each individual material cluster. Calculate the difference between the point cloud isolation degree of the single material cluster and the preset isolation degree judgment threshold. If it is positive, the product of the difference and the preset noise attenuation coefficient is negatively correlated and normalized, and the result is used as the attenuation factor. Otherwise, the preset value is used as the attenuation factor. The product of the attenuation factor corresponding to the individual material cluster and the difference between the prediction in the table is used as the confidence level of the coal slime pollution of the individual material cluster. If the point cloud isolation degree of the individual material cluster is assigned an upper limit constant, then its coal slime pollution confidence level is directly set to 0.
8. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 7, characterized in that, The method for obtaining the isolation degree of the missing point cloud includes: Extract the x and y coordinate nodes of the center coordinates of each individual material cluster on the two-dimensional projection plane of the x and y axes; The two-dimensional Delaunay triangulation algorithm is used to connect the horizontal and vertical coordinate nodes of all individual material clusters to generate a non-intersecting material surface adjacency graph. The numerical difference between any two connected individual material clusters on the z-axis in the adjacency graph is less than a preset height threshold. For any single material cluster, neighboring single material clusters that are directly connected to it are identified in the adjacency graph of the material surface. The average of the predicted difference between the surface and interior of all neighboring single material clusters is calculated as the mean difference. The predicted difference between the surface and interior of the single material cluster is used as the numerator, and the sum of the mean difference and a preset minimum positive number is used as the denominator. The resulting ratio is used as the point cloud isolation degree of the single material cluster. If the single material cluster does not have any directly connected neighboring single material clusters, its point cloud isolation degree is set to a preset upper limit value.
9. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 1, characterized in that, The method for obtaining the global feature decay weights includes: The volume occupied by each individual material cluster in the field of view at each time step is multiplied by its corresponding coal slime pollution confidence level and then summed to obtain the total volume of coal slime encapsulation. The difference between 1 and the coal slime pollution confidence level of each individual material cluster is taken as the safety probability of each individual material cluster. The occupied volume of each individual material cluster is multiplied by its corresponding safety probability and then summed to obtain the total volume of normal material. The sum of the total volume of normal material, the total volume of coal slime, and a preset constant is used as the denominator, and the total volume of normal material is used as the numerator. The resulting ratio is used as the global feature attenuation weight in the field of view at each time step.
10. The multimodal fusion identification method for coal gangue adapted to complex working conditions according to claim 4, characterized in that, At each time step, the global feature attenuation weights and the multimodal features of each individual material cluster are fused to determine the coal gangue classification result, including: For each individual material cluster, we extract the visible light high-dimensional features, hyperspectral high-dimensional features, and depth-of-field high-dimensional features generated during the forward inference process of the multimodal network. At each time point, the global feature attenuation weight is multiplied by the visible light high-dimensional feature vector and the hyperspectral high-dimensional feature vector of each individual material cluster to obtain the visible light feature array and the hyperspectral feature array. The visible light feature array, the hyperspectral feature array, and the depth high-dimensional feature are concatenated to obtain the comprehensive feature vector after channel fusion. The comprehensive feature vector is input into a multilayer perceptron classifier to perform forward inference, and the specific label of each individual material cluster is output as the final coal gangue classification result. The specific label is divided into coal label or gangue label.