Zinc rougher cell underflow grade prediction method, device, equipment and storage medium

By combining foam images and process parameters with a dynamic cross-attention fusion module, the accuracy and lag issues of bottom flow grade prediction in zinc roughing flotation cells were resolved, achieving high-precision and stable real-time control and improving the adaptability and economy of the flotation process.

CN122090189BActive Publication Date: 2026-06-26SHENZHEN ZHONGJIN LINGNAN NONFEMET COMPANY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN ZHONGJIN LINGNAN NONFEMET COMPANY
Filing Date
2026-04-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for predicting the grade of the bottom flow in zinc roughing flotation cells suffer from insufficient accuracy, lag, and strong subjectivity, making it impossible to achieve real-time and accurate control of the flotation process.

Method used

A dynamic cross-attention fusion module is adopted, which combines global and local features of foam images and integrates process parameters of the flotation process. Multi-dimensional features are extracted through Transformer and convolutional neural network and dynamically adaptively fused to achieve high-precision undercurrent grade prediction.

Benefits of technology

It significantly improves the accuracy and stability of underflow grade prediction, maintains high adaptability to fluctuations in ore properties and changes in process conditions, supports real-time optimization control of the flotation process, and reduces production costs and reagent consumption.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122090189B_ABST
    Figure CN122090189B_ABST
Patent Text Reader

Abstract

The present application provides a kind of rough zinc flotation tank underflow grade prediction method, device, equipment and storage medium, by fusing the apparent feature of foam image, depth fusion feature and the process parameter of flotation process, and using dynamic cross attention mechanism to carry out the adaptive fusion of multi-source heterogeneous features, finally realize the high-precision regression prediction of grade, can comprehensively, synergistically utilize the visual information of foam image and the working condition information of process operation, to significantly improve the accuracy and stability of prediction.The present application can more completely represent the internal law of grade change under complex flotation conditions, effectively overcome the problem of insufficient information utilization caused by simple splicing or independent input.Further, the dynamic feature interaction and fusion capability possessed by the method makes it show stronger adaptability and generalization performance when facing fluctuations in ore properties, process condition adjustment or equipment state change.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of mineral processing technology, and in particular to a method, apparatus, equipment and storage medium for predicting the grade of the bottom flow of a zinc roughing flotation cell. Background Technology

[0002] Foam flotation is a key process in mineral processing for separating valuable metals such as zinc from impurities. In this process, the underflow grade of the zinc rougher is a core indicator reflecting flotation efficiency and guiding reagent addition and process control. Currently, industrial sites mainly rely on two methods to monitor the underflow grade: one is offline or online detection using X-ray fluorescence analyzers. While this method offers high accuracy, the equipment is expensive, maintenance is complex, and the results exhibit significant lag (usually more than ten minutes), failing to meet the needs of real-time control; the other is relying on operators' observation of the foam's appearance (such as color, size, and fluidity) for experience-based judgment. This method is highly subjective, lacks quantitative standards, and struggles to cope with changes caused by fluctuations in ore properties.

[0003] With the development of machine vision and artificial intelligence technologies, "soft measurement" methods for grade prediction using foam images have become a research hotspot. Existing methods mainly rely on deep learning models (such as convolutional neural networks or Transformers) to automatically extract features from images, or extract artificially designed appearance features such as bubble size, texture, color, and speed through image processing techniques, and then establish a mapping relationship with grade. However, these existing technologies have obvious limitations in feature extraction and fusion mechanisms: First, most models only focus on single-scale features, either focusing only on local texture details while ignoring global working condition semantics, or only modeling the global context while losing microstructural information that is crucial for grade judgment, resulting in incomplete representation of complex flotation states; Second, when fusing multi-source features (such as deep visual features and artificial appearance features), simple splicing or static weighting methods are usually used, failing to fully consider the "semantic gap" between different features, and unable to achieve dynamic synergy and complementarity between features, thus limiting the accuracy and working condition adaptability of the prediction model.

[0004] Therefore, there is an urgent need for a grade prediction method that can fully explore and synergistically utilize multi-dimensional and multi-scale information from flotation images and achieve dynamic adaptive fusion, so as to overcome the shortcomings of existing technologies and provide a reliable basis for real-time and precise control of the flotation process. Summary of the Invention

[0005] This invention provides a method, apparatus, equipment, and storage medium for predicting the underflow grade of a zinc roughing flotation cell, which addresses the deficiencies in the prior art and significantly improves the accuracy and stability of underflow grade prediction.

[0006] This invention provides a method for predicting the grade of the bottom flow in a zinc roughing flotation cell, comprising:

[0007] Acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; the process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate;

[0008] The apparent features of the real-time foam image are extracted to obtain an apparent feature vector;

[0009] The real-time foam image is subjected to fusion extraction of global and local features to obtain deep fusion features;

[0010] The apparent feature vector and the process parameters are respectively converted into an apparent feature map and a process parameter feature map aligned with the size of the deep fusion feature space;

[0011] Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as the guiding branch features respectively, the features are input into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature.

[0012] The appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature are fused to obtain a multi-source fusion feature;

[0013] Regression prediction is performed based on the multi-source fusion characteristics to output the predicted underflow grade of the target zinc coarsening cell.

[0014] According to the zinc rough flotation cell bottom flow grade prediction method provided by the present invention, the step of fusing and extracting global and local features from the real-time foam image to obtain deep fused features specifically includes:

[0015] Global semantic features of the real-time foam image are extracted using a Transformer network;

[0016] The local structural features of the real-time foam image are extracted by a convolutional neural network. During the extraction process, the extracted texture features are used as a guiding signal to dynamically modulate and enhance the local features, resulting in texture-enhanced local features.

[0017] The global semantic features and the texture-enhanced local features are fused using a global-local bidirectional dynamic cross-attention process to obtain the deep fused features.

[0018] According to the present invention, a method for predicting the grade of the bottom stream in a zinc coarse flotation cell is provided, wherein the texture features are extracted in the following manner:

[0019] The real-time foam image is processed to obtain a grayscale foam image;

[0020] A local gray-level co-occurrence matrix is ​​constructed for the foam gray-level image, and homogeneity, contrast, energy and correlation indices are calculated based on the local gray-level co-occurrence matrix to form a texture feature map.

[0021] According to the zinc rough flotation cell bottom flow grade prediction method provided by the present invention, the step of extracting the global semantic features of the real-time foam image through a Transformer network specifically includes:

[0022] The real-time foam image is segmented into a sequence of image blocks;

[0023] The image patch sequence is processed by a multi-layer Transformer encoder that incorporates a sparse self-attention mechanism to extract global semantic features.

[0024] According to the zinc rough flotation cell bottomflow grade prediction method provided by the present invention, the step of using the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as guiding branch features, respectively, and inputting them into a dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature specifically includes:

[0025] Align the deep fusion features with the appearance feature map in both spatial and channel dimensions;

[0026] Using the aligned deep fusion features as queries and the aligned appearance feature maps as keys and values, we calculate cross-attention weights and perform weighted aggregation on the values ​​to obtain appearance-guided intermediate features.

[0027] The appearance-guided intermediate features are fused with the deep fusion features to output the appearance-guided enhanced features;

[0028] Align the deep fusion features with the process parameter feature map in both spatial and channel dimensions;

[0029] Using the aligned deep fusion features as queries and the aligned process parameter feature maps as keys and values, we calculate cross-attention weights and perform weighted aggregation on the values ​​to obtain intermediate features guided by process parameters.

[0030] The intermediate features guided by the process parameters are fused with the deep fusion features to output the enhanced features guided by the process parameters.

[0031] According to the zinc rough flotation cell underflow grade prediction method provided by the present invention, the step of fusing the apparent guidance enhancement feature, the deep fusion feature, and the process parameter guidance enhancement feature to obtain a multi-source fusion feature specifically includes:

[0032] The appearance-guided enhancement features, the deep fusion features, and the process parameter-guided enhancement features are aggregated along the feature dimension to obtain aggregated multi-source features.

[0033] The aggregated multi-source features are subjected to feature compression and channel integration to obtain the multi-source fused features.

[0034] According to the present invention, a method for predicting the underflow grade of a zinc roughing flotation cell is provided. The step of performing regression prediction based on the multi-source fusion features and outputting the predicted underflow grade value of the target zinc roughing cell specifically includes:

[0035] The multi-source fusion features are globally spatially converged to obtain a compact feature vector;

[0036] The compact feature vector is input into the multilayer perceptron regression head, and the predicted value of the undercurrent grade is output.

[0037] The present invention also provides a zinc roughing flotation cell underflow grade prediction device, comprising:

[0038] The data acquisition module is used to acquire real-time foam images of the target zinc roughing cell and the corresponding real-time process parameters; the process parameters include at least one of slurry concentration, reagent dosage, aeration volume, liquid level, and flow rate;

[0039] The appearance feature module is used to extract appearance features from the real-time foam image to obtain an appearance feature vector.

[0040] The feature fusion module is used to extract global and local features from the real-time foam image to obtain deep fusion features.

[0041] The feature map module is used to convert the apparent feature vector and the process parameters into an apparent feature map and a process parameter feature map that are aligned with the size of the deep fusion feature space, respectively.

[0042] The feature enhancement module is used to use the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as the guiding branch features, respectively, and input them into the dynamic cross-attention fusion module for fusion to obtain appearance-guided enhancement features and process parameter-guided enhancement features.

[0043] The multi-source fusion module is used to fuse the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain the multi-source fusion feature;

[0044] The regression prediction module is used to perform regression prediction based on the multi-source fusion features and output the predicted value of the underflow grade of the target zinc coarsening cell.

[0045] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the zinc rough flotation cell underflow grade prediction method as described above.

[0046] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the zinc rough flotation cell underflow grade prediction method as described above.

[0047] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the zinc rough flotation cell underflow grade prediction method as described above.

[0048] The present invention provides a method, apparatus, equipment, and storage medium for predicting the grade of zinc rougher flotation cell bottomflow. By fusing the apparent features and deep fusion features of froth images with the process parameters of the flotation process, and employing a dynamic cross-attention mechanism for adaptive fusion of multi-source heterogeneous features, it achieves high-precision regression prediction of grade. This method comprehensively and synergistically utilizes the visual information of the froth images and the operating conditions of the process, significantly improving the accuracy and stability of the prediction. Compared to existing methods that rely solely on a single visual feature or ignore process parameters, this invention can more completely characterize the inherent laws governing grade changes under complex flotation conditions, effectively overcoming the problem of insufficient information utilization caused by simple splicing or independent input. Furthermore, the dynamic feature interaction and fusion capabilities of this method enable it to exhibit stronger adaptability and generalization performance when facing fluctuations in ore properties, adjustments in process conditions, or changes in equipment status. Ultimately, this method lays a reliable technical foundation for achieving real-time, closed-loop optimized control of the flotation process, and is expected to replace lagging offline detection and subjective manual judgment, possessing significant industrial application value in improving metal recovery rates and reducing reagent consumption and production costs. Attached Figure Description

[0049] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0050] Figure 1 This is a flowchart illustrating the zinc rough flotation cell underflow grade prediction method provided by the present invention.

[0051] Figure 2 This is a neural network schematic diagram of the zinc rough flotation cell bottom flow grade prediction method provided by the present invention;

[0052] Figure 3 This is a schematic diagram of the cross-fusion module of the zinc rough flotation cell underflow grade prediction method provided by the present invention;

[0053] Figure 4 This is a schematic diagram of the attention fusion module in the zinc rough flotation cell underflow grade prediction method provided by the present invention;

[0054] Figure 5 This is a schematic diagram of the zinc rough flotation cell underflow grade prediction device provided by the present invention;

[0055] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0057] To address the problems in existing technologies, this invention proposes a method for predicting the underflow grade of a zinc roughing flotation cell, significantly improving the accuracy and stability of underflow grade prediction. The method for predicting the underflow grade of a zinc roughing flotation cell is described below, as follows: Figure 1 , Figure 2 As shown, including but not limited to the following steps:

[0058] Step 110: Obtain real-time foam images of the target zinc roughing cell and the corresponding real-time process parameters; the process parameters include at least one of slurry concentration, reagent dosage, aeration volume, liquid level, and flow rate.

[0059] In step 110, an industrial camera deployed above the zinc coarsening cell continuously acquires a video stream of foam on the flotation cell surface at a fixed sampling frequency (e.g., 1-5 frames per second). The video stream is transmitted to an industrial computer or edge computing device via an image acquisition card. During prediction, a clear foam image is extracted from the real-time video stream as the real-time foam image. This foam image can undergo standardized preprocessing, such as adjusting to a fixed resolution and normalizing, before being input into the model.

[0060] Simultaneously, process parameters are collected at the same time as the froth images using sensors or process control systems at the flotation site. These process parameters include, but are not limited to, at least one of the following: pulp concentration, reagent dosage (such as collector and frother dosage), aeration rate, liquid level, and flow rate. These parameters reflect the equipment conditions and operating conditions of the flotation process and have a significant impact on the underflow grade. The collected process parameters need to be time-aligned with the froth images to ensure that each frame corresponds to a set of synchronized process parameter data.

[0061] To construct a dataset for training and testing, it is also necessary to synchronously record the true value of the underflow grade corresponding to each frame of foam image and its corresponding process parameters, which is measured offline by an X-ray fluorescence analyzer (XRF), thereby establishing a multi-source dataset containing the "image-process parameter-grade" triplet.

[0062] Step 120: Extract appearance features from the real-time foam image to obtain appearance feature vectors.

[0063] In step 120, multi-dimensional artificial feature calculations are performed on the preprocessed foam image to obtain appearance feature vectors with clear physical meaning. Specifically, classic algorithms from image processing and computer vision are used to extract the following types of appearance features from the foam image:

[0064] Texture features: Reflect the roughness, regularity, and contrast of the foam surface, and can be quantified using methods such as gray-level co-occurrence matrix and local binary mode.

[0065] Foam size characteristics: Individual bubble regions are identified through image segmentation, and their area distribution is statistically analyzed, including average area and area variance.

[0066] Color characteristics: Calculate the color mean, variance, and ratio between different channels in the bubble region under RGB or HSV color space.

[0067] Motion characteristics: Based on continuous multi-frame images, the velocity field of the foam is calculated using optical flow or block matching methods, and the mean velocity, velocity variance, and stability index are statistically analyzed.

[0068] The above features are spliced ​​or combined to form a one-dimensional apparent feature vector, which is used for subsequent fusion with deep features.

[0069] Step 130: Extract global and local features from the real-time foam image to obtain deep fusion features.

[0070] In step 130, this step aims to use a deep learning model to automatically extract depth features at different scales from the bubble image and fuse them to obtain a more expressive depth representation.

[0071] First, the bubble image is processed by a global feature extraction network (e.g., a Transformer-based encoder) to extract global semantic features. This network models the dependencies between distant pixels in the image through a self-attention mechanism, enabling it to capture the overall structural morphology, macroscopic distribution pattern, and illumination trend of the bubble.

[0072] Secondly, the same bubble image is processed by a local feature extraction network (e.g., an encoder built on a convolutional neural network) to extract local structural features. This network, by stacking convolutional and pooling layers, is able to capture microscopic information such as the edges, corners, and texture details of the bubbles.

[0073] In the process of extracting local features, this invention introduces texture features as a guiding signal to dynamically modulate and enhance the local features. Specifically, the texture feature map (such as contrast, homogeneity, etc.) extracted from the image is mapped in a certain way and then interacts with the local features element by element, so that the network pays more attention to the texture detail areas that have an important impact on the quality judgment, thereby obtaining texture-enhanced local features.

[0074] Finally, the global semantic features and the texture-enhanced local features are fused using a global-local bidirectional dynamic cross-attention process. This fusion process, through an attention mechanism, allows global and local features to guide and refine each other: on the one hand, global features provide macro-contextual information to local features; on the other hand, local features supplement global features with micro-detail information. After multiple rounds of interaction and aggregation, a deep fused feature containing both macro- and micro-information is obtained.

[0075] Step 140: Convert the apparent feature vector and the process parameters into an apparent feature map and a process parameter feature map that are aligned with the size of the deep fusion feature space, respectively.

[0076] In step 140, since the apparent feature vector obtained in step S120 is a one-dimensional vector, and the deep fusion feature obtained in step S130 is a multi-channel three-dimensional feature map (with height, width, and channel number dimensions), the two cannot directly interact at the pixel level or region level. Similarly, the process parameters are also one-dimensional vectors and need to be converted to a format compatible with deep features.

[0077] To this end, this step first projects the one-dimensional appearance feature vector onto the same channel dimension as the deep fusion feature through a learnable linear mapping (such as a fully connected layer). Then, through a spatial broadcast operation, the vector is copied in the spatial height and width dimensions to generate an appearance feature map with the same spatial size and channel dimension as the deep fusion feature.

[0078] Similarly, the one-dimensional process parameter vector is transformed into a process parameter feature map aligned with the spatial dimensions of the deep fusion feature map through a similar linear mapping and spatial broadcasting. After this transformation, both the appearance features and the process parameter features acquire the same spatial structure as the deep features, facilitating subsequent position-by-position feature interactions.

[0079] Step 150: Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as guiding branch features respectively, input them into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature.

[0080] In step 150, this step aims to calibrate and enhance the deep fusion features by utilizing the explicit physical information contained in the appearance features and process parameters.

[0081] Specifically, the deep fusion feature obtained in step S130 is used as the main branch feature (i.e., the target to be enhanced), and the appearance feature map generated in step S140 is used as the first guiding branch feature (i.e., the source providing guiding information). Both are input into a dynamic cross-attention fusion module. In this module, cross-attention is calculated using the main branch feature as the query and the guiding branch feature as the key and value. This process enables each spatial location in the deep fusion feature to aggregate information from the appearance feature map based on its correlation with other locations in the appearance feature map, thereby achieving guided enhancement of the deep fusion feature. The module output is the appearance-guided enhanced feature.

[0082] Similarly, using the same deep fusion feature as the main branch feature and the process parameter feature map as the second guiding branch feature, and inputting it again into the dynamic cross-attention fusion module (which can share the structure with the above module but has independent parameters), cross-attention is calculated to obtain the process parameter-guided enhancement feature. This feature incorporates process condition information such as pulp concentration, reagent dosage, and aeration rate into the deep fusion feature, further enhancing the ability to characterize the flotation state.

[0083] The two fusion processes described above can be executed in parallel or sequentially.

[0084] Step 160: Fuse the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain a multi-source fusion feature.

[0085] In step 160, following step S150, three sets of features are obtained: the original deep fusion features, the features enhanced by appearance features, and the features enhanced by process parameters. These three sets of features reflect the state of the foam image and the flotation process from different perspectives.

[0086] In this step, the three sets of features are aggregated along the feature dimension (i.e., the channel dimension), for example, by channel concatenation, to form a multi-source feature tensor. This tensor contains the image's deep semantic information, appearance statistics, and process condition information, providing more comprehensive information.

[0087] To reduce the computational complexity of subsequent regression predictions and extract the most representative information, the aggregated multi-source features are further compressed and channel integrated. For example, the number of channels is reduced by pointwise convolution (such as 1×1 convolution), while information from different channels is fused to finally output a compact multi-source fused feature.

[0088] Step 170: Perform regression prediction based on the multi-source fusion features and output the predicted value of the underflow grade of the target zinc coarsening cell.

[0089] In step 170, this step is the final output stage of the method, which aims to map multi-source fusion features into a single grade prediction value.

[0090] First, the multi-source fusion features obtained in step S160 are subjected to global spatial pooling. A common method is global average pooling, which calculates the average value of each channel across all spatial locations, thereby compressing the three-dimensional feature map into a compact one-dimensional feature vector. This vector encapsulates the global information of the entire feature map.

[0091] Then, the compact feature vector is input into a regression prediction model. This model typically employs a multilayer perceptron (MLP) structure, consisting of several fully connected layers stacked together, with nonlinear activation functions inserted in between. The output layer of the regression model has one node, and its output value is the predicted underflow grade of the target zinc coarsening cell.

[0092] In practical applications, this predicted value can be displayed in real time on the operation interface, or input as a feedback signal to the automatic control system of the flotation process to guide operations such as reagent addition and liquid level adjustment.

[0093] As a further optional embodiment, the step of fusing and extracting global and local features from the real-time foam image to obtain deep fused features specifically includes:

[0094] Global semantic features of the real-time foam image are extracted using a Transformer network;

[0095] The local structural features of the real-time foam image are extracted by a convolutional neural network. During the extraction process, the extracted texture features are used as a guiding signal to dynamically modulate and enhance the local features, resulting in texture-enhanced local features.

[0096] The global semantic features and the texture-enhanced local features are fused using a global-local bidirectional dynamic cross-attention process to obtain the deep fused features.

[0097] Reference Figure 3 In this embodiment, the global feature extraction branch receives a preprocessed real-time foam image as input. First, the image of size HxWx3 is segmented into N non-overlapping image blocks of size PxP. Each image block is mapped to a feature vector of dimension D through a linear projection layer, thus forming a feature sequence of length N. X p Subsequently, a learnable classification token ([CLS] token) is added before the sequence. The sequence is then fed into a... L g Layers (e.g., L g =6) Processing is performed in a network composed of stacked Sparse-T Block Transformer encoders. Each encoder layer contains a sparse self-attention mechanism and a feedforward neural network. Its self-attention calculation employs a sparsity strategy (e.g., based on Locality Sensitive Hashing, LSH) to reduce computational complexity. The core computation process can be represented as follows:

[0098]

[0099] ;

[0100] in For features to be enhanced, To provide guided features, queries are first generated using three independent OD convolutions. ,key AND value Through multi-layer encoding, the model can effectively model long-range dependencies between image patches, thereby capturing global contextual information such as the overall structural morphology, macroscopic distribution pattern, and illumination trend of the bubble. Finally, the feature vector corresponding to the [CLS] marker output by the last encoder layer is taken, or global average pooling is performed on all image patch features to obtain the global semantic features, denoted as... F g .

[0101] Meanwhile, the local feature extraction branch processes the same bubble image in parallel. The main body of this branch is a lightweight convolutional neural network (CNN) built on a moving-flipped bottleneck convolutional block (MBConv). The network first uses a standard 3x3 convolutional layer on the input image. I Preliminary feature extraction is performed to obtain an initial feature map.

[0102] Subsequently, the feature maps are passed through sequentially.N One (e.g.) N =4) The MBConv block performs depthwise separable convolution, channel expansion and compression, etc., to gradually extract and abstract the local structural information of the image (such as edges, corners, bubble boundaries, etc.) to obtain intermediate local features.

[0103] To enhance the micro-texture information sensitive to the flotation state, this invention introduces a texture feature-guided modulation mechanism during the local feature extraction process. Specifically, the four types of texture feature maps (homogeneity, contrast, energy, and correlation) obtained from step S120 are concatenated and then subjected to channel transformation through a 1x1 convolutional layer to generate a texture guidance map that matches the size and number of channels of the intermediate local feature space.

[0104] Subsequently, the texture guidance map is used as a spatially adaptive gated signal and multiplied element-wise with the local feature F_l_mid (before computation). T g This can be achieved by first activating the sigmoid function to enhance the texture perception of local features. The enhancement calculation process can be represented as follows:

[0105] ;

[0106] After obtaining global semantic features respectively F g and texture enhancement local features F l Then, these representations need to be merged to generate a unified deep representation that combines macroscopic semantics with microscopic details. This invention achieves this goal through an innovative Global-Local Cross-Fusion Module (GLCF).

[0107] First, the size alignment mapping function (typically consisting of up / downsampling operations and 1x1 convolutions) is used to... F g and F l By mapping to a uniform spatial resolution and number of channels, the aligned features F_g_align and F_l_align are obtained.

[0108] Reference Figure 4 Then, the core bidirectional dynamic cross-attention fusion process begins. This process is completed within a dynamic cross-attention fusion submodule. Given a pair of input features (X1, X2), the DCAFM module first uses three independent OD convolutional layers (dynamic convolutions) to generate the query (Q), key (K), and value (V) respectively. OD convolutions can dynamically generate convolution kernel weights based on the input feature content, enhancing the model's adaptability.

[0109] The GLCF module is in and Two symmetrical DCAFM paths are constructed to achieve dynamic cross-fusion of "global guidance of local" and "local refinement of global":

[0110] ;

[0111] The second round of cross-integration and For input, perform bidirectional DCAFM interaction again:

[0112] ;

[0113] Finally, the two second-round outputs are aggregated through C operations and then processed by a linear mapping function. Obtain global-local deep fusion features:

[0114] ;

[0115] As a further optional embodiment, the texture features are extracted in the following manner:

[0116] The real-time foam image is processed to obtain a grayscale foam image;

[0117] A local gray-level co-occurrence matrix is ​​constructed for the foam gray-level image, and homogeneity, contrast, energy and correlation indices are calculated based on the local gray-level co-occurrence matrix to form a texture feature map.

[0118] In this embodiment, texture feature extraction follows a systematic process, aiming to quantify and characterize the microstructural information such as roughness, contrast, and regularity of the foam surface from the grayscale image. The specific steps are as follows:

[0119] First, the acquired real-time color image of the foam (usually in RGB format) is converted to grayscale. A weighted average method is used, assigning different weights to the R (red), G (green), and B (blue) channels based on the human eye's sensitivity to different colors (for example, using the classic grayscale value = 0.299*R + 0.587*G + 0.114*B formula). This converts the color information of each pixel into a single grayscale intensity value, thus obtaining the grayscale image of the foam. I gray This step simplifies subsequent calculations and focuses on the texture information reflected by changes in brightness.

[0120] Gray-level co-occurrence matrix (GLCM) is a classic method for describing texture by studying the spatial repetition patterns of image gray levels. In this invention, a local sliding window approach is used to represent the foam grayscale image. I grayConstruct a series of GLCMs to capture local texture properties in different regions of an image.

[0121] Parameter settings: Define key parameters. For example, the sliding window size s = 6 (i.e., analyze a 6x6 pixel local region); quantize the image's gray levels to N_g = 8 levels (compressing the original 0-255 gray level range into 8 levels, reducing computational complexity and enhancing robustness); define a set of displacement vectors, typically determined by the step size d and direction θ. A typical setup is: step size set d∈{1, 2, 3, 4}, direction set θ∈{0°, 45°, 90°, 135°} (corresponding to the horizontal, right diagonal, vertical, and left diagonal directions, respectively).

[0122] Matrix calculation: for grayscale images I gray For each pixel in the array (as the center of the window, ignoring edge pixels), within its specified sxs neighborhood window, according to the given ( d , θ Yes, it involves counting the frequency of pixel pairs that satisfy a specific spatial relationship. Specifically, it involves counting the gray values ​​of a pair of pixels that are d pixels apart in the direction θ. i and j The number of times the condition occurs is counted, and this count is filled into a matrix. P ( i , j ) of( i , j ) position. For each ( d , θ Combinations will generate a N g x N g co-occurrence matrix P dθ The matrix is ​​usually normalized so that the sum of its elements is 1, representing the probability distribution.

[0123] Based on the constructed normalized GLCMC P(i, j), a set of classic Haralick texture feature indices are calculated. This invention mainly extracts the following four types of features sensitive to the flotation foam state:

[0124] Homogeneity: Measures the uniformity of local regions in an image. A higher value indicates a more uniform and smoother texture. The formula for calculating homogeneity is as follows:

[0125] ;

[0126] Contrast: Reflects the sharpness and texture depth of an image. A higher value indicates stronger texture contrast and sharper edges. The formula for calculating contrast is as follows:

[0127] ;

[0128] Energy, also known as the second moment of the angle, measures the coarseness and regularity of image texture. A higher value indicates a simpler and more regular texture pattern. The formula for calculating energy is as follows:

[0129] ;

[0130] Correlation describes the linear dependence of gray levels in an image. It reflects the linear directionality of texture. The formula for calculating correlation is as follows:

[0131] ;

[0132] in, μ i , μ j These are the mean of the matrix's rows and columns, respectively. σ i , σ j That is the corresponding standard deviation.

[0133] Computation and Synthesis: For each pixel location in the image, the four aforementioned metrics are calculated based on the GLCM calculated using its neighborhood window. Then, for each texture metric (such as homogeneity), the calculated metric value is averaged across all pre-defined combinations of step size d and direction θ. This operation yields a more robust texture metric that is insensitive to direction. Ultimately, each pixel will receive four averaged scalar values ​​(homogeneity, contrast, energy, and correlation). By traversing all pixels of the entire image, four two-dimensional images with the same spatial dimensions as the original grayscale image—i.e., texture feature maps—are generated. T m It includes homogeneity feature maps, contrast feature maps, energy feature maps, and correlation feature maps.

[0134] Through the above steps, the present invention completes the extraction process from the original foam image to a set of quantitative texture feature maps with clear physical meaning, providing accurate guiding signals for subsequent texture enhancement of local features.

[0135] As a further optional embodiment, the step of extracting the global semantic features of the real-time foam image through a Transformer network specifically includes:

[0136] The real-time foam image is segmented into a sequence of image blocks;

[0137] The image patch sequence is processed by a multi-layer Transformer encoder that incorporates a sparse self-attention mechanism to extract global semantic features.

[0138] This embodiment aims to leverage the powerful global context modeling capabilities of the Transformer architecture to extract semantic features from foam images that reflect the overall working conditions.

[0139] First, the preprocessed real-time bubble image (size, for example, H x W = 224 x 224 pixels, number of channels C = 3) is divided into patches. The image is divided into a series of non-overlapping square image patches, each patch being P x P pixels in size (e.g., P = 16). Therefore, a total of N = (H / P) * (W / P) image patches can be obtained (in this example, N = (224 / 16)*(224 / 16) = 196).

[0140] Next, each image patch (of dimension P x P x C) is flattened in space, resulting in a vector of length (P * P * C) (16 * 16 * 3 = 768 in this example). Then, this flattened vector is mapped to a fixed, higher-dimensional embedding space D (e.g., D = 768) through a trainable linear projection layer (fully connected layer). This vector is called the patch embedding. Thus, the original two-dimensional image is transformed into a sequence of length N: X p = [ x p1 , x p2 , ..., x pN ], each of which x pi belong R D (i.e., a D-dimensional vector).

[0141] To preserve the positional information of image patches in the original two-dimensional space, positional encoding needs to be added to each block embedding. Positional encoding can be fixed (e.g., a sine or cosine function) or learnable. Ultimately, the sequence input to the Transformer encoder is the sum of the block embedding and the positional encoding: Z 0= X p+ E pos ,in E pos This represents the position encoding matrix.

[0142] The above sequence Z Input 0 is a module consisting of L layers (e.g., L = 6) of Transformer encoders stacked together. Each encoder layer typically contains two main sub-layers: a (sparse) multi-head self-attention layer (MSA) and a feedforward neural network layer (FFN). Each sub-layer is accompanied by layer normalization and residual connections before and after it, which is the standard structure.

[0143] The key to this invention lies in employing a sparse self-attention mechanism in the self-attention sublayer to address the problem of excessive computational cost of standard self-attention when the image sequence is long (N is large). In specific implementations, attention based on Locality Sensitive Hashing (LSH), sliding window attention, or other sparsification strategies can be used. The core idea is not to calculate the attention between each element in the sequence and all other elements, but rather to allow each query to interact only with a few key elements selected through a specific filtering mechanism (such as hash bucket matching).

[0144] For a certain input sequence Z_l-1, the calculation process of its sparse self-attention can be summarized as follows:

[0145] Generate Q, K, V: Through linear transformation, the input sequence is projected into the query matrix Q, the key matrix K, and the value matrix V, respectively.

[0146] Sparsity filtering: Apply sparsity strategies (such as LSH) to Q and K. For example, map the vectors of Q and K to multiple hash buckets, and only Q and K falling into the same bucket will undergo attention computation. Let the filtered result match a specific query. q i The associated key set is K i (It is a subset of the complete K).

[0147] Computational attention: for each query q i Only calculate its relationship with K i Attention weights for all keys, then for the corresponding values V i (and K i The weighted summation is performed on the corresponding subset of V. The computation of a single head can be formally represented as:

[0148] ;

[0149] in, d k It is the dimension of the key vector, Softmax() in K i Performed on the corresponding dimension.

[0150] Multi-head splicing and output: The outputs of multiple attention heads are spliced ​​together and then subjected to a linear projection to obtain the final output of the self-attention sub-layer.

[0151] After being processed by an encoder with layers L, the sequence Z L Features at each location (corresponding to a patch in the original image) are incorporated into the global context. This is to obtain the global semantic features of the entire image. F g There are generally two aggregation methods:

[0152] Method 1 ([CLS] marker): In the initial sequence Z A learnable special marker [CLS] is added before the 0. After Transformer encoding, the final output vector z corresponding to this [CLS] marker is... L 0 It is considered a summary representation of the entire input sequence / image, i.e. F g =z L 0 .

[0153] Method 2 (Global Average Pooling): For all image patch feature vectors z output from the last layer... L 1 , z L 2 , ...,z L N Taking the average value over the sequence dimension (i.e., N blocks) yields a D-dimensional vector, i.e., F_g = Mean(z). L 1 , z L 2 , ...,z L N ).

[0154] Through the above steps, the model successfully extracted global semantic features F_g from the foam image that can capture its overall distribution, structural relationships and macroscopic working conditions, providing key macroscopic perspective information for subsequent multi-scale feature fusion.

[0155] As a further optional embodiment, the step of using the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as guiding branch features, respectively, and inputting them into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature, specifically includes:

[0156] Align the deep fusion features with the appearance feature map in both spatial and channel dimensions;

[0157] Using aligned deep fusion features as queries and aligned appearance feature maps as keys and values, cross-attention weights are calculated and the values ​​are weighted and aggregated to obtain appearance-guided intermediate features.

[0158] The appearance-guided intermediate features are fused with the deep fusion features to output the appearance-guided enhanced features;

[0159] Align the deep fusion features with the process parameter feature map in both spatial and channel dimensions;

[0160] Using the aligned deep fusion features as queries and the aligned process parameter feature maps as keys and values, we calculate cross-attention weights and perform weighted aggregation on the values ​​to obtain intermediate features guided by process parameters.

[0161] The intermediate features guided by the process parameters are fused with the deep fusion features to output the enhanced features guided by the process parameters.

[0162] In this embodiment, firstly, the deep fusion features obtained in step S130 and the appearance feature map generated in step S140 are aligned in terms of spatial size and channel dimension. Since the two may have different resolutions or channel numbers, they need to be adjusted through upsampling or downsampling operations (such as bilinear interpolation or pooling) and pointwise convolution (such as 1×1 convolution) to make them completely consistent in height, width and channel number.

[0163] Secondly, cross-attention calculation is performed using the aligned deep fusion features as the source of the query and the aligned appearance feature map as the source of the keys and values. Specifically, the aligned deep fusion features are flattened into a query sequence, and the aligned appearance feature map is flattened into a key sequence and a value sequence, respectively. The similarity matrix between the query sequence and the key sequence is calculated, and the attention weights are obtained by normalization using the Softmax function. Subsequently, the value sequences are weighted and summed using these attention weights to obtain the attention aggregation result. This result is the appearance-guided intermediate feature, which reflects the initial enhancement of the deep fusion features under the guidance of appearance statistical information.

[0164] Finally, the appearance-guided intermediate features are fused with the original deep fusion features. The fusion method can be element-wise addition (i.e., residual connection), or concatenation along the channel dimension followed by dimensionality reduction through a linear mapping. This fusion operation preserves the semantic information of the original deep fusion features while incorporating the physical meaning calibration signal provided by the appearance features. The fused output features are the appearance-guided enhancement features.

[0165] Similarly, the same three-step process is performed on the process parameter-guided branches.

[0166] Align the deep fusion features obtained in step S130 with the process parameter feature map generated in step S140 in terms of spatial size and channel dimension. The alignment method is the same as that of the appearance branch, which can be achieved by resolution adjustment and channel mapping.

[0167] Using aligned deep fusion features as queries and aligned process parameter feature maps as keys and values, cross-attention is calculated. This process enables each position in the deep fusion features to aggregate information from the process parameter feature map based on its relevance to other positions in the map. The specific operations for attention calculation are consistent with the appearance branch: flattening, similarity calculation, normalization, and weighted aggregation to obtain intermediate features guided by process parameters.

[0168] The intermediate features guided by the process parameters are fused with the original deep fusion features. The fusion method can also employ residual concatenation or linear mapping after channel splicing. The output feature is the process parameter-guided enhanced feature.

[0169] It should be noted that the appearance-guided branch and the process parameter-guided branch are independent of each other. They can both use the same dynamic cross-attention fusion module, but the module parameters (such as the linear transformation matrix) are learned independently. In actual computation, the two branches can be executed in parallel to improve processing efficiency; or they can be executed sequentially, i.e., one branch is calculated first, and then its result is used as input to calculate the other branch. Regardless of the order, two enhanced features are ultimately obtained: an appearance-guided enhanced feature and a process parameter-guided enhanced feature, which are used in subsequent step S160.

[0170] Through the detailed steps described in this embodiment, dynamic and adaptive calibration and enhancement of deep fusion features using apparent statistical information and process parameter information are realized, laying a solid foundation for subsequent multi-source feature fusion.

[0171] As a further optional embodiment, the step of fusing the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain a multi-source fusion feature specifically includes:

[0172] The appearance-guided enhancement features, the deep fusion features, and the process parameter-guided enhancement features are aggregated along the feature dimension to obtain aggregated multi-source features.

[0173] The aggregated multi-source features are subjected to feature compression and channel integration to obtain the multi-source fused features.

[0174] In this embodiment, the fusion of multi-source features adopts a two-stage strategy of "aggregation first, compression later", which specifically includes the following operations.

[0175] After step S150, we obtained three features: the original deep fusion feature, the appearance-guided enhancement feature, and the process parameter-guided enhancement feature. These three features have the same spatial dimensions (height and width) and channel dimension (after alignment and mapping in steps S140 and S150, the three are now in the same feature space). To fully utilize the complementary information among the three, we first aggregate these three features along the feature dimension (i.e., the channel dimension).

[0176] A typical implementation of aggregation is channel concatenation. Specifically, three feature maps are concatenated end-to-end along the channel direction to form a multi-source feature tensor with double the number of channels. For example, assuming each feature has C channels, the concatenated feature tensor has 3C channels while maintaining the same spatial size. This aggregation method is simple and effective, preserving the original information of each feature for subsequent network selection and fusion. Besides concatenation, other aggregation methods such as weighted summation and attention fusion can also be used; this invention does not limit these methods. The features obtained after aggregation are called aggregated multi-source features.

[0177] While the aggregated multi-source features contain rich information, the number of channels increases significantly, leading to a substantial computational burden and potential information redundancy when used directly for regression prediction. Therefore, it is necessary to compress and integrate these features to extract the most representative information while reducing the computational complexity of the subsequent regression model.

[0178] A typical approach to compression and integration is to use pointwise convolutions (such as 1×1 convolutional layers) to process the aggregated multi-source features. A 1×1 convolution can linearly combine and fuse information from each channel without changing the spatial dimensions, reducing the number of channels from a high value (such as 3C) to a preset smaller value (such as C or a smaller dimension). This convolutional layer has learnable parameters and can adaptively learn how to fuse features from three sources through training. Besides pointwise convolutions, stacking multiple convolutional layers or combining batch normalization and non-linear activation functions can also enhance expressive power.

[0179] After compression and integration, the output feature map has a smaller number of channels and the same spatial size as the original feature. This feature is called a multi-source fusion feature, which condenses the image's deep semantics, apparent statistics, and process parameter information, providing high-quality feature input for the final grade regression prediction.

[0180] Through the above two-stage processing, the present invention achieves effective fusion of heterogeneous multi-source information, which not only preserves the uniqueness of each feature, but also removes redundancy through compression and integration, thereby improving the efficiency and accuracy of subsequent predictions.

[0181] As a further optional embodiment, the step of performing regression prediction based on the comprehensive features and outputting the predicted underflow grade value of the target zinc coarsening cell specifically includes:

[0182] Channel adjustment and global spatial convergence are performed on the comprehensive features to obtain a one-dimensional compact feature vector;

[0183] The compact feature vector is input into the regression prediction model to calculate the predicted value of the bottom flow grade.

[0184] This step is the information aggregation and final output stage of the method of the present invention, which aims to map the rich features obtained in the preceding steps into a single, accurate grade prediction value.

[0185] The comprehensive feature F obtained after step S140 da It is a three-dimensional tensor with shape (C2, H2, W2), where C2 is the number of channels, and H2 and W2 are the spatial dimensions. For the final scalar regression, this spatial-channel feature first needs to be compressed into a representative global descriptive vector.

[0186] Channel adjustment (optional): In some designs, a 1x1 convolutional layer may be used to perform a lightweight adjustment of the number of channels in F_da. This operation does not change the spatial dimensions, but only fuses and compresses the information between channels, and can be represented as:

[0187] F_adj = Conv_1x1(F_da);

[0188] Here, Conv_1x1 represents a convolution operation with a kernel size of 1x1. The adjusted feature map F_adj may become (C3, H2, W2), where C3 is the preset final number of channels (e.g., C3 = 512 or C3 = 256). This step aims to further refine the features and match the input dimensions of the subsequent regression head.

[0189] Global Spatial Convergence: Next, global spatial convergence is performed on the channel-adjusted feature maps (or F_da directly if no channel adjustment was performed). The most common and effective method is Global Average Pooling (GAP). This operation calculates the average value of each channel of the feature map across all spatial locations (H2 x W2), thereby compressing the two-dimensional activation map of each channel into a scalar.

[0190] The formula for calculating compact eigenvectors is:

[0191] g = GAP(F_adj);

[0192] Or g = GAP(F_da);

[0193] For the c-th channel (c = 1, 2, ..., C), the specific calculation is as follows:

[0194] ;

[0195] After performing GAP, a one-dimensional vector g of length C (C3 or C2) is obtained. This vector encapsulates all the spatial information of the original comprehensive feature map and is a highly condensed numerical summary of the overall state of the current bubble image, called a compact feature vector.

[0196] After obtaining the compact feature vector g, it needs to be mapped to the final grade value. This is done through a regression prediction model (or regression head).

[0197] Model Structure: The regression prediction model is typically a multilayer perceptron (MLP). An MLP consists of several fully connected layers (linear layers) stacked together. Non-linear activation functions (such as ReLU) and optional normalization layers (such as batch normalization) can be inserted between layers to prevent overfitting and accelerate training. A typical structure might be:

[0198] Input layer: Receives a vector g with dimension C.

[0199] One or more hidden layers: for example, the first hidden layer maps the dimension from C to D1 (e.g., 1024), and the second hidden layer maps from D1 to D2 (e.g., 256). Each layer is followed by a ReLU activation function.

[0200] Output layer: The last fully connected layer maps the output dimension of the hidden layers to 1, i.e., outputs a scalar. This layer typically does not use a non-linear activation function to output continuous predictions.

[0201] Therefore, the entire regression process can be formally represented as:

[0202] ;

[0203] in, W (l) and b (l) Here, σ represents the weights and biases of each layer, σ represents the activation function (such as ReLU), and y_hat is the predicted undercurrent grade value.

[0204] Model Training and Output: During model training, a large number of labeled "foam image-true grade" data pairs are used to minimize the difference between the predicted value y_hat and the true value y_true (the commonly used loss function is mean squared error (MSE) or mean absolute error (MAE)). All parameters in the network (including feature extraction, fusion, and the parameters of this regression head) are optimized end-to-end using the backpropagation algorithm. After training, for a new real-time foam image, its corresponding compact feature vector g is obtained through the aforementioned steps. This vector is then input into the trained MLP regression head, allowing for direct and rapid calculation and output of the predicted bottom current grade at that moment.

[0205] Through the above steps, this invention achieves accurate and efficient mapping from high-dimensional, multi-scale image features to key process parameters (grade). The entire process realizes end-to-end, real-time, non-contact intelligent prediction of the grade of the zinc roughing cell bottom stream, providing real-time data support for the optimized control of the flotation process.

[0206] In summary, a preferred embodiment of the present invention is as follows:

[0207] Step 1: Data Preparation;

[0208] Foam images of zinc flotation were acquired using a flotation site foam image system, and the true value of the underflow grade was measured by an X-ray fluorescence analyzer. A foam image dataset was established, in which each dataset consists of one foam image and its corresponding true grade value.

[0209] Step 2: Apparent feature extraction;

[0210] The acquired images are cropped and denoised, and existing methods are used to extract the image's appearance features (texture, foam size, color, and speed features).

[0211] Step 3: Global Feature Extraction;

[0212] Input bubble raw image Perform block processing into a patch sequence Subsequently Layered linear mapping and sparse attention blocks (Sparse-T Blocks) for extracting global structural semantics:

[0213] ;

[0214] Finally, the global feature representation is obtained. This feature captures the overall structural morphology and macroscopic distribution relationship of the foam.

[0215] Step 4: Local feature extraction for texture enhancement;

[0216] use Convolution processing of foam images To obtain more dimensional features of the bubble image:

[0217] ;

[0218] To improve network efficiency and performance, local branches use Extracting local structural features using moving-flipped bottleneck convolutional (MBConv) blocks:

[0219] ;

[0220] texture map through Convolution mapping to For the same size, texture features are introduced as gating signals to enhance local features:

[0221] ;

[0222] Multiply the dot products separately:

[0223] ;

[0224] After two stages of texture enhancement, the final texture-enhanced local features are obtained through the MBConv block:

[0225] ;

[0226] Step 5: Global-Local Cross-Merger Module;

[0227] In obtaining global features With local features Subsequently, this invention further models the bidirectional dependency between the two through a Global-Local Cross-Fusing Module (GLCF). First, to facilitate subsequent cross-attention calculation, the two feature paths are mapped to a unified space through size alignment:

[0228]

[0229] in , By upsampling and downsampling The convolutional structure only changes the resolution and number of channels, without altering the semantic features.

[0230] Building upon this, a Dynamic Cross-Attention Fusion Module (DCAFM) based on OD convolution is introduced. Given a pair of input features... ,in For features to be enhanced, To provide guided features, queries are first generated using three independent OD convolutions. ,key AND value :

[0231] ;

[0232] OD convolution adaptively generates convolution kernels and combines weights based on the input, allowing the spatial response of Q / K / V to dynamically adjust according to the bubble state. Subsequently, Q, K, and V are flattened in the spatial dimension into a format of length [missing information]. The sequence is denoted as Calculate the cross-attention matrix and obtain the attention aggregation result:

[0233]

[0234] For the sake of brevity, the above calculation process is recorded as follows:

[0235] ;

[0236] The GLCF module is in and Two symmetrical DCAFM paths are constructed to achieve dynamic cross-fusion of "global guidance of local" and "local refinement of global":

[0237] ;

[0238] The second round of cross-integration and For input, perform bidirectional DCAFM interaction again:

[0239] ;

[0240] Finally, the two second-round outputs are aggregated through C operations and then processed by a linear mapping function. Obtain global-local deep fusion features:

[0241] ;

[0242] Step Six: Dynamic fusion of deep features and apparent features;

[0243] The apparent feature vector calculated in step two is denoted as... The components are, in order: contrast, red / gray mean ratio, variance, velocity mean, velocity variance, stability, mean, area variance, kurtosis, skewness, correlation, mean area, energy, and balance.

[0244] To align with deep features in the channel dimension, the appearance vector is first mapped to... 3D channel space, and extended to Space dimensions:

[0245]

[0246] in For learnable parameters, This indicates that the broadcast is replicated in the spatial dimension.

[0247] Subsequently, using fusion deep features Main branch, appearance feature map To guide the branch, the data is fed into the aforementioned Dynamic Cross-Attention Fusion (DCAFM) module to obtain features that simultaneously encode depth and appearance information:

[0248] ;

[0249] Features after fusion Lightweight linear compression and global convergence are performed to obtain a compact representation for regression. Specifically, firstly, through... Convolution adjustment channel number:

[0250] ;

[0251] Step Seven: Grade Prediction;

[0252] Finally, Input a multilayer perceptron (MLP) regression head to predict the grade of the zinc roughing cell bottom flow:

[0253]

[0254] in, This is the target grade prediction value output by the method of this invention.

[0255] In the aforementioned method for predicting the bottom flow grade of zinc rapid flotation cells based on multi-feature dynamic cross-fusion, in step three, the input foam image is divided into... The patch sequence is stacked on top of the token sequence. A sparse self-attention and feedforward network is used to obtain a global structural semantic sequence.

[0256] In the above-mentioned method for predicting the grade of zinc rapid flotation cell bottom flow based on multi-feature dynamic cross-fusion, in step four, the apparent texture feature map... The texture features include four categories: homogeneity, contrast, energy, and correlation; among them, for the normalized foam grayscale image, the window size is... Gray quantification level Step size set and direction set A local gray-level co-occurrence matrix is ​​constructed, and the four types of texture metrics mentioned above are calculated according to the Haralick feature formula. Finally, channel averaging is performed on the results for all strides and directions to obtain four texture feature maps. .

[0257] In the aforementioned method for predicting the bottom flow grade of zinc rapid flotation tanks based on multi-feature dynamic cross-fusion, in step five, the query matrix Q, key matrix K, and value matrix V perform channel compression on the input features through independent 1×1 OD-Conv, resulting in an output dimension of... And the spatial resolution H×W remains unchanged.

[0258] In the above-mentioned method for predicting the grade of zinc rapid flotation cell bottom flow based on multi-feature dynamic cross-fusion, in step six, the Expand(·) operation will... Channel vectors are copied in spatial dimensions This results in a size of appearance feature map .

[0259] The following describes the zinc rough flotation cell bottom flow grade prediction device provided by the present invention, such as... Figure 5 As shown, the zinc rough flotation cell underflow grade prediction device described below and the zinc rough flotation cell underflow grade prediction method described above can be referred to in correspondence.

[0260] A zinc roughing flotation cell underflow grade prediction device includes:

[0261] The data acquisition module 510 is used to acquire real-time foam images of the target zinc roughing cell and the corresponding real-time process parameters; the process parameters include at least one of slurry concentration, reagent dosage, aeration volume, liquid level, and flow rate;

[0262] Appearance feature module 520 is used to extract appearance features from the real-time foam image to obtain an appearance feature vector.

[0263] The feature fusion module 530 is used to extract global and local features from the real-time foam image to obtain deep fusion features.

[0264] Feature map module 540 is used to convert the apparent feature vector and the process parameters into an apparent feature map and a process parameter feature map that are aligned with the size of the deep fusion feature space, respectively.

[0265] The feature enhancement module 550 is used to use the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as the guiding branch features, respectively, and input them into the dynamic cross-attention fusion module for fusion to obtain appearance-guided enhancement features and process parameter-guided enhancement features.

[0266] The multi-source fusion module 560 is used to fuse the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain a multi-source fusion feature;

[0267] The regression prediction module 570 is used to perform regression prediction based on the multi-source fusion features and output the predicted value of the underflow grade of the target zinc coarsening cell.

[0268] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6 As shown, the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 can call logic instructions in the memory 630 to execute a zinc roughing flotation cell bottom flow grade prediction method, which includes:

[0269] Acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; the process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate;

[0270] The apparent features of the real-time foam image are extracted to obtain an apparent feature vector;

[0271] The real-time foam image is subjected to fusion extraction of global and local features to obtain deep fusion features;

[0272] The apparent feature vector and the process parameters are respectively converted into an apparent feature map and a process parameter feature map aligned with the size of the deep fusion feature space;

[0273] Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as the guiding branch features respectively, the features are input into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature.

[0274] The appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature are fused to obtain a multi-source fusion feature;

[0275] Regression prediction is performed based on the multi-source fusion characteristics to output the predicted underflow grade of the target zinc coarsening cell.

[0276] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0277] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is able to execute the zinc rough flotation cell underflow grade prediction method provided by the above methods, the method comprising:

[0278] Acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; the process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate;

[0279] The apparent features of the real-time foam image are extracted to obtain an apparent feature vector;

[0280] The real-time foam image is subjected to fusion extraction of global and local features to obtain deep fusion features;

[0281] The apparent feature vector and the process parameters are respectively converted into an apparent feature map and a process parameter feature map aligned with the size of the deep fusion feature space;

[0282] Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as the guiding branch features respectively, the features are input into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature.

[0283] The appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature are fused to obtain a multi-source fusion feature;

[0284] Regression prediction is performed based on the multi-source fusion characteristics to output the predicted underflow grade of the target zinc coarsening cell.

[0285] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the zinc roughing flotation cell underflow grade prediction method provided by the methods described above, the method comprising:

[0286] Acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; the process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate;

[0287] The apparent features of the real-time foam image are extracted to obtain an apparent feature vector;

[0288] The real-time foam image is subjected to fusion extraction of global and local features to obtain deep fusion features;

[0289] The apparent feature vector and the process parameters are respectively converted into an apparent feature map and a process parameter feature map aligned with the size of the deep fusion feature space;

[0290] Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as the guiding branch features respectively, the features are input into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature.

[0291] The appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature are fused to obtain a multi-source fusion feature;

[0292] Regression prediction is performed based on the multi-source fusion characteristics to output the predicted underflow grade of the target zinc coarsening cell.

[0293] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0294] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0295] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for predicting the grade of the bottom stream in a zinc roughing flotation cell, characterized in that, include: Acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; the process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate; The apparent features of the real-time foam image are extracted to obtain an apparent feature vector; The real-time foam image is subjected to fusion extraction of global and local features to obtain deep fusion features; The apparent feature vector and the process parameters are respectively converted into an apparent feature map and a process parameter feature map aligned with the size of the deep fusion feature space; Using the deep fusion feature as the main branch feature, and the appearance feature map and the process parameter feature map as the guiding branch features respectively, the features are input into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature. The appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature are fused to obtain a multi-source fusion feature; Regression prediction is performed based on the multi-source fusion characteristics to output the predicted underflow grade of the target zinc coarsening cell.

2. The method for predicting the grade of the bottom stream in a zinc roughing flotation cell according to claim 1, characterized in that, The step of fusing and extracting global and local features from the real-time foam image to obtain deep fused features specifically includes: Global semantic features of the real-time foam image are extracted using a Transformer network; The local structural features of the real-time foam image are extracted by a convolutional neural network. During the extraction process, the extracted texture features are used as a guiding signal to dynamically modulate and enhance the local features, resulting in texture-enhanced local features. The global semantic features and the texture-enhanced local features are fused using a global-local bidirectional dynamic cross-attention process to obtain the deep fused features.

3. The method for predicting the grade of the bottom stream in a zinc roughing flotation cell according to claim 2, characterized in that, The texture features are extracted in the following way: The real-time foam image is processed to obtain a grayscale foam image; A local gray-level co-occurrence matrix is ​​constructed for the foam gray-level image, and homogeneity, contrast, energy and correlation indices are calculated based on the local gray-level co-occurrence matrix to form a texture feature map.

4. The method for predicting the grade of the bottom stream in a zinc roughing flotation cell according to claim 2, characterized in that, The step of extracting the global semantic features of the real-time foam image using a Transformer network specifically includes: The real-time foam image is segmented into a sequence of image blocks; The image patch sequence is processed by a multi-layer Transformer encoder that incorporates a sparse self-attention mechanism to extract global semantic features.

5. The method for predicting the grade of the bottom stream in a zinc roughing flotation cell according to claim 1, characterized in that, The step of using the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as guiding branch features, respectively, and inputting them into the dynamic cross-attention fusion module for fusion to obtain the appearance-guided enhancement feature and the process parameter-guided enhancement feature, specifically includes: Align the deep fusion features with the appearance feature map in both spatial and channel dimensions; Using aligned deep fusion features as queries and aligned appearance feature maps as keys and values, cross-attention weights are calculated and the values ​​are weighted and aggregated to obtain appearance-guided intermediate features. The appearance-guided intermediate features are fused with the deep fusion features to output the appearance-guided enhanced features; Align the deep fusion features with the process parameter feature map in both spatial and channel dimensions; Using the aligned deep fusion features as queries and the aligned process parameter feature maps as keys and values, we calculate cross-attention weights and perform weighted aggregation on the values ​​to obtain intermediate features guided by process parameters. The intermediate features guided by the process parameters are fused with the deep fusion features to output the enhanced features guided by the process parameters.

6. The method for predicting the grade of the bottom stream in a zinc roughing flotation cell according to claim 1, characterized in that, The step of fusing the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain a multi-source fusion feature specifically includes: The appearance-guided enhancement features, the deep fusion features, and the process parameter-guided enhancement features are aggregated along the feature dimension to obtain aggregated multi-source features. The aggregated multi-source features are subjected to feature compression and channel integration to obtain the multi-source fused features.

7. The method for predicting the grade of the bottom stream of a zinc roughing flotation cell according to claim 1, characterized in that, The step of performing regression prediction based on the multi-source fusion features and outputting the predicted underflow grade of the target zinc coarsening cell specifically includes: The multi-source fusion features are globally spatially converged to obtain a compact feature vector; The compact feature vector is input into the multilayer perceptron regression head, and the predicted value of the undercurrent grade is output.

8. A device for predicting the grade of the bottom flow of a zinc roughing flotation cell, characterized in that, include: The data acquisition module is used to acquire real-time foam images and corresponding real-time process parameters of the target zinc roughing cell; The process parameters include at least one of slurry concentration, reagent dosage, aeration rate, liquid level, and flow rate; The appearance feature module is used to extract appearance features from the real-time foam image to obtain an appearance feature vector. The feature fusion module is used to extract global and local features from the real-time foam image to obtain deep fusion features. The feature map module is used to convert the apparent feature vector and the process parameters into an apparent feature map and a process parameter feature map that are aligned with the size of the deep fusion feature space, respectively. The feature enhancement module is used to use the deep fusion feature as the main branch feature and the appearance feature map and the process parameter feature map as the guiding branch features, respectively, and input them into the dynamic cross-attention fusion module for fusion to obtain appearance-guided enhancement features and process parameter-guided enhancement features. The multi-source fusion module is used to fuse the appearance-guided enhancement feature, the deep fusion feature, and the process parameter-guided enhancement feature to obtain the multi-source fusion feature; The regression prediction module is used to perform regression prediction based on the multi-source fusion features and output the predicted value of the underflow grade of the target zinc coarsening cell.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the zinc rough flotation cell bottom flow grade prediction method as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the zinc rough flotation cell underflow grade prediction method as described in any one of claims 1 to 7.