A high-temperature melt liquid level detection method and system based on multi-modal image fusion

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing infrared-visible image fusion technology, combined with feature registration and a hybrid CNN-Mamba network, the accuracy and stability issues of high-temperature melt level detection have been resolved, achieving high-precision, interference-resistant level detection suitable for complex industrial environments.

CN122199641APending Publication Date: 2026-06-12CENT SOUTH UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CENT SOUTH UNIV
Filing Date: 2026-05-18
Publication Date: 2026-06-12

Application Information

Patent Timeline

18 May 2026

Application

12 Jun 2026

Publication

CN122199641A

IPC: G06T7/33; G06T7/73; G06T5/50; G06T7/80; G06V10/44; G06V10/42; G06V10/54; G06V10/22; G06V10/764; G06V10/26; G06V10/80; G06V10/82; G06N3/0464; G06N3/0442; G06N3/048; G06N3/09

AI Tagging

Application Domain

Image enhancement Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing methods for detecting the liquid level of high-temperature melts suffer from problems such as low measurement accuracy, poor environmental adaptability, and high maintenance costs. In particular, it is difficult to achieve high-precision and stable liquid level detection under complex and interfering working conditions.

⚗Method used

Infrared and visible light cameras are used to acquire images simultaneously. Through multimodal image fusion technology, combined with feature registration network and hybrid CNN-Mamba fusion network, accurate registration and fusion of infrared and visible light images are achieved, eliminating the effects of dust, scattering and uneven illumination. A liquid level edge recognition and localization model is constructed, and liquid level is calculated by combining camera attitude and structural parameters.

🎯Benefits of technology

It achieves non-contact, high-precision, anti-interference, and long-term stable high-temperature melt level detection, and has high detection accuracy, strong environmental adaptability and long life.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122199641A_ABST

Patent Text Reader

Abstract

The application provides a high-temperature melt liquid level detection method based on multi-modal image fusion, which synchronously collects infrared and visible light images, uses a bidirectional cyclic consistency alignment method to complete modal registration, and effectively eliminates cross-modal differences and image misplacement problems.Aiming at complex interference such as dust, scattering and uneven illumination in smelting sites, a hybrid CNN-Mamba fusion framework is constructed to generate high-quality fusion images that can clearly display the liquid level and the edge of the slot.On this basis, an edge accurate identification and pixel distance calculation model is established, and combined with the tilt angle correction method of the oblique camera, the image pixel distance is accurately converted into the real vertical height.The application simultaneously solves the problems of image feature blurring, strong dust interference and measurement deviation caused by camera tilt under high-temperature working conditions, and realizes non-contact, high-precision and strong anti-interference online detection of high-temperature melt liquid level.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of automatic control technology for high-temperature industrial production processes, specifically relating to a method and system for detecting the liquid level of high-temperature melts based on multimodal image fusion. It is particularly suitable for non-contact online detection of the liquid level of high-temperature melts (such as molten aluminum in an aluminum electrolysis cell, molten steel in a ladle, etc.) in open molten pools. Background Technology

[0002] High-temperature melts refer to liquid or partially liquid systems formed when a substance transforms from a solid state at temperatures above its melting point. They are commonly found in high-temperature industrial production processes such as metallurgy, chemical engineering, and materials science, including molten steel in ladles and molten aluminum in aluminum electrolysis cells. The level of high-temperature melts is a crucial process parameter in industrial production, providing key information for optimizing and controlling process parameters during smelting. Therefore, high-temperature melt level detection is of great significance for ensuring safety in high-temperature industrial production and improving product quality.

[0003] Based on whether the detection equipment is in contact with the high-temperature molten material, existing methods for detecting the level of high-temperature molten materials can be categorized into two types: contact and non-contact. Contact level detection methods include air-blowing detection, rod-type probe detection, and weighted detection, while non-contact level detection methods mainly refer to laser ranging detection, radar detection, and infrared detection. In high-temperature environments, the vapors and fumes emitted by the molten material form mist near laser signals and radar antennas, strongly absorbing and scattering the laser signals and radar waves, interfering with the detection of the true level signal and increasing measurement errors. Furthermore, the dielectric constant of the high-temperature molten material fluctuates with temperature, weakening the radar reflection signal intensity and affecting detection stability. In addition, infrared detection results are affected by various factors such as the surface temperature distribution of the molten material, environmental radiation, and atmospheric attenuation, resulting in significant errors and making high-precision level detection difficult.

[0004] Chinese invention patent CN104198011A discloses a device and method for measuring the level of high-temperature molten metal in a metallurgical furnace. This patent employs a rod-type probe contact detection combined with encoder stroke measurement to measure the level of high-temperature molten metal in the furnace, and uses a PLC or DCS system to complete signal determination and level calculation. However, this patent does not fully consider the impact of high-temperature molten metal erosion, probe wear, and strong electromagnetic interference within the furnace on the stability of the contact measurement, thus limiting the reliability and lifespan of long-term continuous measurement.

[0005] Chinese invention patent CN101905303A discloses a method for detecting the liquid level of high-temperature molten metal using a laser sensor. This patent employs a laser displacement sensor aligned with the riser level setting point, and determines whether the high-temperature molten metal has reached the set level by detecting changes in the output voltage. The control system then stops the pouring process. However, this patent does not consider the interference of high-temperature molten metal radiation, smoke, and dust on the laser detection signal, making it difficult to achieve continuous, high-precision liquid level measurement; it can only achieve fixed-point liquid level determination.

[0006] Therefore, in view of the technical defects of existing high-temperature melt level detection methods and systems in open molten pools, such as low measurement accuracy, poor environmental adaptability and high maintenance cost, a high-temperature melt level detection method based on infrared-visible dual-modal image fusion is proposed. This method overcomes the interference of dust and high temperature and strong light at the smelting site on single non-contact image detection methods and realizes accurate online detection of high-temperature melt level. Summary of the Invention

[0007] This invention aims to propose a method and system for detecting the level of high-temperature molten metal based on multimodal image fusion. The invention utilizes an infrared camera and a visible light camera to simultaneously acquire infrared and visible light images containing the high-temperature molten metal region. Through infrared-visible light image fusion technology, it fully integrates the target thermal radiation characteristics of the infrared image and the texture details of the visible light image, effectively improving the feature recognition of the target region in complex environments. Based on this, a liquid level edge recognition and localization model based on image fusion features is established to eliminate the influence of dust, scattering, and uneven illumination on edge detection, achieving accurate identification of liquid level edges under complex interference conditions. Furthermore, by mapping the image pixels to the actual physical dimensions, combined with the camera attitude and aluminum electrolysis cell structural parameters, an online liquid level calculation model is constructed to complete the detection of the high-temperature molten metal level in an open molten pool. The high-temperature molten metal level detection method and system based on multimodal image fusion proposed in this invention can achieve non-contact, high-precision, strong anti-interference, and long-term stable liquid level detection, with advantages such as high detection accuracy, strong environmental adaptability, safety and reliability, and long service life.

[0008] The specific technical solution is as follows:

[0009] A method for detecting the liquid level of high-temperature melt based on multimodal image fusion includes the following steps:

[0010] S1. Acquire images containing high-temperature melt and the area at the opening of the aluminum electrolysis cell, and complete the accurate registration of the two modal images through a bidirectional cyclic consistency alignment method;

[0011] S2. Construct a dual-module framework that includes a feature registration network and a hybrid CNN-Mamba fusion network. After processing the registration features, generate a fused image with clearly displayed edges.

[0012] S3. In the generated fused image, identify the liquid level edge and the groove edge, count the number of pixels between them, and calculate the distance between them on the image. ;

[0013] S4. Perform tilt correction on the angle between the camera's optical axis and the horizontal direction. Combined with camera parameters, adjust the distance in the image. Converted to actual vertical height, the high-temperature melt level is detected.

[0014] Further, step S1 specifically includes:

[0015] S11. Use an infrared camera and a visible light camera to simultaneously acquire target images including the high-temperature melt and the slot area. The two cameras are triggered synchronously to ensure that the acquired infrared images and visible light images are completely matched in the time dimension.

[0016] S12. During the acquisition process, the infrared camera captures the infrared features of the high-temperature melt and the edge of the groove; the visible light camera acquires detailed information including the edge of the groove and the surface texture of the melt.

[0017] S13. After acquisition, the infrared image and the visible light image are transmitted and stored, and the image is processed by the feature registration network. The feature registration network learns the pixel-level correspondence through the modal alignment module by using a bidirectional cyclic consistency alignment method, while mitigating cross-modal differences.

[0018] Furthermore, the hybrid CNN-Mamba fusion network in step S2 includes:

[0019] Gated residual blocks are used to extract local texture features using residual convolutional blocks, and a gating mechanism based on global context is introduced to adaptively fuse visible light and infrared features.

[0020] Mamba fusion blocks model global semantic features using multilayer perceptrons and bidirectional state-space models to capture long-term cross-modal dependencies.

[0021] Gated fusion module: Gated weights are generated through fully connected layers and sigmoid activation function to adaptively fuse local and global features and generate a fused image.

[0022] Furthermore, step S3 specifically includes:

[0023] S31. The Canny edge detection algorithm is used to extract the edges of the fused image generated in step S2, the optimal threshold parameter is determined, and the complete contours of the liquid level edge and the groove edge are extracted.

[0024] S32. Perform pixel coordinate localization on the extracted liquid level edge and slot edge. Based on the image pixel coordinate system, collect the coordinate information of all pixels on the liquid level edge and slot edge respectively, and construct a set of liquid level edge pixel coordinates:

[0025]

[0026] And the set of pixel coordinates of the slot edge:

[0027]

[0028] Where n and m represent the total number of pixels on the two edge contours, and x and y correspond to the horizontal and vertical coordinates of each pixel in the image, respectively.

[0029] S33. Based on the actual diameter D of the slot, measure the number of pixels occupied by the slot diameter in the fused image. Calculate the actual length represented by each pixel:

[0030]

[0031] S34. Calculate the pixel distance between the liquid level edge and the groove edge using the Euclidean distance shortest method. The calibration was performed using a multiple averaging method.

[0032]

[0033] in, , , For the edge of the liquid level The horizontal and vertical coordinates of each pixel For the edge of the groove The horizontal and vertical coordinates of each pixel.

[0034] Further, step S4 specifically includes:

[0035] S41, Camera Tilt Calibration

[0036] A tilt sensor is rigidly connected to the camera to synchronously acquire the real-time tilt angle of the camera at the smelting site, thereby obtaining the angle between the camera's optical axis and the vertical direction. The measurement error is reduced by taking multiple measurements and averaging them, while the ambient temperature and vibration parameters are recorded during the calibration process.

[0037] S42. Camera internal and external parameter calibration

[0038] The camera was calibrated using Zhang Zhengyou's calibration method to obtain the camera intrinsic parameter matrix. With extrinsic matrix The intrinsic parameter matrix Including camera focal length Pixel size and Image principal point coordinates extrinsic matrix Includes camera rotation matrix Translation vector ;

[0039] S43. Parameter Verification and Calibration

[0040] Substitute the calibrated internal and external parameters into the standard test scenario, and verify the accuracy of the parameters by comparing the pixel distance of the target with the known real distance in the image. If the error exceeds 0.5%, recalibrate until the parameters meet the detection requirements.

[0041] S44. Convert the actual vertical height to obtain the liquid level parameters.

[0042] Combining the angle between the camera's optical axis and the vertical direction Internal and external parameters and distance on the image Construct a calculation model for the true vertical height H; first, calculate the image pixel distance based on the camera intrinsic parameters. Distance converted to camera coordinates The calculation formula is:

[0043]

[0044] Then, combining the angle between the camera's optical axis and the vertical direction... Distance in camera coordinate system Converted to actual vertical height Because the camera's optical axis forms an angle with the horizontal direction, the distance projection in the image has an angular deviation, which needs to be corrected using a cosine function. The final calculation formula is:

[0045]

[0046] The height of the electrolytic cell is known to be Then the liquid level The calculation formula is:

[0047] .

[0048] The present invention also provides a high-temperature melt level detection system based on multimodal image fusion, the system comprising:

[0049] Infrared and visible light cameras are mounted on the same bracket to simultaneously acquire infrared and visible light images of the high-temperature melt and the groove area.

[0050] Data transmission line, used to transmit the acquired infrared and visible light images to the computer system;

[0051] A computer system for performing the method described in any one of claims 1 to 5, performing image registration, fusion, edge recognition, distance calculation, and liquid level conversion.

[0052] Furthermore, the computer system includes:

[0053] The registration module is used to achieve bidirectional cyclic consistency alignment.

[0054] The fusion module is a dual-module framework that includes a feature registration network and a hybrid CNN-Mamba fusion network to generate fused images;

[0055] The recognition module is used to extract the liquid level edge and the groove edge and calculate the pixel distance;

[0056] The conversion module is used for tilt correction and actual height calculation.

[0057] Compared with the prior art, the present invention has at least the following beneficial effects:

[0058] 1. A multimodal image registration model based on bidirectional cyclic consistency alignment was constructed, which automatically completes the precise pixel-level alignment of infrared and visible light images, effectively eliminating cross-modal differences and image misalignment problems, and laying a precise foundation for subsequent fusion processing;

[0059] 2. A dual-module fusion framework of fusion feature registration network and hybrid CNN-Mamba fusion network was designed. The CNN captures local texture, the Mamba model models the global structure, and the gating mechanism suppresses redundant information to generate a high-quality fusion image that can clearly show the liquid level edge and the groove edge.

[0060] 3. A model integrating accurate image edge recognition and pixel distance calculation was constructed. Canny edge detection was used to extract target edges, and the pixel distance between the liquid level and the edge of the tank was calculated using the Euclidean distance shortest method. At the same time, a mapping relationship between pixels and real physical size is established to achieve accurate quantization of image distance;

[0061] 4. A tilt correction and true height conversion model for cameras with an angle between their optical axis and the horizontal direction was constructed. Combining the angle between the camera's optical axis and the vertical direction, and internal and external parameters, the image pixel distance was calculated. It accurately converts the liquid level into the true vertical height of the tank opening, enabling online precise detection of the high-temperature molten liquid level in an open molten pool;

[0062] 5. A method and system for online detection of high-temperature melt level based on infrared-visible light image fusion is proposed, which realizes online detection of high-temperature melt level under complex interferences such as dynamic dust and high-temperature radiation. Attached Figure Description

[0063] Figure 1 This is a flowchart of the high-temperature melt level detection method based on multimodal image fusion according to the present invention;

[0064] Figure 2 This is a schematic diagram of the high-temperature melt level detection system based on multimodal image fusion according to the present invention;

[0065] Figure 3 This is a graph showing the results of applying the present invention to the detection of the high-temperature aluminum liquid level in an aluminum electrolysis cell. Detailed Implementation

[0066] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of the application will be further described in detail below with reference to the accompanying drawings. The described embodiments are only a part of the embodiments involved in this invention. All non-innovative embodiments based on this invention by other researchers in the art are within the protection scope of this invention.

[0067] Example 1:

[0068] like Figure 1 As shown, this embodiment provides a method for detecting the liquid level of high-temperature melt based on multimodal image fusion, and the specific steps include:

[0069] S1. Acquire infrared and visible light images of the high-temperature melt and achieve image registration through a bidirectional cyclic consistency alignment method.

[0070] This invention employs an infrared camera and a visible light camera to simultaneously acquire target images containing the high-temperature melt and the groove area. The two cameras must maintain strict synchronous triggering to ensure complete temporal matching between the acquired infrared and visible light images, avoiding image misalignment caused by asynchronous acquisition. During acquisition, the infrared camera focuses on capturing the infrared features of the high-temperature melt and the groove edge; the visible light camera clearly captures details such as the groove edge and melt surface texture, providing support for subsequent edge recognition. After acquisition, the infrared and visible light images are transmitted to a computer in the monitoring room via a dedicated fiber optic network for storage and subsequent image processing. The feature registration network is driven by a bidirectional cyclic consistency strategy, learning pixel-level correspondences through a modal alignment module while mitigating cross-modal differences.

[0071] a) Modal alignment module

[0072] The modality alignment (MA) module performs learnable alignment from the source modality to the target modality. The MA module transforms the source modality features based on statistics from the target modality, thereby reducing distribution mismatch and promoting consistent feature fusion. The MA module estimates the source modality features. The per-channel mean and standard deviation are calculated. The source modality statistics are further modulated by learnable multi-layer perceptrons (MLPs) to generate content-adaptive alignment parameters. and Then, the source features are normalized using their own statistics, and re-normalized using learned statistics aligned with the target modality, as follows:

[0073]

[0074]

[0075]

[0076] in, Represents source modal features, This represents the target modal features.

[0077] To further enhance the local structural representation, the aligned source features are subjected to residual enhancement transformation through 3×3 convolution:

[0078]

[0079] in, It is the source feature for the final alignment.

[0080] b) Dense Flow Estimation Module

[0081] The dense flow estimation (DFE) module receives features modulated by the MA module as input and performs fine-grained alignment on the multi-scale features. Notably, the DFE module processes the input in both directions by inverting the input. and );by For example, the DFE module consists of a bottleneck layer and three progressively aligned layers. The bottleneck layer processes 1 / 16 scale features and contains a global correlation block and a flow predictor block, capturing global feature correspondences and generating a coarse flow field. The progressively aligned layers process 1 / 8, 1 / 4, and 1 / 2 scale features. Each layer consists of a local correlation block, a flow predictor block, and a flow refinement block, progressively refining the flow field. The flow predictor and refinement modules consist of 4 and 2 convolutional layers, respectively.

[0082] and It is defined as the input pyramid feature map. At the lowest 1 / 16 resolution, the global correlation block calculates the global correlation quantity.

[0083]

[0084] in, and This represents the corresponding spatial index.

[0085] Then the relevant volume The inputs are fed into the flow predictor and upsampling module to generate a coarse flow field. At a 1 / 8 scale, Used for Perform geometric twisting to produce twisted features Then, local correlation quantities The calculation is as follows:

[0086]

[0087] in, Indicates relative to a reference position The displacement is constrained within the search radius R. Next, Reshaped to match The spatial dimension is determined and connected to it. The connection result is input into the flow predictor block to evaluate the residual flow field. Finally, the upsampled and Connect the components and input them into the flow refinement module to achieve the final refined flow field. The 1 / 4 and 1 / 2 scales follow the same procedure as the 1 / 8 scale.

[0088] c) Two-way cyclic consistency alignment strategy

[0089] A bidirectional cyclic consistent alignment strategy includes forward and backward alignment paths. Explicitly modeling the spatial relationship between the direct path and the corresponding indirect path constrains the geometric consistency of the two registration directions, thereby improving the robustness and accuracy of cross-modal image alignment. Specifically, a pair of misaligned images in the real world is defined as... Application of random enhancement deformation Generate auxiliary images respectively and This process can be described as follows:

[0090]

[0091] in, This represents the deformation of random space. Based on this, we construct... or Triple images are used to achieve closed-loop spatial alignment paths in the forward and backward registration directions. Select triple images. As an example.

[0092] Forward Consistency Path: A forward consistency path refers to the path from the image... To its enhanced version Direct spatial alignment, the path should be aligned with the intermediate image. indirect path Consistent. In practical modeling, the registration network predicts the flow field. , representing from the original image To enhance images The direct spatial transformation. Simultaneously, the registration network also predicts... and They combine to form an indirect transformation along an indirect path. Considering the nonlinearity of the flow field, synthesis is achieved through forward sampling, where... The sampling location is determined by Provided. This can be expressed as:

[0093]

[0094] in, This represents the composition of the flow field. Under ideal conditions, the direct and indirect spatial alignment paths should be consistent, indicating that the network has thoroughly learned the spatial correspondences between multimodal images. To achieve this, a forward consistency loss is designed to supervise network training, as follows:

[0095]

[0096] in, and express Through the flow The distorted sample. For simplicity of notation, the subscript is omitted in the subsequent derivation and denoted as . The first term promotes convergence and tends to yield smooth solutions, while the second term is used to learn realistic motion patterns and appearance changes.

[0097] Backward Consistency Path: To enhance the invertibility of spatial transformations, backward consistency paths are constructed to learn more accurate spatial correspondences. Similar to forward consistency, this involves improving the image... Return to original image The direct spatial correspondence should be related to the indirect path Consistent. Specifically, the registration network first predicts the direct flow. Then predict the flow and Forming an indirect composite flow This composite process also performs coordinate transformation through pixel sampling. The backward consistency loss is defined as follows:

[0098]

[0099] in, and express Through the flow Distorted samples. Indicated The reverse flow maps coordinates from the target domain back to the source domain.

[0100] Probabilistic Modeling: Accurately modeling spatial deformation is inherently difficult due to modal differences, often leading to ambiguous correspondences between images. To more effectively capture the uncertainties in the image registration process, the bidirectional consistency constraint is reformulated as a probabilistic modeling framework to enhance modal robustness in complex scenes. For clarity, the forward path is used as an example; the backward path is formulated in a similar manner. In the forward consistency loss, the direct flow... and composite flow It is modeled as a probability distribution. Specifically, it is assumed that the flow field... It follows a two-dimensional Gaussian distribution and can be represented as:

[0101]

[0102] in, and Let N represent the mean and variance of the flow field, respectively. N represents a standard normal distribution with zero mean and zero unit variance. Then, following the principle of maximum log-likelihood estimation, the negative log-likelihood loss of the flow field under the Gaussian model is minimized:

[0103]

[0104] in, This represents the loss function after probabilistic modeling.

[0105] For composite flow Accurately modeling its probability distribution is challenging. Therefore, the conditional independence assumption is introduced, where... and Assuming that are independent random variables, defined as:

[0106]

[0107] Under this assumption, the composite flow also follows a two-dimensional Gaussian distribution, and its mean and covariance are the sum of the means and covariances of the individual flows:

[0108]

[0109] Based on the above equation, the formula for the negative log-likelihood loss of the composite flow field is as follows:

[0110]

[0111] Finally, the forward and backward consistency losses are combined to form the overall consistency constraint term. :

[0112]

[0113] S2. Construct a dual-module framework, process registration features, and generate high-quality fused images.

[0114] A hybrid CNN-Mamba fusion network framework was constructed, focusing on deep processing of the alignment features obtained during the registration stage. The fusion network leverages the local perception advantages of the CNN module to accurately capture local texture details and feature information of the liquid level and trough edges in the image; it utilizes the efficient global modeling capabilities of the Mamba module to fully mine the global structural features of the image, achieving an organic fusion of local texture details and global structural information. Subsequently, a gated fusion block adaptively combines features from the two branches, balancing local and global information. The final result is a high-quality fused image that clearly displays the liquid level edges and trough edges.

[0115] a) Hybrid CNN-Mamba module

[0116] Gated residual blocks: In a CNN branch, given aligned visible and infrared features... and First, residual convolutional blocks (ResBlocks) are used to extract local features:

[0117]

[0118] Here, Res represents ResBlock. Then, an extraction feature is adaptively fused using a global context-based gating mechanism:

[0119]

[0120] Among them, weight Dynamic predictions are made by two convolutional layers to adaptively balance the contributions of visible and infrared features.

[0121] Mamba Fusion Block: Mamba excels at modeling long sequences and capturing global context. Building on this capability, a Mamba fusion module is proposed to model cross-modal semantic relationships. Specifically, global semantic features are first extracted using an MLP, and then processed using a 1×1 convolution.

[0122]

[0123] The sequences of the two modalities are then concatenated and input into the SSM, where a bidirectional scanning strategy is employed to capture the long-term dependencies between the two modalities. This process can be described as follows:

[0124]

[0125] in, This indicates a concatenation operation. Simultaneously, the activated features are fused with the global features element-wise:

[0126]

[0127] Where Act represents the activation function. The final output of the Mamba branch is calculated as follows:

[0128]

[0129] b) Gated fusion module

[0130] The gated fusion module learns the relative importance between local and global features. This module enhances the consistency of the overall structure while preserving fine details, enabling the fused image to achieve high visibility and better cross-modal semantic representation. The gated fusion module integrates local CNN features... and global Mamba features As input, the two features are first concatenated and passed through two 1×1 convolutions to achieve initial fusion of local and global features. Next, the fused features are processed by a fully connected layer and then subjected to sigmoid activation to generate gated weights. These weights adaptively control the contributions of CNN and Mamba features to the fused image. The entire process can be formalized as follows:

[0131]

[0132] in, For a fully connected layer, Sigmoid represents the sigmoid activation function.

[0133] Finally, the fusion features The input reconstruction layer generates a fused image. The reconstruction layer consists of two 3×3 convolutional layers that gradually integrate local textures and global structure, ensuring a balance between detail and overall visual quality.

[0134] S3. Identify edges in the fused image, count the number of pixels, and calculate the distance on the image.

[0135] In the generated high-quality fused image, the edges of the liquid level and the groove are accurately identified, the number of pixels between them is counted, and the distance between them in the image is calculated. Specifically, it includes the following steps:

[0136] S31. The Canny edge detection algorithm is used to extract the edges of the high-quality fused image generated in step S2. The optimal threshold parameter is determined by repeated debugging to accurately extract the complete contours of the liquid level edge and the groove edge, effectively removing false edges, broken edges and irrelevant background edges, and obtaining two continuous and clear target edge contours.

[0137] S32. Perform pixel coordinate localization on the two extracted edge contours. Based on the image pixel coordinate system, collect the coordinate information of all pixels on the liquid level edge and the groove edge respectively, and construct the liquid level edge pixel coordinate set:

[0138]

[0139] And the set of pixel coordinates of the slot edge:

[0140]

[0141] Where n and m represent the total number of pixels on the two edge contours, and x and y correspond to the horizontal and vertical coordinates of each pixel in the image.

[0142] S33. Given the true diameter D of the slot, measure the number of pixels occupied by the slot diameter in the fused image. Calculate the actual length represented by each pixel:

[0143]

[0144] This coefficient is used to subsequently convert pixel distance into actual physical distance.

[0145] S34. Count the number of pixels between two edges and calculate the distance on the image. To ensure calculation accuracy, the shortest distance method is employed. This involves traversing all pixels in both pixel coordinate sets and calculating the Euclidean distance between each pixel on the liquid level edge and each pixel on the groove edge. The formula is as follows:

[0146]

[0147] in, , , For the edge of the liquid level The horizontal and vertical coordinates of each pixel For the edge of the groove The horizontal and vertical coordinates of each pixel. To further improve the reliability of the calculation, the obtained distance... Verification and calibration are performed to eliminate outliers caused by edge extraction deviations. Random errors are reduced by repeatedly calculating and averaging the results, ensuring the accuracy of distance measurements in the image. The calculation accuracy.

[0148] S4. For cameras where the optical axis forms an angle with the horizontal direction, perform tilt correction, and combine this with camera parameters to adjust the distance... Converted to actual height for complete detection

[0149] For cameras with an angle between their optical axis and the horizontal direction, tilt correction is performed to obtain the angle between the camera's optical axis and the vertical direction, as well as internal and external parameters, and the image distance is then calculated. Converting the measured value to the actual vertical height and completing the high-temperature molten metal level detection involves the following steps:

[0150] S41, Camera Tilt Calibration

[0151] A high-precision tilt sensor is rigidly connected to the camera to synchronously acquire the camera's real-time tilt angle at the smelting site, with a focus on obtaining the angle between the camera's optical axis and the vertical direction. (Unit: degrees) Measurement errors are reduced by taking multiple measurements and averaging them to ensure that the tilt angle measurement accuracy is controlled within ±0.1°. At the same time, the ambient temperature and vibration parameters during the calibration process are recorded for subsequent error compensation.

[0152] S42. Camera internal and external parameter calibration

[0153] The camera was calibrated using Zhang Zhengyou's calibration method to obtain the camera intrinsic parameter matrix. With extrinsic matrix The intrinsic parameter matrix Including camera focal length Pixel size and Image principal point coordinates extrinsic matrix Includes camera rotation matrix Translation vector It is used to describe the position and attitude of the camera in the world coordinate system. During the calibration process, multiple calibration plates with different angles and positions are selected to ensure the universality and accuracy of the calibration parameters.

[0154] S43. Parameter Verification and Calibration

[0155] Substitute the calibrated internal and external parameters into the standard test scenario, and verify the accuracy of the parameters by comparing the pixel distance of the target with the known real distance in the image. If the error exceeds the allowable range (≤0.5%), recalibrate until the parameters meet the detection requirements.

[0156] S44. Convert the actual vertical height to obtain the liquid level parameters.

[0157] Combining the angle between the camera's optical axis and the vertical direction Internal and external parameters and distance on the image A calculation model for the true vertical height H is constructed. First, the pixel distance of the image is calculated based on the camera intrinsic parameters. Distance converted to camera coordinates The calculation formula is:

[0158]

[0159] Then, combining the angle between the camera's optical axis and the vertical direction... Distance in camera coordinate system Converted to actual vertical height Because the camera's optical axis forms an angle with the horizontal direction, the distance projection in the image has an angular deviation, which needs to be corrected using a cosine function. The final calculation formula is:

[0160]

[0161] The height of the electrolytic cell is known to be Then the liquid level The calculation formula is:

[0162]

[0163] S45. Regression analysis is performed between the liquid level detection results obtained by the present invention and the actual probe results.

[0164] Under the same working conditions and time points, collect no less than 30 sets of sample data covering different liquid levels and camera attitudes. After outlier removal and standardization, use linear regression to build a model and solve for the regression parameters. Calculate the goodness of fit by measuring indicators such as the coefficient of determination and root mean square error. If the preset standard is not met, adjust the relevant parameters.

[0165] Example 2:

[0166] Based on the above method, this embodiment uses an aluminum electrolysis cell as the experimental object. The aluminum electrolysis cell 1 contains high-temperature molten aluminum, and the method involved in Embodiment 1 is applied to this object. An infrared camera 2 and a visible light camera 3 are mounted on a bracket 4 above the side of the aluminum electrolysis cell 1, arranged in an inclined fixed manner. The optical axes of the cameras are at a certain downward angle relative to the horizontal plane and both point towards the molten pool area inside the aluminum electrolysis cell 1. Simultaneously, infrared and visible light images containing the edge of the molten aluminum level and the edge of the cell opening are acquired, constructing the image as shown in the attached figure. Figure 2 The detection system shown includes:

[0167] Infrared camera 2 and visible light camera 3 are mounted on the same bracket 4 to simultaneously acquire infrared and visible light images of the high-temperature melt and the groove area.

[0168] Data transmission line 5 is used to transmit the acquired infrared and visible light images to the computer system 6;

[0169] Computer system 6 is used to execute the method in Example 1 to complete image registration, fusion, edge recognition, distance calculation, and liquid level conversion. Computer system 6 includes:

[0170] The registration module is used to achieve bidirectional cyclic consistency alignment.

[0171] The fusion module is a dual-module framework that includes a feature registration network and a hybrid CNN-Mamba fusion network to generate fused images;

[0172] The recognition module is used to extract the liquid level edge and the groove edge and calculate the pixel distance;

[0173] The conversion module is used for tilt correction and actual height calculation.

[0174] Test results as follows Figure 3 As shown in the figure, the horizontal axis represents time, the vertical axis represents the liquid level value, the solid line represents the detection result of this invention, and the dashed line represents the actual measurement result of the probe. The two are highly consistent.

[0175] The above description is only a preferred embodiment of the present invention. For those skilled in the art, several modifications and substitutions can be made without departing from the principles and concepts of the present invention, and all equivalent modifications and substitutions should fall within the protection scope of the present invention.

Claims

1. A method for detecting the liquid level of high-temperature melt based on multimodal image fusion, characterized in that, Includes the following steps: S1. Acquire images containing high-temperature melt and the area at the opening of the aluminum electrolysis cell, and complete the accurate registration of the two modal images through a bidirectional cyclic consistency alignment method; S2. Construct a dual-module framework that includes a feature registration network and a hybrid CNN-Mamba fusion network. After processing the registration features, generate a fused image with clearly displayed edges. S3. In the generated fused image, identify the liquid level edge and the groove edge, count the number of pixels between them, and calculate the distance between them on the image. ; S4. Perform tilt correction on the angle between the camera's optical axis and the horizontal direction. Combined with camera parameters, adjust the distance in the image. Converted to actual vertical height, the high-temperature melt level is detected.

2. The method for detecting the liquid level of high-temperature melt based on multimodal image fusion according to claim 1, characterized in that, Step S1 specifically includes: S11. Use an infrared camera and a visible light camera to simultaneously acquire target images including the high-temperature melt and the slot area. The two cameras are triggered synchronously to ensure that the acquired infrared images and visible light images are completely matched in the time dimension. S12. During the acquisition process, the infrared camera captures the infrared features of the high-temperature melt and the edge of the groove; the visible light camera acquires detailed information including the edge of the groove and the surface texture of the melt. S13. After acquisition, the infrared image and the visible light image are transmitted and stored, and the image is processed by the feature registration network. The feature registration network learns the pixel-level correspondence through the modal alignment module by using a bidirectional cyclic consistency alignment method, while mitigating cross-modal differences.

3. The method for detecting the liquid level of high-temperature melt based on multimodal image fusion according to claim 1, characterized in that, The hybrid CNN-Mamba fusion network in step S2 includes: The gated residual block uses residual convolutional blocks to extract local texture features and introduces a global context-based gating mechanism to adaptively fuse visible light and infrared features. Mamba fusion blocks model global semantic features using multilayer perceptrons and bidirectional state-space models to capture long-term cross-modal dependencies. Gated fusion module: Gated weights are generated through fully connected layers and sigmoid activation function to adaptively fuse local and global features and generate a fused image.

4. The method for detecting the liquid level of high-temperature melt based on multimodal image fusion according to claim 1, characterized in that, Step S3 specifically includes: S31. The Canny edge detection algorithm is used to extract the edges of the fused image generated in step S2, the optimal threshold parameter is determined, and the complete contours of the liquid level edge and the groove edge are extracted. S32. Perform pixel coordinate localization on the extracted liquid level edge and slot edge. Based on the image pixel coordinate system, collect the coordinate information of all pixels on the liquid level edge and slot edge respectively, and construct a set of liquid level edge pixel coordinates: ； And the set of pixel coordinates of the slot edge: ； Where n and m represent the total number of pixels on the two edge contours, and x and y correspond to the horizontal and vertical coordinates of each pixel in the image, respectively. S33. Based on the actual diameter D of the slot, measure the number of pixels occupied by the slot diameter in the fused image. Calculate the actual length represented by each pixel: ； S34. Calculate the pixel distance between the liquid level edge and the groove edge using the Euclidean distance shortest method. The calibration was performed using a multiple averaging method. ； in, , , For the edge of the liquid level The horizontal and vertical coordinates of each pixel For the edge of the groove The horizontal and vertical coordinates of each pixel.

5. The method for detecting the liquid level of high-temperature melt based on multimodal image fusion according to claim 1, characterized in that, Step S4 specifically includes: S41, Camera tilt calibration; A tilt sensor is rigidly connected to the camera to synchronously acquire the real-time tilt angle of the camera at the smelting site, thereby obtaining the angle between the camera's optical axis and the vertical direction. The measurement error is reduced by taking multiple measurements and averaging them, while the ambient temperature and vibration parameters are recorded during the calibration process. S42. Camera internal and external parameter calibration; The camera was calibrated using Zhang Zhengyou's calibration method to obtain the camera intrinsic parameter matrix. With extrinsic matrix The intrinsic parameter matrix Including camera focal length Pixel size and Image principal point coordinates extrinsic matrix Includes camera rotation matrix Translation vector ; S43. Parameter verification and calibration; Substitute the calibrated internal and external parameters into the standard test scenario, and verify the accuracy of the parameters by comparing the pixel distance of the target with the known real distance in the image. If the error exceeds 0.5%, recalibrate until the parameters meet the detection requirements. S44. Convert the actual vertical height to obtain the liquid level parameters; Combining the angle between the camera's optical axis and the vertical direction Internal and external parameters and distance on the image Construct a calculation model for the true vertical height H; first, calculate the image pixel distance based on the camera intrinsic parameters. Distance converted to camera coordinates The calculation formula is: ； Then, combining the angle between the camera's optical axis and the vertical direction... Distance in camera coordinate system Converted to actual vertical height Because the camera's optical axis forms an angle with the horizontal direction, the distance projection in the image has an angular deviation, which needs to be corrected using a cosine function. The final calculation formula is: ； The height of the electrolytic cell is known to be Then the liquid level The calculation formula is: 。 6. A high-temperature melt level detection system based on multimodal image fusion, characterized in that, The system includes: Infrared and visible light cameras are mounted on the same bracket to simultaneously acquire infrared and visible light images of the high-temperature melt and the groove area. Data transmission line, used to transmit the acquired infrared and visible light images to the computer system; A computer system for performing the method described in any one of claims 1 to 5, performing image registration, fusion, edge recognition, distance calculation, and liquid level conversion.

7. A high-temperature melt level detection system based on multimodal image fusion according to claim 6, characterized in that, The computer system includes: The registration module is used to achieve bidirectional cyclic consistency alignment. The fusion module is a dual-module framework that includes a feature registration network and a hybrid CNN-Mamba fusion network to generate fused images; The recognition module is used to extract the liquid level edge and the groove edge and calculate the pixel distance; The conversion module is used for tilt correction and actual height calculation.