Infrared image data processing method and device

CN118229535BActive Publication Date: 2026-06-26PETROCHINA CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PETROCHINA CO LTD
Filing Date
2022-12-15
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing infrared spectroscopy cameras, due to their narrow-band filters, result in low signal volume and limited grayscale range in gas leak monitoring, leading to unclear visualization of infrared images. This is especially true in situations with high dynamic range and the presence of objects with temperature differences, where the images appear overly gray or overly dark.

Method used

A cooled infrared camera equipped with a narrowband filter is used to acquire 16-bit infrared image data, which is then processed by a Transformer-based Unet neural network. 8-bit visualization image data is generated by training with a pre-built historical sample dataset. By combining self-attention encoding and residual connections, adaptive dynamic range stretching of the image is achieved.

Benefits of technology

It achieves clear visualization of narrowband infrared images, reduces the influence of temperature differences on objects, improves image recognition and sensory effects, enhances image similarity, and reduces distortion.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118229535B_ABST
    Figure CN118229535B_ABST
Patent Text Reader

Abstract

The application discloses an infrared image data processing method and device, wherein the method comprises the following steps: obtaining current 16bit infrared image data through a refrigeration infrared camera equipped with a narrowband filter; inputting the current 16bit infrared image data into a Unet neural network based on a Transformer to obtain current 8bit visualized image data; the Unet neural network is generated by pre-training based on a historical sample data set; each historical sample data in the historical sample data set comprises an image pair composed of a historical 16bit infrared image data and a historical 8bit visualized image data; the historical 16bit infrared image data is obtained by using the refrigeration infrared camera equipped with the narrowband filter; and the historical 8bit visualized image data is obtained by processing the historical 16bit infrared image data. The application can realize clear image visualized display of the narrowband infrared image data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to an infrared image data processing method and apparatus. Background Technology

[0002] This section is intended to provide background or context for the embodiments of the invention set forth in the claims. The description herein is not an admission that it is prior art simply because it is included in this section.

[0003] Strong infrared radiation is generated during the use of special-purpose lighting equipment or high-temperature operations in industrial and agricultural production, scientific research, and medicine. Infrared monitoring technology can be used to monitor, track, and mark these targets. Taking industrial safety monitoring as an example, the demand for industrial safety monitoring technology has become increasingly urgent in recent years. Gas leaks in chemical production can cause explosions, fires, and other consequences, resulting in loss of life and property. Using infrared spectral cameras to monitor leaked gases is a relatively advanced method. Because most gases have characteristic spectra in the infrared band, using infrared spectral equipment to monitor gas leaks can clearly detect gas gushing out and can monitor different locations and temperatures to determine the danger.

[0004] However, when using infrared spectroscopy equipment for gas leak monitoring, the imaging has a high dynamic range, thus requiring stretching of the infrared camera image. Because the output bias of an infrared camera changes with its internal temperature and the ambient temperature, it's impossible to achieve a good visualization of the infrared image by fixing the stretching range. Furthermore, during monitoring, objects with significant temperature differences may exist within the monitoring range. These objects can easily cause the infrared image to appear overly gray or overly dark, making it difficult to clearly display other objects besides the affected object in the visualized image.

[0005] In particular, to increase the contrast of gas in the image, a narrowband filter is often needed to capture and display the characteristic spectrum. When a narrowband filter is used in a cooled infrared camera, the signal strength is very low, and the corresponding effective grayscale range may only be a few hundred grayscale levels. This further hinders the clear visualization of the infrared image. Summary of the Invention

[0006] This invention provides an infrared image data processing method for achieving clear image visualization of narrowband infrared image data. The method includes:

[0007] The current 16-bit infrared image data is obtained by using a cooled infrared camera equipped with a narrowband filter;

[0008] The current 16-bit infrared image data is input into the Transformer-based Unet neural network to obtain the current 8-bit visualization image data. The Transformer-based Unet neural network is pre-trained using a pre-built historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data.

[0009] This invention also provides an infrared image data processing device for clearly visualizing narrowband infrared image data. The device includes:

[0010] The acquisition unit is used to acquire current 16-bit infrared image data through a cooled infrared camera equipped with a narrowband filter;

[0011] The image processing unit is used to input the current 16-bit infrared image data into the Transformer-based Unet neural network to obtain the current 8-bit visualization image data. The Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data.

[0012] This invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the above-described infrared image data processing method.

[0013] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described infrared image data processing method.

[0014] This invention also provides a computer program product, which includes a computer program that, when executed by a processor, implements the above-described infrared image data processing method.

[0015] In this embodiment of the invention, the infrared image data processing scheme, compared with existing technologies that cannot clearly visualize infrared images, achieves clear image visualization of narrowband infrared image data by: obtaining current 16-bit infrared image data using a cooled infrared camera equipped with a narrowband filter; inputting the current 16-bit infrared image data into a Transformer-based Unet neural network to obtain current 8-bit visualized image data; the Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset, where each historical sample data includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualized image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualized image data is obtained by processing the historical 16-bit infrared image data. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. In the drawings:

[0017] Figure 1 This is a flowchart illustrating the infrared image data processing method in an embodiment of the present invention.

[0018] Figure 2 This is a schematic diagram of the network structure used in the infrared image processing method in this embodiment of the invention;

[0019] Figure 3a This is the image after decryption of the output image from the cooled infrared camera configured with a narrowband filter in this embodiment of the invention;

[0020] Figure 3b for Figure 3a The histogram of the image shown;

[0021] Figure 3c To Figure 3a The resulting visualized image after processing;

[0022] Figure 4a This is a schematic diagram comparing the results of image processing in different ways in an embodiment of the present invention;

[0023] Figure 4b for Figure 4a Comparison of metrics results for different image processing methods;

[0024] Figure 5 This is a schematic diagram of the infrared image processing device in an embodiment of the present invention;

[0025] Figure 6 This is a schematic diagram of the structure of the computing device in an embodiment of the present invention. Detailed Implementation

[0026] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings. Here, the illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but are not intended to limit the present invention.

[0027] Figure 1 This is a flowchart illustrating the infrared image data processing method in an embodiment of the present invention, as shown below. Figure 1 As shown, the method includes the following steps:

[0028] Step 101: Obtain the current 16-bit infrared image data using a cooled infrared camera equipped with a narrowband filter;

[0029] Step 102: Input the current 16-bit infrared image data into the Transformer-based Unet neural network to obtain the current 8-bit visualization image data; the Transformer-based Unet neural network is pre-trained using a pre-built historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data.

[0030] The infrared image data processing method provided in this embodiment of the invention operates as follows: 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter; the current 16-bit infrared image data is input into a Transformer-based Unet neural network to obtain current 8-bit visual image data, which is used for real-time imaging during gas leak monitoring; the Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset, where each historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visual image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visual image data is obtained by processing the historical 16-bit infrared image data.

[0031] Compared with existing technologies that cannot clearly visualize infrared images, the infrared image data processing method provided in this invention can achieve clear image visualization of narrowband infrared image data. The infrared image data processing method will now be described in detail.

[0032] The infrared image data processing method according to embodiments of the present invention includes: constructing a historical sample dataset and training a Transformer-based Unet neural network; each historical sample dataset consists of a historical 16-bit infrared image data and a historical 8-bit visualization image data; obtaining the historical 16-bit infrared image data using a cooled infrared camera equipped with a narrowband filter, the narrowband filter having a filtering range of 3.2µm-3.25µm; and processing the historical 16-bit infrared image data into the historical 8-bit visualization image data; after training the Transformer-based Unet neural network, inputting the 16-bit current infrared image data obtained through the cooled infrared camera into the trained neural network to obtain 8-bit current visualization image data. Embodiments of the present invention can achieve relatively clear image visualization display of narrowband infrared image signals.

[0033] This invention provides an infrared image data processing method, the method comprising:

[0034] First, a historical sample dataset is pre-constructed, and a Transformer-based Unet neural network is trained using this dataset. Each historical sample in the dataset consists of an image pair composed of a historical 16-bit infrared image and a historical 8-bit visualization image. The historical 16-bit infrared image is obtained using a cooled infrared camera equipped with a narrowband filter, the filter range of which is 3.2µm-3.25µm. In one embodiment, the filter range of the narrowband filter is 3.2µm-3.25µm. The historical 16-bit infrared image is then processed into the 8-bit historical visualization image.

[0035] Secondly, after training the Transformer-based Unet neural network, the current 16-bit data obtained through the cooled infrared camera will be processed. Figure 2 (The 14 bits in the text refer to the output bit depth. In this embodiment of the invention, 16 bits are used for image reading, which means the same thing.) Infrared image data is input into the pre-trained Transformer-based Unet neural network to obtain the current 8-bit visualization image data output by the neural network.

[0036] As described above, the difference between this embodiment of the invention and the conventional Unet based on convolution lies in replacing the convolution operator with a Transformer module. Compared to the convolution-based Unet, the convolutional structure has a limited receptive field, and the convolutional kernel has difficulty adapting to the input content. The Unet based on the Transformer module does not have this problem; the Transformer module can capture global dependencies and perform adaptive dynamic range stretching. Furthermore, because the Transformer module emphasizes spatial local context information while implicitly modeling the contextual relationships between pixels, it is very helpful for reducing the darkness of images caused by locally high-emissivity objects in infrared scenes.

[0037] In one embodiment, the Transformer-based Unet neural network includes: a downsampling portion composed of each Transformer module in sequence, and an upsampling portion composed of each Transformer module in sequence.

[0038] exist Figure 2 In the diagram, the three arrows of the upsampling module refer to the arrows corresponding to the three Transformer modules on the right (TransformerBlock1, Transformer Block2, Transformer Block3), and the three arrows of the downsampling module refer to the arrows corresponding to the three Transformer modules on the left. The three Transformer modules on the right and their corresponding arrows constitute the upsampling part, and the three Transformer modules on the left and their corresponding arrows constitute the downsampling part.

[0039] In one embodiment, when the Transformer module is executed, it includes the following operations:

[0040] For the input feature map (16-bit infrared image data) vector Generate a query vector, a key vector, and a value vector;

[0041] After reshaping the vector matrices of Query and Key using a row-reshape operation, a dot product is performed to generate an attention map. The size of this attention map is [size missing].

[0042] The attention map is multiplied by the value vector matrix, and then reshaped using a row reshape operation to restore the original image size.

[0043] Size The original image is added to the input feature map X to obtain the feature map output by the Transformer module. This feature map Visualize image data in 8-bit format.

[0044] In specific implementation, the Transformer module of this invention has been optimized, and the Transformer module has been designed as follows: To process, rather than traditional Its size and processing time are several times the square root of the traditional Transformer module.

[0045] In one embodiment, in the Transformer-based Unet neural network: skip connections (e.g., ...) are used in the corresponding downsampling and upsampling stages. Figure 2 The three horizontal arrows shown (the first arrow between downsampled Transformer Block 1 and upsampled Transformer Block 1, the second arrow between downsampled Transformer Block 2 and upsampled Transformer Block 2, and the third arrow between downsampled Transformer Block 3 and upsampled Transformer Block 3) are used for fusion; and / or residual connections are set between the input and output of the neural network (with... Figure 2 (The jumper connections are the same).

[0046] In practice, skip connections are used for fusion (e.g., fusion via splicing), so that the final recovered feature map incorporates more low-level features. Specifically, residual connections allow the UNet network to learn the difference between mapping 16-bit infrared image data to 8-bit visualization image data.

[0047] The Transformer-based Unet neural network in this embodiment of the invention may include:

[0048] The downsampling section is composed of each Transformer module sequentially, and the upsampling section is also composed of each Transformer module sequentially; skip connections are used in the corresponding downsampling and upsampling stages (e.g., ...). Figure 2 The jump connections shown are used for fusion; residual connections are provided between the input and output of the neural network;

[0049] When the Transformer module is executed, it includes:

[0050] For the input feature map vector Generate a query vector, a key vector, and a value vector;

[0051] After reshaping the vector matrices of Query and Key using a row-reshape operation, a dot product is performed to generate an attention map. The size of this attention map is [size missing].

[0052] The attention map is multiplied by the value vector matrix, and then reshaped using a row reshape operation to restore the original image size.

[0053] The feature map output by the Transformer module is obtained by adding it to the input feature map X.

[0054] The term "an embodiment" or "an embodiment" as used in this specification means that a particular feature, structure, or characteristic described in conjunction with that embodiment is included in at least one embodiment of the invention. Therefore, the terms "in one embodiment" or "in an embodiment" appearing throughout this specification do not necessarily refer to the same embodiment, but may refer to the same embodiment. Furthermore, in one or more embodiments, the particular features, structures, or characteristics can be combined in any suitable manner, as will be apparent to those skilled in the art from this disclosure.

[0055] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. In case of any inconsistency, the meaning set forth in this specification or derived from the content described herein shall prevail. Furthermore, the terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application. To accurately describe the technical content of this application and to accurately understand the invention, the following explanations or definitions of the terms used in this specification are provided before describing specific embodiments:

[0056] 1) Unet: also known as U-shaped network, which includes downsampling and upsampling parts. Since the overall structure of the downsampling and upsampling network is similar to a U shape, it is called U-shaped network.

[0057] In the corresponding stages of downsampling and upsampling, UNet can use skip connections for fusion (e.g., fusion by concatenation), so that the final recovered feature map incorporates more low-level features.

[0058] 2) Transformer module: abbreviated as TRM module, translator module, generally includes self-attention layer and feedforward network side.

[0059] 3) Self-attention encoding: Encodes the input vector based on a self-attention function.

[0060] The first step in computing self-attention is to generate three vectors from the input vector of each encoder. That is, for each input vector, create a query vector, a key vector, and a value vector. These three vectors are created by multiplying each input vector by a weight matrix, which is obtained through training the neural network.

[0061] The second step in calculating self-attention is to calculate and normalize the attention score. The attention score is calculated using a computational model of the query vector Q and key vector K. Specifically, for the query vector matrix Q, key vector matrix K, and value vector matrix V, one of the following computational models can be used to calculate the attention score matrix:

[0062] a) Dot product model: softmax(QK) T The specific implementation method described below uses this calculation model.

[0063] b) Scaling dot product model: Where, d k It is the scaling factor, which is a constant, such as 6, 8, 9, etc.

[0064] c) Additive model: softmax(tanh(WK+UQ)), where W and U are learnable parameters.

[0065] d) Bilinear model. softmax(K T WQ), where W is a learnable parameter.

[0066] The third step in calculating self-attention is to weight the value vector V with attention scores and sum them.

[0067] The self-attention function Attention, when described by a formula and represented by a matrix vector, can be expressed as follows:

[0068] Attention(Q,K,V)=V·softmax(Q,K)

[0069] 4) Training and Inference Processes. The training process refers to training the neural network using a sample dataset. Network training includes supervised learning, unsupervised learning, methods based on loss function optimization (e.g., using gradient descent), and adversarial training methods with a discriminator. The inference process is the process of using the trained neural network. Unless otherwise specified, the network mentioned in this application refers to the inference process.

[0070] To facilitate understanding of how this invention is implemented, detailed examples are provided below.

[0071] The infrared image processing method provided in this invention converts 16-bit infrared image data into 8-bit visualized image data. It can discriminate within the 16-bit infrared image data, identifying the effective signal ranges for the main scene and objects with significant temperature differences. By utilizing these effective signal ranges to process the infrared image while maintaining stable information boundaries, it obtains a visualized image (8-bit visualized image data) that fuses the main scene and objects with significant temperature differences. This reduces the impact of objects with large temperature differences, improving target recognition and the overall sensory experience of the image. This description can be used as a training objective for training a Transformer-based Unet neural network.

[0072] Below, we will introduce how Figure 2 The neural network structure used in the infrared image processing method provided in this embodiment of the invention is shown. This neural network employs a Unet structure, where the Unet structure is symmetrical. The left side is the encoder (…). Figure 2 The left three of the seven squares in the middle can extract input features and encode them; the right side is the decoder. Figure 2 The three rightmost blocks out of the seven squares in the middle can decode the encoded features and output an image. The Unet structure maintains computational efficiency while performing hierarchical multi-scale cascaded representation.

[0073] And, as Figure 2 As shown, UNet uses skip connections in the corresponding stages of downsampling and upsampling to fuse the downsampling output of the corresponding stage (e.g., by concatenation) into the upsampling input.

[0074] The difference between this embodiment of the invention and the conventional convolution-based Unet is that the convolution operator is replaced with a Transformer module. Compared to the convolution-based Unet, the convolution structure has a limited receptive field, and the convolution kernel has difficulty adapting to the input content. However, the Unet built with the Transformer module does not have this problem, as the Transformer module can capture global dependencies and perform adaptive dynamic range stretching.

[0075] And, as Figure 2As shown, a residual connection is set between the input of the original 16-bit infrared image and the output 8-bit visualization image data of the Unet network. This allows the 8-bit visualization image data output by the Unet network to be added to the original 16-bit infrared image as a residual image (i.e., residual connection), resulting in the output of 8-bit stretched visualization image data. Through the residual connection, the Unet network can learn the difference between the 16-bit infrared image data and the 8-bit visualization image data. This additive operation corrects the 16-bit infrared image data, restoring it to the 8-bit visualization image data.

[0076] And, as Figure 2 As shown, in each Transformer module, the input tensor X is encoded using self-attention before being output. Assume the feature map vector input to this Transformer module is... The process of the Transformer performing self-attention encoding and output is as follows:

[0077] Step 1: For the input feature map vector Generate a query vector, a key vector, and a value vector. These three vectors are obtained by multiplying the feature map vectors by weight matrices, where each weight matrix is ​​obtained by training a neural network.

[0078] Step 2: In this embodiment, a dot product model is used to calculate the attention score matrix, including: reshaping the vector matrices of Query and Key, and then performing a dot product to generate an attention map. The size of this attention map is...

[0079] Therefore, this application optimizes the Transformer module, designing it as follows: To process, rather than traditional Its size and processing time are several times the square root of the traditional Transformer module.

[0080] Step 3: Perform a dot product between the attention map and the value vector matrix, then reshape the image to restore its original size.

[0081] Step 4: Add the feature map to the input feature map X (this can be cascaded) to obtain the feature map output by the Transformer module.

[0082] The Transformer module emphasizes local spatial context information while implicitly modeling the global contextual relationships between pixels. This is particularly helpful in infrared scenes where images appear dark due to locally high-emissivity objects. Steps one through four above can be expressed as follows:

[0083]

[0084] Attention(Q,K,V)=V·softmax(Q·K / a)

[0085] In the above formula, 'a' is a learnable scaling parameter used to control the size of the dot product of K and Q before applying the softmax function, and W... p These are parameters that can be learned through training, and Softmax is the normalization function.

[0086] The following is combined with Figure 1 The flowchart shown and Figure 2 The network structure shown illustrates the infrared image processing method provided in this embodiment of the invention:

[0087] First, a historical sample dataset is pre-built, and the training is performed using this historical sample dataset. Figure 2 The neural network shown.

[0088] In this embodiment of the invention, each historical sample data in the historical sample dataset consists of an image pair composed of a historical 16-bit infrared image data and a historical 8-bit visualization image data; wherein, the historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, the filtering range of the narrowband filter being 3.2um-3.25um; the historical 16-bit infrared image data is then processed into the historical 8-bit visualization image data. The specific steps for training the Unet neural network are as follows:

[0089] Step 1: Acquire historical infrared image data: Using a cooled infrared camera, equipped with a 3.2um-3.25um filter at the front end, capture video sequences of 10 scenes and export historical 16-bit infrared image data in encrypted format.

[0090] Because it is equipped with a narrowband filter, the signal is very small, and the image after decryption is as follows: Figure 3a As shown, its histogram is as follows: Figure 3b As shown, within the dynamic range (grayscale value 0-65536) of a 16-bit infrared image, the effective grayscale value range is only a few hundred grayscale levels.

[0091] Step 2: Generate historical 8-bit visual image data, and use the image pairs formed by these pairs and historical 16-bit infrared image data as historical sample data in the historical sample dataset.

[0092] The decrypted historical 16-bit infrared image data is stretched to 8-bit visual image data, i.e., grayscale values ​​of 0-255, to make it visible. The stretched historical 8-bit visual image is used as the target image.

[0093] Examples of stretching results are as follows: Figure 3c As shown, the target image is obtained through this stretching process.

[0094] The image pairs formed by the historical 16-bit infrared image data and the historical 8-bit visual image data constitute a sample data.

[0095] Following this method, the required number of historical sample data are generated, forming a historical sample dataset.

[0096] Step 3: Training using the historical sample dataset Figure 2 The neural network shown.

[0097] The training methods can include supervised learning, unsupervised learning, methods based on optimizing the loss function such as gradient descent, or adversarial training with a discriminator.

[0098] Secondly, the decrypted 16-bit infrared image data (image data to be processed) obtained through the aforementioned cooled infrared camera is input into the trained... Figure 2 The neural network shown is used for inference to obtain the current 8-bit stretched visualization image data, and then output it.

[0099] Specifically, such as Figure 2 The input data of the Unet network shown is decrypted 16-bit infrared image data captured by a cooled infrared camera, and the output of the Unet network is 8-bit visualized image data.

[0100] As shown above, through multi-level U-net structures with different spatial resolutions, the embodiments of the present invention obtain feature space representations at different scales.

[0101] like Figure 4a and Figure 4b The infrared image processing method provided by the embodiments of the present invention is shown, along with comparison results and indexes with other technical methods. Figure 4aThe diagram shows a comparison of the results of processing five images using different techniques. Groundtruth corresponds to the image before processing; CLAHE indicates processing using a contrast-limited adaptive histogram equalization method; HE indicates processing using a histogram equalization method; LGT indicates processing using a linear grayscale transformation method; ILGT indicates processing using an improved linear grayscale transformation method; and Ours indicates processing using the method provided in this application. Figure 4a A comparison of the various images shows that the 8-bit image obtained by processing using the method provided in this application has the highest similarity to the original image, the lowest processing distortion, and solves the problem of the image being too dark due to local high-radiation objects. See also [link to other resources]. Figure 4b The comparison indicators are shown.

[0102] SSIM stands for Structural Similarity, a metric used to measure the similarity between two images. It can be seen that the 8-bit image obtained by the method provided in this application, compared to the Ground truth image (corresponding to the image before processing), has the highest SSIM score, indicating the highest similarity compared to the original image. PSNR, or Peak Signal-to-Noise Ratio, is a metric used to measure distortion after image processing; a higher PSNR indicates less distortion. It can also be seen that the 8-bit image obtained by the method provided in this application, compared to the Ground truth image (corresponding to the image before processing), has the highest PSNR score, indicating the lowest distortion compared to the original image.

[0103] This invention also provides an infrared image data processing device, as described in the following embodiments. Since the principle by which this device solves the problem is similar to that of the infrared image data processing method, the implementation of this device can be referred to the implementation of the infrared image data processing method, and repeated details will not be elaborated further.

[0104] Figure 5 This is a schematic diagram of the infrared image processing device in an embodiment of the present invention, as shown below. Figure 5 As shown, the device includes:

[0105] Acquisition unit 01 is used to acquire current 16-bit infrared image data through a cooled infrared camera equipped with a narrowband filter;

[0106] Image processing unit 02 is used to input the current 16-bit infrared image data into a Transformer-based Unet neural network to obtain 8-bit visualization image data. The Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data.

[0107] In one embodiment, the infrared image processing apparatus provided by the present invention may further include: a training unit for constructing a historical sample dataset and using the historical sample dataset to train a Transformer-based Unet neural network; wherein each historical sample data in the historical sample dataset consists of an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data; wherein the historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, the filtering range of the narrowband filter being 3.2um-3.25um; and the historical 16-bit infrared image data is processed into the historical 8-bit visualization image data.

[0108] In one embodiment, the Transformer-based Unet neural network includes: a downsampling portion composed of each Transformer module in sequence, and an upsampling portion composed of each Transformer module in sequence.

[0109] In one embodiment, when the Transformer module is executed, it includes the following operations:

[0110] For the input infrared feature map vector Generate a query vector, a key vector, and a value vector;

[0111] After reshaping the vector matrices of Query and Key using a row-reshape operation, a dot product is performed to generate an attention map. The size of this attention map is [size missing].

[0112] The attention map is multiplied by the value vector matrix, and then reshaped using a row reshape operation to restore the original image size.

[0113] Size The original image is added to the input infrared feature map vector X to obtain the feature map vector output by the Transformer module.

[0114] In one embodiment, in the Transformer-based Unet neural network:

[0115] Skip connections are used for fusion at the corresponding stages of downsampling and upsampling;

[0116] And / or, a residual connection is provided between the input and output of the neural network.

[0117] In one embodiment, the narrowband filter has a filtering range of 3.2µm-3.25µm.

[0118] Figure 6 This is a schematic structural diagram of a computing device 900 provided in an embodiment of this application. The computing device 900 includes: a processor 910, a memory 920, and a communication interface 930.

[0119] It should be understood that Figure 6 The communication interface 930 in the computing device 900 shown can be used to communicate with other devices.

[0120] The processor 910 can be connected to the memory 920. The memory 920 can be used to store the program code and data. Therefore, the memory 920 can be a storage unit inside the processor 910, an external storage unit independent of the processor 910, or a component that includes both the storage unit inside the processor 910 and the external storage unit independent of the processor 910.

[0121] Optionally, the computing device 900 may also include a bus. The memory 920 and communication interface 930 can be connected to the processor 910 via the bus. The bus can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc.

[0122] It should be understood that in the embodiments of this application, the processor 910 may be a central processing unit (CPU). The processor may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor. Alternatively, the processor 910 may employ one or more integrated circuits to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0123] The memory 920 may include read-only memory and random access memory, and provides instructions and data to the processor 910. A portion of the processor 910 may also include non-volatile random access memory. For example, the processor 910 may also store device type information.

[0124] When the computing device 900 is running, the processor 910 executes the computer execution instructions in the memory 920 to perform the operation steps of the above method.

[0125] It should be understood that the computing device 900 according to the embodiments of this application can correspond to the corresponding subject in executing the methods according to the various embodiments of this application, and the above and other operations and / or functions of each module in the computing device 900 are respectively for implementing the corresponding processes of the methods of this embodiment. For the sake of brevity, they will not be described in detail here.

[0126] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0127] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0128] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0129] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0130] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0131] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0132] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, is used to perform the above-described method, which includes at least one of the schemes described in the above embodiments.

[0133] The computer storage medium in this application embodiment can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium can be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0134] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0135] The program code contained on a computer-readable medium may be transmitted using any suitable medium, including, but not limited to, wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0136] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as "C" or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0137] This invention also provides a computer program product, which includes a computer program that, when executed by a processor, implements the above-described infrared image data processing method.

[0138] In this embodiment of the invention, the infrared image data processing scheme, compared with existing technologies that cannot clearly visualize infrared images, achieves clear image visualization of narrowband infrared image data by: obtaining current 16-bit infrared image data using a cooled infrared camera equipped with a narrowband filter; inputting the current 16-bit infrared image data into a Transformer-based Unet neural network to obtain current 8-bit visualized image data; the Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset, where each historical sample data includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualized image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualized image data is obtained by processing the historical 16-bit infrared image data.

[0139] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0140] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0141] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0142] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0143] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for processing infrared image data, characterized in that, include: The current 16-bit infrared image data is obtained by using a cooled infrared camera equipped with a narrowband filter; The current 16-bit infrared image data is input into the Transformer-based Unet neural network to obtain the current 8-bit visualization image data. The Transformer-based Unet neural network is pre-trained using a pre-built historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data. The Transformer-based Unet neural network includes: a downsampling section composed of Transformer modules in sequence, and an upsampling section composed of Transformer modules in sequence; skip connections are used for fusion at corresponding stages of downsampling and upsampling; and / or, residual connections are provided between the input and output of the neural network; when the Transformer module is executed, it includes the following operations: for the input infrared feature map vector Generate a query vector, a key vector, and a value vector; reshape the Query and Key vector matrices, then perform a dot product to generate an attention map, the size of which is [value missing]. The attention map is multiplied by the vector matrix of Value, and then reshaped to restore the original image size. ; the size is The original image is added to the input infrared feature map vector X to obtain the feature map vector output by the Transformer module. .

2. The method as described in claim 1, characterized in that, The narrowband filter has a filtering range of 3.2um-3.25um.

3. An infrared image data processing device, characterized in that, include: The acquisition unit is used to acquire current 16-bit infrared image data through a cooled infrared camera equipped with a narrowband filter; An image processing unit is used to input the current 16-bit infrared image data into a Transformer-based Unet neural network to obtain the current 8-bit visualization image data. The Transformer-based Unet neural network is pre-trained using a pre-constructed historical sample dataset. Each historical sample data in the historical sample dataset includes an image pair consisting of a historical 16-bit infrared image data and a historical 8-bit visualization image data. The historical 16-bit infrared image data is obtained using a cooled infrared camera equipped with a narrowband filter, and the historical 8-bit visualization image data is obtained by processing the historical 16-bit infrared image data. The Transformer-based Unet neural network includes: a downsampling section composed of Transformer modules in sequence, and an upsampling section composed of Transformer modules in sequence; skip connections are used for fusion at corresponding stages of downsampling and upsampling; and / or, residual connections are provided between the input and output of the neural network; when the Transformer module is executed, it includes the following operations: for the input infrared feature map vector Generate a query vector, a key vector, and a value vector; reshape the Query and Key vector matrices, then perform a dot product to generate an attention map, the size of which is [value missing]. The attention map is multiplied by the vector matrix of Value, and then reshaped to restore the original image size. ; the size is The original image is added to the input infrared feature map vector X to obtain the feature map vector output by the Transformer module. .

4. The apparatus as described in claim 3, characterized in that, The narrowband filter has a filtering range of 3.2um-3.25um.

5. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 2.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method of any one of claims 1 to 2.

7. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the method of any one of claims 1 to 2.