Thermal image depth estimation method and device

The thermal image depth estimation method addresses the challenges of high-cost LiDAR data by using a neural network trained on thermal images with specific loss functions, achieving accurate depth estimation and edge preservation.

WO2026141769A1PCT designated stage Publication Date: 2026-07-02THERMOEYE INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
THERMOEYE INC
Filing Date
2025-01-21
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Monocular depth estimation in computer vision and robotics faces challenges due to the high cost of LiDAR data collection and the sparse depth information it provides, leading to inaccurate and dense depth estimation.

Method used

A thermal image depth estimation method using a neural network model trained on a thermal image dataset, employing normalization, preprocessing, and multiple loss functions such as Scale-Shift Invariant Loss, Gradient Matching Loss, and Thermal Edge Loss to enhance depth estimation accuracy.

Benefits of technology

Accurately estimates depth from a single thermal image, achieving less than a meter error in pixel-level distance estimation while preserving thermal edges and maintaining consistent predictions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025001173_02072026_PF_FP_ABST
    Figure KR2025001173_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed are a thermal image depth estimation method and device. According to the present invention, provided is a thermal image depth estimation device comprising a processor, and a memory stored in the processor, wherein the memory stores program instructions executed by the processor so as to: develop a prototype neural network model by using a pre-collected thermal image dataset; train the neural network model on the basis of a preset loss; and optimize and distribute the trained neural network model according to an open neural network exchange (onnx) format.
Need to check novelty before this filing date? Find Prior Art

Description

Thermal imaging depth estimation method and device

[0001] The present invention relates to a thermal image depth estimation method and apparatus, and more specifically, to a monocular depth estimation method and apparatus that measures the depth for each pixel based on a single 2D image.

[0002] This invention is the result of the Agency for Defense Development's research project title: Civil-Military Technology Cooperation (R&D) (Ministry of Trade, Industry and Energy, Defense Acquisition Program Administration, Ministry of Science and ICT), research task title: Development of a domestically produced thermal imaging-visible light fusion edge camera for robots utilizing thermal imaging-visible light image compression and delay compensation technology.

[0003] In recent years, monocular depth estimation has become an important component in the fields of computer vision and robotics, along with applications such as autonomous driving and augmented reality, due to its potential to replace LiDAR.

[0004] Many studies have shown that monocular depth estimation performance can be improved through supervised learning using LiDAR as ground truth data.

[0005] However, in supervised learning, there is a problem in that data collection is difficult due to the high cost of LiDAR, and accurate and dense depth information cannot be estimated due to the sparse depth information from LiDAR.

[0006] To address this problem, researchers proposed a self-supervised learning method that does not require actual depth information throughout the learning process.

[0007] Many researchers have proposed various self-supervised learning-based monocular depth estimation methods, and while the performance gap between supervised and self-supervised learning is narrowing, there is still a problem with low depth estimation accuracy.

[0008] To solve the problems of the aforementioned prior art, the present invention proposes a thermal image depth estimation method and apparatus capable of accurately estimating depth using a single thermal image even in various environments.

[0009] To achieve the above-mentioned purpose, according to one embodiment of the present invention, a thermal image depth estimation device is provided, comprising: a processor; and a memory stored in the processor, wherein the memory stores program instructions executed by the processor to develop a prototype neural network model using a previously collected thermal image dataset, train the neural network model based on a preset loss, and optimize and distribute the trained neural network model according to the Open Neural Network Exchange (ONNX) format.

[0010] The above program instructions can normalize the previously collected thermal image dataset, perform preprocessing including noise removal and resizing on the normalized thermal image dataset, input the thermal image included in the preprocessed thermal image dataset into a neural network model to output a depth map, and postprocess the output depth map.

[0011] The above program instructions can perform depth change processing on the thermal image to accurately capture temperature-based depth features by considering the difference between the thermal image and the RGB image before inputting the thermal image into the neural network model.

[0012] The above program commands can convert the depth of a thermal image into a parallax space (d = 1 / t) using a preset model and normalize it from 0 (closest) to 1 (farthest).

[0013] The above program instructions may use Scale-Shift Invariant Loss and Gradient Matching Loss for training the neural network model.

[0014] The above program instructions may use affine invariant loss to ignore the unknown scale and shift of each sample with the scale shift invariant loss.

[0015] The final loss for the above training may additionally include Thermal Edge Loss and Thermal Contrast Loss to maintain adaptation to thermal conditions.

[0016] According to another aspect of the present invention, a method for estimating thermal image depth is provided, comprising the steps of: developing a prototype neural network model using a pre-collected thermal image dataset; training the neural network model based on a pre-set loss; and optimizing and distributing the trained neural network model in an Open Neural Network Exchange (ONNX) format.

[0017] According to another aspect of the present invention, a computer program stored in a computer-readable recording medium for performing the above-described method is provided.

[0018] According to the present invention, there is an advantage in that monocular depth estimation of a thermal image can be accurate.

[0019] FIG. 1 is a flowchart illustrating a process for estimating thermal image depth according to a preferred embodiment of the present invention.

[0020] FIG. 2 is a diagram illustrating the prototype development process according to the present embodiment.

[0021] Figure 3 is a diagram illustrating the inference process of a neural network model according to the present embodiment.

[0022] Figure 4 is a diagram illustrating an exemplary inference using a neural network model.

[0023] Figure 5 is a diagram showing the gradient matching loss according to the present embodiment.

[0024] FIG. 6 is a diagram illustrating the configuration of a thermal image depth estimation device according to the present embodiment.

[0025] Figure 7 shows the monocular depth estimation performance according to the present embodiment.

[0026] The present invention is capable of various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the invention to specific embodiments, and it should be understood that the invention includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the invention.

[0027] The terms used herein are merely for describing specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as “comprising” or “having” are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0028] Furthermore, the components of the embodiments described with reference to each drawing are not limited to the respective embodiments and may be implemented to be included in other embodiments within the scope of maintaining the technical spirit of the present invention. It is also obvious that multiple embodiments may be re-implemented as a single embodiment that integrates multiple embodiments, even if a separate description is omitted.

[0029] Furthermore, in the description referring to the attached drawings, identical components are assigned the same or related reference numerals regardless of drawing symbols, and redundant descriptions thereof are omitted. In describing the present invention, if it is determined that a detailed description of related prior art could unnecessarily obscure the essence of the present invention, such detailed description is omitted.

[0030]

[0031] The present embodiment relates to a deep learning-based monocular depth estimation method for measuring accurate pixel-level distances from a thermal image.

[0032] This embodiment proposes a method for generating an absolute distance map, converting it into a high-quality 3D point cloud, and optimizing it so that it can be efficiently distributed to embedded Neural Processing Unit (NPU) hardware.

[0033] FIG. 1 is a flowchart illustrating a process for estimating thermal image depth according to a preferred embodiment of the present invention.

[0034] Referring to FIG. 1, the thermal image depth estimation process according to the present embodiment may include prototype development (step 100), training (step 102), optimization (step 104), and deployment (step 106).

[0035] FIG. 2 is a diagram illustrating a prototype development process according to the present embodiment, and the prototype development process according to the present embodiment may include normalization (step 200), preprocessing (step 202), inputting the preprocessed data into a neural network model for inference (step 204), and postprocessing (step 206) of a previously collected thermal image dataset.

[0036] A thermal imaging dataset is data containing thermal images and depth information for each thermal image, which can be collected in advance by using existing data from various sources or by directly capturing using dedicated equipment.

[0037] Step 200 is to normalize the previously collected thermal images, and Step 202 is to perform processes such as noise removal and resizing on the normalized thermal images. Preprocessing improves the quality of the data input to the neural network model.

[0038] FIG. 3 is a diagram showing the inference process of a neural network model according to the present embodiment, and FIG. 4 is a diagram exemplarily illustrating inference using a neural network model.

[0039] The neural network model according to the present embodiment is composed of a multi-head based transformer structure in step 204 and outputs a depth map using preprocessed thermal image data as input.

[0040] The quality of the depth map output in this way is improved through a post-processing process.

[0041] To do this, you can write prototype model code using Python, machine learning libraries such as TensorFlow and PyTorch, and computer vision libraries such as OpenCV.

[0042] To ensure accurate depth estimation, a high-quality thermal image dataset is critical for effective neural network model training and performance. In this embodiment, considering that the thermal images included in the dataset differ from RGB images, depth variation processing (Handling Depth Variability in Thermal Data) is performed on the thermal images to accurately capture temperature-based depth features.

[0043] The TEACHER model is used for processing depth changes in thermal image data.

[0044] The TEACHER model starts with a pre-trained Dino-V2 encoder and then fine-tunes it on a thermal imaging dataset. Due to depth variations between thermal images, depth is converted into a disparity space (d = 1 / t) and normalized from 0 (nearest) to 1 (farthest).

[0045] This normalization aligns the entire thermal imaging dataset, enabling consistent parallax prediction in thermal images.

[0046] The loss functions used in the training process of the neural network model according to the present embodiment are Scale-Shift Invariant Loss and Gradient Matching Loss.

[0047] Using scale-shift-invariant loss can cause problems when the predicted depth and actual depth differ depending on the magnitude or shift factor. For example, if the actual depths are 1, 0.5, and 0.1 but the predictions are 0.9, 0.6, and 0.3, the relationship is similar but not consistent. To ensure that magnitude and shift do not affect the loss, the depth map must be aligned before applying the mean squared error.

[0048] To ignore the unknown scale and shift of each sample, we adopt the Affine-Invariant Loss as follows.

[0049]

[0050] Here and and are the predicted value and the actual value, respectively, and p is the Affine-Invariant Mean Absolute Error, am.

[0051] Here and is a scaled and shifted version of the predicted and actual values, and It is as follows.

[0052]

[0053] t(d) and s(d) are used to align the predicted and actual values ​​as shown below to have a single unit without transformation.

[0054]

[0055] Figure 5 is a diagram showing the gradient matching loss according to the present embodiment.

[0056] Referring to Fig. 5, the gradient matching loss preserves detail by aligning the vertical / horizontal gradients between the predicted depth map and the actual depth map. Gradients are calculated along the x-axis and y-axis for both maps, and a loss is applied at the gradient level.

[0057] This allows neural network models to better capture fine features such as edges, which is important for object detection and segmentation tasks.

[0058] The final loss combines the scale-shift invariant loss and the multi-scale gradient matching loss. The scale-shift invariant loss learns relative depth, while the gradient matching loss preserves sharp edges and fine details. In the training phase, the column condition loss and the base loss are coupled to maintain the neural network model's adaptation to column conditions.

[0059] Here, thermal condition loss may include thermal edge loss and thermal contrast loss.

[0060] FIG. 6 is a diagram illustrating the configuration of a thermal image depth estimation device according to the present embodiment.

[0061] As illustrated in FIG. 6, the thermal image depth estimation device according to the present embodiment may include a processor (600) and a memory (602).

[0062] Here, the processor (600) may include a CPU (central processing unit) capable of executing a computer program or other virtual machines.

[0063] The memory (602) may include a non-volatile storage device such as a fixed hard drive or a removable storage device. The removable storage device may include a compact flash unit, a USB memory stick, etc. The memory (602) may also include volatile memory such as various random access memory and may be defined as a computer-readable recording medium.

[0064] In the memory (602) according to the present embodiment, program instructions are stored for developing a prototype neural network model using a pre-collected thermal image dataset, training the neural network model based on a pre-set loss, and optimizing and distributing the trained neural network model according to the Open Neural Network Exchange (ONNX) format.

[0065] Figure 7 shows the monocular depth estimation performance according to the present embodiment.

[0066] Referring to Fig. 7, the input is a normalized column frame showing a pedestrian and a background scene, and the output is a color-coded, visualized depth map. Here, the depth map is expressed in metric units.

[0067] Pixel-level comparison is performed using a 3x3 region of interest, with a ground measured distance of 11m (consistent across the entire region), a predicted distance of 9.8m - 11.4m, and an error range of 0.6m - 1.0m, and the monocular depth estimation model according to the present embodiment achieves accurate depth estimation of less than a meter while maintaining thermal edge preservation and consistent prediction.

[0068] The embodiments of the present invention described above are disclosed for illustrative purposes only, and those skilled in the art with ordinary knowledge of the present invention may make various modifications, changes, and additions within the spirit and scope of the present invention, and such modifications, changes, and additions should be considered to fall within the scope of the following claims.

Claims

1. As a thermal imaging depth estimation device, processor; and It includes memory stored in the above processor, The above memory is, Develop a prototype neural network model using a pre-collected thermal image dataset, and The above neural network model is trained based on a preset loss, and To optimize and distribute the above-mentioned trained neural network model in the Open Neural Network Exchange (ONNX) format, A thermal image depth estimation device storing program instructions executed by the above processor.

2. In Paragraph 1, The above program instructions are, Normalize the above-mentioned pre-collected thermal image dataset, and Preprocessing including noise removal and scaling is performed on the above normalized thermal image dataset, and Thermal images included in the above preprocessed thermal image dataset are input into a neural network model to output a depth map, and A thermal image depth estimation device that post-processes the depth map output above.

3. In Paragraph 1, The above program instructions are, A thermal image depth estimation device that performs depth change processing on a thermal image to accurately capture temperature-based depth features by considering the differences between the thermal image and an RGB image before inputting the thermal image into the neural network model.

4. In Paragraph 3, The above program instructions are, A thermal image depth estimation device that uses a preset model to convert the depth of a thermal image into a parallax space (d = 1 / t) and normalize it from 0 (closest) to 1 (farthest).

5. In Paragraph 1, The above program instructions are, A thermal image depth estimation device using Scale-Shift Invariant Loss and Gradient Matching Loss for training the above neural network model.

6. In Paragraph 5, The above program instructions are, A thermal imaging depth estimation device that uses affine invariant loss to ignore unknown scale and movement of each sample with the above scale shift invariant loss.

7. In Paragraph 5, A thermal image depth estimation device in which the final loss for the above training additionally includes Thermal Edge Loss and Thermal Contrast Loss to maintain adaptation to thermal conditions.

8. A method for estimating thermal image depth in a device including a processor and memory, A step of developing a prototype neural network model using a pre-collected thermal image dataset; A step of training the above neural network model based on a preset loss; and A thermal image depth estimation method comprising the step of optimizing and distributing the above-mentioned trained neural network model in the Open Neural Network Exchange (ONNX) format.

9. In the 8th, The above development step is, A thermal image depth estimation method that performs depth change processing on a thermal image to accurately capture temperature-based depth features by considering the differences between the thermal image and an RGB image before inputting the thermal image into a neural network model.

10. A computer program stored on a computer-readable recording medium that performs the method according to paragraph 8.