A method for three-dimensional depth analysis of intraluminal endoscopic lipids based on physical constraints and neural interpretable mapping

By introducing an optical attenuation coefficient and a non-negative/monotonic physical prior, and combining a saliency map with the optical attenuation coefficient, the problems of coarse output granularity, three-dimensional inconsistency, and physical uninterpretability in lipid recognition in OCT imaging are solved. This achieves precise localization of the lipid core and consistent three-dimensional reconstruction, making it suitable for clinical interventional diagnosis and treatment.

CN121998976BActive Publication Date: 2026-06-16JIAXING RES INST ZHEJIANG UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JIAXING RES INST ZHEJIANG UNIV
Filing Date
2026-04-08
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing OCT imaging technology suffers from problems such as coarse output granularity, three-dimensional inconsistency, physical uninterpretability, and sensitivity to noise artifacts in lipid identification. It is difficult to provide stable, accurate, interpretable, and three-dimensionally consistent quantitative analysis results of lipid core, which cannot meet the clinical needs of precise interventional diagnosis and treatment.

Method used

By employing a method based on physical constraints and neural interpretable mapping, and by introducing an optical attenuation coefficient and a non-negative/monotonic physical prior, combined with a saliency map and the optical attenuation coefficient, the three-dimensional distribution of lipids is reconstructed, and the upper and lower boundaries and volume of lipids are obtained, achieving precise localization in the depth direction and consistency in three-dimensional space.

🎯Benefits of technology

It enhances the physical consistency of deep localization, improves the coherence and robustness of three-dimensional space, provides an interpretable network decision-making process, and is suitable for clinical deployment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121998976B_ABST
    Figure CN121998976B_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on physical constraint and neural explainable mapping's intraluminal endoscopic lipid three-dimensional depth analysis method. Including according to original intensity image input to pre-trained classification model and be handled to obtain the lipid probability vector corresponding to each frame original intensity image, according to the feature map of last convolutional layer output of classification model, obtain initial class activation saliency map, according to original intensity image, obtain optical attenuation coefficient, according to optical attenuation coefficient and initial class activation saliency map, obtain lipid saliency;According to original intensity image and lipid saliency, respectively obtain lipid upper boundary and lipid lower boundary, according to lipid upper boundary and lipid lower boundary, obtain thickness, then reconstruct lipid three-dimensional distribution, according to lipid three-dimensional distribution, further obtain lipid volume, minimum thickness and its spatial position in circumferential and retraction direction.The application has the advantages of enhanced physical consistency, strong explainability, high three-dimensional stability and engineering deployment friendly.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing and computer vision technology, and specifically relates to a method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping. Background Technology

[0002] As a leading endoscopic imaging technology, optical coherence tomography (OCT) has become the gold standard imaging tool for identifying vulnerable plaques (such as lipid cores) during coronary interventions due to its high axial resolution at the micrometer level. However, the inherent physical characteristics of OCT imaging pose significant challenges to automated lipid identification: 1) Strong speckle noise blurs tissue boundaries, resulting in a low signal-to-noise ratio; 2) Signal decays approximately exponentially with depth, making deep tissue features weak and difficult to distinguish from noise; 3) Shadows caused by highly reflective structures such as calcification result in downstream signal loss and artifacts. These factors collectively lead to OCT image interpretation being highly dependent on physician experience and exhibiting significant inter-observer variability.

[0003] In recent years, significant progress has been made in OCT lipid recognition methods based on deep learning. However, existing methods suffer from the following key limitations and defects, hindering their clinical translation and reliable application:

[0004] 1. Coarse-grained recognition and lack of fine-grained depth localization: Existing methods are mostly based on image patch or whole-frame classification, and their output is usually a region-level probability (such as the probability of plaque presence) or a pixel-level binary segmentation map. These methods cannot stably and accurately output the upper and lower boundaries of the lipid core along the depth direction of the blood vessel wall, while boundary information (especially the thickness of the thinnest fibrous cap) is the most critical clinical indicator for assessing plaque vulnerability. The root cause is that the network's feature learning in the depth dimension lacks clear anatomical and physical constraints.

[0005] 2. Poor 3D spatial consistency: Existing methods typically analyze each frame of the retracted image sequence independently, ignoring the natural continuity of vascular structures and the spatial continuity of lipid plaques in the retraction direction (longitudinal). This leads to unreasonable jitter, breaks, or abrupt changes in the identification results between adjacent frames, i.e., the "cross-frame inconsistency" problem, making it impossible to form a smooth and coherent 3D lipid distribution model, thus reducing the reliability and visualization effect of the results.

[0006] 3. Neglecting Imaging Physics Priors and Poor Interpretability: Existing neural networks, as "black box" models, lack physical interpretability in their decision-making processes. Crucially, they fail to explicitly incorporate the core physical law of approximately exponential decay of OCT signals with depth. Therefore, the activation-like maps (CAMs) generated by the networks to explain decisions often exhibit physically illogical responses in the depth direction, such as activation in high-attenuation noise regions or weak or absent activation in real lipid regions due to signal decay (i.e., "depth drift"). This reduces the credibility of CAMs and makes them difficult to corroborate with clinical physical understanding.

[0007] 4. Insufficient robustness to imaging artifacts and noise: Despite the use of data augmentation and other techniques, existing methods are still insufficient in modeling the inherent speckle noise, occlusion shadows, and signal attenuation in OCT. In areas with poor image quality (such as deep or shadowed areas), model predictions are prone to isolated false positives or false negatives, and stability needs to be improved.

[0008] In summary, the shortcomings of existing technologies can be attributed to the following: in the complex imaging environment of OCT, existing deep learning lipid recognition methods suffer from problems such as coarse output granularity, three-dimensional inconsistency, physical uninterpretability, and sensitivity to noise artifacts. Therefore, they are unable to provide stable, accurate, interpretable, and three-dimensionally consistent lipid core quantitative analysis results, and cannot fully meet the clinical needs of precise interventional diagnosis and treatment. Summary of the Invention

[0009] To address the problems existing in the background technology, this invention provides a three-dimensional depth analysis method for endoscopic lipids based on physical constraints and neural interpretable mapping. This method solves the technical problems of coarse output granularity, three-dimensional inconsistency, physical uninterpretability, and sensitivity to noise artifacts in the prior art. It provides an interpretable intravascular lipid identification method that can achieve precise depth direction positioning, consistent three-dimensional spatial optimization, and fusion of imaging physical priors.

[0010] The technical solution adopted in this invention is:

[0011] I. A method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping:

[0012] S1. OCT imaging is used to obtain the original intensity image sequence of B-scan acquired in the order of retraction within the cavity. Each frame of the original intensity image in the sequence is input into a pre-trained classification model for processing to obtain the lipid probability vector corresponding to each frame of the original intensity image.

[0013] S2. Obtain the initial class activation saliency map based on the feature map output by the last convolutional layer of the pre-trained classification model, obtain the optical attenuation coefficient based on the original intensity image, and obtain the final lipid saliency based on the optical attenuation coefficient and the initial class activation saliency map.

[0014] S3. Obtain the upper and lower lipid boundaries of each A-line based on the original intensity image and lipid saliency, and obtain the thickness based on the upper and lower lipid boundaries.

[0015] S4. Reconstruct the three-dimensional distribution of lipids based on the upper and lower boundaries of lipids and the saliency of lipids. Then, obtain the lipid volume, minimum thickness and spatial position in the circumferential and retraction directions based on the three-dimensional distribution of lipids.

[0016] Step S1 specifically involves: obtaining the original intensity image sequence of B-scan in the order of retraction. The original intensity image of each frame in the sequence The raw intensity image for each frame is obtained by inputting it into a pre-trained classification model. Lipid probabilities of all corresponding circumferential A-lines Lipid probabilities of all circumferential A-lines The lipid probability vector .

[0017] Step S2 specifically involves:

[0018] S21. Denote the feature map output by the last convolutional layer of the pre-trained classification model as the feature map. Calculate feature map The gradient of the channel k is calculated and its global average is taken along the depth direction h to obtain the importance weight of each channel k for A-line classification. Feature map Weighted summation is performed along the channel dimension, and finally, a ReLU activation function is applied to obtain the depth saliency map. Then, for depth saliency maps One-dimensional median or Gaussian smoothing is performed along the depth direction h, followed by linear normalization, to obtain the initial class activation saliency map. .

[0019] S22, For the original intensity image For each position (h, w), take the points within the neighborhood window [hr, h+r] along the depth direction of (h, w). Using Huber loss as a robustness criterion, a linear model is fitted. And based on the fitted slope Obtain the initial estimate of the local optical attenuation coefficient Initial estimate of the local optical attenuation coefficient The final optical attenuation coefficient is obtained by applying a one-dimensional total variational smoothing regularization along the depth direction h. .

[0020] S23. Based on the initial class activation saliency map and optical attenuation coefficient The final lipid significance can be obtained using a product-normalization-gating method or an optimization-based iterative fine-tuning method. .

[0021] The product-normalization-gating method is used to obtain lipid significance according to the following formula. :

[0022]

[0023]

[0024]

[0025] in, For lipid significance; For physically consistent depth significance; For indicator functions; Let w be the lipid classification probability of the w-th A-line in the z-th frame; This is the probability-gated threshold. This is the result of intermediate fusion; Indexed by depth direction; To calculate the index i for all depth directions on the w-th A-line. The maximum value; This is the initial class activation saliency map; Use the Sigmoid activation function; For scaling parameters; All are bias parameters; This is the optical attenuation coefficient.

[0026] The optimization-based iterative fine-tuning method is as follows:

[0027] D1. Based on the significance of the lipids to be optimized and optical attenuation coefficient Constructing alignment loss :

[0028]

[0029] in, For alignment loss; Let KL divergence be a metric. For the Sigmoid function; For scaling parameters; All are bias parameters; This indicates the significance of the lipids to be optimized.

[0030] D2. Based on the linear model and optical attenuation coefficient Construct physical loss :

[0031]

[0032] in, For physical loss; These are the weighting coefficients for the monotonicity constraint terms; For activation functions; To represent the logarithmic intensity image The first-order partial derivative in the depth direction h; The weighting coefficients for the attenuation coefficient smoothing term; It is a norm; Optical attenuation coefficient The gradient along the depth direction h.

[0033] D3. Define the objective function using the following formula:

[0034]

[0035] D4. Initially, the lipid significance to be optimized. Using the initial class activation saliency map As initial values, the optimal solution is then obtained through iterative search of the objective function. The optimal solution is then processed according to the following formula to obtain the final lipid significance. :

[0036]

[0037] in, These are the weighting coefficients for the smoothing term; It is a one-dimensional total variation; For indicator functions; Let w be the lipid classification probability of the w-th A-line in the z-th frame; This is the probability gating threshold.

[0038] Step S3 specifically involves:

[0039] S31. For the original intensity image Taking the derivative along the depth direction h yields the depth gradient map. , to make lipids significant With depth gradient map gradient magnitude By performing a linear combination, the upper boundary sensitive response map is obtained. For each A-line with a fixed circumferential direction w, within a preset search range in the depth direction, find the upper boundary sensitive response map. The depth location of the local maximum point is selected as the upper boundary of the lipid layer corresponding to the A-line. .

[0040] S32, Based on lipid significance Set a significance threshold , will satisfy The regions are marked as candidate lipid core depth regions; the optical attenuation coefficient is calculated. Gradient in the depth direction h By setting a positive threshold , will satisfy The regions are marked as areas where the attenuation coefficient undergoes a positive mutation; the union of the lipid core candidate region and the regions where the attenuation coefficient undergoes a positive mutation is used to form the candidate depth point set for lower boundary detection. .

[0041] S33, using candidate depth point sets The position h in the middle is used as the lower boundary. For each candidate A-line, a one-dimensional energy function is constructed, and dynamic programming or graph cut algorithms are used to solve for the minimized energy function to obtain the lipid lower boundary corresponding to each A-line. .

[0042] S34, lower boundary of lipids Subtract the upper boundary of lipids Obtain thickness .

[0043] The energy function is set according to the following formula:

[0044]

[0045] in, It is an energy function; These are the weighting coefficients for the data items; These are the weighting coefficients for the smoothing term; and These are adjacent in the circumferential direction. and The lower boundary of the lipids corresponding to the two A-lines; For the lower boundary of lipids The significance value of lipids at the location; For the lower boundary of lipids The gradient magnitude of the optical attenuation coefficient at that location. Only in the candidate point set Select from within.

[0046] Step S4 specifically involves:

[0047] S41, Based on the upper boundary of lipids lipid lower boundary And lipid significance Huber loss was used to construct saliency data at the three-dimensional voxel level. and smooth boundary surfaces The variational energy function.

[0048] S42. Discretely solve the variational energy function, and obtain the saliency at the three-dimensional voxel level. and smooth boundary surfaces Based on the saliency of three-dimensional voxel level Or smooth boundary surface The three-dimensional distribution of lipids can be reconstructed in both cases.

[0049] S43. Based on the three-dimensional lipid distribution, calculate the lipid volume, minimum thickness, and spatial position in the circumferential and retraction directions.

[0050] The saliency of three-dimensional voxel level and smooth boundary surfaces The variational energy function is set according to the following formula:

[0051]

[0052]

[0053]

[0054]

[0055] in, Represents a three-dimensional voxel-level saliency volume; The initial observations for the three-dimensional voxel-level saliency volume; Represents a smooth boundary surface; These are the initial observations of the boundary surface; and All are losses using robust Huber architecture; and All are weighting coefficients; It is a first-order smooth; Let be the partial derivative with respect to the retracement direction z; For the second-order difference in the circumferential direction w; This indicates the significance of lipids.

[0056] II. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.

[0057] The beneficial effects of this invention are:

[0058] (1) Enhanced physical consistency: By introducing optical attenuation coefficient and non-negative / monotonic physical prior, physical consistency of depth positioning is achieved, reducing depth drift across devices.

[0059] (2) High interpretability: Combining saliency plots and optical attenuation coefficients provides a visual basis for network decision-making process.

[0060] (3) High three-dimensional stability: Through three-dimensional consistency repair, the coherence and robustness of the pullback direction and circumferential direction are improved.

[0061] (4) Engineering deployment friendly: The classification network module is decoupled from the physical constraint module, which can be independently optimized or accelerated in clinical deployment. Attached Figure Description

[0062] Figure 1 This is a flowchart of the present invention.

[0063] Figure 2 Examples of the final effect of the method of the present invention are shown below; (a) is a schematic diagram of the effect of converting the original intensity image to the Cartesian coordinate system; (b) is the obtained initial class activation saliency map; (c) is a visualization example of the obtained lipid saliency; and (d) is a schematic diagram of the effect of the obtained three-dimensional lipid distribution.

[0064] Figure 3 This is a module diagram of the sequence modeling module used in the embodiment. Detailed Implementation

[0065] The present invention will now be described in more detail with reference to the accompanying drawings and embodiments. However, the present invention is not limited thereto. For those skilled in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications are also considered to be within the scope of protection of the present invention. Contents not described in detail in this specification are prior art known to those skilled in the art.

[0066] like Figure 1 As shown, the endoscopic lipid three-dimensional depth analysis method of this embodiment includes the following steps:

[0067] S1. Obtain the original intensity image sequence of B-scan in the cavity using OCT imaging, acquired in the retraction sequence. The original intensity image of each frame in the sequence The input is processed in a pre-trained and frozen classification model to obtain the lipid probability vector corresponding to each frame of the original intensity image. .like Figure 2 Image (a) shows the original intensity image. A schematic diagram showing the effect of converting to a Cartesian coordinate system.

[0068] Acquire the raw intensity image sequence of B-scan acquired in the retraction order. ,in For the retraction direction frame index. and These are the indices for the depth (radial) direction and the circumferential direction, respectively. This represents the total number of frames in the pullback direction, where each frame in the sequence (the first frame) represents the total number of frames in the pullback direction. (Frame) Original Intensity Image The raw intensity image for each frame is obtained by inputting it into a pre-trained classification model. Lipid probabilities of all corresponding circumferential A-lines Lipid probabilities of all circumferential A-lines The lipid probability vector .

[0069] In practice, the original intensity image The following steps are followed to obtain: several frames of intravascular optical imaging data acquired in the retraction sequence; spectral processing of each frame of optical imaging data according to A-line to obtain several spectrally processed A-line spectra corresponding to each frame of optical imaging data; and FFT processing of all spectrally processed A-line spectra to convert them into depth domain images, which are the original intensity images. .

[0070] S2. Based on the feature map output by the last convolutional layer of the pre-trained classification model. Obtain the initial class activation saliency map Based on the original intensity image Obtaining the optical attenuation coefficient (OAC), based on optical attenuation coefficient and initial class activation saliency map Obtain the final lipid significance .

[0071] S21. Denote the feature map output by the last convolutional layer of the pre-trained classification model as the feature map. ,in Calculate the feature map using the channel index. The gradient of the channel k is calculated and its global average is taken along the depth direction h to obtain the importance weight of each channel k for A-line classification. Feature map Weighted summation is performed along the channel dimension, and finally, a ReLU activation function is applied to obtain the depth saliency map. Then, for depth saliency maps After performing one-dimensional median or Gaussian smoothing along the depth direction h (kernel width preferably 3–7 pixels), the data is then linearly normalized to [0,1] to obtain the initial class activation saliency map. .like Figure 2 Figure (b) shows the initial class activation saliency map obtained. .

[0072] The depth saliency map is obtained by setting the following formula:

[0073]

[0074]

[0075] in, This is a depth saliency map; For activation functions; As weight; Indexed by channel dimension; This is the feature map output by the last convolutional layer of the classification model; Index in the depth direction; The total length in the depth direction; Let be the lipid classification logical value of the w-th A-line in the z-th frame, and be the lipid classification probability of the A-line. Lipid classification logical values ​​obtained without using the Sigmoid function; To find the partial derivative.

[0076] S22, For the original intensity image For each position (h, w) in the depth direction, take the points within the neighborhood window [hr, h+r] (the neighborhood window is centered at the current depth h and has a half-width of r). Using Huber loss as a robustness criterion, a linear model is fitted. And based on the fitted slope Obtain the initial estimate of the local optical attenuation coefficient Initial estimate of the local optical attenuation coefficient The final optical attenuation coefficient is obtained by applying a one-dimensional total variational (TV) smoothing regularization along the depth direction h. .

[0077] Linear model and initial estimate of local optical attenuation coefficient Set it according to the following formula:

[0078]

[0079]

[0080] in, This represents the fitted linear model (original strength). After taking the natural logarithm, in position (logarithmic intensity value at the location). Indicates the fit intercept; Indicates the slope of the fit; The parameter representing the depth direction is ∈[hr, h+r]. This represents the initial estimate of the local optical attenuation coefficient, where No specific meaning. As a whole, it represents the estimated value of the optical attenuation coefficient.

[0081] S23. Based on the initial class activation saliency map and optical attenuation coefficient The final lipid significance can be obtained using a product-normalization-gating method or an optimization-based iterative fine-tuning method. .like Figure 2 (c) represents the significance of the obtained lipids. A visual example diagram.

[0082] The lipid significance obtained in step S23 It provides a visual basis for network decision-making processes and has the advantage of strong interpretability.

[0083] This embodiment provides a product-normalization-gating method, which processes lipids using the following formula to obtain lipid significance. :

[0084]

[0085]

[0086]

[0087] in, For lipid significance; For physically consistent depth significance; This is an indicator function that takes the value 1 when the condition inside the parentheses is true, and 0 otherwise; Let w be the lipid classification probability of the w-th A-line in the z-th frame, which is also the lipid probability vector. The classification probability corresponding to the w-th A-line in the dataset; This is the probability-gated threshold. This is the result of intermediate fusion; Indexed by depth direction; To calculate the index i for all depth directions on the w-th A-line. The maximum value; Meaning and They have the same meaning, only the index variables in the depth direction are different; This is the initial class activation saliency map; Use the Sigmoid activation function; For scaling parameters; All are bias parameters; This is the optical attenuation coefficient.

[0088] This embodiment also provides an optimization-based iterative fine-tuning method, which specifically includes:

[0089] D1. Based on the significance of the lipids to be optimized and optical attenuation coefficient Constructing alignment loss :

[0090]

[0091] in, For alignment loss; Let KL divergence be a metric. For the Sigmoid function; For scaling parameters; All are bias parameters; This indicates the significance of the lipids to be optimized.

[0092] D2. Based on the linear model and optical attenuation coefficient Construct physical loss :

[0093]

[0094] in, For physical loss; These are the weighting coefficients for the monotonicity constraint terms; For activation functions; To represent the logarithmic intensity image The first-order partial derivative (i.e., gradient) in the depth direction h reflects how fast the signal changes with depth; The weighting coefficients for the attenuation coefficient smoothing term; It is a norm; Optical attenuation coefficient The gradient along the depth direction h; To sum over all frames and all circumferential directions; This involves summing over all frames, all circumferential directions, and all depth positions.

[0095] D3. Define the objective function using the following formula:

[0096]

[0097] D4. Initially, the lipid significance to be optimized. Using the initial class activation saliency map As initial values, the optimal solution is then obtained through iterative search of the objective function. The optimal solution is then processed according to the following formula to obtain the final lipid significance. .

[0098]

[0099] in, These are the weighting coefficients for the smoothing term; It is a one-dimensional total variation; This is an indicator function that takes the value 1 when the condition inside the parentheses is true, and 0 otherwise; Let w be the lipid classification probability of the w-th A-line in the z-th frame; This is the probability gating threshold.

[0100] Steps S21-S23 achieve enhanced physical consistency: By introducing an optical attenuation coefficient and a non-negative / monotonic physical prior, physical consistency of depth positioning is achieved, reducing depth drift across devices.

[0101] S3, Based on the original intensity image And lipid significance Obtain the lipid upper boundary of each A-line separately. and lipid lower boundary According to the upper boundary of lipid and lipid lower boundary Obtain thickness .

[0102] S31. For the original intensity image Taking the derivative along the depth direction h yields the depth gradient map. , to make lipids significant With depth gradient map gradient magnitude By performing a linear combination, the upper boundary sensitive response map is obtained. For each A-line with a fixed circumferential direction w, within a preset search range in the depth direction (e.g., 20-50 micrometers downward from the lumen boundary), the upper boundary sensitive response map is searched. The depth location of the local maximum point is selected as the upper boundary of the lipid layer corresponding to the A-line. .

[0103] S32, Based on lipid significance Set a significance threshold , will satisfy The regions are marked as candidate lipid core depth regions; the optical attenuation coefficient is calculated. Gradient in the depth direction h By setting a positive threshold , will satisfy The regions are marked as areas where the attenuation coefficient undergoes a positive mutation; the union of the lipid core candidate region and the regions where the attenuation coefficient undergoes a positive mutation is used to form the candidate depth point set for lower boundary detection. .

[0104] S33, using candidate depth point sets The position h in the middle is used as the lower boundary. For each candidate A-line, a one-dimensional energy function is constructed, and dynamic programming or graph cut algorithms are used to solve for the minimized energy function to obtain the lipid lower boundary corresponding to each A-line. .

[0105] The energy function is set according to the following formula:

[0106]

[0107] in, It is an energy function; These are the weighting coefficients for the data items; These are the weighting coefficients for the smoothing term; and These are adjacent in the circumferential (rotational) direction of the blood vessel cross-section. and The lower boundary of the lipids corresponding to the two A-lines; For the lower boundary of lipids The significance value of lipids at the location; For the lower boundary of lipids The gradient magnitude of the optical attenuation coefficient at that location. In specific implementation, The value can only be taken in the candidate point set. Select from within.

[0108] S34, lower boundary of lipids Subtract the upper boundary of lipids ( ) to obtain thickness The obtained thickness Used to evaluate the local thickness of the lipid layer at each circumferential location (each A-line).

[0109] S4. Based on the upper boundary of lipids lipid lower boundary and lipid significance The three-dimensional distribution of lipids is reconstructed, and the lipid volume, minimum thickness, and spatial position in the circumferential and retraction directions are obtained based on the three-dimensional distribution of lipids.

[0110] S41, Based on the upper boundary of lipids lipid lower boundary And lipid significance Huber loss was used to construct saliency data at the three-dimensional voxel level. and smooth boundary surfaces The variational energy function.

[0111] S42. Discretely solve the variational energy function, and obtain the saliency at the three-dimensional voxel level. and smooth boundary surfaces Based on the saliency of three-dimensional voxel level Or smooth boundary surface The three-dimensional distribution of lipids can be reconstructed in both cases. For example... Figure 2 (d) in the figure is a schematic diagram of the effect of the obtained three-dimensional lipid distribution.

[0112] Regarding the saliency of three-dimensional voxel level and smooth boundary surfaces The variational energy function is set according to the following formula:

[0113]

[0114]

[0115]

[0116]

[0117] in, Represents a three-dimensional voxel-level saliency volume. For three-dimensional voxel-level saliency abbreviation; Represents a smooth boundary surface. Boundary surface abbreviation; and All are initial observations; and All are losses using robust Huber architecture; and All are weighting coefficients; It is a first-order smooth (L1 regularization); Let be the partial derivative with respect to the retracement direction z; This is a second-order difference in the circumferential direction w. In specific implementation, To make lipid significance significant for all frames (retreat direction z) Obtained by stacking.

[0118] According to The lipid upper boundary of each A-line is obtained by arranging them in the order of retraction (z). and lipid lower boundary .

[0119] Specifically, for each frame (each pullback direction z), the corresponding upper boundary of the lipids for each A-line (each circumferential index w) within that frame has been obtained. and lipid lower boundary .

[0120] Therefore: for the upper boundary surface, its initial observation values (For frame z); for the lower boundary surface, its initial observations (For frame z).

[0121] Finally, by arranging the upper and lower boundaries of all frames in the order of retraction, we obtain two two-dimensional surfaces. The initial observation data (one representing the upper surface and one representing the lower surface).

[0122] In practice, when performing discrete solutions, the Alternating Direction Multiplier Method (ADMM), Iterative Shrink Threshold (ISTA), or other convex optimization methods can be used.

[0123] Steps S41-S42 improve the consistency and robustness of the pullback direction and circumferential direction through three-dimensional consistency repair.

[0124] S43. Based on the three-dimensional lipid distribution, calculate the lipid volume, minimum thickness, and its spatial position in the circumferential and retraction directions (circumferential and retraction directions).

[0125] Step S1 is an independent classification network module, and steps S2-S4 are independent physical constraint modules.

[0126] Furthermore, the quantitative indicators extracted from the three-dimensional lipid distribution are key outputs for transforming artificial intelligence algorithms into clinical diagnostic tools, providing doctors with quantitative reference data. Lipid volume is used to quantify plaque burden, minimum thickness is used to locate and assess the thickness of the thinnest fibrous cap, spatial location (circumferential and retardative directions) is used to provide three-dimensional map navigation, and the three-dimensional lipid distribution itself is used for visual diagnosis and doctor-patient communication.

[0127] Furthermore, the pre-trained classification model in this embodiment employs a pre-trained spectral-depth joint modeling network.

[0128] Original intensity image The following steps are performed to obtain the following: Acquire several frames of intravascular optical imaging data acquired in the retraction sequence; perform A-line spectral processing on each frame of optical imaging data to obtain several spectrally processed A-line spectra corresponding to each frame of optical imaging data; perform FFT processing on all spectrally processed A-line spectra to convert them into depth domain images, which are the original intensity images. .

[0129] The process of constructing the dataset is as follows: [The original intensity images are then processed.] The intensity images are transformed into Cartesian coordinates and labeled; each frame of the original intensity image is then processed. Using the corresponding labels as input and the labels as output, an optical imaging dataset is constructed.

[0130] The pre-trained spectral depth joint modeling network consists of a preprocessing module, a physical layer module, a feature extraction module, and a classification module connected in series.

[0131] The preprocessing module performs the following steps:

[0132] For each frame of the original intensity image The inverse FFT process is used to convert the spectrum into several A-line spectra. Each A-line spectrum is then arranged according to its spectral sampling points to construct a spectral vector. The spatial dimension of the spectral vector is then expanded according to the spatial sampling points to obtain the spectral depth representation.

[0133] The processing procedure of the physical layer module is as follows:

[0134] F1. The spectral depth representation is subjected to bandpass filtering and logarithmic transformation in sequence according to the spectral dimensions to obtain the filtered depth features.

[0135] F2. Input the filtered depth features into the depth reconstruction regularization module for processing to obtain a depth-resolved feature image.

[0136] The specific processing steps of the depth reconstruction regularization module are as follows: the filtered depth features are input into the feature encoding unit for learnable mapping to obtain an intermediate representation reflecting local attenuation features; learnable regularization is applied to the intermediate representation according to the preset physical regularization constraints to obtain the regularization result; and finally, the regularization result is mapped back to the depth domain to obtain the depth-resolved features; the physical regularization constraints include at least one of monotonic attenuation, local smoothness, and scattering consistency that conform to the Lambert-Beer law.

[0137] In specific implementations, the feature encoding unit is implemented by convolutional layers (such as 1x1 convolutions) or fully connected layers. Applying learnable regularization to the intermediate representation based on monotonically decaying values ​​can be achieved using learnable depth-oriented convolutional kernels or recurrent neural networks along the depth dimension (such as state-space models like SSM). To ensure that the network output strictly follows the monotonically decaying characteristic of the Lambert-Beer law, this embodiment applies a non-positive constraint to the weight parameters of the depth-oriented convolutional kernels or recurrent neural networks along the depth dimension, or introduces an exponential decay term in the activation function. Applying learnable regularization to the intermediate representation based on local smoothness can be achieved using small-scale isotropic or anisotropic convolutions for smoothing; applying learnable regularization to the intermediate representation based on scattering consistency can be achieved using nonlocal or self-attention mechanisms. Mapping the regularization result back to the depth domain can be achieved using convolutional layers or fully connected layers.

[0138] F3. Based on the intensity image in the Cartesian coordinate system, perform adaptive flattening processing on the obtained depth-resolved feature image to obtain the physical feature map.

[0139] The adaptive flattening process based on feature extraction alignment is as follows:

[0140] H1. The position of the intravascular boundary is obtained by using an intravascular boundary detection algorithm on the intensity image in the rectangular coordinate system.

[0141] In specific implementations, intracavity boundary detection algorithms include rule-based methods based on grayscale thresholds, segmentation methods based on neural networks, and geometric model fitting methods, etc. This invention does not limit the specific implementation method.

[0142] H2. The depth-resolved feature image is truncated at the same position as the intravascular boundary obtained in step H1, so that the corresponding intravascular boundary depth position is obtained for each A-line.

[0143] H3. On each A-line, from the depth position of the cavity boundary. rise( The index represents the circumferential index from a continuous / physical perspective, i.e., the index of the A-line. A candidate depth range is selected in the depth range, and local features related to the tissue structure are calculated within the candidate depth range.

[0144] Local features include the trend of local energy or amplitude variation, first or second-order gradient variation in the depth direction, local variance, skewness and other statistical characteristics.

[0145] H4. Based on the obtained local features, determine the cutoff position for depth selection on each A-line. .

[0146] H5, for each A-line, the depth range The signal within the vessel lumen is cropped as the effective region. The cropped effective region is then mapped to a uniform depth dimension through interpolation or resampling, and the intravascular boundary is aligned to the same row.

[0147] H6. Perform the same processing on all A-lines as in step H5 to obtain the physical feature map.

[0148] The specific processing procedure of the feature extraction module is as follows: the physical feature map obtained in step H6 is sequentially input into the first encoding module, the second encoding module, the third encoding module, and the fourth encoding module for processing. The results processed by the third encoding module and the fourth encoding module are input together into the sequence modeling module for processing to obtain the sequence modeling result. The results processed by the fourth encoding module are also input together into the fourth decoding module for processing. The output of the fourth decoding module and the output of the third encoding module are input together into the third decoding module for processing. The results processed by the third decoding module and the output of the second encoding module are input together into the second decoding module for processing. The results processed by the second decoding module and the results processed by the first encoding module are input together into the first decoding module for processing. The result processed by the first decoding module is used as the output of the feature extraction module.

[0149] The first, second, third, and fourth encoding modules have the same structure, consisting of a concatenated convolutional layer, a batch normalization layer, an activation function, and a max pooling layer. The fourth, third, second, and first decoding modules also have the same structure, consisting of a concatenated upsampling layer, a convolutional layer, a batch normalization layer, and an activation function.

[0150] In specific implementation, the fourth decoding module first processes the sequence modeling result through an upsampling layer. The result of this upsampling is then connected to the result of the fourth encoding module via a skip connection. The result of this skip connection is then sequentially input into a convolutional layer, batch normalization, and activation function for further processing, ultimately outputting the result of the fourth decoding module. Similarly, the third decoding module first processes the output of the fourth decoding module through an upsampling layer. The result of this upsampling is then connected to the output of the third encoding module via a skip connection. The result of this skip connection is then sequentially input into a convolutional layer, batch normalization, and activation function for further processing, ultimately outputting the result of the third decoding module. The second decoding module first processes the output of the third decoding module through an upsampling layer. The result of this upsampling is then connected to the output of the second encoding module via a skip connection. The result of this skip connection is then sequentially input into a convolutional layer, batch normalization, and activation function for further processing, ultimately outputting the result of the second decoding module. Finally, the first decoding module first processes the output of the second decoding module through an upsampling layer. The result of this upsampling is then connected to the output of the first encoding module via a skip connection. The result of this skip connection is then sequentially input into a convolutional layer, batch normalization, and activation function for further processing, ultimately outputting the result of the first decoding module.

[0151] like Figure 3 As shown, the sequence modeling module is set up according to the following formula:

[0152]

[0153] ;

[0154] ;

[0155] ;

[0156] ;

[0157] ;

[0158] in, The output of the sequence modeling module; It is a 1×1 convolution; For reshape operation; The result of processing by the fourth encoding module; The result of processing by the third encoding module; and All are feedforward networks; For element-wise product; and All are gating coefficients; , , , , , , , , , These are the first to tenth computational features (all of which are intermediate computational features); and All of these are Mamba modules.

[0159] The Mamba module is mainly composed of layer normalization, a first linear mapping layer, a depthwise separable convolution, a SiLU activation layer, a state space computation layer (SSM), and a second linear mapping layer connected in series. The input of the layer normalization layer serves as the input of the Mamba module, and the output of the second linear mapping layer serves as the output of the Mamba module.

[0160] The sequence modeling module employs a dual-path architecture: first, the two input features are convolutionally converted into contextual representations and then serialized, before being fed into two Mamba modules to capture long-range dependencies. The core innovation lies in using a feedforward network. / Transformation features from the other path are extracted and then weighted and fused using gating coefficients to achieve bidirectional adaptive interaction. The fused sequence features are then convolved to restore the spatial representation, enhancing the original features as residuals. Finally, the outputs of the two paths are added together, achieving a unified fusion of local and global, self and cross-information. This design replaces traditional attention with a gating cross-information mechanism, maintaining computational efficiency while ensuring information flow.

[0161] This sequence modeling module achieves efficient bidirectional long-range dependency capture and adaptive feature fusion. It extracts contextual information from different directions using a dual-path Mamba structure and dynamically adjusts cross-path interaction strength using a gating mechanism, effectively enhancing the model's ability to model complex sequence relationships. Simultaneously, the combination of 1×1 convolutions and the Mamba module balances local feature transformation and global dependency capture, significantly improving feature representation capabilities while maintaining near-linear computational complexity. The module employs residual connections to ensure training stability, and the overall design achieves a balance between performance and efficiency in long-sequence tasks such as vision and speech processing.

[0162] The specific processing steps of the classification module are as follows: the feature map output by the feature extraction module is denoted as the feature map. ,in For the retraction direction frame index. and These are the indices for the depth (radial) direction and the circumferential direction (discrete), respectively. The feature map is indexed by the channel. The feature map for segmentation is obtained by processing through a learnable linear projection layer. Then, all feature maps of each scan line A-line along the depth direction h are processed. Aggregate into an A-line level feature vector A-line level feature vectors The lipid probability of each A-line is then obtained by processing it through a fully connected layer and a sigmoid function. The lipid probabilities corresponding to each frame z, derived from all circumferential A-lines. The lipid probability vector that makes up each frame of optical imaging data The lipid probability vector corresponding to each frame of optical imaging data. The loss function is calculated based on the labels, and then backpropagation is performed based on the loss function to continuously optimize the model.

[0163] lipid probability vector Set it according to the following formula:

[0164]

[0165] ;

[0166]

[0167] in, For the retraction direction frame index; and These are the indices for the depth (radial) direction and the circumferential direction, respectively; This is the lipid probability vector; , , and These represent the lipid probabilities corresponding to the 1st, 2nd, wth, and Nth A-lines in the circumferential direction, respectively. For the Sigmoid function; The lipid classification logical value of the w-th A-line in the circumferential direction of the z-th frame in the pullback direction; This is the transpose of the weight vector; For bias; In the depth direction; A-line index in the circumferential direction; Let w be the A-line level feature vector of the w-th A-line; For task feature maps; The weight parameters in the depth direction can be prior or learnable, when It degenerates into global average pooling.

[0168] The total loss function of the spectral-depth joint modeling network is set according to the following formula:

[0169]

[0170]

[0171] in, This is the total loss function; This is the cross-entropy loss (CE loss). This is the Focal Loss function. , and All are weighted parameters; Represents the loss function for neighboring frame consistency constraints; The total number of pixels participating in the neighboring frame constraint; For frame index, indicating the first frame. frame; The total number of adjacent frame pairs involved in the calculation; and They were respectively in the second Frame and the Frame, spatial coordinates The predicted lipid probability value, where the spatial coordinates are... From spatial coordinates Obtained through coordinate transformation; As weight.

[0172] Furthermore, this invention can also employ other existing classification models, as long as they can achieve the goal of obtaining the original intensity image for each frame. The corresponding lipid probability vector All classification models can be used.

[0173] This embodiment also provides a computer device, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.

[0174] The classification network module and physical constraint module of this invention are decoupled, allowing for independent optimization or acceleration in clinical deployment, and exhibiting engineering-friendly deployment characteristics.

[0175] The above embodiments are merely preferred embodiments provided to fully illustrate the present invention, and the scope of protection of the present invention is not limited thereto. Equivalent substitutions or modifications made by those skilled in the art based on the present invention are all within the scope of protection of the present invention. The scope of protection of the present invention is defined by the claims.

Claims

1. A method for three-dimensional depth analysis of endoscopic lipids in luminal cavity based on physical constraints and neurally interpretable mapping, characterized in that: S1. OCT imaging is used to obtain the original intensity image sequence of B-scan acquired in the order of withdrawal. Each frame of the original intensity image in the sequence is input into a pre-trained classification model for processing to obtain the lipid probability vector corresponding to each frame of the original intensity image. S2. Obtain the initial class activation saliency map based on the feature map output by the last convolutional layer of the pre-trained classification model, obtain the optical attenuation coefficient based on the original intensity image, and obtain the final lipid saliency based on the optical attenuation coefficient and the initial class activation saliency map. S3. Obtain the upper and lower lipid boundaries of each A-line based on the original intensity image and lipid saliency, and obtain the thickness based on the upper and lower lipid boundaries. S4. Reconstruct the three-dimensional distribution of lipids based on the upper and lower boundaries of lipids and the saliency of lipids. Then, obtain the lipid volume, minimum thickness and spatial position in the circumferential and retraction directions based on the three-dimensional distribution of lipids.

2. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 1, characterized in that, Step S1 specifically involves: Obtain the B-scan raw intensity image sequence in retracement order, and extract the raw intensity image of each frame in the sequence. The raw intensity image for each frame is obtained by inputting it into a pre-trained classification model. Lipid probabilities of all corresponding circumferential A-lines Lipid probabilities of all circumferential A-lines The lipid probability vector ;in For the retraction direction frame index. and These are the indices for the depth direction and the circumferential direction, respectively.

3. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 1, characterized in that, Step S2 specifically involves: S21. Denote the feature map output by the last convolutional layer of the pre-trained classification model as the feature map. Calculate feature map The gradient of the channel k is calculated and its global average is taken along the depth direction h to obtain the importance weight of each channel k for A-line classification. Feature map Weighted summation is performed along the channel dimension, and finally, a ReLU activation function is applied to obtain the depth saliency map. Then, for depth saliency maps One-dimensional median or Gaussian smoothing is performed along the depth direction h, followed by linear normalization, to obtain the initial class activation saliency map. ; S22, For the original intensity image For each position (h, w), take the points within the neighborhood window [hr, h+r] along the depth direction of (h, w). Using Huber loss as a robustness criterion, a linear model is fitted. And based on the fitted slope Obtain the initial estimate of the local optical attenuation coefficient Initial estimate of the local optical attenuation coefficient The final optical attenuation coefficient is obtained by applying a one-dimensional total variational smoothing regularization along the depth direction h. ; S23. Based on the initial class activation saliency map and optical attenuation coefficient The final lipid significance can be obtained using a product-normalization-gating method or an optimization-based iterative fine-tuning method. .

4. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 3, characterized in that: The product-normalization-gating method is used to obtain lipid significance according to the following formula. : in, For lipid significance; For physically consistent depth significance; For indicator functions; Let w be the lipid classification probability of the w-th A-line in the z-th frame; This is the probability-gated threshold. This is the result of intermediate fusion; Indexed by depth direction; To calculate the index i for all depth directions on the w-th A-line. The maximum value; This is the initial class activation saliency map; Use the Sigmoid activation function; For scaling parameters; All are bias parameters; This is the optical attenuation coefficient.

5. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 3, characterized in that, The optimization-based iterative fine-tuning method is as follows: D1. Based on the significance of the lipids to be optimized and optical attenuation coefficient Constructing alignment loss : in, For alignment loss; Let KL divergence be a metric. For the Sigmoid function; For scaling parameters; All are bias parameters; Indicates the significance of the lipids to be optimized; D2. Based on the linear model and optical attenuation coefficient Construct physical loss : in, For physical loss; These are the weighting coefficients for the monotonicity constraint terms; For activation functions; To represent the logarithmic intensity image The first-order partial derivative in the depth direction h; The weighting coefficients for the attenuation coefficient smoothing term; It is a norm; Optical attenuation coefficient Gradient along the depth direction h; D3. Define the objective function using the following formula: D4. Initially, the lipid significance to be optimized. Using the initial class activation saliency map As initial values, the optimal solution is then obtained through iterative search of the objective function. The optimal solution is then processed according to the following formula to obtain the final lipid significance. ; in, These are the weighting coefficients for the smoothing term; It is a one-dimensional total variation; For indicator functions; Let w be the lipid classification probability of the w-th A-line in the z-th frame; This is the probability gating threshold.

6. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 1, characterized in that, Step S3 specifically involves: S31. For the original intensity image Taking the derivative along the depth direction h yields the depth gradient map. , to make lipids significant With depth gradient map gradient magnitude By performing a linear combination, the upper boundary sensitive response map is obtained. For each A-line with a fixed circumferential direction w, within a preset search range in the depth direction, find the upper boundary sensitive response map. The depth location of the local maximum point is selected as the upper boundary of the lipid layer corresponding to the A-line. ; S32, Based on lipid significance Set a significance threshold , will satisfy The regions are marked as candidate lipid core depth regions; Calculate the optical attenuation coefficient Gradient in the depth direction h By setting a positive threshold , will satisfy The regions are marked as areas where the attenuation coefficient undergoes a positive mutation; the union of the lipid core candidate region and the regions where the attenuation coefficient undergoes a positive mutation is used to form the candidate depth point set for lower boundary detection. ; S33, using candidate depth point sets The position h in the middle is used as the lower boundary. For each candidate A-line, a one-dimensional energy function is constructed, and dynamic programming or graph cut algorithms are used to solve for the minimized energy function to obtain the lipid lower boundary corresponding to each A-line. ; S34, lower boundary of lipids Subtract the upper boundary of lipids Obtain thickness .

7. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 6, characterized in that: The energy function is set according to the following formula: in, It is an energy function; These are the weighting coefficients for the data items; These are the weighting coefficients for the smoothing term; and These are adjacent in the circumferential direction. and The lower boundary of the lipids corresponding to the two A-lines; For the lower boundary of lipids The significance value of lipids at the location; For the lower boundary of lipids The gradient magnitude of the optical attenuation coefficient at that location. Only in the candidate point set Select from within.

8. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 1, characterized in that, Step S4 specifically involves: S41, Based on the upper boundary of lipids lipid lower boundary And lipid significance Huber loss was used to construct saliency data at the three-dimensional voxel level. and smooth boundary surfaces The variational energy function; S42. Discretely solve the variational energy function and calculate the three-dimensional voxel-level saliency. and smooth boundary surfaces Based on the saliency of three-dimensional voxel level Or smooth boundary surface The three-dimensional distribution of lipids can be reconstructed in both cases; S43. Based on the three-dimensional lipid distribution, calculate the lipid volume, minimum thickness, and spatial position in the circumferential and retraction directions.

9. The method for three-dimensional depth analysis of endoscopic lipids based on physical constraints and neurally interpretable mapping according to claim 8, characterized in that: The saliency of three-dimensional voxel level and smooth boundary surfaces The variational energy function is set according to the following formula: in, Represents a three-dimensional voxel-level saliency volume; The initial observations for the three-dimensional voxel-level saliency volume; Represents a smooth boundary surface; These are the initial observations of the boundary surface; and All are losses using robust Huber; and All are weighting coefficients; It is a first-order smooth; Let be the partial derivative with respect to the retracement direction z; For the second-order difference in the circumferential direction w; This indicates the significance of lipids.

10. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 9.