A deep learning-based gastroscopic polyp data recognition method and system
By employing deep learning methods for dynamic deformation correction and subpixel-level boundary segmentation, the problem of dynamic deformation interference in polyp identification during gastrointestinal endoscopy was solved, achieving high-precision polyp measurement and automated identification, thereby improving diagnostic efficiency and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FUJIAN ZHIKANGYUN MEDICAL TECH CO LTD
- Filing Date
- 2026-05-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for polyp identification in gastrointestinal endoscopy suffer from problems such as dynamic deformation interference affecting measurement accuracy and difficulty in achieving sub-pixel-level precise measurement. In particular, under non-rigid deformation caused by intestinal peristalsis and respiratory movements, existing methods cannot meet clinical needs.
A dynamic deformation correction module is introduced to eliminate morphological distortion through temporal deformation field modeling and dynamic pattern decomposition. A sub-pixel-level segmentation method based on level set and boundary heatmap regression is adopted, combined with a deep learning model to accurately locate and measure polyp boundaries.
It significantly improves the measurement consistency and accuracy of polyp identification in gastrointestinal endoscopy, achieves sub-pixel-level boundary positioning, meets the requirements of precise clinical diagnosis and treatment, reduces physician examination time, and has the ability to continuously learn and adapt to the imaging characteristics of different hospital equipment.
Smart Images

Figure CN122244560A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of medical image processing and deep learning technology, specifically to a method and system for identifying polyps in gastrointestinal endoscopy data based on deep learning. Background Technology
[0002] Gastrointestinal and colonoscopy are core clinical tools for screening, diagnosing, and predicting precancerous lesions of the digestive tract. The accurate identification, classification, and size measurement of polyps directly determine the treatment plan and patient prognosis. With the advancement of digitalization in gastroenterology, the polyp imaging data (white light endoscopy images, narrow-band imaging endoscopy images, etc.), pathological data, and clinical medical record data generated from gastroscopy and colonoscopy are experiencing explosive growth, becoming core data assets for the precise diagnosis and treatment of gastrointestinal diseases.
[0003] Deep learning technology, with its powerful feature extraction and pattern recognition capabilities, has been widely applied in the field of medical image recognition. For example, medical image segmentation methods based on convolutional neural networks (CNNs) have achieved good results in scenarios such as lung nodules and fundus lesions. However, in the specific scenario of polyp identification in gastrointestinal endoscopy, existing identification methods still have many limitations due to factors such as the complex internal environment of the gastrointestinal tract (mucosal folds obstructing the view, interference from intestinal secretions, uneven lighting), the diverse morphologies of polyps (flat, raised, pedunculated), the small difference in grayscale between polyps and normal mucosa, and the inconsistent diagnostic standards among different physicians. Currently, clinical practice still mainly relies on physicians to manually identify polyps, which suffers from problems such as high labor intensity, high rates of missed and misdiagnosed diagnoses, and difficulty in achieving real-time polyp classification and accurate size measurement, failing to meet the clinical needs of large-scale screening and precise diagnosis and treatment.
[0004] In particular, during gastroscopy and colonoscopy, intestinal peristalsis, patient respiration, or heartbeat cause periodic deformation of tissues, resulting in non-rigid morphological distortions (stretching, compression, rotation) of polyps in consecutive frames. This dynamic deformation severely affects the accuracy of single-frame-based identification and measurement, especially for pedunculated polyps, where the length of the pedicle can vary by more than 30% across different frames. Current techniques only perform rigid registration or inter-frame alignment, which cannot eliminate measurement errors caused by non-linear deformation. This leads to significant differences in measurement results for the same polyp at different times, affecting the reliability of disease progression tracking.
[0005] Furthermore, accurate measurement of polyps (clinically required error ≤0.5mm) is crucial for developing treatment plans (such as determining indications for endoscopic resection). Existing deep learning segmentation methods (such as U-Net and DeepLab) output pixel-level classifications, and the accuracy of boundary localization is limited by pixel size (each pixel may correspond to a physical size of 0.1~0.3mm). For flat polyps, the gray-level gradient between them and normal mucosa is extremely gentle, and pixel-level deviations are prone to occur at the segmentation boundaries, resulting in poor measurement repeatability and making it difficult to meet clinical requirements.
[0006] In summary, existing technologies have significant shortcomings in handling dynamic deformation interference and achieving sub-pixel-level accurate measurement. There is an urgent need for an intelligent gastrointestinal polyp identification method and system that can comprehensively solve the above problems. Summary of the Invention
[0007] In view of some shortcomings of existing technologies, this invention proposes a deep learning-based method and system for identifying polyps in gastrointestinal endoscopic data. This invention introduces a dynamic deformation correction module, which eliminates polyp morphological distortions caused by intestinal peristalsis and respiratory movements through temporal deformation field modeling and dynamic pattern decomposition. Simultaneously, it introduces a sub-pixel-level segmentation method based on level set and boundary heatmap regression to achieve sub-pixel-level precise positioning of polyp boundaries, thereby supporting accurate measurements with an error ≤0.5mm. This invention aims to automate the entire process of gastrointestinal endoscopic polyp data acquisition, standardized preprocessing, intelligent identification, accurate classification, and measurement, improving identification accuracy and efficiency, and providing gastroenterologists with an efficient and accurate auxiliary diagnostic tool.
[0008] To achieve the above objectives, this invention provides a deep learning-based method for identifying polyps in gastrointestinal endoscopic data, comprising the following steps: Step S1: Automatically collect multimodal data related to polyps in gastrointestinal endoscopy through a standardized interface; the multimodal data includes gastrointestinal endoscopy image data, polyp pathology data, patient clinical medical record data, and polyp diagnosis and treatment record data; Step S2: The collected multimodal data is automatically preprocessed according to the preset standardized preprocessing specifications, including denoising, enhancement, registration, normalization and preliminary localization of suspected areas for gastrointestinal endoscopy image data, feature extraction and natural language processing for images and text in pathological data, and structured organization of clinical medical records and diagnosis and treatment records. Step S3: Input the preprocessed gastrointestinal endoscopy image data into a pre-trained improved deep learning fusion model. The model integrates convolutional neural networks and Transformer architecture, and introduces an attention mechanism to output polyp identification results, including whether polyps exist, polyp location coordinates, and confidence level. Step S4: Based on the output polyp identification results, automatically call the polyp typing algorithm and measurement algorithm to automatically type and accurately measure the identified polyps, and output the polyp morphology type, long diameter, short diameter and measurement error; Step S5: Combine the results of steps S3 and S4 with other multimodal data processed in step S2, using the patient's unique identifier as the basis for association, to establish a complete data link and perform standardized collection and storage. Step S6: Present the identification, classification, measurement results and related data in a visual manner, receive manual verification and correction from physicians, and feed back the verified valid data as training samples to the model iteration module.
[0009] In one specific embodiment, the precise measurement in step S4 further includes the following sub-pixel-level boundary segmentation and measurement steps: Step S41: Based on the polyp location coordinates output in step S3, crop out the polyp region image from the preprocessed image; Step S42: Input the polyp region image into a deep learning segmentation network. While outputting a pixel-level polyp mask, the segmentation network adds a boundary distance regression head to predict the symbolic distance function value from each pixel to the real polyp boundary. The symbolic distance function value is a continuous floating-point number, achieving sub-pixel precision boundary localization. Step S43: Introduce a differentiable level set evolution layer, using the symbolic distance function as the initial level set function, and perform iterative optimization by minimizing an energy functional to make the level set zero level plane converge to the true boundary of the polyp, outputting a sub-pixel level continuous closed contour curve; the energy functional includes a boundary gradient term, a region consistency term, and a curvature regularization term; Step S44: Design the level set evolution process as a differentiable neural network layer, and perform end-to-end joint training with the segmentation network. The loss function includes boundary overlap loss and measurement error loss. Step S45: Based on the output subpixel-level contour curve, the major axis, minor axis and area of the polyp are calculated using a pixel calibration algorithm combined with the parameters of the gastrointestinal endoscope, and a contour map with measurement annotations is generated.
[0010] In one specific embodiment, after step S3 outputs the polyp identification result and before step S4, a dynamic deformation correction step is included to eliminate polyp morphological distortion caused by intestinal peristalsis or respiratory movements. Step S31: Extract the temporal contour sequence of the same polyp from the continuous multi-frame polyp identification results output in step S3; Step S32: Construct a temporal deformation field estimation network. Input the polyp region images of two adjacent frames and output a dense displacement field, representing the motion vector of each pixel from the current frame to the next frame. The temporal deformation field estimation network adopts a U-Net structure, and the loss function includes the smoothness constraint of the displacement field and the supervision based on contour feature points. Step S33: Accumulate the displacement field of N consecutive frames in time to obtain the accumulated deformation field with the first frame as a reference; decompose the accumulated deformation field into periodic components and drift components through dynamic mode decomposition, and model them respectively; Step S34: Select the frame with the smallest deformation in the frame sequence as the standard shape frame, or deform the polyp outline of all frames back to the average shape through the reverse deformation field. Step S35: Output the corrected standard shape profile or average shape profile to step S4 for subsequent accurate measurement, and output the deformation amplitude quantification index at the same time.
[0011] In one specific embodiment, the preprocessing of the gastrointestinal endoscopy image data in step S2 specifically includes: denoising using an adaptive median filtering algorithm; improving image contrast through histogram equalization; aligning multiple frames of images using image registration technology; initially locating suspected polyp areas and cropping invalid backgrounds using a contour extraction algorithm; and uniformly converting the images to the DICOM 3.0 standardized format.
[0012] In one specific implementation, the improved deep learning fusion model in step S3 is specifically executed as follows: local features in the image are extracted through a convolutional neural network, including polyp boundaries, surface texture, and blood vessel distribution; global features are extracted through a Transformer architecture, including the correlation features between the polyp and the surrounding mucosa; local features and global features are fused; fused features are filtered through an attention mechanism to strengthen key features; classification and judgment are performed through a fully connected layer to output the recognition result.
[0013] In one specific embodiment, the automatic classification in step S4 includes classifying polyps into flat, raised, or pedunculated types based on their elevation height and pedicle condition using a classification algorithm; the measurement error of the polyp's long and short diameters in the precise measurement is controlled within 0.5 mm.
[0014] In another aspect of the present invention, a deep learning-based system for identifying polyps in gastrointestinal endoscopy data is provided for the aforementioned method, comprising: Data acquisition module: Used to interface with the hospital information system through a standardized interface to automatically collect multimodal data related to polyps in gastrointestinal endoscopy; Data preprocessing module: used to perform denoising, enhancement, registration, structured processing, and natural language processing on the collected multimodal data according to its type; Deep learning recognition module: Built-in improved deep learning fusion model for polyp recognition on preprocessed gastrointestinal endoscopic image data, outputting location and confidence level; Dynamic deformation correction module: used to estimate the temporal deformation field and correct the shape of polyps identified in multiple consecutive frames, eliminating the shape distortion caused by intestinal peristalsis or respiratory movements; The classification and measurement module is used to automatically classify and measure the geometric dimensions of the identified polyps. The classification and measurement module includes a sub-pixel-level boundary segmentation unit, which uses steps S41 to S45 of claim 2 to achieve accurate measurement. Data association and storage module: used to associate multimodal data with recognition results, establish standardized data links, and store the data; Interaction and Feedback Module: Used to visualize the results, receive manual verification and corrections, and feed back the corrected data to the model iteration module.
[0015] In one specific implementation, the dynamic deformation correction module incorporates a time-series deformation field estimation network and a dynamic pattern decomposition unit to output standard morphological contours and deformation amplitude quantification indicators; the sub-pixel-level boundary segmentation unit in the classification and measurement module includes a differentiable level set evolution layer to achieve end-to-end training.
[0016] In one specific implementation, the standardized interfaces supported by the data acquisition module include DICOM, HL7, and RESTful interfaces; the acquired data formats include DICOM and JPG image data and structured / unstructured text data.
[0017] Beneficial effects Compared with the prior art, the present invention has the following beneficial effects: Dynamic deformation correction significantly improves measurement consistency: Through temporal deformation field estimation and dynamic pattern decomposition, the non-rigid deformation caused by intestinal peristalsis and respiratory movement is quantified and corrected for the first time in the identification of polyps in gastrointestinal endoscopy.
[0018] Subpixel-level boundary segmentation with measurement accuracy meeting clinical requirements: By employing signed distance function regression and differentiable level set evolution, boundary localization exceeding pixel resolution is achieved. This invention effectively reduces measurement errors, significantly outperforming traditional U-Net and meeting the requirements for precise clinical diagnosis and treatment.
[0019] End-to-end trainability avoids error accumulation: The level set evolution is designed to be differentiable and hierarchical, enabling joint optimization of the segmentation network and measurement post-processing, thus avoiding error propagation caused by the separation of segmentation and measurement in traditional methods.
[0020] High robustness and adaptability to complex clinical environments: The region consistency and curvature regularization terms in the energy functional make the segmentation results insensitive to image noise, local gray-level fluctuations, and secretion occlusion.
[0021] Fully automated workflow improves diagnostic and treatment efficiency: From data acquisition, preprocessing, identification, deformation correction, classification measurement to report generation, the entire process is automated, with a single polyp processing time of ≤200ms (including dynamic correction and subpixel measurement). Using this system can effectively reduce the average examination time per polyp for physicians.
[0022] Continuous learning capability to support model evolution: Through a physician verification and feedback mechanism, clinically confirmed effective data is automatically incorporated into the model's iterative training, enabling the system to continuously improve its performance after deployment and adapt to the imaging characteristics of different hospitals and equipment. Attached Figure Description
[0023] Figure 1 A flowchart illustrating a deep learning-based method for identifying polyps in gastrointestinal endoscopy data, provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the data acquisition and management interface in an embodiment of the present invention; Figure 3 This is a schematic diagram of the polyp interface in an embodiment of the present invention. Detailed Implementation
[0024] The embodiments of this patent are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this patent, and should not be construed as limiting this patent.
[0025] Example 1 like Figures 1-3 As shown, in the first embodiment of the present invention, a method for identifying polyps in gastrointestinal endoscopic data based on deep learning is disclosed, comprising: Step S1: Automated acquisition of multimodal data Through pre-defined standardized interfaces (DICOM, HL7, RESTful, etc.), the system seamlessly integrates with hospital endoscopy equipment, endoscopy center systems, pathology systems, and electronic medical record systems, establishing a real-time data exchange channel. Based on user-defined acquisition rules (acquisition frequency, data type, acquisition range, etc.), the system automatically collects various multimodal data, including: endoscopy image data (white light endoscopy images, narrowband imaging images, in DICOM, JPG, etc. formats), polyp pathology data (pathology slide images, pathology diagnosis reports), patient clinical medical record data (symptoms, medical history, test results), and polyp diagnosis and treatment records. During the acquisition process, the system automatically performs preliminary data verification, judging data integrity and format compliance, marking missing or abnormal data, and automatically identifying and deduplicating duplicate data.
[0026] In this embodiment, the system connects in real-time to the hospital's Olympus CV-290 endoscopy system via a DICOM interface, acquiring white light endoscopic images and narrow-band imaging (NBI) images at a frequency of 30 frames per second in DICOM format. Simultaneously, it interfaces with the pathology system and electronic medical record system via an HL7 interface to obtain the patient's pathology report (text) and medical record data. The acquisition rules are set to: acquire images in real-time, and automatically retrieve the corresponding pathology and medical record data after each examination. During the acquisition process, if the system detects a frame of image blurred due to lens fogging, it automatically marks it as "quality abnormality" and records it in the log.
[0027] Step S2: Multimodal data standardization preprocessing The collected multimodal data is automatically preprocessed according to preset standardized preprocessing specifications, categorized by type: Preprocessing of gastrointestinal endoscopy image data: Adaptive median filtering algorithm is used to remove image noise and artifacts; histogram equalization is used to improve image contrast and solve the problem of uneven illumination; image registration technology is used to align multiple frames of images; contour extraction algorithm is used to initially locate suspected polyp areas and crop invalid backgrounds; images are uniformly converted to DICOM3.0 standardized format and normalized to a uniform size (e.g., 512×512 pixels) and grayscale range (e.g., [0,1]).
[0028] Typically, for the acquired DICOM sequence, adaptive median filtering (window size 5×5) is first used to remove salt-and-pepper noise; then histogram equalization is performed to stretch the grayscale range to 0-255; SIFT feature matching is used to achieve inter-frame registration; Canny edge detection and contour extraction algorithms are used to initially locate suspected polyp regions (in this example, a candidate region was detected in the ascending colon), and a 512×512 pixel sub-image containing this region is cropped. Finally, the images are uniformly converted to DICOM 3.0 format.
[0029] Pathological data preprocessing: The pathological slide images are calibrated, enhanced, and segmented to extract pathological features (cell morphology, tissue structure); the pathological diagnosis report (text data) is segmented, stop words are removed, and entities are identified using natural language processing techniques (such as the BERT model) to extract key information (benign or malignant polyps, pathological classification) and to achieve standardized encoding of the text data.
[0030] Typically, for the pathology report text "(ascending colon) tubular adenoma, low-grade intraepithelial neoplasia", the medical BERT model is used for named entity recognition to extract three structured fields: "location: ascending colon", "pathological diagnosis: tubular adenoma", and "necrotic grade: low grade".
[0031] Clinical medical records and other data preprocessing: The patient's clinical medical record data and diagnosis and treatment record data are structured and organized, key information (such as the location of polyp onset, patient age, past medical history, and medication history) is extracted, invalid data and outliers are removed, standardized data tables are established, and correlation and matching with imaging data and pathology data are achieved.
[0032] For example, information such as the patient's age (52 years old), gender (male), chief complaint of "rectal bleeding for 1 month," and past medical history of "hypertension" were extracted from the EMR system and organized into JSON format.
[0033] Step S3: Intelligent polyp identification based on deep learning model The preprocessed gastrointestinal endoscopic images are input into an improved deep learning fusion model. This model combines a convolutional neural network (CNN) and a Transformer architecture, with the following specific structure: CNN branch: Uses a pre-trained ResNet-50 as an encoder to extract local features from images, including polyp boundaries, surface texture, blood vessel distribution, etc.
[0034] Transformer branch: Divide the image into 16×16 patches and extract global contextual features through the VisionTransformer (ViT) encoder to capture the association information between the polyp and the surrounding mucosa.
[0035] Feature fusion layer: The feature map output by the CNN is concatenated with the sequence features output by the Transformer, and the dimensionality is reduced by 1×1 convolution to obtain the fused features.
[0036] Attention mechanism: Channel attention (SENet) and spatial attention (CBAM) are introduced to weight the fused features, enhance the key features of polyps, and suppress noise such as secretions and artifacts.
[0037] Output layer: Outputs the probability (confidence score) of polyp presence through a fully connected layer and a sigmoid activation function, and outputs the coordinates of the polyp bounding box (center point, width, height) through a regression branch. The default confidence threshold is 0.85. Values higher than this are considered valid identifications, while values lower than this are marked as "suspected polyps" and prompt for manual verification.
[0038] The model was trained using 5000 labeled gastrointestinal images collected clinically. A joint loss function of cross-entropy loss (classification) and SmoothL1 loss (bounding box regression) was employed, with an initial learning rate of 0.001 and the AdamW optimizer. Training lasted 50 epochs with a batch size of 16. Training data augmentation included random rotation, flipping, and brightness adjustment.
[0039] Typically, the preprocessed image is input into an improved deep learning fusion model. The model detects a candidate region with a confidence score of 0.92, and outputs bounding box coordinates of [120, 200, 80, 60] (center point x=120 pixels, y=200 pixels, width 80 pixels, height 60 pixels). Since the confidence score is higher than the threshold of 0.85, the system determines it to be a valid polyp.
[0040] Furthermore, in this embodiment, after outputting multiple consecutive frames of polyp identification results in step S3, the following sub-steps are performed to eliminate dynamic deformation: Step S31: Extract the temporal contour sequence. For the same polyp, extract its initial contour point set in N consecutive frames (N is 10, covering about 1-2 peristaltic cycles).
[0041] Step S32: Temporal Deformation Field Estimation. A U-Net-structured temporal deformation field estimation network is constructed. Input is the polyp region image (128×128 pixels) from two adjacent frames, and the output is a dense displacement field (a two-dimensional motion vector for each pixel). The loss function includes: optical flow consistency loss, displacement field smoothness loss, and supervised loss for contour feature points (obtained through SIFT matching).
[0042] Step S33: Accumulated Deformation Field and Dynamic Mode Decomposition. Accumulate N-1 consecutive displacement fields over time to obtain the accumulated deformation field with the first frame as a reference. Arrange this accumulated deformation field into a matrix along the time dimension and perform dynamic mode decomposition (DMD) to obtain eigenvalues and DMD modes. Components near |λ|=1 correspond to periodic components (caused by heartbeats and respiration), while components with |λ|<1 correspond to drift components (caused by peristaltic propulsion).
[0043] Step S34: Shape Correction. The "shape pullback" method is used: for the contour points of all frames, the coordinates of the first frame are transformed by inversely accumulating the deformation field. Then, the average position of all transformed contours is taken to obtain the average shape contour. Alternatively, the frame with the smallest deformation amplitude is selected as the standard shape frame.
[0044] Step S35: Output the correction results. Output the corrected contour to step S4, and simultaneously output the deformation amplitude quantification indicators (periodic component amplitude, drift velocity) for physicians to assess polyp activity.
[0045] Because the patient exhibited mild intestinal peristalsis, the system automatically extracted the polyp's contour sequence over 10 consecutive frames (approximately 0.33 seconds). A temporal deformation field estimation network calculated the dense displacement field between adjacent frames, which, after accumulation, was decomposed using dynamic pattern decomposition to obtain: a periodic component amplitude of 0.12 pixels (primarily caused by abdominal aortic pulsation) and a drift velocity of 0.08 pixels / frame (peristaltic propulsion). Using a morphological pullback method, the contours of the 10 frames were transformed to the coordinate system of the first frame and then averaged to obtain the corrected average morphological contour.
[0046] The temporal deformation field estimation network adopts a U-Net structure. The input is the polyp region image (size 128×128×3, normalized to [0,1]) of two adjacent frames, and the output is a dense displacement field (size 128×128×2), representing the two-dimensional motion vector (unit: pixels) of each pixel from the current frame to the next frame. The network encoder contains 4 downsampling blocks, and the decoder contains 4 upsampling blocks. Concatenation is used for skip connections. The final layer uses bilinear upsampling to restore the image to 128×128 and has no activation function.
[0047] The cumulative deformation field of N consecutive frames (N=20) is flattened into an M×T matrix X, where M is the number of pixels × 2 (each pixel has two displacement components, x and y), and T is the number of time frames. Singular value decomposition is performed on X to reduce its dimensionality to r=10. The dimensionality-reduced system matrix is then calculated. .
[0048] Solve for the eigenvalues λ_i and eigenvectors w_i of A_tilde. Components with |λ_i|≈1 are classified as periodic deformations (e.g., heartbeat, respiration), while components with |λ_i|<1 are classified as drift deformations (e.g., peristaltic propulsion). Reconstruct the deformation fields of each component for subsequent correction.
[0049] This embodiment employs the 'inverse deformation field accumulation' method: for the contour point set P_i of the i-th frame, the accumulated inverse deformation field from the i-th frame to the reference frame (such as the 1st frame) is calculated. .in This indicates the composition operation of the function. This formula describes how to sequentially concatenate the inverse deformation fields from frame i, frame i-1, ..., frame 2, and frame 1 to finally obtain the cumulative inverse deformation field directly transformed from frame i to frame 1.
[0050] Map each point p in P_i to the reference frame coordinates using bilinear interpolation: p'=Φ {1←i} (p). The median or mean of the transformed contour points for all frames is taken to obtain the average shape contour. If the 'standard shape frame' is selected, the deformation energy of the contour in each frame (the sum of the squares of the displacements of each point on the contour) is calculated, and the frame with the minimum energy is selected as the standard shape frame.
[0051] Step S4: Automatic polyp classification and subpixel-level precise measurement (a) Automatic classification: Based on the morphological characteristics of the corrected polyps (height of the ridge, the ratio of the pedicle width to the base width), the polyps are classified into flat type (height < 2 mm and pedicle ratio < 0.3), raised type (height ≥ 2 mm and pedicle ratio < 0.3), and pedicled type (pedicle ratio ≥ 0.3) by a three-class support vector machine (SVM) or a lightweight neural network, and the classification confidence score is output at the same time.
[0052] For example, the calculated corrected height of the polyp was 3.2 mm, the ratio of the pedicle width to the base width was 0.15, and it was classified as "protruding type" with a confidence level of 0.96.
[0053] (II) Subpixel-level boundary segmentation and measurement (steps S41-S45): Step S41: Crop the polyp region image (256×256 pixels) from the corrected image.
[0054] Step S42: Input the polyp region image into an improved U-Net segmentation network. This network outputs a pixel-level binary mask while adding a boundary distance regression head to predict the signed distance function (SDF) value from each pixel to the actual polyp boundary. The SDF value is a continuous floating-point number (positive inside the boundary, negative outside, and zero on the boundary), enabling sub-pixel accuracy boundary localization. During training, the SDF regression head uses mean squared error loss.
[0055] In this embodiment, the U-Net structure used for SDF regression is as follows: The encoder contains 5 downsampling blocks, each consisting of two 3×3 convolutions (stride 1, padding 1), BatchNorm, ReLU, and a 2×2 max pooling. The number of channels is 64, 128, 256, 512, and 512 respectively. The decoder contains 4 upsampling blocks, each consisting of a transposed convolution (2×2, stride 2), concatenation with the corresponding encoder features, and two 3×3 convolutions. The output layer is divided into two branches: the first branch is a 1-channel sigmoid output, representing the pixel-level segmentation mask; the second branch is a 1-channel linear output, representing the signed distance function (SDF) value. The SDF value is calculated during training using precise boundary annotations, with positive values inside the boundary, negative values outside, and zero values on the boundary.
[0056] Typically, in this embodiment, the U-Net+SDF head network predicts the initial SDF to obtain the approximate location of the boundary. In this embodiment, the U-Net structure used for SDF regression is as follows: The encoder contains 5 downsampling blocks, each consisting of two 3×3 convolutions (stride 1, padding 1), BatchNorm, ReLU, and a 2×2 max pooling operation. The number of channels is 64, 128, 256, 512, and 512 respectively. The decoder contains 4 upsampling blocks, each consisting of a transposed convolution (2×2, stride 2), concatenation with the corresponding encoder features, and two 3×3 convolutions. The output layer has two branches: the first branch is a 1-channel sigmoid output, representing the pixel-level segmentation mask; the second branch is a 1-channel linear output, representing the signed distance function (SDF). The SDF value is calculated during training using precise boundary annotations, with positive values inside the boundary, negative values outside, and zero values on the boundary itself.
[0057] Step S43: Introduce a differentiable level set evolution layer. Use the predicted SDF as the initial level set function φ0. Use the predicted initial SDF as the initial value φ0(x,y) of the level set function. Perform iterative optimization by minimizing the energy functional (iteration 20 times).
[0058] The energy functional of level set evolution is defined as follows: E(φ)=λ1·∫g(x,y)·δ(φ)·|∇φ|dxdy+λ2·∫H(φ)·(I-c1)²+(1-H(φ))·(I-c2)²dxdy+λ3·∫|∇H(φ)|dxdy; Where g is the edge indicator function, H(φ) is the Heaviside function, δ(φ) is the Dirac function, and c1 and c2 are the average gray values inside and outside the contour, respectively.
[0059] Typically, the energy functional parameters are set to λ1=0.1, λ2=1.0, and λ3=0.01. After evolution, the zero-level plane converges to the fine boundary, and the coordinates of some boundary points reach sub-pixel accuracy (e.g., the coordinates of a certain boundary point are (120.3, 156.7)).
[0060] The differentiable level set evolution layer is discretized using the finite difference method, and the iterative formula is as follows: ; Here, Δt is set to 0.1, and the number of iterations is set to 20. δ(φ) uses the regularized Dirac function: δ_ε(φ)=ε / (π(ε²+φ²)), where ε is set to 1.0. All operations are implemented using PyTorch, supporting automatic differentiation, thereby enabling end-to-end joint training with the segmentation network.
[0061] The total loss function is: L_total=L_ce+α·L_sdf+β·L_hausdorff+γ·L_measure; Where L_ce is the cross-entropy loss, L_sdf is the mean squared error loss of SDF prediction, L_hausdorff is the Hausdorff distance between the predicted contour and the true contour, and L_measure is the smoothing L1 loss of the major / minor axis measurement error. α, β, and γ are set to 0.5, 0.2, and 0.1, respectively.
[0062] Step S44: End-to-end joint training. The total loss function includes: segmentation mask cross-entropy loss, SDF regression mean squared error loss, contour Hausdorff distance loss, and measurement error loss. The training dataset contains 1000 annotated gastrointestinal images with subpixel-level polyp boundaries.
[0063] The total loss function is: L_total = L_ce + α·L_sdf + β·L_hausdorff + γ·L_measure Where L_ce is the cross-entropy loss, L_sdf is the mean squared error loss of SDF prediction, L_hausdorff is the Hausdorff distance between the predicted contour and the true contour, and L_measure is the smoothed L1 loss of the major / minor axis measurement error. α, β, and γ are set to 0.5, 0.2, and 0.1, respectively.
[0064] Step S45: Based on the output subpixel-level contour curve, a pixel calibration algorithm (combining the CCD pixel size, lens focal length, and working distance parameters of the endoscope device) is used to convert the pixel coordinates into physical millimeter coordinates. The minimum circumscribed ellipse of the contour is calculated, with its major axis as the major axis and its minor axis as the minor axis, and the area is calculated. The measurement results are rounded to one decimal place, and a contour map with measurement annotations is generated and superimposed on the original image.
[0065] Typical pixel calibration: The endoscope device has a CCD pixel size of 2.2μm, a working distance of 8mm, a lens focal length of 3.5mm, and a conversion factor K=0.025mm / pixel.
[0066] Measurement results: Major axis = 5.4 mm, minor axis = 4.1 mm, area = 17.4 mm². Generate a measurement annotation map, draw the outline and major and minor axis line segments on the image, and label the values.
[0067] Step S5: Multimodal data association and normalization aggregation Using the patient's unique patient ID as the primary key, the polyp identification results (location, confidence level), classification results, and subpixel measurement results obtained in steps S3 and S4 are associated with the pathological data (benign / malignant, pathological classification) and clinical record data (age, medical history, medication) processed in step S2. The data is coded and archived according to a pre-defined classification system (by location, polyp type, and examination date). A distributed storage system (HDFS) and an index database (Elasticsearch) are used to achieve secure storage and fast retrieval, supporting incremental aggregation.
[0068] For example, using the patient's visit ID as the primary key, the above results (polyp location, confidence level 0.92, classification "protruding type", long diameter 5.4 mm, short diameter 4.1 mm) are associated with preprocessed pathological information (tubular adenoma, low grade) and medical record information (52 years old, male, rectal bleeding). The data is coded according to the classification system: location code "C18.2" (ascending colon), examination date "2026-02-27", and stored in an Elasticsearch index. Simultaneously, the original image and contour data are stored in HDFS.
[0069] Step S6: Result Verification, Application and Feedback The system presents the final results in a visual interface, including: polyp bounding boxes and confidence levels overlaid on the original images; a sidebar displaying the classification results, measurement values, and measurement annotations; and the associated pathological diagnosis and medical record summary displayed at the bottom. Physicians can manually verify the results on the interface: correcting the classification, re-measuring, or marking false positives / false negatives. The corrected data is automatically saved as the final diagnostic record. Physicians can add the verified valid data (including the original images and corrected labels) to the model training pool with a single click. The system periodically triggers incremental model training (e.g., weekly) for continuous iterative optimization.
[0070] For example, the system displays the following on the interactive interface: a green bounding box and a "0.92" confidence level label overlaid on the original image; the right side shows "protruding polyp, long diameter 5.4mm, short diameter 4.1mm," along with a measurement annotation diagram; the bottom shows "Pathology: tubular adenoma (low grade), endoscopic resection recommended." After reviewing the case and deeming the identification accurate and requiring no correction, the attending physician clicks the "Confirm" button. The system automatically generates a structured report and saves it to the electronic medical record. The physician also marks this case as a "high-quality sample" and feeds it into the model training pool for incremental training the following week.
[0071] Example 2: Deep Learning-Based Polyp Recognition System for Gastrointestinal Endoscopy This embodiment provides a system for executing the above method. The system is deployed on a hospital intranet server (configuration: 2×Intel Xeon Gold 6248, 4×NVIDIA RTX 4090 GPU, 256GB RAM, 2TB NVMe SSD). The specific implementation of each module is as follows: Data acquisition module: Written in Go, it communicates with various data sources via gRPC. It supports the DICMC-STORESCP service, HL7MLLP receiver, and RESTful API polling. After receiving data, it writes it to a Kafka message queue (3 partitions) to buffer peak traffic.
[0072] Data preprocessing module: Distributed across 4 computing nodes, each running Python 3.9 + OpenCV 4.8 + PyTorch 1.13. The image preprocessing pipeline uses Dask for parallel processing; the text NLP module is based on a fine-tuned BioBERT model from HuggingFace; structured data processing uses Pandas.
[0073] Deep learning recognition module: The model is implemented using PyTorch, with TensorRT 8.5 as the inference engine, and the inference time per image is ≤35ms. The model version is managed by MLflow and supports A / B testing. Mixed precision (FP16) and gradient accumulation are used during training.
[0074] Dynamic deformation correction module: The temporal deformation field estimation network uses a U-Net architecture (4 layers for the encoder, 4 layers for the decoder), with an input of 128×128×3 and an output of 128×128×2. Dynamic mode decomposition is implemented using the scipy.linalg module in Python. The correction time for a single polyp is approximately 80ms (10 frames).
[0075] Fractal and Measurement Module: The subpixel-level boundary segmentation unit contains a differentiable level set evolution layer, implemented using a custom PyTorch autogradFunction. The measurement algorithm is based on OpenCV's fitEllipse and contourArea. Single polyp measurement time is approximately 50ms.
[0076] Data association and storage module: Elasticsearch 8.5 is used as the index database, and HDFS 3.3 is used as the object storage. The data association logic is executed hourly by an Apache Spark job to build the graph structure.
[0077] Interaction and Feedback Module: The front-end is developed based on Vue3 and ECharts, communicating with the back-end (FastAPI) via WebSocket. The physician verification interface supports drag-and-drop bounding box correction, manual input of measurement values, and adding comments. Feedback data is automatically written to the "retraining_queue" table for model iteration.
[0078] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.
Claims
1. A deep learning-based method for identifying data of polyps in a gastroscopic examination, characterized in that, Includes the following steps: Step S1: Automatically collect multimodal data related to polyps in gastrointestinal endoscopy through a standardized interface; the multimodal data includes gastrointestinal endoscopy image data, polyp pathology data, patient clinical medical record data, and polyp diagnosis and treatment record data; Step S2: The collected multimodal data is automatically preprocessed according to the preset standardized preprocessing specifications, including denoising, enhancement, registration, normalization and preliminary localization of suspected areas for gastrointestinal endoscopy image data, feature extraction and natural language processing for images and text in pathological data, and structured organization of clinical medical records and diagnosis and treatment records. Step S3: Input the preprocessed gastrointestinal endoscopy image data into a pre-trained improved deep learning fusion model. The model integrates convolutional neural networks and Transformer architecture, and introduces an attention mechanism to output polyp identification results, including whether polyps exist, polyp location coordinates, and confidence level. Step S4: Based on the output polyp identification results, automatically call the polyp typing algorithm and measurement algorithm to automatically type and accurately measure the identified polyps, and output the polyp morphology type, long diameter, short diameter and measurement error; Step S5: Combine the results of steps S3 and S4 with other multimodal data processed in step S2, using the patient's unique identifier as the basis for association, to establish a complete data link and perform standardized collection and storage. Step S6: Present the identification, classification, measurement results and related data in a visual manner, receive manual verification and correction from physicians, and feed back the verified valid data as training samples to the model iteration module.
2. The method of claim 1, wherein, The precise measurement in step S4 further includes the following sub-pixel-level boundary segmentation and measurement steps: Step S41: Based on the polyp location coordinates output in step S3, crop out the polyp region image from the preprocessed image; Step S42: Input the polyp region image into a deep learning segmentation network. While outputting a pixel-level polyp mask, the segmentation network adds a boundary distance regression head to predict the symbolic distance function value from each pixel to the real polyp boundary. The symbolic distance function value is a continuous floating-point number, achieving sub-pixel precision boundary localization. Step S43: Introduce a differentiable level set evolution layer, using the symbolic distance function as the initial level set function, and perform iterative optimization by minimizing an energy functional to make the level set zero level plane converge to the true boundary of the polyp, outputting a sub-pixel level continuous closed contour curve; the energy functional includes a boundary gradient term, a region consistency term, and a curvature regularization term; Step S44: Design the level set evolution process as a differentiable neural network layer, and perform end-to-end joint training with the segmentation network. The loss function includes boundary overlap loss and measurement error loss. Step S45: Based on the output subpixel-level contour curve, the major axis, minor axis and area of the polyp are calculated using a pixel calibration algorithm combined with the parameters of the gastrointestinal endoscope, and a contour map with measurement annotations is generated.
3. The method of claim 1, wherein, After step S3 outputs the polyp identification result and before step S4, a dynamic deformation correction step is also included to eliminate polyp morphological distortion caused by intestinal peristalsis or respiratory movements. Step S31: Extract the temporal contour sequence of the same polyp from the continuous multi-frame polyp identification results output in step S3; Step S32: Construct a temporal deformation field estimation network. Input the polyp region images of two adjacent frames and output a dense displacement field, representing the motion vector of each pixel from the current frame to the next frame. The temporal deformation field estimation network adopts a U-Net structure, and the loss function includes the smoothness constraint of the displacement field and the supervision based on contour feature points. Step S33: Accumulate the displacement field of N consecutive frames in time to obtain the accumulated deformation field with the first frame as a reference; The cumulative deformation field is decomposed into periodic and drift components through dynamic mode decomposition, and modeled separately. Step S34: Select the frame with the smallest deformation in the frame sequence as the standard shape frame, or deform the polyp outline of all frames back to the average shape through the reverse deformation field. Step S35: Output the corrected standard shape profile or average shape profile to step S4 for subsequent accurate measurement, and output the deformation amplitude quantification index at the same time.
4. The method of claim 1, wherein, The preprocessing of gastrointestinal endoscopy image data in step S2 specifically includes: denoising using an adaptive median filtering algorithm; improving image contrast through histogram equalization; aligning multiple frames of images using image registration technology; initially locating suspected polyp areas and cropping invalid backgrounds using a contour extraction algorithm; and uniformly converting the images to the DICOM 3.0 standardized format.
5. The method of claim 1, wherein, The improved deep learning fusion model in step S3 is specifically implemented as follows: local features in the image are extracted through a convolutional neural network, including polyp boundaries, surface texture, and blood vessel distribution; global features are extracted through a Transformer architecture, including the correlation features between the polyp and the surrounding mucosa; local features and global features are fused; fused features are filtered through an attention mechanism to strengthen key features; and classification and judgment are performed through a fully connected layer to output the recognition result.
6. The method of claim 1, wherein, The automatic classification in step S4 includes classifying polyps into flat, raised, or pedunculated types based on their elevation height and pedicle condition using a classification algorithm; the measurement error of the polyp's long and short diameters in the precise measurement is controlled within 0.5 mm.
7. A deep learning-based colonoscopy polyp data recognition system for performing the method of any one of claims 1 to 6, characterized in that, include: Data acquisition module: Used to interface with the hospital information system through a standardized interface to automatically collect multimodal data related to polyps in gastrointestinal endoscopy; Data preprocessing module: used to perform denoising, enhancement, registration, structured processing, and natural language processing on the collected multimodal data according to its type; Deep learning recognition module: Built-in improved deep learning fusion model for polyp recognition on preprocessed gastrointestinal endoscopic image data, outputting location and confidence level; Dynamic deformation correction module: used to estimate the temporal deformation field and correct the shape of polyps identified in multiple consecutive frames, eliminating the shape distortion caused by intestinal peristalsis or respiratory movements; The classification and measurement module is used to automatically classify and measure the geometric dimensions of the identified polyps. The classification and measurement module includes a sub-pixel-level boundary segmentation unit, which uses steps S41 to S45 of claim 2 to achieve accurate measurement. Data association and storage module: used to associate multimodal data with recognition results, establish standardized data links, and store the data; Interaction and Feedback Module: Used to visualize the results, receive manual verification and corrections, and feed back the corrected data to the model iteration module.
8. The system of claim 7, wherein, The dynamic deformation correction module has a built-in temporal deformation field estimation network and a dynamic pattern decomposition unit, which are used to output standard morphological contours and deformation amplitude quantification indicators; the sub-pixel level boundary segmentation unit in the classification and measurement module includes a differentiable level set evolution layer to achieve end-to-end training.
9. The system of claim 7, wherein, The data acquisition module supports standardized interfaces including DICOM, HL7, and RESTful interfaces; the acquired data formats include DICOM and JPG image data, as well as structured / unstructured text data.