Early esophageal cancer invasion depth classification method based on conventional endoscopic images and deep learning
By constructing the EID-Net model based on conventional endoscopic images and deep learning, the problems of subjectivity and accuracy in assessing the invasion depth of early esophageal cancer were solved. The model enables automatic classification of the invasion depth of early esophageal cancer, improves the objectivity and accuracy of the assessment, and assists in the development of individualized treatment plans.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHONGSHAN HOSPITAL FUDAN UNIV
- Filing Date
- 2026-04-28
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244636A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and a deep learning model, belonging to the field of medical image analysis and artificial intelligence-assisted diagnosis and treatment technology. Background Technology
[0002] Esophageal cancer remains one of the most common and deadliest malignant tumors of the digestive tract worldwide. With improvements in endoscopic screening and early diagnosis and treatment, the detection rate of early esophageal cancer (EEC) is gradually increasing. For EEC, endoscopic submucosal dissection (ESD) has become an important radical treatment method due to its advantages such as being minimally invasive, allowing for en bloc resection, and preserving organ function.
[0003] However, the radical indication for ESD strictly depends on the accurate preoperative assessment of tumor invasion depth. Generally, lesions confined to the lamina propria (LP), muscularis mucosae (MM), or superficial submucosal layer (SM1, ≤200 μm) are more likely to be suitable for radical ESD; while deep submucosal invasion (SM2, >200 μm) lesions often carry a higher risk of lymph node metastasis and non-radical resection, frequently requiring further consideration of surgery, radiotherapy, chemotherapy, or combined therapy. Therefore, the accuracy of preoperative invasion depth assessment directly affects the choice of treatment strategy and patient prognosis.
[0004] In current clinical practice, assessment of EEC invasion depth primarily relies on endoscopists' comprehensive judgment of lesion surface morphology, mucosal structure, and microvascular morphology. Commonly used imaging methods include white light imaging (WLI), narrow-band imaging (NBI), iodine staining, and blue light imaging (BLI). Although these imaging techniques provide rich visual information, their interpretation remains highly dependent on the surgeon's experience, exhibiting significant subjectivity and limited consistency among different physicians. Particularly in identifying invasion subtypes such as SM2, which are crucial for clinical decision-making, existing manual assessment methods often suffer from insufficient accuracy, easily leading to undertreatment or overtreatment.
[0005] On the one hand, misdiagnosing deep invasive lesions as superficial lesions may lead to non-radical endoscopic resection in patients unsuitable for ESD, delaying further treatment. On the other hand, misdiagnosing superficial lesions suitable for ESD as deep invasive lesions may require unnecessary surgery, increasing trauma and medical burden. Therefore, developing a technical method that can objectively, automatically, and with high accuracy identify the depth of early esophageal cancer invasion based on routine endoscopic images has significant clinical implications and application value. Summary of the Invention
[0006] Accurate preoperative assessment of tumor invasion depth is crucial in determining whether early-stage esophageal cancer patients can undergo radical ESD. This invention addresses the problems of existing manual endoscopic assessments, such as high subjectivity, limited accuracy, and insufficient inter-observer consistency. It aims to provide an objective, stable, and highly accurate preoperative decision support tool to improve the scientific rigor and accuracy of ESD treatment decisions.
[0007] To achieve the above objectives, the present invention discloses a method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning, characterized by comprising the following steps:
[0008] Step 1: Screen patients with early-stage esophageal cancer and collect their clinical data, preoperative routine endoscopic images, and postoperative pathological data as sample data. Step 2: Preprocess the collected sample data and, based on the postoperative pathological results, use the infiltration depth corresponding to each lesion matched by each sample data as a supervision label to establish a standard dataset, in which there are K different types of infiltration depth, K≥2; Step 3: Construct the EID-Net model, which includes a visual Transformer encoder and a classification head is connected after the visual Transformer encoder; Step 4: Train the EID-Net model using a standard dataset, including the following steps: The visual Transformer encoder is pre-trained using a standard masked autoencoder framework; Using labeled endoscopic images, the entire EID-Net model network is fine-tuned end-to-end under the supervision of pathological labels, and the cross-entropy classification loss and supervised deep clustering loss are jointly optimized. Step 5: Input the preoperative routine endoscopic images of the patient to be tested into the trained EID-Net model. The EID-Net model performs preprocessing, feature extraction and invasion depth classification on the input images, and outputs the prediction result of the lesion belonging to one of K different types of invasion depth.
[0009] Preferably, in step 1, the inclusion criteria for patients with a history of early-stage esophageal cancer include: The patient has a complete postoperative pathology report, which includes at least the depth of invasion, size of the lesion, location of the lesion, lymphovascular invasion, and condition of the surgical margin; The patient has high-quality preoperative endoscopic images, which include at least one or more of white light imaging, narrow-band imaging, iodine staining images and blue light imaging. At the same time, the lesion morphology is clear, the microvascular structure is clearly displayed, and there is no obvious motion blur or artifacts in the endoscopic images. The patient has complete clinical data.
[0010] Preferably, in step 1, the exclusion criteria for patients with a history of early-stage esophageal cancer include: The endoscopic images are of poor quality, with severe blurring, artifacts, insufficient air filling, or unclear lesion boundaries; The patient has a history of esophageal surgery; The patient has previously received neoadjuvant radiotherapy, chemotherapy, or a combination of radiotherapy and chemotherapy; The patient also had other malignant tumors; Data on the depth of pathological infiltration is missing; The patient has multiple esophageal lesions; The patient has synchronous distant metastases.
[0011] Preferably, in step 2, there are four different types of infiltration depths, including: the lamina propria, the muscularis mucosae, the superficial submucosa, and the deep submucosa.
[0012] Preferably, in step 5, when the prediction result output by the EID-Net model is the lamina propria, muscularis mucosae, or superficial submucosal layer, it suggests that the lesion is more likely to meet the indications for radical resection by endoscopic submucosal dissection; when the prediction result output by the EID-Net model is the deep submucosal layer, it suggests that the lesion is more likely not suitable for simple endoscopic submucosal dissection and further surgical or radiotherapy / chemotherapy treatment options need to be considered.
[0013] Preferably, in step 2, the standard dataset is systematically divided into three independent queues according to the patient's source and purpose, including a training queue, an internal validation queue, and an external validation queue. The queue division is performed on a patient-by-patient basis, meaning that all images of the same patient belong to only one queue and do not appear in the training queue, internal validation queue, and external validation queue simultaneously.
[0014] Preferably, in step 4, after training is completed, a large number of early esophageal cancer patients are retrospectively included as a model development cohort and an external validation cohort to finally obtain a classification prediction model with high accuracy.
[0015] Preferably, in step 4, when performing end-to-end supervised fine-tuning of the entire EID-Net model network, the overall loss function... for:
[0016] in, Cross-entropy classification loss, To monitor the loss of deep clustering, This is the balance coefficient.
[0017] Preferably, the supervised deep clustering loss Defined as:
[0018] in, This indicates the number of samples in the mini-batch. Indicates the first The feature vectors extracted from each sample by a pre-trained visual Transformer encoder. Indicates category At the category center in the feature space, This indicates its true category label. Indicates category At the category center in the feature space, This indicates the transpose operation. This is the temperature coefficient.
[0019] Preferably, in step 4, the EID-Net model is trained using the SGD optimizer, combined with a warm-up learning rate strategy and a cosine annealing learning rate scheduling strategy.
[0020] Accurate preoperative assessment of tumor invasion depth is crucial in determining whether early-stage esophageal cancer patients are suitable candidates for radical ESD. This invention constructs a classification model that integrates masked autoencoder (MAE) pre-training, visual Transformer (ViT) feature extraction, and supervised deep clustering constraints. This model automatically classifies the invasion depth of early-stage esophageal cancer into four categories, predicting whether the lesion belongs to the lamina propria (LP), muscularis mucosae (MM), superficial submucosal layer (SM1), or deep submucosal layer (SM2). This further assists in determining whether the lesion meets the indications for radical ESD resection, thereby improving preoperative decision-making and reducing the risks of overtreatment and undertreatment. The method disclosed in this invention can objectively, automatically, and with high precision assess lesion invasion depth preoperatively, assisting in determining whether the lesion meets the indications for radical ESD resection, thus providing a basis for clinicians to develop individualized treatment plans.
[0021] Compared with existing technical solutions, the present invention has the following beneficial effects: 1. Improve the objectivity of infiltration depth assessment This invention uses a deep learning model to automatically analyze conventional endoscopic images, reducing the reliance on operator experience in traditional manual judgment and minimizing errors caused by subjectivity. 2. Improve classification accuracy, especially enhance the ability to identify key categories. This invention introduces clustering loss on the basis of the ViT classification framework. By enhancing intra-class aggregation and inter-class separation, it highlights the key discriminative features between different invasion depth categories, which helps to improve the accuracy of LP, MM, SM1 and SM2 four-class classification tasks, and is especially helpful in identifying SM2 lesions that are of great clinical significance. 3. Improve model generalization performance using MAE pre-training This invention employs the MAE pre-training mechanism, enabling the model to learn more representative visual features from a large number of endoscopic images, thereby improving the model's adaptability to different imaging modes, different lesion morphologies, and complex clinical scenarios. 4. Make full use of routine endoscopic images; high clinical application value. The images used in this invention are derived from routine clinical examinations, including white light imaging, narrow band imaging, iodine staining, and blue light imaging. No additional expensive equipment or complex examination procedures are required, making it highly feasible and promising for widespread application. 5. Assist in ESD treatment decisions and reduce the risks of overtreatment and undertreatment. This invention enables more precise stratification of lesion invasion depth before surgery, providing an objective basis for determining the radical indications for ESD, thereby reducing non-radical ESD, avoiding unnecessary surgery, and reducing the risk of undertreatment due to missed deep invasion. Attached Figure Description
[0022] Figure 1 This is a flowchart of a method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning, as disclosed in an embodiment of the present invention. Detailed Implementation
[0023] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined by the appended claims.
[0024] Different treatment strategies are required for different depths of invasion in early esophageal cancer. Conventional endoscopic images objectively contain visual information reflecting the extent of lesion invasion. However, current clinical assessment methods mainly rely on human experience and judgment, which have problems such as strong subjectivity, poor repeatability and limited accuracy, making it difficult to meet the needs of precision treatment.
[0025] By leveraging deep learning techniques, especially visual Transformer networks with global modeling capabilities, deep features related to invasion depth can be extracted from conventional endoscopic images. Furthermore, pre-training with a masked autoencoder enhances the model's ability to learn general structural features of endoscopic images, improving its efficiency in utilizing limited medical annotation data. Simultaneously, by introducing clustering loss into the classification task to constrain the sample distribution in the feature space, features of similar samples become more concentrated, while features of dissimilar samples become more separated, thus highlighting key discriminative features between different invasion depth categories and improving classification performance. Based on the aforementioned principles, to overcome the shortcomings of existing technologies in assessing early esophageal cancer invasion depth, such as reliance on human experience, high subjectivity, low accuracy, poor repeatability, and insufficient ability to identify key deep submucosal invasive lesions, this invention constructs an early esophageal cancer invasion depth classification method integrating MAE pre-training, a ViT classification network, and clustering loss constraints. This method enables automatic identification of four invasion depths: LP, MM, SM1, and SM2, and is used to assist in determining whether the lesion meets the indications for radical resection of ESD, such as... Figure 1 As shown, the specific steps include: Step 1: Case screening and data collection: We collected clinical data, preoperative routine endoscopic images, and postoperative pathological data from patients with early-stage esophageal cancer.
[0026] In a preferred embodiment of the present invention, the inclusion criteria include: 1) The patient has a complete postoperative pathology report, which includes at least the depth of invasion, size of the lesion, location of the lesion, lymphovascular invasion and the condition of the surgical margin; 2) The patient has high-quality preoperative endoscopic images, which include at least one or more of white light imaging (WLI), narrow band imaging (NBI), iodine staining images and blue light imaging (BLI). At the same time, the lesion morphology is clear, the microvascular structure is clearly displayed, and there is no obvious motion blur or artifacts in the endoscopic images. 3) The patient has complete clinical data.
[0027] In a preferred embodiment of the present invention, the exclusion criteria include: 1) Poor endoscopic image quality, with severe blurring, artifacts, insufficient air filling, or unclear lesion boundaries; 2) The patient has previously undergone esophageal surgery; 3) The patient has previously received neoadjuvant radiotherapy, chemotherapy, or a combination of radiotherapy and chemotherapy; 4) The patient has other malignant tumors; 5) Missing data on pathological infiltration depth; 6) The patient has multiple esophageal lesions; 7) The patient has synchronous distant metastases.
[0028] With the approval of the Ethics Committee of Zhongshan Hospital Affiliated to Fudan University, one optional example of this invention is a retrospective multicenter study design that includes 890 patients with early esophageal cancer who underwent endoscopic submucosal dissection (ESD). A total of 9,580 routine preoperative endoscopic images were collected and analyzed. The endoscopic images include white light imaging (WLI), narrow band imaging (NBI), blue light imaging (BLI), and iodine-stained images.
[0029] Step 2, Image Preprocessing and Data Labeling: The acquired endoscopic images were preprocessed, and based on the postoperative pathological results, the infiltration depth corresponding to each lesion was used as a supervision label to establish a standard dataset.
[0030] In a preferred embodiment of the present invention, the preprocessing of endoscopic images includes image screening, size standardization, and format unification.
[0031] In another preferred embodiment of the present invention, the infiltration depth corresponding to each lesion can be divided into the following four categories: lamina propria (LP); muscularis mucosae (MM); superficial submucosa (SM1); and deep submucosa (SM2).
[0032] In another preferred embodiment of the invention, the established standard dataset is systematically divided into three independent queues based on patient origin and purpose. To avoid data leakage, the queues are divided on a patient-by-patient basis, meaning all images of the same patient belong to only one queue.
[0033] In one embodiment of the present invention, an optional example is that the three queues include: Training cohort: from the main center, a total of 634 patients and 6833 endoscopic images; Internal validation cohort: from the main center, a total of 159 patients and 1867 endoscopic images; External validation cohort: 97 patients and 880 endoscopic images from an independent external center, Zhongshan Hospital Affiliated to Fudan University (Xiamen Hospital).
[0034] All images of the same patient do not appear simultaneously in the training queue, internal validation queue, and external validation queue.
[0035] Step 3: Construct a feature extraction network based on MAE pre-training: A Vision Transformer (ViT) is used as the basic backbone network, and a Masked Autoencoder (MAE) pre-training strategy is introduced. The input image is segmented and embedded into multiple fixed-size image patches. These image patches are then input into the encoder for representation learning, thereby obtaining high-dimensional feature representations suitable for esophageal endoscopy images. MAE pre-training enhances the model's ability to express lesion texture, blood vessel morphology, and local structural features, improving the robustness and generalization ability of feature extraction.
[0036] Step 4: Construct an infiltration depth classification model incorporating clustering loss: After completing self-supervised pre-training using a standard masked autoencoder framework, we transferred the pre-trained visual Transformer encoder to a downstream four-class classification task for esophageal cancer invasion depth. Based on this, a classification head was added after the pre-trained encoder, and the entire network was fine-tuned end-to-end using labeled endoscopic images to obtain the EID-Net model. Considering the subtle visual differences between the four invasion depth categories, relying solely on traditional cross-entropy loss for optimization, while improving the class discrimination ability of the output layer, still offers limited direct constraints on the internal structure of the feature space. Therefore, samples from different categories may still exhibit local overlap in the embedding space, affecting the model's generalization ability to unseen samples, a problem particularly pronounced in fine-grained medical image classification tasks.
[0037] To further enhance the discriminative power of feature representations, supervised deep clustering (SDC) loss is introduced during the supervised fine-tuning stage to impose additional constraints on the feature distribution. The core idea is that in an ideal feature space, samples of the same class should cluster as much as possible, while samples of different classes should remain sufficiently separated. For esophageal cancer invasion depth classification, different depth classes often exhibit only subtle but crucial differences in endoscopic images; therefore, classification loss alone may not be sufficient to learn a well-structured feature distribution. By explicitly introducing class prototypes or cluster centers into the embedding space, SDC can guide the encoder to learn more compact and discriminative feature representations, thereby simultaneously enhancing intra-class consistency and inter-class separability.
[0038] set up Indicates the first The feature vectors extracted from each sample by a pre-trained visual Transformer encoder. This indicates its true category label. In this embodiment of the invention, the total number of categories is [number]. For each category Define its class center in the feature space as The supervised deep clustering loss is defined as:
[0039] in, This represents the number of samples in a mini-batch. This indicates the transpose operation. The temperature coefficient is used to adjust the smoothness of the similarity distribution. This loss is essentially a discriminative constraint based on class prototypes: it encourages sample features to have higher similarity to their true class centers while suppressing their similarity to other class centers. As training progresses, the feature representation gradually exhibits a clearer class structure, i.e., samples of the same class are more compactly distributed, and the boundaries between dissimilar samples are more pronounced.
[0040] Through the above constraints, the model is guided to learn a feature space that satisfies the following structural characteristics: samples of the same class are clustered as much as possible in the embedding space, and samples of different classes are separated as much as possible, thereby enhancing intra-class compactness and inter-class separability, improving the model's ability to generalize to unseen samples, and especially helping to identify deep categories with subtle visual differences but important clinical significance.
[0041] In a preferred embodiment of the present invention, the overall loss function of the model is:
[0042] in, Cross-entropy classification loss, To monitor the loss of deep clustering, This is the balance coefficient.
[0043] Step 5: Model Training and Optimization The annotated and preprocessed endoscopic images are then input into the EID-Net model for training. The training process includes: First, perform MAE pre-training on the visual Transformer backbone network; Further downstream supervised fine-tuning is carried out under the supervision of pathological labels, and the cross-entropy classification loss and supervised deep clustering loss are jointly optimized.
[0044] The EID-Net model is trained using the SGD optimizer, combined with a warm-up learning rate strategy and a cosine annealing learning rate scheduling strategy to improve the stability and convergence efficiency of the training process. The model selection is based on the validation set accuracy, retaining the model parameters with the best performance on the validation set as the final model.
[0045] In one embodiment of the present invention, an optional example is that, after training, a large number of early esophageal cancer patients are retrospectively included as a model development cohort and an external validation cohort to finally obtain a classification prediction model with high accuracy.
[0046] Step 6: Classification of Infiltration Depth and Clinical Decision Support Preoperative routine endoscopic images of the patient are input into the trained EID-Net model. The EID-Net model preprocesses the input images, extracts features, and classifies the depth of invasion, outputting a prediction result indicating whether the lesion belongs to LP, MM, SM1, or SM2. Based on the prediction results, clinicians use this information to help determine whether the lesion meets the indications for radical resection of ESD and optimize the treatment plan accordingly. When the prediction result is LP, MM or SM1, it suggests that the lesion is more likely to meet the indications for radical resection of ESD; When the predicted result is SM2, it suggests that the lesion is more likely not suitable for simple ESD treatment and further treatment options such as surgery or radiotherapy and chemotherapy should be considered.
[0047] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this invention can be combined or combined in various ways, even if such combinations or combinations are not explicitly described in this disclosure. In particular, the features described in the various embodiments and / or claims of this invention can be combined or combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations and / or combinations fall within the scope of this disclosure.
[0048] The embodiments of the present invention have been described above. However, these embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. The scope of the present invention is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of the present invention, and all such substitutions and modifications should fall within the scope of this disclosure.
Claims
1. A method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning, characterized in that, Includes the following steps: Step 1: Screen patients with early-stage esophageal cancer and collect their clinical data, preoperative routine endoscopic images, and postoperative pathological data as sample data. Step 2: Preprocess the collected sample data and, based on the postoperative pathological results, use the infiltration depth corresponding to each lesion matched by each sample data as a supervision label to establish a standard dataset, in which there are K different types of infiltration depth, K≥2; Step 3: Construct the EID-Net model, which includes a visual Transformer encoder and a classification head is connected after the visual Transformer encoder; Step 4: Train the EID-Net model using a standard dataset, including the following steps: The visual Transformer encoder is pre-trained using a standard masked autoencoder framework; Using labeled endoscopic images, the entire EID-Net model network is fine-tuned end-to-end under the supervision of pathological labels, and the cross-entropy classification loss and supervised deep clustering loss are jointly optimized. Step 5: Input the preoperative routine endoscopic images of the patient to be tested into the trained EID-Net model. The EID-Net model performs preprocessing, feature extraction and invasion depth classification on the input images, and outputs the prediction result of the lesion belonging to one of K different types of invasion depth.
2. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 1, the inclusion criteria for patients with a history of early-stage esophageal cancer include: The patient has a complete postoperative pathology report, which includes at least the depth of invasion, size of the lesion, location of the lesion, lymphovascular invasion, and condition of the surgical margin; The patient has high-quality preoperative endoscopic images, which include at least one or more of white light imaging, narrow-band imaging, iodine staining images and blue light imaging. At the same time, the lesion morphology is clear, the microvascular structure is clearly displayed, and there is no obvious motion blur or artifacts in the endoscopic images. The patient has complete clinical data.
3. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 1, the exclusion criteria for patients with a history of early-stage esophageal cancer include: The endoscopic images are of poor quality, with severe blurring, artifacts, insufficient air filling, or unclear lesion boundaries; The patient has a history of esophageal surgery; The patient has previously received neoadjuvant radiotherapy, chemotherapy, or a combination of radiotherapy and chemotherapy; The patient also had other malignant tumors; Data on the depth of pathological infiltration is missing; The patient has multiple esophageal lesions; The patient has synchronous distant metastases.
4. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 2, there are four different types of infiltration depths, including: lamina propria, muscularis mucosae, superficial submucosa, and deep submucosa.
5. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 4, characterized in that, In step 5, when the prediction result output by the EID-Net model is the lamina propria, muscularis mucosae, or superficial submucosa, it suggests that the lesion is more likely to meet the indications for radical resection by endoscopic submucosal dissection; when the prediction result output by the EID-Net model is the deep submucosa, it suggests that the lesion is more likely not suitable for simple endoscopic submucosal dissection and further surgical or radiotherapy / chemotherapy treatment options need to be considered.
6. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 2, the standard dataset is systematically divided into three independent queues according to the patient's source and purpose: a training queue, an internal validation queue, and an external validation queue. The queues are divided on a patient-by-patient basis, meaning that all images of the same patient belong to only one queue and do not appear in the training queue, internal validation queue, and external validation queue simultaneously.
7. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 6, characterized in that, In step 4, after training is completed, a large number of early esophageal cancer patients are retrospectively included as a model development cohort and an external validation cohort to finally obtain a classification prediction model with high accuracy.
8. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 4, when performing end-to-end supervised fine-tuning of the entire EID-Net model network, the overall loss function... for: in, Cross-entropy classification loss, To monitor the loss of deep clustering, This is the balance coefficient.
9. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 8, characterized in that, The supervised deep clustering loss Defined as: in, This indicates the number of samples in the mini-batch. Indicates the first The feature vectors extracted from each sample by a pre-trained visual Transformer encoder. Indicates category At the category center in the feature space, This indicates its true category label. Indicates category At the category center in the feature space, This indicates the transpose operation. This is the temperature coefficient.
10. The method for classifying the invasion depth of early esophageal cancer based on conventional endoscopic images and deep learning as described in claim 1, characterized in that, In step 4, the EID-Net model is trained using the SGD optimizer, combined with a warm-up learning rate strategy and a cosine annealing learning rate scheduling strategy.