A method for detecting cholangiocarcinoma

A bile duct cancer detection method optimized through machine learning and artificial fish swarm algorithms, utilizing Raman spectroscopy data and a classification model, solves the problems of low accuracy and high cost in bile duct cancer detection, achieving efficient, rapid, and economical detection.

CN120275364BActive Publication Date: 2026-06-12PEKING UNIVERSITY THIRD HOSPITAL (THE THIRD CLINICAL MEDICAL SCHOOL OF PEKING UNIVERSITY)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PEKING UNIVERSITY THIRD HOSPITAL (THE THIRD CLINICAL MEDICAL SCHOOL OF PEKING UNIVERSITY)
Filing Date
2025-04-07
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing methods for detecting bile duct cancer have low accuracy, high cost, and low efficiency, making it difficult to achieve efficient, rapid, and economical detection.

Method used

Using machine learning methods, based on Raman spectroscopy data and a pre-trained cholangiocarcinoma classification model, combined with the artificial fish swarm algorithm to optimize the convolutional neural network and random forest model, bile samples are classified. Through feature extraction, fusion, and classification, the cholangiocarcinoma detection results are output.

Benefits of technology

It improves the efficiency and accuracy of cholangiocarcinoma detection, reduces detection costs, and optimizes model training speed and accuracy through the artificial fish swarm algorithm, avoiding local optima.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120275364B_ABST
    Figure CN120275364B_ABST
Patent Text Reader

Abstract

The present application relates to a kind of biliary duct cancer detection method, belong to cancer auxiliary detection technical field, solve the low accuracy, high cost, low efficiency problem of biliary duct cancer detection method in prior art.The multiple Raman spectra of the bile to be classified are obtained, and the data preprocessing is carried out on each Raman spectrum;Based on the multiple Raman spectra after preprocessing and the first biliary duct cancer classification model pre-trained, the first classification result of the bile to be classified is obtained;Based on the multiple Raman spectra after preprocessing and the second biliary duct cancer classification model pre-trained, the second classification result of the bile to be classified is obtained, and the second biliary duct cancer classification model includes feature extraction module, feature fusion module, classification module;If the first classification result and the second classification result are consistent, then the classification result of the bile to be classified is output, and the classification result includes: benign biliary duct disease or biliary duct cancer.A kind of efficient, fast, economical biliary duct cancer detection method is realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cancer auxiliary detection technology, and in particular to a method for detecting cholangiocarcinoma. Background Technology

[0002] Cholangiocarcinoma (CCA) is a malignant tumor originating from the epithelial cells of the bile ducts. It exhibits aggressive biological behavior and a relatively high cancer-related mortality rate. Surgical resection with therapeutic intent remains the best option for achieving long-term survival because surgery typically involves the removal of peripheral organs and requires complex digestive tract reconstruction. Misdiagnosis can lead to a high incidence of perioperative complications. Therefore, accurate detection of CCA is crucial for timely and appropriate subsequent treatment.

[0003] In the existing technology, the detection methods for cholangiocarcinoma include: (1) Endoscopic retrograde cholangiopancreatography (ERCP) tissue / cytological biopsy, but its sensitivity is relatively low (often below 50%), leading to delayed diagnosis. (2) Bile comes into direct contact with bile duct lesions and is rich in metabolites secreted by the bile duct system. Therefore, bile is a potential resource for the diagnosis and detection of CCA, and patients undergoing ERCP can easily obtain bile samples. However, some scholars have explored tumor markers in bile, but no highly sensitive and specific tumor markers have yet entered into clinical practice. (3) "Liquid biopsy" that involves gene sequencing of bile, but this method is expensive and the sequencing analysis is time-consuming, and its diagnostic detection efficiency has not yet been recognized.

[0004] Therefore, there is a need to provide an efficient, rapid, and economical method for detecting cholangiocarcinoma. Summary of the Invention

[0005] Based on the above analysis, the present invention aims to provide a method for detecting cholangiocarcinoma, in order to solve the problems of low accuracy, high cost and low efficiency of existing cholangiocarcinoma detection methods.

[0006] This invention provides a method for detecting cholangiocarcinoma, comprising:

[0007] Multiple Raman spectra of the bile to be classified were obtained, and data preprocessing was performed on each Raman spectrum;

[0008] The first classification result of the bile to be classified was obtained based on the preprocessed Raman spectra and the pre-trained first bile duct cancer classification model.

[0009] Based on the preprocessed multiple Raman spectra and the pre-trained second bile duct carcinoma classification model, the second classification result of the bile to be classified is obtained. The second bile duct carcinoma classification model includes a feature extraction module, a feature fusion module, and a classification module.

[0010] If the first classification result and the second classification result are consistent, the classification result of the bile to be classified is output, and the classification result includes: benign bile duct disease or bile duct cancer.

[0011] Based on a further improvement to the above method, obtaining multiple Raman spectra of the bile to be classified includes:

[0012] With the equipment parameters of the stimulated Raman scattering (T-SRS) device fixed, multiple regions of the bile to be classified were sampled to obtain multiple Raman spectra.

[0013] Based on further improvements to the above method, the first cholangiocarcinoma classification model is a KPCA-LDA-SVM model; the KPCA-LDA-SVM model includes a kernel principal component analysis (KPCA) module, an LDA classification model, and a support vector machine (SVM) model.

[0014] The first classification result of the bile to be classified, obtained based on multiple pre-processed Raman spectra and a pre-trained first cholangiocarcinoma classification model, includes:

[0015] For each Raman spectrum, the spectral data between 2700 cm⁻¹ and 3100 cm⁻¹ are selected as the target Raman spectrum;

[0016] For each target Raman spectrum, the Kernel Principal Component Analysis (KPCA) module is used to reduce the dimensionality of the data. The dimensionality-reduced data is then input into the LDA classification model to extract discriminant features. All the obtained discriminant features are then input into the Support Vector Machine (SVM) model to obtain the first classification result.

[0017] Based on a further improvement of the above method, the data preprocessing includes:

[0018] Noise filtering was performed using a Savitzky-Golay filter, fluorescence background was eliminated using a polynomial fitting method, and the spectral shift of the samples was aligned using dynamic time warping (DTW).

[0019] Based on further improvements to the above method, the second classification result of the bile to be classified, obtained based on multiple preprocessed Raman spectra and a pre-trained second cholangiocarcinoma classification model, includes:

[0020] Each preprocessed spectral data is input into the feature extraction module to obtain the feature vector of the Raman spectrum. Multiple feature vectors are input into the feature fusion module to obtain the feature fusion vector. The feature fusion module adopts feature fusion based on the self-attention mechanism.

[0021] The feature fusion vector is input into the classification module to obtain the second classification result of the bile to be classified.

[0022] Based on further improvements to the above method, the feature extraction module adopts a CNN model, including: an input layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, and a pooling layer. The first convolutional layer is a 1D convolution with a kernel width of 3, the second convolutional layer is a 1D convolution with a kernel width of 5, and the third convolutional layer is a 1D convolution with a kernel width of 7. The activation function is LeakyReLU. The classification module adopts a random forest model.

[0023] Based on further improvements to the above method, the number of neurons in the first, second, and third convolutional layers of the second cholangiocarcinoma classification model, as well as the number of trees and maximum depth in the random forest model, are obtained based on the artificial fish swarm algorithm.

[0024] Based on further improvements to the above method, the artificial fish swarm algorithm is executed in the following manner:

[0025] S1: Initialize the artificial fish swarm algorithm parameters, including: initial fish swarm size, maximum number of iterations, field of view, crowding factor, movement step size, maximum number of trials, each fish represents the number of neurons in the first, second, and third convolutional layers, and the number and maximum depth of trees in the random forest model;

[0026] S2: Simulate artificial fish school behavior, including: in foraging behavior, using adaptive step size exploration, randomly generating new positions within the field of vision, and if the fitness value at the new position is greater than the fitness value at the current position, then moving to the new position by one of the moving steps;

[0027] The adaptive step size is:

[0028]

[0029] Step new For the updated adaptive step size, Step initial The initial step size is ε, the decay rate is t, the current iteration number is T, and the maximum iteration number is T.

[0030] In the swarming behavior, for each fish, the lowest fitness value among all neighboring fish within the current i-th fish's field of vision is calculated. Based on the current i-th fish's fitness value and the lowest fitness value, the normalized weight of the current i-th fish is calculated.

[0031]

[0032] The weighted center is calculated based on the normalized weight of each fish and its corresponding position, i.e.:

[0033]

[0034] Among them, w i Position is the normalized weight of the current i-th fish. i Let F be the position vector of the current i-th fish. i Let F be the fitness value of the current i-th fish. min The minimum fitness value is n, where n is the number of all neighboring fish within the field of view, j = 1, 2, 3, ..., n, N is the initial fish swarm size, i = 1, 2, 3, ..., N; when j < η(t) * N, the position is moved according to the weighted center, where η(t) is the crowding factor after dynamic update.

[0035] In the tail-chasing behavior, for each fish, calculate the highest fitness value among all neighboring fish within the current i-th fish's field of vision. If F max If / n>η(t)*N, then the normalized fitness difference is calculated based on the fitness difference between the current i-th fish and the target fish, the tail-chasing weight is calculated based on the normalized fitness difference, and the position is updated based on the tail-chasing weight.

[0036] S3: If foraging behavior, herding behavior, or tailgating behavior is not performed, then random behavior is performed;

[0037] S4: Record the current global optimal solution and optimal fitness, where the global optimal solution is the fish at the position corresponding to the optimal fitness;

[0038] S5: Iterative optimization, repeat steps S2-S4 until the maximum number of iterations is met;

[0039] S6: Output the optimization results and use the optimization results as the number of neurons in the first, second, and third convolutional layers, as well as the number of trees and the maximum depth in the random forest model.

[0040] A further improvement to the above method, the step of moving positions based on the weighted center, includes:

[0041]

[0042] x new For the updated position, x old For the current position, η inital The crowding factor at initialization;

[0043] The method of updating the position based on the tail-chase weight includes:

[0044]

[0045] x new For the updated position, x old The current position is φ, the tail-chasing weight is x.best The location of the neighboring fish corresponding to the highest fitness value;

[0046] The training samples for the second bile duct cancer classification model consist of multiple Raman spectra of each bile sample and their corresponding labels, which are used to indicate benign or bile duct cancer.

[0047] Based on the above method, if the number of training samples is insufficient, it can be expanded in the following ways:

[0048] At least one Raman spectrum of an existing bile is randomly shifted, and the shifted Raman spectrum is input into a pre-trained generative adversarial network DCGAN to generate a new Raman spectrum. The original Raman spectrum is then replaced with the new Raman spectrum, thus forming a new set of Raman spectra corresponding to the bile.

[0049] or,

[0050] Noise is added to at least one Raman spectrum of existing bile. The noise-added Raman spectrum is then input into a pre-trained Generative Adversarial Network (DCGAN) to generate a new Raman spectrum. The original Raman spectrum is then replaced with the new Raman spectrum, thus forming a new set of Raman spectra corresponding to the bile.

[0051] or,

[0052] At least two existing Raman spectra of bile are linearly combined. The combined Raman spectra are then input into a pre-trained Generative Adversarial Network (DCGAN) to generate new Raman spectra. The new Raman spectra are then used to replace the original Raman spectra, thus forming a new set of Raman spectra corresponding to the bile.

[0053] Compared with the prior art, the present invention can achieve at least one of the following beneficial effects:

[0054] 1. This invention provides a method for detecting cholangiocarcinoma. It employs a machine learning approach, using Raman spectral data of bile to be classified as a basis, and determines the classification result based on two classification models to obtain cholangiocarcinoma detection information. Compared with existing technologies such as ERCP and gene sequencing, this method further improves the efficiency and accuracy of cholangiocarcinoma detection and reduces detection costs.

[0055] 2. This invention provides a method for detecting cholangiocarcinoma. In the training process of the cholangiocarcinoma classification model, an artificial fish swarm algorithm is used to improve the training speed of the classification model. The foraging, clustering, and tail-chasing behaviors of the artificial fish swarm algorithm are adaptively improved, so that the fish swarm algorithm can find the global optimum faster during the optimization process, while improving the accuracy of the solution. The optimized fish swarm algorithm can also better balance global search and local search, and avoid getting trapped in local optima.

[0056] In this invention, the above-described technical solutions can be combined with each other to achieve more preferred combinations. Other features and advantages of this invention will be set forth in the following description, and some advantages may become apparent from the description or be learned by practicing the invention. The objects and other advantages of this invention can be realized and obtained from what is particularly pointed out in the description and drawings. Attached Figure Description

[0057] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts.

[0058] Figure 1 This is an example diagram of a method for detecting cholangiocarcinoma according to an embodiment of the present invention. Detailed Implementation

[0059] Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.

[0060] A specific embodiment of the present invention discloses a method for detecting cholangiocarcinoma, such as... Figure 1 As shown, it includes:

[0061] S1: Obtain multiple Raman spectra of the bile to be classified, and perform data preprocessing on each Raman spectrum.

[0062] The process of obtaining multiple Raman spectra of the bile to be classified includes: fixing the equipment parameters of the stimulated Raman scattering device (T-SRS) and sampling multiple regions of the bile to be classified to obtain multiple Raman spectra.

[0063] Understandably, bile is a complex liquid composed of cholesterol, bile salts, proteins, metabolites, etc., and its composition may vary unevenly depending on the location of the tumor, the degree of inflammation, or the pathological stage. For example, tumor-related metabolites (such as abnormal lipids) may accumulate in certain areas; therefore, different sampling points may lead to local differences in the acquired Raman spectra. Furthermore, the parameter settings of the T-SRS device directly affect the spectral resolution, signal-to-noise ratio, and detection sensitivity, thus causing spectral differences. Therefore, this invention requires standardizing the device parameters of the stimulated Raman scattering (T-SRS) device when acquiring Raman spectra, including fixing the laser power, integration time, and scanning mode, to ensure that the differences in multiple spectra mainly originate from the sample itself. By analyzing the Raman spectra of multiple regions, the accuracy of the classification results can be further improved.

[0064] For example, at least four Raman spectra need to be obtained.

[0065] To further improve data quality, after acquiring the Raman spectral data, each spectrum needs to undergo data processing. Specifically, this includes:

[0066] For noise reduction, the Savitzky-Golay filter can be used to filter noise, which can smooth the spectral curve and preserve high-frequency characteristics; alternatively, wavelet transform can be used, which can decompose the signal and remove high-frequency noise components.

[0067] Baseline correction is performed using polynomial fitting or Asymmetric Least Squares (ALS) to eliminate baseline drift, thereby eliminating the fluorescence background.

[0068] Outlier handling involves using box plots or Z-score methods to identify and remove spectra with abnormal signal intensity (such as extreme values ​​caused by laser power fluctuations). Outliers can be removed if necessary.

[0069] Spectral alignment, through dynamic time warping (DTW) or characteristic peak matching, aligns the spectral shifts of different samples, eliminating the effects of device drift or sample differences.

[0070] Standardization / normalization: Perform Z-score normalization or Min-Max normalization on each Raman spectrum to eliminate dimensional differences.

[0071] S2: Based on the preprocessed Raman spectra and the pre-trained first bile duct cancer classification model, the first classification result of the bile to be classified is obtained.

[0072] The first bile duct cancer classification model is the KPCA-LDA-SVM model, which includes the KPCA module, the LDA classification model, and the SVM model.

[0073] The first classification result of the bile to be classified, obtained based on multiple pre-processed Raman spectra and a pre-trained first cholangiocarcinoma classification model, includes:

[0074] For each Raman spectrum, the spectral data between 2700 cm⁻¹ and 3100 cm⁻¹ are selected as the target Raman spectrum;

[0075] For each target Raman spectrum, the Kernel Principal Component Analysis (KPCA) module is used to reduce the dimensionality of the data. The dimensionality-reduced data is then input into the LDA classification model to extract discriminant features. All the obtained discriminant features are then input into the Support Vector Machine (SVM) model to obtain the first classification result.

[0076] Raman spectroscopy analysis of patient bile samples revealed that lipids and proteins are key metabolites in tumor diagnostic testing. Therefore, the 2700 cm⁻¹ to 3100 cm⁻¹ range in the Raman spectrum was selected as a target. 1The first classification model for cholangiocarcinoma was established based on the spectral data between them.

[0077] To train the KPCA-LDA-SVM model, this invention collected 172 Raman spectra from 43 benign disease samples and 132 Raman spectra from 33 CCA samples. The preprocessing described in step S1 was used to process each Raman spectral data, and the four Raman spectral data corresponding to each bile sample (i.e., 2700 cm⁻¹) were then processed. 1 Up to 3100cm- 1 The spectral data between the two points and the corresponding labels (i.e., benign bile duct disease / cholangiocarcinoma) are used as a training sample. The specific implementation method for training the KPCA-LDA-SVM model, such as the setting of the cumulative variance contribution rate, the setting of parameters such as the penalty coefficient and kernel function in the SVM model, can be set according to actual needs, and this invention does not limit it.

[0078] If the model prediction accuracy obtained based on the above sample data is not good, data augmentation can be performed based on the existing sample data. Augmentation methods include: randomly shifting at least one Raman spectrum of existing bile, inputting the shifted Raman spectrum into a pre-trained Generative Adversarial Network (DCGAN) to generate a new Raman spectrum, and replacing the original Raman spectrum with the new Raman spectrum to form a new set of Raman spectra corresponding to the bile; or, adding noise to at least one Raman spectrum of existing bile, inputting the noise-added Raman spectrum into a pre-trained DCGAN to generate a new Raman spectrum, and replacing the original Raman spectrum with the new Raman spectrum to form a new set of Raman spectra corresponding to the bile; or, linearly combining at least two Raman spectra of existing bile, inputting the linearly combined Raman spectrum into a pre-trained DCGAN to generate a new Raman spectrum, and replacing the original Raman spectrum with the new Raman spectrum to form a new set of Raman spectra corresponding to the bile.

[0079] S3: Based on the preprocessed multiple Raman spectra and the pre-trained second bile duct carcinoma classification model, the second classification result of the bile to be classified is obtained. The second bile duct carcinoma classification model includes a feature extraction module, a feature fusion module, and a classification module.

[0080] The second classification result of the bile to be classified, obtained based on multiple preprocessed Raman spectra and a pre-trained second cholangiocarcinoma classification model, includes:

[0081] Each preprocessed spectral data is input into the feature extraction module to obtain the feature vector of the Raman spectrum. Multiple feature vectors are input into the feature fusion module to obtain the feature fusion vector. The feature fusion module adopts feature fusion based on the self-attention mechanism.

[0082] The feature fusion vector is input into the classification module to obtain the second classification result of the bile to be classified.

[0083] The feature extraction module uses a CNN model, including an input layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, and a pooling layer. The first convolutional layer is a 1D convolution with a kernel width of 3, the second convolutional layer is a 1D convolution with a kernel width of 5, and the third convolutional layer is a 1D convolution with a kernel width of 7. The activation function is LeakyReLU. The classification module uses a random forest model.

[0084] The number of neurons in the first, second, and third convolutional layers of the second bile duct cancer classification model, as well as the number of trees and maximum depth in the random forest model, are obtained based on the artificial fish swarm algorithm, thereby improving the training efficiency of the second bile duct cancer classification model.

[0085] The artificial fish swarm algorithm is executed in the following manner:

[0086] M1: Initialize the artificial fish swarm algorithm parameters, including: initial fish swarm size, maximum number of iterations, field of view, crowding factor, movement step size, maximum number of trials, each fish represents the number of neurons in the first, second, and third convolutional layers, and the number and maximum depth of trees in the random forest model;

[0087] M2: Simulates artificial fish school behavior, including: in foraging behavior, using adaptive step size exploration, randomly generating new positions within the field of vision, and if the fitness value at the new position is greater than the fitness value at the current position, then moving to the new position by one of the moving steps;

[0088] The adaptive step size is:

[0089]

[0090] Step new For the updated adaptive step size, Step initial The initial step size is ε, the decay rate is t, the current iteration number is T, and the maximum iteration number is T. The system adopts a larger step size in the initial stage, which enables the artificial fish to quickly cover a larger search space, thereby accelerating the global search speed. As the number of iterations increases, the step size is gradually reduced, so that the artificial fish can perform a more accurate local search when it is close to the optimal solution.

[0091] In the swarming behavior, for each fish, the lowest fitness value among all neighboring fish within the current i-th fish's field of vision is calculated. Based on the current i-th fish's fitness value and the lowest fitness value, the normalized weight of the current i-th fish is calculated.

[0092]

[0093] The weighted center is calculated based on the normalized weight of each fish and its corresponding position, i.e.:

[0094]

[0095] Among them, w i Position is the normalized weight of the current i-th fish. i Let F be the position vector of the current i-th fish. i Let F be the fitness value of the current i-th fish. min Let n be the minimum fitness value, j = 1, 2, 3, ..., n, and N be the initial fish swarm size, i = 1, 2, 3, ..., N; when j < η(t) * N, position movement is achieved based on the weighted center, where η(t) is the dynamically updated crowding factor, i.e.:

[0096]

[0097] x new For the updated position, x old For the current position, η inital The crowding factor at initialization;

[0098] In the tail-chasing behavior, for each fish, calculate the highest fitness value among all neighboring fish within the current i-th fish's field of vision. If F max If / n>η(t)*N, then the normalized fitness difference is calculated based on the fitness difference between the current i-th fish and the target fish. The tail-chasing weight is then calculated based on this normalized fitness difference, and the position is updated based on this tail-chasing weight. That is:

[0099]

[0100] x new For the updated position, x old The current position is φ, the tail-chasing weight is x. best The location of the neighboring fish corresponding to the highest fitness value;

[0101] M3: If foraging, herding, or tailgating behaviors are not performed, then random behaviors will be performed;

[0102] M4: Records the current global optimal solution and optimal fitness, where the global optimal solution is the fish at the position corresponding to the optimal fitness;

[0103] M5: Iterative optimization, repeating steps M2-M4 until the maximum number of iterations is met;

[0104] M6: Output the optimization results and use these results as the number of neurons in the first, second, and third convolutional layers, as well as the number of trees and the maximum depth in the random forest model.

[0105] In this algorithm, the fitness function value is calculated based on accuracy. Each fish's position represents a set of parameters (i.e., the number of neurons in the first, second, and third convolutional layers, the number of trees in the random forest model, and the maximum depth). When calculating the fitness value, a corresponding second cholangiocarcinoma classification model is first constructed based on these parameters. Then, it is trained using sample data. To complete the training of the second cholangiocarcinoma classification model, this invention collected 172 Raman spectra from 43 benign disease samples and 132 Raman spectra from 33 CCA samples. The preprocessing described in step S1 was used to process each Raman spectrum data, and the four Raman spectra corresponding to each bile sample, along with the corresponding label (i.e., benign bile duct disease / cholangiocarcinoma), were used as a training sample. The specific implementation method for training the second cholangiocarcinoma classification model, such as the ratio of training set, test set, and validation set, can be set according to actual needs. This invention does not limit this, as long as a second cholangiocarcinoma classification model with classification functionality can be obtained. Then, after training the model using the training set, it is tested using the test set. If the model obtained based on the above sample data has poor test accuracy, the sample expansion method described in step S2 can be used to retrain the constructed model. Finally, after obtaining the second cholangiocarcinoma classification model corresponding to this set of parameters, the spectral data corresponding to each bile sample in the validation set is used as input, and the accuracy of each sample in the validation set is statistically analyzed. If the predicted output of the sample is consistent with the label of the sample in the validation set, it is correct; otherwise, it is incorrect, thus obtaining the accuracy of the second cholangiocarcinoma classification model corresponding to this set of parameters.

[0106] After finding the optimal set of parameters using the fish swarm algorithm, the number of neurons in the first, second, and third convolutional layers of the CNN model is set according to this set of parameters, and the number of trees and maximum depth in the random forest model are set. Then, the established CNN model and random forest model are trained using the training sample set, and the second bile duct cancer classification model is obtained after the training is completed.

[0107] S4: If the first classification result and the second classification result are consistent, output the classification result of the bile to be classified, which includes: benign bile duct disease or bile duct cancer.

[0108] If the first classification result and the second classification result are inconsistent, a reminder notification can be sent to inform that the current bile to be classified cannot obtain a classification result and that bile needs to be obtained again for identification and classification.

[0109] Compared with existing technologies, this embodiment provides a method for detecting cholangiocarcinoma that employs machine learning. Based on Raman spectral data of bile to be classified, it determines the classification result using two classification models to obtain cholangiocarcinoma detection information. Compared with existing methods such as ERCP and gene sequencing, this method further improves the efficiency and accuracy of cholangiocarcinoma detection while reducing detection costs. The training process of the cholangiocarcinoma classification model utilizes an artificial fish swarm algorithm, which improves the training speed of the classification model. Adaptive improvements are made to the foraging, clustering, and tail-chasing behaviors of the artificial fish swarm algorithm, enabling it to find the global optimum faster during optimization and improving solution accuracy. The optimized fish swarm algorithm also better balances global and local searches, avoiding getting trapped in local optima.

[0110] It is worth noting that the bile duct cancer detection method provided by this invention is a process of using artificial intelligence technology to process medical information to obtain intermediate results. In the actual diagnosis process, the detection results obtained by the bile duct cancer detection method proposed by this invention are only an intermediate result. Doctors can refer to this result when making a diagnosis and make a final diagnosis based on other clinical information of the patient.

[0111] Those skilled in the art will understand that all or part of the processes of the methods described in the above embodiments can be implemented by a computer program instructing related hardware, and the program can be stored in a computer-readable storage medium. The computer-readable storage medium may be a disk, optical disk, read-only memory, or random access memory, etc.

[0112] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A computer program product for cholangiocarcinoma detection, comprising a computer program, characterized in that, When this computer program is executed by the processor, it implements the following method: Multiple Raman spectra of the bile to be classified were obtained, and data preprocessing was performed on each Raman spectrum; The first classification result of the bile to be classified was obtained based on the preprocessed Raman spectra and the pre-trained first bile duct cancer classification model. Based on the preprocessed multiple Raman spectra and the pre-trained second bile duct carcinoma classification model, the second classification result of the bile to be classified is obtained. The second bile duct carcinoma classification model includes a feature extraction module, a feature fusion module, and a classification module. If the first classification result and the second classification result are consistent, the classification result of the bile to be classified is output, and the classification result includes: benign bile duct disease or bile duct cancer; The second classification result of the bile to be classified, obtained based on multiple preprocessed Raman spectra and a pre-trained second cholangiocarcinoma classification model, includes: Each preprocessed spectral data is input into the feature extraction module to obtain the feature vector of the Raman spectrum. Multiple feature vectors are input into the feature fusion module to obtain the feature fusion vector. The feature fusion module adopts feature fusion based on the self-attention mechanism. The feature fusion vector is input into the classification module to obtain the second classification result of the bile to be classified.

2. The computer program product for cholangiocarcinoma detection according to claim 1, characterized in that, The acquisition of multiple Raman spectra of the bile to be classified includes: With the equipment parameters of the stimulated Raman scattering (T-SRS) device fixed, multiple regions of the bile to be classified were sampled to obtain multiple Raman spectra.

3. The computer program product for cholangiocarcinoma detection according to claim 1, characterized in that, The first cholangiocarcinoma classification model is a KPCA-LDA-SVM model; the KPCA-LDA-SVM model includes a kernel principal component analysis (KPCA) module, an LDA classification model, and a support vector machine (SVM) model. The first classification result of the bile to be classified, obtained based on multiple pre-processed Raman spectra and a pre-trained first cholangiocarcinoma classification model, includes: For each Raman spectrum, the cutoff point is at 2700 cm⁻¹. -1 Up to 3100 cm -1 The spectral data between them are used as the target Raman spectrum; For each target Raman spectrum, the Kernel Principal Component Analysis (KPCA) module is used to reduce the dimensionality of the data. The dimensionality-reduced data is then input into the LDA classification model to extract discriminant features. All the obtained discriminant features are then input into the Support Vector Machine (SVM) model to obtain the first classification result.

4. The computer program product for cholangiocarcinoma detection according to claim 1, characterized in that, The data preprocessing includes: Noise filtering was performed using a Savitzky-Golay filter, fluorescence background was eliminated using a polynomial fitting method, and the spectral shift of the samples was aligned using dynamic time warping (DTW).

5. A computer program product for cholangiocarcinoma detection according to claim 1, characterized in that, The feature extraction module uses a CNN model, including an input layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, and a pooling layer. The first convolutional layer is a 1D convolution with a kernel width of 3, the second convolutional layer is a 1D convolution with a kernel width of 5, and the third convolutional layer is a 1D convolution with a kernel width of 7. The activation function is LeakyReLU. The classification module uses a random forest model.

6. A computer program product for cholangiocarcinoma detection according to claim 5, characterized in that, The number of neurons in the first, second, and third convolutional layers of the second cholangiocarcinoma classification model, as well as the number of trees and maximum depth in the random forest model, were obtained based on the artificial fish swarm algorithm.

7. A computer program product for cholangiocarcinoma detection according to claim 6, characterized in that, The artificial fish swarm algorithm is executed in the following manner: S1: Initialize the parameters of the artificial fish swarm algorithm, including: initial fish swarm size, maximum number of iterations, field of view, crowding factor, step size and maximum number of trials, each fish represents the number of neurons in the first, second and third convolutional layers, and the number and maximum depth of trees in the random forest model; S2: Simulate artificial fish school behavior, including: in foraging behavior, using adaptive step size exploration, randomly generating new positions within the field of vision, and if the fitness value at the new position is greater than the fitness value at the current position, then moving to the new position by one of the moving steps; The adaptive step size is: , For the updated adaptive step size, The movement step size at initialization, The decay rate is denoted by t, the current iteration number is t, and the maximum iteration number is T. In the swarming behavior, for each fish, the lowest fitness value among all neighboring fish within the current i-th fish's field of vision is calculated. Based on the current i-th fish's fitness value and the lowest fitness value, the normalized weight of the current i-th fish is calculated. , The weighted center is calculated based on the normalized weight of each fish and its corresponding position, i.e.: , in, Let i be the normalized weight of the i-th fish. Let i be the position vector of the i-th fish. Let i be the fitness value of the i-th fish. The minimum fitness value is given, where n is the number of all neighboring fish within the field of view, j=1,2,3,…,n, and N is the initial fish swarm size, i=1,2,3,…,N; when Position movement is achieved based on the weighting center. This is the dynamically updated congestion factor; In the tail-chasing behavior, for each fish, calculate the highest fitness value among all neighboring fish within the current i-th fish's field of vision. Then, the normalized fitness difference is calculated based on the fitness difference between the current i-th fish and the target fish, the tail-chasing weight is calculated based on the normalized fitness difference, and the position is updated based on the tail-chasing weight. S3: If foraging, herding, or tailgating behaviors are not performed, then random behaviors will be performed; S4: Record the current global optimal solution and optimal fitness, where the global optimal solution is the fish at the position corresponding to the optimal fitness; S5: Iterative optimization, repeat steps S2-S4 until the maximum number of iterations is met; S6: Output the optimization results and use the optimization results as the number of neurons in the first, second, and third convolutional layers, as well as the number of trees and the maximum depth in the random forest model.

8. A computer program product for cholangiocarcinoma detection according to claim 7, characterized in that, The method of moving positions based on the weighting center includes: , For the updated position, Current position The crowding factor at initialization; The method of updating the position based on the tail-chase weight includes: , For the updated position, Current position For rear-end weighting, The location of the neighboring fish corresponding to the highest fitness value; The training samples for the second bile duct cancer classification model consist of multiple Raman spectra of each bile sample and their corresponding labels, which are used to indicate benign or bile duct cancer.

9. A computer program product for cholangiocarcinoma detection according to claim 8, characterized in that, If the number of training samples is insufficient, the following methods can be used to expand it: At least one Raman spectrum of an existing bile is randomly shifted, and the shifted Raman spectrum is input into a pre-trained generative adversarial network DCGAN to generate a new Raman spectrum. The original Raman spectrum is then replaced with the new Raman spectrum, thus forming a new set of Raman spectra corresponding to the bile. or, Noise is added to at least one Raman spectrum of existing bile. The noise-added Raman spectrum is then input into a pre-trained Generative Adversarial Network (DCGAN) to generate a new Raman spectrum. The original Raman spectrum is then replaced with the new Raman spectrum, thus forming a new set of Raman spectra corresponding to the bile. or, At least two existing Raman spectra of bile are linearly combined. The combined Raman spectra are then input into a pre-trained Generative Adversarial Network (DCGAN) to generate new Raman spectra. The new Raman spectra are then used to replace the original Raman spectra, thus forming a new set of Raman spectra corresponding to the bile.