A method for caries fine-grained classification fusing attention mechanism and key features
By integrating attention mechanisms and key features into a fine-grained caries classification method, the challenges of timely diagnosis and severity assessment in traditional caries examination methods have been solved, enabling automated and rapid diagnosis of caries severity.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DALIAN UNIV
- Filing Date
- 2023-01-13
- Publication Date
- 2026-06-19
Smart Images

Figure CN116012343B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical artificial intelligence technology, specifically to a fine-grained classification method for dental caries that integrates attention mechanisms and key features. Background Technology
[0002] Dental caries is the most common and prevalent oral disease in humans, significantly impacting their quality of life. Data from the 2011-2012 U.S. Health and Nutrition Examination Survey shows that 37% of children aged 2-8 have cavities in their primary teeth; the probability is 58% among adolescents aged 12-19; and approximately 90% of adults over 20 years of age have caries to varying degrees.
[0003] Traditional methods for examining and diagnosing dental caries include visual inspection, probing, temperature checks, and X-ray examination, all relying on professional dentists. These methods significantly reduce the timeliness of caries examination and diagnosis, while increasing the workload of dentists and the financial burden on patients. The rapid development of smart devices has made acquiring color digital images much easier, greatly reducing the difficulty of image acquisition. Therefore, caries diagnosis methods based on color digital images can provide dentists with reliable clinical references, effectively preventing the onset and further development of caries, and are of great significance for improving the quality of life.
[0004] By acquiring digital images of the oral cavity using consumer-grade cameras and mobile phones, dentists can visually observe and identify and diagnose dental caries based on the Integral Caries Detection and Assessment System (ICDAS). In the field of image classification, deep learning technology is used to train and model a large number of images acquired and diagnosed, enabling computers to automatically diagnose caries types and achieve the goals of caries prevention and screening.
[0005] From a patient's treatment perspective, current caries diagnosis methods based on color digital images only determine whether a tooth is decayed, without providing a more detailed assessment of the extent of the decay. This fails to offer the most direct support for the dentist's decision-making. From a caries diagnosis perspective, different types of caries exhibit subtle differences in color, location, size, and shape. Even for experienced dentists, diagnosing the extent of caries is time-consuming and difficult. Summary of the Invention
[0006] The purpose of this invention is to provide a fine-grained classification method for dental caries that integrates attention mechanisms and key features, which improves the speed and convenience of dental caries detection and provides technical support for the development and application of smart devices.
[0007] To achieve the above objectives, the technical solution of this application is: a fine-grained classification method for dental caries that integrates attention mechanisms and key features, comprising:
[0008] Step 1, Data Sample Collection: Professional dentists collect oral image data of patients visiting the dental clinic using mobile phones or consumer-grade cameras;
[0009] Step 2, Data Annotation: Professional dentists annotate the collected oral image data, using labelme software to select and annotate the areas of caries lesions, with each annotation box containing only one carious tooth; at the same time, the severity of caries lesions is classified according to the Merged Codes standard in ICDAS, that is, divided into three categories according to the severity: mild caries, moderate caries, and severe caries.
[0010] Step 3: Data preprocessing: Check the labeling of the data samples, delete images with incorrect labeling and low quality, and create a dental caries classification dataset.
[0011] Furthermore, step 3 specifically includes:
[0012] Step 3.1: Manually clean the oral images by comparing them with the labeled examples and removing unreasonable labels or low-quality images, keeping only the oral images with clear and correct labels and high image quality.
[0013] Step 3.2: Flip the retained oral cavity images with a 50% probability. During the flipping process, flip the images horizontally and vertically with a 50% probability each. For each oral cavity image, adjust the brightness, contrast, and saturation with a 100% probability. During the adjustment process, randomly change the brightness, contrast, and saturation with a 33.3% probability each.
[0014] Step 3.3: Add noise with a probability of 30%. During the addition process, Gaussian noise, pepper noise, and salt noise are added with a probability of 33.3% respectively. After all operations are completed, both the processed oral cavity image and the original oral cavity image are retained. The caries image is then cut out using PIL and saved as a caries classification dataset.
[0015] Furthermore, this application also includes step 4, model training: feeding the caries images and labeled data into a fine-grained color digital image classification model for caries that integrates attention mechanisms and key features for training.
[0016] Furthermore, step 4 specifically includes:
[0017] Step 4.1: Use a multi-spectral attention mechanism to overcome the shortcomings of insufficient feature information in existing channel attention methods by introducing more frequency components to fully utilize feature information; given a local feature of a caries image after continuous convolution operations. (C represents the number of channels in the cavity image, H represents the height of the cavity image, and W represents the width of the cavity image), and divide it into n equal parts to obtain [X 0 ,X 1 ,...,X n-1 ],in
[0018] First, obtain the channel attention value for each equally divided feature map:
[0019]
[0020]
[0021] Where 2DDCT represents the 2D Discrete Cosine Transform. [u] represents the channel attention value of the i-th equally divided feature map. i ,v i ] represents each part X i The corresponding two-dimensional component index, Freq represents the obtained multi-spectral attention vector; attention coefficient values Represented as:
[0022]
[0023] Where fc represents a fully connected function;
[0024] Finally, after obtaining the attention vectors for all channels, each channel of the local feature X of the caries image is multiplied by the attention coefficient for scaling:
[0025]
[0026] Step 4.2: Use a positional attention mechanism to establish associations between key features to explore global contextual information and capture feature dependencies in the spatial dimension, learning the spatial relevance of features. Specifically, for features at a certain location in a caries image, update the feature by weighted summation of features from all locations, where the weights are determined by the feature similarity between two corresponding locations;
[0027] Obtain the caries features from the final output of the backbone network. First, F is fed into two convolutional layers with 1×1 kernels to reshape it, yielding matrices Q and K, respectively. N = H × W represents the number of pixels;
[0028] Generate a spatial attention matrix, and model the spatial relationship between any two pixels in the spatial attention matrix:
[0029]
[0030] Then, F is fed into another convolutional layer with a 1×1 kernel to generate a feature map. and change its shape to Simultaneously, matrix multiplication is performed between the attention matrix and the original features, and the resulting weighted features are added to the original features:
[0031]
[0032] E has a global context view and selectively aggregates context based on the spatial attention map, making key fine-grained features more compact and clustered; where α is a pre-set and learnable parameter;
[0033] Step 4.3: Select key feature points using a weakly supervised approach. Each feature point passes through a fully connected layer to obtain a classification probability. When the probability is higher than a certain value, these feature points are considered helpful features, retained, and applied to the later feature fusion. Feature points that are not selected are considered to have no contribution to the fine-grained classification task and are selectively filtered out.
[0034] First, define Z. i ∈R C×H×W Z i This represents the cavity feature map represented by the i-th block of the CNN backbone network. First, the cavity features are mapped to the global space through a 1*1 convolution, and then each feature point is classified.
[0035]
[0036] Where Conv represents a 1x1 convolution, and MLP represents a multilayer perceptron; after transformation, L(Z) i )∈R t×H×W t represents the number of categories in fine-grained classification;
[0037] After obtaining the classification result for each feature point, it is also necessary to obtain the probability distribution of each feature point on the fine-grained category classification target; the feature vector is first flattened according to the following formula:
[0038]
[0039]
[0040] Then, the softmax function is applied to obtain the classification probability value:
[0041]
[0042] S(Z i This represents the probability distribution of each feature point in the i-th block on the fine-grained category classification target. After obtaining the probability distribution, it is necessary to select the category with the highest classification probability for each feature point and sort all feature points according to this probability. Since features at different levels contribute differently to the final classification, it is also necessary to select different numbers of feature points from features at different levels.
[0043] Furthermore, step 4 also includes:
[0044] Step 4.4: Use a graph convolutional neural network to aggregate key feature points, obtain feature points with global discriminative power, and apply them for classification. All selected feature points are considered as a graph structure, with each selected feature point represented as a node in the graph. These nodes represent information about the feature points at different spatial locations and scales.
[0045] The goal of GCN is to learn the function h(·,·) on graph G; each layer of GCN is represented as:
[0046]
[0047] in, It is the representation of the current node, E l+1 ∈R n×d 'Represents the updated node representation, It is the adjacency matrix input to GCN;
[0048] Representing all feature points as a graph G, after applying GCN to graph G, the function h(·,·) is expressed as:
[0049]
[0050] E l E is the node representation of the l-th graph convolutional layer. l+1 Let A represent the node representation updated at layer (l+1), and let A represent the adjacency matrix of the GCN input; where E∈R N×M×t N represents the total number of key feature points selected from feature maps at different levels, M represents the total number of global graph convolutional feature points used for fusion, and t represents the total number of categories for fine-grained classification; where δ(·) represents the non-linear activation function, and W l It is a learnable transformation matrix. It is obtained by batch normalization of the adjacency matrix A.
[0051] Furthermore, this application also includes:
[0052] Step 5, Model Iteration and Optimization: By analyzing the training loss function curve and accuracy curve, the network structure is adjusted and the training parameters are modified. The final training strategy is: batch size of 8, learning rate of 0.001, learning rate decay of 0.005, and number of iterations of 200.
[0053] Step 6: Model prediction: Select the best performing model and load it. Input the image of the caries to be diagnosed into the model to diagnose the degree of caries lesion.
[0054] Compared with the prior art, the above technical solution adopted in this invention has the following advantages: This invention first uses a mobile portable device to capture oral cavity images, then marks the location of the decayed teeth, and uses the decayed tooth images as input to perform caries grading diagnosis using a saved model, achieving good recognition accuracy, improving the speed and convenience of caries detection, and providing technical support for the development and application of intelligent devices. Attached Figure Description
[0055] Figure 1 Framework diagram for a fine-grained caries classification method that integrates attention mechanisms and key features;
[0056] Figure 2 A schematic diagram of the feature point selection process for weak supervision;
[0057] Figure 3 A schematic diagram illustrating the process of aggregating feature points at different levels. Detailed Implementation
[0058] The embodiments of the present invention are implemented under the premise of the technical solution of the present invention, and detailed implementation methods and specific operation processes are given. However, the protection scope of the present invention is not limited to the following embodiments.
[0059] The present invention will be described in detail below with reference to the embodiments and accompanying drawings, so that those skilled in the art can implement it after referring to this specification.
[0060] Example 1
[0061] This embodiment uses PyCharm as the development platform and Python as the development language. It employs the fine-grained caries classification method of this invention, which integrates attention mechanisms and key features, to detect and identify caries. The specific process is as follows:
[0062] Oral images are obtained by capturing images of teeth using mobile devices such as smartphones and cameras. Caries images are then labeled from these images and used as input to load the model in this method, yielding the caries grading diagnosis results from the images. The model is evaluated overall using accuracy (ACC), precision (PR), recall (RE), and F1-score (F1), defined as follows:
[0063]
[0064]
[0065]
[0066]
[0067] Where TP represents the number of samples correctly predicted as positive, FP represents the number of samples incorrectly predicted as positive, TN represents the number of samples correctly predicted as negative, and FN represents the number of samples incorrectly predicted as negative.
[0068] The foregoing description of specific exemplary embodiments of the invention is for illustrative and explanatory purposes. These descriptions are not intended to limit the invention to the precise forms disclosed, and it will be apparent that many changes and variations can be made in accordance with the foregoing teachings. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application, thereby enabling those skilled in the art to implement and utilize various different exemplary embodiments of the invention, as well as various different choices and variations. The scope of the invention is intended to be defined by the claims and their equivalents.
Claims
1. A method for caries fine-grained classification by fusing attention mechanism and key features, characterized in that, include: Step 1, Data Sample Collection: Collect oral image data of patients visiting the dental department; Step 2, Data Annotation: Professional dentists annotate the collected oral image data, using labelme software to select and annotate the areas of caries lesions, with each annotation box containing only one carious tooth; at the same time, the severity of caries lesions is classified according to the Merged Codes standard in ICDAS, that is, divided into three categories according to the severity: mild caries, moderate caries, and severe caries. Step 3, Data Preprocessing: Check the labeling of the data samples, delete images with incorrect labeling and low quality, and create a dental caries classification dataset; Step 4, Model Training: Feed the caries images and labeled data into a fine-grained color digital image classification model for caries that integrates attention mechanisms and key features for training; Step 4 specifically includes: Step 4.1: Given local features of a caries image after successive convolution operations. , The number of channels in a caries image. Indicates the height of the image of a tooth decayed tooth. The width of the image representing the dental caries is divided into equal parts. Each part, obtained ,in , ; First, obtain the channel attention value for each equally divided feature map: (1) (2) in express Discrete cosine transform, Indicates the first Channel attention values for each equally divided feature map. Indicate each part The corresponding two-dimensional component index, This represents the obtained multi-spectral attention vector; attention coefficient values. Represented as: (3) in Indicates a fully connected function; Finally, after obtaining the attention vectors for all channels, the local features of the caries image are... Each channel is multiplied by an attention coefficient for scaling: (4) Step 4.2: Obtain the caries features from the final output of the backbone network. First of all Send in two with The convolutional layer of the convolutional kernel changes its shape, resulting in matrices respectively. and ,in , This represents the number of pixels; Generate a spatial attention matrix, and model the spatial relationship between any two pixels in the spatial attention matrix: (5) Then, Feed into another convolution kernel Convolutional layers to generate feature maps and change its shape to Simultaneously, matrix multiplication is performed between the attention matrix and the original features, and the resulting weighted features are added to the original features: (6) It possesses a global contextual view and selectively aggregates context based on spatial attention maps, making key fine-grained features more compact and clustered; among which These are parameters that are pre-set to 0 and can be learned; Step 4.3: First define , This indicates the CNN backbone network's... The feature map of dental caries is represented by blocks; first, the dental caries features are mapped to the global space through a 1*1 convolution, and then each feature point is classified: (7) in This indicates that after a 1*1 convolution, This represents a multilayer perceptron; after transformation, , Indicates the number of categories in fine-grained classification; After obtaining the classification result for each feature point, it is also necessary to obtain the probability distribution of each feature point on the fine-grained category classification target; according to the following formula (8), the feature vector is first flattened: (8) wherein , ; Then, the softmax function is applied to obtain the classification probability value: (9) That is to say, the first The probability distribution of each feature point within a block on the fine-grained category classification target; after obtaining the probability distribution, select the category with the highest classification probability for each feature point, and sort all feature points according to that probability; select different numbers of feature points from features at different levels.
2. The method according to claim 1, wherein, Step 3 specifically includes: Step 3.1: Manually clean the oral images by comparing them with the labeled examples and removing unreasonable labels or low-quality images, keeping only the oral images with clear and correct labels and high image quality. Step 3.2: Flip the retained oral cavity images with a 50% probability. During the flipping process, flip the images horizontally and vertically with a 50% probability each. For each oral cavity image, adjust the brightness, contrast, and saturation with a 100% probability. During the adjustment process, randomly change the brightness, contrast, and saturation with a 33.3% probability each. Step 3.3: Add noise with a probability of 30%. During the addition process, Gaussian noise, pepper noise, and salt noise are added with a probability of 33.3% respectively. Then, retain the processed oral cavity image and the original oral cavity image, and use PIL to cut out the caries image and save it as a caries classification dataset.
3. The method according to claim 1, wherein, Step 4 also includes: Step 4.4: Treat all selected feature points as a graph structure. Each selected feature point is represented as a node on the graph. These nodes represent information about the feature points at different spatial locations and scales. The GCN aims to learn a function f on a graph G ; each layer of the GCN is represented as: (10) in, It represents the current node. This represents the updated node representation. It is the adjacency matrix input to GCN; Representing all feature points as a graph , for the diagram After applying GCN, the function It is represented as: (11) It is the first Node representation of a layered graph convolutional layer Indicates the first The node representation updated at each layer; where , This represents the total number of key feature points selected in feature maps at different levels. This represents the total number of global graph convolutional feature points used for fusion. This represents the total number of categories in fine-grained classification; where Represents a non-linear activation function. It is a learnable transformation matrix. It is an adjacency matrix The result is obtained after batch normalization.
4. The fine-grained caries classification method integrating attention mechanisms and key features according to claim 1, characterized in that, Also includes: Step 5, Model Iteration and Optimization: By analyzing the training loss function curve and accuracy curve, the network structure is adjusted and the training parameters are modified; Step 6: Model prediction: Select the best performing model and load it. Input the image of the caries to be diagnosed into the model to diagnose the degree of caries lesion.