A tongue diagnosis recognition method and system with progressive feature refinement
By employing a progressive feature refinement method, multilayer perceptrons and graph convolutional networks are used to decouple features and model continuity relationships in tongue diagnosis images. This solves the problem of insufficient fine-grained feature extraction in tongue diagnosis image recognition and improves the accuracy and stability of recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHAN DONG MSUN HEALTH TECH GRP CO LTD
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies lack effective extraction of subtle local features and suppression of background noise in tongue diagnosis image recognition, resulting in insufficient recognition accuracy and stability.
A progressive feature refinement method is adopted, which decouples and models the continuity of features in tongue diagnosis images through multilayer perceptron and graph convolutional network, and uses masking for feature refinement and fusion to suppress background noise and enhance fine-grained diagnostic information representation.
It improves the accuracy and stability of tongue diagnosis, enhances the ability to represent fine-grained information such as tongue color distribution, local texture and edge contour, and improves the recognition ability of traditional methods in complex backgrounds.
Smart Images

Figure CN122244542A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of tongue diagnosis and recognition, and more particularly to a method and system for tongue diagnosis and recognition with progressive feature refinement. Background Technology
[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.
[0003] Tongue diagnosis is the core of "inspection" in the four diagnostic methods of Traditional Chinese Medicine (TCM): observation, auscultation, inquiry, and palpation. In TCM theory, the tongue is closely connected to the five internal organs (Zang, Wei, and Fu) through meridians. The state of Qi and blood, the abundance or deficiency of body fluids, and the strength of organ function are all objectively and keenly reflected in the tongue's appearance (including the tongue body and coating). For example, tongue color (such as red or dark purple) reflects the body's internal imbalances (cold or heat, deficiency or excess) and the circulation of Qi and blood; tongue coating (such as yellow and greasy or thin and white) reveals the nature and depth of pathogenic factors; and tongue shape (such as large or swollen with teeth marks) directly indicates the state of the internal organs' essence and Qi. Tongue diagnosis provides a non-invasive, convenient, and highly intuitive physiological observation window, and is an indispensable objective basis for doctors to conduct "syndrome differentiation and treatment."
[0004] Currently, although deep learning technology has been gradually applied to the field of tongue diagnosis in Traditional Chinese Medicine (TCM) and has improved the automation level of tongue image classification to some extent, existing technologies still have significant technical shortcomings in fine-grained feature extraction. Most existing solutions focus on learning the overall semantic information of tongue images, typically using global classification networks to uniformly represent the entire image. This lack of targeted modeling of subtle local features makes it difficult to effectively identify fine-grained information with important diagnostic value, such as differences in tongue color distribution, crack texture, teeth mark edges, tongue coating thickness, and local morphological changes. Furthermore, in actual tongue image acquisition, images are often accompanied by complex background interference such as lips, teeth, oral shadows, and lighting changes. Existing methods, lacking an effective mechanism to distinguish between target and irrelevant areas, easily incorporate background noise into the feature learning process, thereby weakening the model's ability to focus on key local areas, reducing the accuracy of detail representation, and affecting the accuracy and stability of recognition. Summary of the Invention
[0005] To overcome the shortcomings of the prior art, the present invention provides a progressive feature refinement tongue diagnosis method and system, aiming to solve the technical problem of insufficient local detail information feature extraction capability of the prior art.
[0006] To achieve the above objectives, one or more embodiments of the present invention provide the following technical solutions: Firstly, a progressive feature refinement method for tongue diagnosis is disclosed, including: Obtain tongue diagnosis images and extract the first, second, and third features respectively; A first mask is constructed using the third feature, and the second feature is processed using the first mask to obtain a refined second feature; a second mask is constructed using the refined second feature, and the first feature is processed using the second mask to obtain a refined first feature; the third feature, the refined second feature, and the refined first feature are fused to obtain a comprehensive query feature; Multilayer perceptron is used to decouple the comprehensive query features, and graph convolutional network is used to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features. The corrected query features are classified using a classifier, and the tongue diagnosis image classification result is output.
[0007] Furthermore, acquiring tongue diagnosis images and extracting the first, second, and third features respectively includes: inputting the acquired tongue diagnosis images into the YOLOv12 backbone network for feature extraction to obtain the first, second, and third features, with the resolution of the first feature > the resolution of the second feature > the resolution of the third feature.
[0008] Furthermore, constructing the first mask using the third feature includes: aligning the third feature to the second feature using bilinear interpolation, then inputting it into the sub-network constructed by the convolutional layer, and then normalizing it using the Sigmoid activation function to obtain the first mask.
[0009] Furthermore, constructing the second mask using the refined second feature includes: aligning the refined second feature to the first feature using bilinear interpolation, then inputting it into a sub-network constructed by a convolutional layer, and then normalizing it using a Sigmoid activation function to obtain the second mask.
[0010] Furthermore, the second feature is processed by the first mask to obtain the refined second feature, which includes: performing convolution processing on the second feature, and multiplying the convolution-processed second feature element-wise with the first mask to obtain the refined second feature.
[0011] Furthermore, the first feature is processed by a second mask to obtain a refined first feature, which includes: performing convolution on the first feature, and then multiplying the convolutional first feature element-wise with the second mask to obtain the refined first feature.
[0012] Furthermore, a multilayer perceptron is used to decouple the comprehensive query features, and a graph convolutional network is used to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features, including: using a multilayer perceptron to decouple the comprehensive query features to obtain the query features; Construct the class prototype matrix; Calculate the cosine similarity between the query features and the class prototype matrix to obtain the relationship matrix; The query features are concatenated with the matrix of all class prototypes at the node dimension to obtain the concatenated matrix; The relation matrix and the concatenation matrix are input into the graph convolutional network to obtain the optimized query features; The corrected query features are obtained by weighted summation of the query features and the optimized query features.
[0013] Secondly, a tongue diagnosis and recognition system with progressive feature refinement is disclosed, including: The feature extraction module is used to acquire tongue diagnosis images and extract the first feature, the second feature, and the third feature respectively; A progressive feature refinement module is used to construct a first mask with a third feature, perform first mask processing on the second feature to obtain a refined second feature; construct a second mask with the refined second feature, perform second mask processing on the first feature to obtain a refined first feature; and fuse the third feature, the refined second feature, and the refined first feature to obtain a comprehensive query feature. The feature correction module is used to decouple the comprehensive query features using a multilayer perceptron and to model the continuous relationship of the decoupled comprehensive query features using a graph convolutional network to obtain the corrected query features. The output module is used to classify the corrected query features using a classifier and output the tongue diagnosis image classification result.
[0014] Thirdly, an electronic device is disclosed, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the method described above.
[0015] Fourthly, a computer-readable storage medium is disclosed having a computer program stored thereon that, when executed by a processor, implements the method described above.
[0016] The above one or more technical solutions have the following beneficial effects: Compared with existing technologies, this invention extracts the first, second, and third features of the tongue diagnosis image and progressively refines the multi-layer features from deep to shallow based on the first and second masks. It utilizes deep global semantic information to guide and constrain the mid-layer local texture features and shallow detail features step by step, which can effectively suppress background noise and irrelevant feature responses outside the tongue. This approach can enhance the representation ability of fine-grained diagnostic information such as tongue color distribution, local texture, and edge contours. At the same time, it can also improve the consistency of multi-layer feature fusion and improve the problems of insufficient detail extraction ability and susceptibility to complex background interference in traditional tongue diagnosis methods, thereby improving the accuracy and stability of tongue diagnosis.
[0017] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0018] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0019] Figure 1 This is a flowchart of a tongue diagnosis method with progressive feature refinement according to Embodiment 1 of the present invention. Detailed Implementation It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0020] It should be noted that the terminology used herein is for the purpose of describing particular implementations only and is not intended to limit the exemplary implementations of the present invention.
[0021] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.
[0022] Example 1 like Figure 1 As shown, this embodiment discloses a tongue diagnosis method with progressive feature refinement, characterized by including: S1: Obtain the tongue diagnosis image and extract the first feature, second feature and third feature respectively.
[0023] The technical solution provided in this disclosure can use image acquisition devices such as cameras to acquire tongue diagnosis images.
[0024] In one optional implementation, the acquired tongue diagnosis image is input into the YOLOv12 backbone network for feature extraction to obtain a first feature, a second feature, and a third feature, wherein the resolution of the first feature > the resolution of the second feature > the resolution of the third feature, specifically expressed as follows: , , =Backbone(X); Where X represents a tongue diagnosis image, Indicates the first feature, Indicates the second characteristic, This indicates the third characteristic.
[0025] The technical solution provided in this disclosure embodiment is that the YOLOv12 backbone network consists of convolutional layers, C3K2, and A2C2f.
[0026] Understandably, the YOLOv12 backbone network is used to perform hierarchical downsampling and feature encoding on the input tongue diagnosis image, generating three types of multi-scale feature maps: shallow features, medium features, and deep features, to provide differentiated visual representations for subsequent feature refinement.
[0027] The first feature is a shallow feature, the second feature is a medium-level feature, and the third feature is a deep feature.
[0028] The first feature contains edge detail features of the tongue diagnosis image; the second feature contains local texture features of the tongue diagnosis image; and the third feature contains global semantic features of the tongue diagnosis image.
[0029] S2: Construct a first mask using the third feature, perform first mask processing on the second feature to obtain a refined second feature; construct a second mask using the refined second feature, perform second mask processing on the first feature to obtain a refined first feature; fuse the third feature, the refined second feature, and the refined first feature to obtain a comprehensive query feature.
[0030] Existing solutions often employ global classification networks, neglecting the extraction of fine-grained local features. Under background noise from sources such as lips, teeth, or lighting conditions, the ability to focus and extract fine-grained features is severely insufficient, leading to classification errors in tongue diagnosis images. This embodiment refines features through a progressive feature refinement method combining a first mask and a second mask, achieving accurate classification of tongue diagnosis images.
[0031] In one optional implementation, constructing the first mask using the third feature includes: aligning the third feature to the second feature using bilinear interpolation, then inputting it into a sub-network constructed from a convolutional layer, and then normalizing it using a sigmoid activation function to obtain the first mask, specifically expressed as follows: ; ; in, This represents the third feature after bilinear interpolation, where BI stands for bilinear interpolation. Indicates the third characteristic, Indicates the first mask. This represents the Sigmoid activation function. This indicates that the subnetwork constructed by the convolutional layer has been trained using the third feature.
[0032] The technical solution provided in this disclosure is that the sub-network constructed by the convolutional layer is composed of 5 layers of convolutional kernels stacked with 3×3 kernels.
[0033] In one optional implementation, performing a first mask processing on the second feature to obtain a refined second feature includes: performing a convolution processing on the second feature, and then multiplying the convolutional second feature element-wise with the first mask to obtain the refined second feature, specifically expressed as follows: ; in, This represents the refined second feature. Indicates the first mask. This indicates element-wise multiplication. This indicates convolution processing. This indicates the second characteristic.
[0034] In one optional implementation, constructing the second mask using the refined second feature includes: aligning the refined second feature to the first feature using bilinear interpolation, then inputting it into a sub-network constructed from a convolutional layer, and then normalizing it using a sigmoid activation function to obtain the second mask, specifically expressed as follows: ; ; in, This represents the refined second feature after bilinear interpolation, where BI stands for bilinear interpolation. This represents the refined second feature. Indicates the second mask. This represents the Sigmoid activation function. This indicates that the subnetwork constructed by the convolutional layer has been trained using the second feature.
[0035] In one optional implementation, performing a second masking process on the first feature to obtain a refined first feature includes: performing a convolution process on the first feature, and then multiplying the convolutional first feature element-wise with the second mask to obtain the refined first feature, specifically expressed as follows: ; in, This represents the first feature after refinement. Indicates the second mask. This indicates element-wise multiplication. This indicates convolution processing. This indicates the first characteristic.
[0036] In one optional implementation, the third feature, the refined second feature, and the refined first feature are fused to obtain a comprehensive query feature. Specifically, the third feature is subjected to two bilinear interpolation processes, i.e., the third feature after one bilinear interpolation process... Based on this, a second bilinear interpolation process is performed. This refined second feature is then concatenated with the refined first feature, followed by global pooling to obtain the comprehensive query feature, as shown below: ; in, Indicates comprehensive query characteristics, Indicates global pooling. Indicates feature splicing, This represents the first feature after refinement. This represents the refined second feature after bilinear interpolation. Indicates bilinear interpolation. This represents the third feature after bilinear interpolation.
[0037] The technical solution provided in this disclosure uses global pooling as max pooling.
[0038] The technical solution provided in this disclosure extracts the first, second, and third features of a tongue diagnosis image, and progressively refines the multi-layer features from deep to shallow based on the first and second masks. It utilizes deep global semantic information to guide and constrain the mid-layer local texture features and shallow detail features step by step, which can effectively suppress background noise and irrelevant feature responses outside the tongue. This approach can enhance the representation ability of fine-grained diagnostic information such as tongue color distribution, local texture, and edge contours. At the same time, it can also improve the consistency of multi-layer feature fusion and improve the problems of insufficient detail extraction ability and susceptibility to interference from complex backgrounds in traditional tongue diagnosis methods, thereby improving the accuracy and stability of tongue diagnosis.
[0039] Understandably, compared to the original YOLOv12 neck layer which only achieves feature fusion through multi-scale feature transfer and aggregation, the above scheme utilizes deep global semantic features to construct a mask, performs saliency screening on mid-level local texture features, and further uses the refined mid-level features to construct a mask to constrain shallow detail features, thereby achieving progressive feature refinement across layers. This approach can effectively suppress background noise and irrelevant responses, enhance the texture and subtle feature representation of target-related regions, and narrow the semantic gap between shallow, mid, and deep features, thus exhibiting superior performance compared to the original neck layer in fine-grained texture recognition scenarios.
[0040] S3: Use a multilayer perceptron to decouple the comprehensive query features, and use a graph convolutional network to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features.
[0041] Although there are natural physiological connections between different diagnostic dimensions (e.g., "blurred tongue" is often accompanied by "teeth marks on the sides"), traditional models often employ isolated training for each task, ignoring feature sharing, or use simple hard-sharing mechanisms, lacking feature decoupling for each dimension. This embodiment effectively avoids the problem of ineffective or excessive sharing of feature information by orthogonally decoupling the comprehensive query features between progressive feature refinement and continuous relationship modeling.
[0042] Furthermore, the pathological evolution of tongue appearance in Traditional Chinese Medicine (TCM) is a gradual, continuous spectrum, while existing classification models lack modeling of these continuous relationships, leading to misclassification of blurred tongue images at transitional boundaries. This embodiment solves the problem of inaccurate classification of blurred tongue images at transitional boundaries by modeling these continuous relationships using graph convolutional networks.
[0043] The technical solution provided in this disclosure utilizes a multilayer perceptron to decouple the comprehensive query features to obtain query features. Specifically, the comprehensive query features are projected onto D independent diagnostic subspaces to obtain the query features, as specifically represented as follows: ; in, This represents the query feature of the d-th diagnostic subspace. This represents the multilayer perceptron in the d-th diagnostic subspace. , This indicates the characteristics of a comprehensive query.
[0044] The feature decoupling of the present invention will be further described with reference to a specific implementation. Assume that the query feature is 128-dimensional and there are 4 independent diagnostic subspaces. After the query feature is projected, the query feature of each diagnostic subspace is 128-dimensional. That is, the query feature in each of the 4 independent diagnostic subspaces is 128-dimensional. Each diagnostic subspace corresponds to a multilayer perceptron. Then, the projected query feature is input into the multilayer perceptron to obtain the query feature of the corresponding subspace.
[0045] In one optional implementation, a multilayer perceptron is used to decouple the comprehensive query features, and a graph convolutional network is used to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features, including: using a multilayer perceptron to decouple the comprehensive query features to obtain the query features; Construct the class prototype matrix; Calculate the cosine similarity between the query features and the class prototype matrix to obtain the relationship matrix; The query features are concatenated with the matrix of all class prototypes at the node dimension to obtain the concatenated matrix; The relation matrix and the concatenation matrix are input into the graph convolutional network to obtain the optimized query features; The revised query features are obtained by weighted summing of the query features and the optimized query features, as follows: ; ; ; ; =ReLU( ( +I) ); ; in, Represents the prototype matrix of all classes. This represents the prototype matrix of the first class. This represents the prototype matrix of the second class. Indicates the first Class prototype matrix The relation matrix representing the prototype matrix of the m-th class is... This represents the prototype matrix of the m-th class. Indicates query characteristics, Represents the relation matrix. ReLU represents the concatenation matrix, and ReLU represents the ReLU activation function. Let I denote the normalization degree matrix, and let I denote the identity matrix. This represents the learnable weight matrix in a graph convolutional network. This represents the output features of a graph convolutional network. This indicates the optimized query features. , Indicates the learnable weight coefficients. =0.5, =0.5, This indicates the corrected query characteristics.
[0046] The continuous relationship modeling of the present invention will be further described with reference to a specific implementation method, in order to query features. To illustrate this in detail, let's assume we have 10 manually selected typical images of different categories. Of these, 3 are tongue color images, 4 are tongue coating images, 2 are tongue shape images, and 1 is a tongue body image. That is, across the four diagnostic dimensions, we construct four class prototype matrices, which are as follows: , , , , And assume query characteristics The corresponding class prototype matrix is Through the class prototype matrix With query features Constructing the relational matrix of the first class prototype matrix And so on, and then the various relation matrices are concatenated to obtain the relation matrix. Through the class prototype matrix With query features Concatenate the nodes into a concatenation matrix. The concatenation matrix and relation matrix are input into a graph convolutional network to obtain the optimized concatenation matrix. =ReLU( ( +I) In fact, the optimized splicing matrix can also be represented as Then use split to divide it into Then, the optimized query features and the original query features are weighted and summed to obtain the corrected query features.
[0047] The weighted summation of the query features and the optimized query features is to ensure the stability of the query features and avoid situations where the optimized query features alone deviate significantly. In real-world scenarios, although the optimized query features can specifically improve the accuracy of feature representation, they may be affected by factors such as data noise and scene fluctuations, resulting in deviations such as feature distribution shifts and numerical anomalies. If the optimized query features are used directly and alone, it may lead to the distortion of the query features, thereby reducing the reliability and stability of the query features.
[0048] The technical solution provided in this disclosure orthogonally decouples comprehensive query features between progressive feature refinement and continuous relationship modeling. Leveraging the characteristics of orthogonal spaces, it achieves hierarchical feature control for diagnostic tasks. On one hand, it connects the underlying feature links, allowing different diagnostic dimensions to share basic common features and mine general representational information from the data. On the other hand, it independently decouples high-level features, ensuring that the exclusive features of each diagnostic branch do not interfere with each other and are precisely focused. This fundamentally overcomes the problems of traditional multi-task isolated training modes, such as fragmented and unshared features, redundant training resources, and weak generalization ability; as well as the inherent defects of severe interference from hard-shared features in multi-task systems, insufficient extraction of differentiated features, and limited model robustness. This feature decoupling achieves efficient reuse of feature resources while ensuring the independence and discrimination accuracy of each diagnostic task, thus improving the stability and accuracy of tongue diagnosis image recognition.
[0049] The technical solution provided in this disclosure uses graph convolutional networks to model the continuous relationship of query features, which can more accurately classify critical tongue images in the color and texture transition region.
[0050] S4: Use a classifier to classify the corrected query features and output the tongue diagnosis image classification result.
[0051] The technical solution provided in this disclosure uses a joint loss function for overall end-to-end optimization of the network model. The total loss function is expressed as follows: ; in, This represents the weight coefficient of the detection box. This represents the regression loss of the detection box. This represents the classification loss weight coefficient in the d-th dimension. Let d represent the classification loss in the d-th dimension. This classification loss is calculated using a few-sample metric learning formula that incorporates the temperature coefficient and cosine similarity, and is expressed as: ; in, The coefficient of performance is represented by temperature, and S represents the cosine similarity. This indicates the corrected query characteristics. This represents the prototype matrix of the k-th class. Let represent the true value matrix of the d-th dimension target.
[0052] Understandably, tongue diagnosis image classification typically involves first locating the tongue region and then classifying the tongue in multiple dimensions. There is information coupling between these two tasks. This study uses a combined loss function—a bounding box regression loss and a classification loss—for end-to-end optimization. The bounding box provides precise tongue regions for classification, and classification occurs within the bounding box, improving accuracy. Furthermore, the temperature coefficient adjusts the sharpness of the similarity distribution; a larger temperature coefficient smooths the distribution and prevents overfitting, while a smaller temperature coefficient minimizes inter-class differences, enhancing the model's ability to distinguish between similar but difficult-to-distinguish tongue diagnosis images. Cosine similarity sensitively captures fine-grained semantic differences, measuring semantic similarity between samples and further improving the classification accuracy of tongue diagnosis images.
[0053] Example 2 This embodiment provides a tongue diagnosis recognition system with progressive feature refinement, including: The feature extraction module is used to acquire tongue diagnosis images and extract the first feature, the second feature, and the third feature respectively; A progressive feature refinement module is used to construct a first mask with a third feature, perform first mask processing on the second feature to obtain a refined second feature; construct a second mask with the refined second feature, perform second mask processing on the first feature to obtain a refined first feature; and fuse the third feature, the refined second feature, and the refined first feature to obtain a comprehensive query feature. The feature correction module is used to decouple the comprehensive query features using a multilayer perceptron and to model the continuous relationship of the decoupled comprehensive query features using a graph convolutional network to obtain the corrected query features. The output module is used to classify the corrected query features using a classifier and output the tongue diagnosis image classification result.
[0054] It should be noted that each module in this embodiment corresponds one-to-one with each step in Embodiment 1, and their specific implementation processes are the same, so they will not be repeated here.
[0055] Example 3 This embodiment provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the progressive feature refinement tongue diagnosis method described in Embodiment 1 above.
[0056] Example 4 This embodiment provides a computer-readable storage medium.
[0057] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of a progressive feature refinement tongue diagnosis method as described in Embodiment 1.
[0058] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.
[0059] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, systems, and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0060] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0061] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0062] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0063] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. A method for tongue diagnosis with progressive feature refinement, characterized in that, include: Obtain tongue diagnosis images and extract the first, second, and third features respectively; A first mask is constructed using the third feature, and the second feature is processed using the first mask to obtain a refined second feature; a second mask is constructed using the refined second feature, and the first feature is processed using the second mask to obtain a refined first feature; the third feature, the refined second feature, and the refined first feature are fused to obtain a comprehensive query feature; Multilayer perceptron is used to decouple the comprehensive query features, and graph convolutional network is used to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features. The corrected query features are classified using a classifier, and the tongue diagnosis image classification result is output.
2. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, The process of acquiring a tongue diagnosis image and extracting the first, second, and third features includes: inputting the acquired tongue diagnosis image into the YOLOv12 backbone network for feature extraction to obtain the first, second, and third features, with the resolution of the first feature being greater than that of the second feature, which in turn is greater than that of the third feature.
3. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, Constructing the first mask using the third feature involves: aligning the third feature to the second feature using bilinear interpolation, then inputting it into a sub-network constructed from convolutional layers, and finally normalizing it using the Sigmoid activation function to obtain the first mask.
4. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, Constructing a second mask using the refined second feature includes: aligning the refined second feature to the first feature using bilinear interpolation, then inputting it into a sub-network constructed by a convolutional layer, and then normalizing it using a Sigmoid activation function to obtain the second mask.
5. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, The process of applying a first mask to the second feature to obtain a refined second feature includes: performing a convolution on the second feature, and then multiplying the convolutional second feature element-wise with the first mask to obtain the refined second feature.
6. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, The process of applying a second mask to the first feature to obtain a refined first feature includes: performing a convolution on the first feature, and then multiplying the convolutional first feature element-wise with the second mask to obtain the refined first feature.
7. The tongue diagnosis method with progressive feature refinement as described in claim 1, characterized in that, Multilayer perceptron is used to decouple the comprehensive query features, and graph convolutional network is used to model the continuous relationship of the decoupled comprehensive query features to obtain the corrected query features, including: using multilayer perceptron to decouple the comprehensive query features to obtain the query features; Construct the class prototype matrix; Calculate the cosine similarity between the query features and the class prototype matrix to obtain the relationship matrix; The query features are concatenated with the matrix of all class prototypes at the node dimension to obtain the concatenated matrix; The relation matrix and the concatenation matrix are input into the graph convolutional network to obtain the optimized query features; The corrected query features are obtained by weighted summation of the query features and the optimized query features.
8. A tongue diagnosis and recognition system with progressive feature refinement, characterized in that, include: The feature extraction module is used to acquire tongue diagnosis images and extract the first feature, the second feature, and the third feature respectively; A progressive feature refinement module is used to construct a first mask with a third feature, perform first mask processing on the second feature to obtain a refined second feature; construct a second mask with the refined second feature, perform second mask processing on the first feature to obtain a refined first feature; and fuse the third feature, the refined second feature, and the refined first feature to obtain a comprehensive query feature. The feature correction module is used to decouple the comprehensive query features using a multilayer perceptron and to model the continuous relationship of the decoupled comprehensive query features using a graph convolutional network to obtain the corrected query features. The output module is used to classify the corrected query features using a classifier and output the tongue diagnosis image classification result.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the progressive feature refinement tongue diagnosis recognition method according to any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it performs a progressive feature refinement tongue diagnosis recognition method as described in any one of claims 1-7.