Face-to-face diagnosis image intelligent processing method and system based on artificial intelligence model
By improving the visual Transformer model and combining the region association mask matrix and the negative correlation bias matrix, the problem of insufficient integration of global and local features in the TCM facial diagnosis system was solved, achieving higher diagnostic accuracy and interpretability, and improving the practicality of TCM facial diagnosis as a clinical auxiliary tool.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG HUIKANG TECH CO LTD
- Filing Date
- 2026-02-12
- Publication Date
- 2026-06-16
AI Technical Summary
Existing intelligent TCM facial diagnosis systems cannot effectively combine global and local features, resulting in low consistency between diagnostic results and TCM theory, which limits their acceptance and practicality as a clinical auxiliary tool.
An improved visual Transformer model is used to extract key facial regions through semantic segmentation. By combining the region association mask matrix, region bias matrix, and negative correlation bias matrix, attention calculation is adjusted to enhance feature interaction within the same diagnostic region and suppress false associations caused by individual skin color differences.
It improves the diagnostic accuracy and universality of TCM facial diagnosis systems, enhances the transparency and interpretability of results, conforms to the clinical thinking habits of TCM diagnosis, and improves the practicality and credibility of intelligent processing of facial diagnosis images.
Smart Images

Figure CN121708014B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of facial diagnosis image processing technology. In particular, it relates to an intelligent method and system for facial diagnosis image processing based on an artificial intelligence model. Background Technology
[0002] Traditional Chinese medicine (TCM) facial diagnosis, as a core component of observational diagnosis, is a unique diagnostic method that infers the state of a person's Qi and blood and the nature of their illness by observing changes in facial color and shape. With the increasing penetration of artificial intelligence technology into the medical field, research on intelligent TCM facial diagnosis based on computer vision has become an important direction.
[0003] In current technologies, intelligent diagnosis in Traditional Chinese Medicine (TCM) facial diagnosis faces two major challenges: low levels of theoretical digitization and insufficient effectiveness in clinical assistance. Most current systems employ traditional image processing or general deep learning models based on single facial images for feature extraction and analysis. However, relying solely on facial image data presents significant technical bottlenecks for machine diagnosis of complex and abstract comprehensive syndromes such as "Qi and Blood status."
[0004] Existing visual models, such as convolutional neural networks or standard visual transformers, cannot establish an organic connection between global features (such as overall complexion and expression) and local features (such as lip color and eyelid color) in traditional Chinese medicine diagnosis. This results in a low degree of consistency between the diagnostic results and traditional Chinese medicine theory, which limits their acceptance and practicality as a clinical auxiliary tool. Summary of the Invention
[0005] To address the technical problem that the diagnostic results of existing facial diagnosis image intelligent processing methods have a low degree of consistency with traditional Chinese medicine theory, which limits their acceptance and practicality as clinical auxiliary tools, this invention provides solutions in the following aspects.
[0006] In the first aspect, the intelligent image processing method for facial diagnosis based on artificial intelligence models includes:
[0007] Obtain a global image of the user's face and extract multiple key facial regions through semantic segmentation;
[0008] The global image of the face is input into the improved visual Transformer model for processing;
[0009] The improved visual Transformer model outputs a global disease probability distribution vector and a region importance weight vector corresponding to the global image, and obtains the diagnostic result based on the global disease probability distribution vector;
[0010] The improved visual Transformer model's method for processing the global image includes: dividing the global image into multiple patches, determining the key facial regions to which each patch belongs based on semantic segmentation results, and generating a region association mask matrix based on the key facial regions to which the patches belong; calculating a region bias matrix adapted to the global image based on the color, texture, and spatial features of the global image and each patch; calculating an initial negative correlation matrix based on the color, texture, and spatial features of the patches and the key facial regions to which the patches belong; calculating a negative correlation bias matrix in real time based on the initial negative correlation matrix and the region bias matrix; and adjusting the standard attention calculation using the region association mask matrix, the region bias matrix, and the negative correlation bias matrix.
[0011] Preferably, the region association mask matrix is a binary mask matrix. For any element in the binary mask matrix, when two patches belong to the same key facial region, the element is a first preset value; when two patches belong to different key facial regions, the element is a second preset value.
[0012] Preferably, the method for calculating the region bias matrix includes: calculating the RGB average value of all pixels in the global image and each tile to obtain global color features and tile color features; calculating the LBP feature vector of the global image and each tile to obtain global texture features and tile texture features; calculating the normalized center coordinates of the tile based on the global image size and tile size; and calculating the element values of the region bias matrix based on the global color features, tile color features, global texture features, tile texture features, and normalized center coordinates of the tile, thereby obtaining the region bias matrix.
[0013] Preferably, the calculation of the element values of the region bias matrix includes: calculating the Euclidean distance between the tile color features and the global color features, and dividing the Euclidean distance by the theoretical maximum Euclidean distance in the RGB color space to obtain the tile color difference; calculating the cosine similarity between the tile texture features and the global texture features to obtain the tile texture difference; for any two tiles, calculating the difference between the tile color difference and the difference between the tile texture difference, and subtracting the product of the two differences from 1 to obtain the similarity in color and texture; calculating the Euclidean distance between the normalized center coordinates of the two tiles, normalizing the Euclidean distance, and subtracting the normalized Euclidean distance from 1 to obtain the spatial similarity; and multiplying the similarity in color and texture with the spatial similarity to obtain the element value.
[0014] Preferably, the method for calculating the negative correlation bias matrix includes: calculating an initial negative correlation matrix based on the tile color difference, tile texture difference, and the key facial region to which the tile belongs; calculating an ideal negative correlation bias matrix based on the region bias matrix; and calculating a negative correlation bias matrix based on the initial negative correlation matrix and the ideal negative correlation bias matrix.
[0015] Preferably, the calculation of the initial negative correlation matrix based on the tile color difference, tile texture difference, and the key facial region to which the tile belongs includes: calculating a skin color difference influence factor based on the tile color difference and tile texture difference; for any two tiles, selecting a preset calculation method according to the key facial region to which the tile belongs, and calculating a consistency adjustment factor based on the calculation method and the skin color difference influence factor; calculating the difference between 1 and the consistency adjustment factor, and multiplying the difference by the similarity of the two tiles in color and texture and the similarity in space to obtain the initial factor, and the initial factors between all pairs of tiles constitute the initial negative correlation matrix.
[0016] Preferably, calculating the ideal negative correlation bias matrix based on the regional bias matrix includes: calculating the difference between the regional bias matrix and the minimum value of the element in the regional bias matrix as the first difference; calculating the difference between the maximum value and the minimum value of the element in the regional bias matrix as the second difference; calculating the ratio of the first difference to the second difference, and subtracting the ratio from 1 to obtain the ideal negative correlation bias matrix.
[0017] Preferably, calculating the negative correlation bias matrix based on the initial negative correlation matrix and the ideal negative correlation bias matrix includes: calculating the mean and variance of the initial negative correlation matrix, calculating the mean and variance of the ideal negative correlation bias matrix, calculating the Pearson correlation coefficient value based on the mean and variance of the initial negative correlation matrix and the mean and variance of the ideal negative correlation bias matrix; calculating the difference between the absolute value of the Pearson correlation coefficient value and the absolute value of the Pearson correlation coefficient value, and multiplying the obtained difference by the initial negative correlation matrix to obtain a first product; calculating the product of the absolute value of the Pearson correlation coefficient value and the ideal negative correlation bias matrix to obtain a second product; and calculating the sum of the first product and the second product to obtain the negative correlation bias matrix.
[0018] Preferably, the method for obtaining the region importance weight vector includes: obtaining the importance score of each patch from the attention matrix of the last layer of the improved visual Transformer model, aggregating the importance scores of all patches belonging to the same key facial region to obtain the importance score of each key facial region, and normalizing the importance scores of all key facial regions to obtain the region importance weight vector.
[0019] Secondly, a facial diagnosis image intelligent processing system based on an artificial intelligence model includes a processor and a memory, wherein the memory stores computer program instructions, and when the computer program instructions are executed by the processor, the aforementioned facial diagnosis image intelligent processing method based on an artificial intelligence model is implemented.
[0020] The present invention has the following effects:
[0021] 1. This invention introduces a region association mask matrix to transform facial semantic segmentation results into attention guidance signals, forcing the model to enhance feature interactions within the same diagnostic region, which aligns with the theoretical basis of TCM regional syndrome differentiation. Simultaneously, by dynamically calculating the region bias matrix, multi-dimensional TCM diagnostic features such as color, texture, and spatial location are fused into a continuous similarity metric, softly encoding the physiological and pathological correlations between different facial regions, enabling the model to learn a deeper feature table that better fits TCM theory.
[0022] 2. This invention effectively suppresses false associations caused by distortion of color and texture features due to individual skin color differences by calculating the negative correlation bias matrix in real time and designing a negative correlation optimization mechanism between the negative correlation bias matrix and the region bias matrix. This mechanism guides the model to reduce its dependence on features heavily contaminated by skin color differences when allocating attention, and instead focus on more stable structural information, thereby ensuring that the system can stably and accurately capture underlying pathological features when facing users with different skin colors, thus improving the universality and practicality of the method.
[0023] 3. The regional importance weight vector output by this invention originates from the model's own attention mechanism, which can clearly reveal the key facial regions on which the diagnostic decision is based and their contribution. This is equivalent to providing the model's diagnostic approach, greatly enhancing the transparency and credibility of the results, conforming to the clinical thinking habits of traditional Chinese medicine practitioners, and contributing to human-machine collaboration and decision support. Attached Figure Description
[0024] Figure 1 This is a flowchart of steps S1-S3 in an intelligent facial diagnosis image processing method based on an artificial intelligence model according to an embodiment of the present invention.
[0025] Figure 2 This is a schematic diagram of steps S20-S23 in an intelligent facial diagnosis image processing method based on an artificial intelligence model according to an embodiment of the present invention. Detailed Implementation
[0026] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments.
[0027] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0028] Reference Figure 1 A method for intelligent processing of facial diagnosis images based on an artificial intelligence model includes steps S1-S3, as follows:
[0029] S1: Obtain a global image of the user's face and extract multiple key facial regions through semantic segmentation.
[0030] A smart facial diagnostic device (such as a dedicated device integrating a 1080P camera and a ring light) is used to acquire a frontal facial image of the user under standard lighting conditions. To reduce the impact of uneven lighting during acquisition, the acquired facial image is processed as follows: the facial image is converted to grayscale; a Gaussian filter is applied to smooth the image; the Sobel operator is used to calculate the image gradient, and the image is divided into bright and dark areas based on the gradient magnitude; the brightness histogram and LBP (Local Binary Pattern) texture features of the bright and dark areas are calculated; if the textures are similar, brightness equalization is performed; the balanced global image is output for subsequent processing.
[0031] A training dataset is constructed using a large number of global facial images. A typical UNet (U-Network, semantic segmentation network) is used as the input. The input is a 512×512×3 RGB (Red Green Blue) global image labeled with six categories: background, lips, eyelids, forehead, cheekbone, and bridge of nose. Cross-entropy loss is used, and the Adam (Adaptive Moment Estimation) optimizer is used to train for 1000 epochs until the IoU (Intersection over Union) of the validation set is stable, resulting in a trained UNet network.
[0032] The currently acquired, illumination-equalized global image is input into the UNet network to obtain the class prediction for each pixel in the global image. For each target region (lips, eyelids, forehead, cheekbone, bridge of the nose), a binary mask for that target region is generated based on the predicted class of each pixel (i.e., pixels belonging to that target region are 1, and others are 0). For each target region, its bounding box is found using its binary mask, and the rectangular region is cropped from the global image and resized into a fixed-size local region image. Using the binary mask, regions labeled as background in the global image are removed, resulting in a clean global image. Thus, one clean global image and five local region images are obtained.
[0033] S2: Input the global image of the face into the improved visual Transformer model for processing.
[0034] The clean global image is input into the improved visual Transformer model for forward propagation inference. The inference process includes steps S20-S23, as follows:
[0035] S20: Divide the global image into multiple patches, determine the key facial regions to which each patch belongs based on the semantic segmentation results, and generate a region association mask matrix based on the key facial regions to which the patches belong.
[0036] Divide the global image into The size is tiles (such as) Each tile, after linear projection, forms a... The feature vectors of all tiles constitute a dimensional feature vector. The tile matrix.
[0037] Based on the category label of each pixel output by the UNet network, the category labels of all pixels within each patch are counted, and the number of category labels belonging to the same category is counted. The category with the most category labels is then marked as the final category label for that patch, thus obtaining... Each tile has its own category label.
[0038] Construct a size of The binary mask matrix, where the element acquisition rule of the binary mask matrix is: if the image patch and tiles If they all belong to the same key facial region (e.g., both on the lips), then the th element in the binary mask matrix... Line number Column elements The first preset value is 1 (for example); if the block and tiles If they do not belong to the same key facial region, then the first one in the binary mask matrix... Line number Column elements The second preset value is used (0 for example). This yields the region association mask matrix.
[0039] The region association mask matrix (i.e., the binary mask matrix) will be used as a modulation factor for attention score in subsequent attention calculations, guiding the model to pay more attention to the associations of texture, color, etc. within the same key facial region, and reducing possible interference between different key facial regions.
[0040] S21: Calculate the region bias matrix adapted to the global image based on the color, texture and spatial features of the global image and each tile.
[0041] The aforementioned region association mask matrix is a static matrix based on prior category partitioning. Its function is to ensure that basic attentional connections can be established between patches within the same diagnostic region (i.e., key facial regions). However, even within the same diagnostic region, there may be subtle differences in texture, color, and other features at different locations that are diagnostically significant. A static mask matrix alone cannot model such complex and dynamic internal relationships, nor can it adapt to feature distortions caused by individual skin color differences. Therefore, this embodiment introduces a learnable region bias matrix and a negative correlation bias matrix to address the complex regional relationships of the human face in traditional Chinese medicine facial diagnosis and the individual skin color differences of the tested individuals.
[0042] The method for constructing the region bias matrix includes: calculating the RGB average value of all pixels in the global image and each tile to obtain global color features and tile color features; calculating the LBP feature vector of the global image and each tile to obtain global texture features and tile texture features; calculating the normalized center coordinates of the tiles based on the global image size and tile size; and calculating the element values of the region bias matrix based on the global color features, tile color features, global texture features, tile texture features, and normalized center coordinates of the tiles, thereby obtaining the region bias matrix.
[0043] In this embodiment, the mean values of the red, green, and blue channels of all pixels in the global image are calculated to form a three-dimensional vector, denoted as the global color feature.
[0044] The global face image is converted to grayscale, and a 59-dimensional LBP feature vector is obtained using a uniform mode (radius of 2, sampling point of 16), which is denoted as global texture feature.
[0045] The calculation method for the tile color feature of each tile is the same as the calculation method for the global color feature, and the calculation method for the tile texture feature is the same as the calculation method for the global texture feature.
[0046] Based on the size of the global image ( ) and the size of the tiles ( ), calculate the normalized center coordinates of each tile, for the tile located at the ), line, number For a tile with columns (row and column indices starting from 0), the formula for calculating its normalized center coordinates is: .
[0047] Calculate the Euclidean distance between the tile color features and the global color features, and divide this Euclidean distance by the theoretical maximum Euclidean distance value in the RGB color space. The formula for calculating the theoretical maximum Euclidean distance value in the RGB color space is as follows: This yields the color differences between the tiles.
[0048] Calculate the cosine similarity between the tile texture features and the global texture features to obtain the tile texture differences.
[0049] The comprehensive similarity value of the two tiles is calculated based on the differences in tile texture, tile color, and normalized center coordinates. The comprehensive similarity value constitutes the element values of the region bias matrix, thus obtaining the region bias matrix.
[0050] The calculation method for the comprehensive similarity value includes: for any two tiles, calculating the difference in tile color and the difference in tile texture, and subtracting the product of the two differences from 1 to obtain the similarity between the two tiles in color and texture; calculating the Euclidean distance between the normalized center coordinates of the two tiles, normalizing the Euclidean distance, and subtracting the normalized Euclidean distance from 1 to obtain the spatial similarity between the two tiles; multiplying the similarity in color and texture with the spatial similarity to obtain the comprehensive similarity value between the two tiles, which is the element value in the region bias matrix. The specific formula is as follows:
[0051]
[0052] In the formula, Representing a block and tiles The overall similarity value; Representing a block The differences in tile textures; Representing a block The differences in tile textures; Representing a block The color difference of the tiles; Representing a block The color difference of the tiles; Representing a block Normalized center coordinates; Representing a block Normalized center coordinates; Representing a block and tiles The theoretical maximum value of the Euclidean distance between the normalized center coordinates.
[0053] Characterization Patch and tiles Distance in color and texture, Characterization Patch and tiles Similarity in color and texture is denoted as ; Characterization Patch and tiles Spatial similarity is denoted as .
[0054] At this point, the region bias matrix has been initialized. The region bias matrix mainly encodes the comprehensive similarity between tiles in terms of color, texture and spatial location. It reflects the strength of tile association under ideal conditions and injects prior knowledge that conforms to traditional Chinese medicine theory and adapts to individual differences into the attention mechanism of the visual Transformer model, thereby guiding the model to more accurately focus on facial region associations with diagnostic relevance.
[0055] S22: Calculate the initial negative correlation matrix based on the color features, texture features, spatial features of the tile and the key facial region to which the tile belongs, and calculate the negative correlation bias matrix in real time based on the initial negative correlation matrix and the region bias matrix.
[0056] Individual skin color differences can affect the color and texture of symptoms, leading to distorted similarity of certain patch pairs in the region bias matrix. The negative correlation bias matrix needs to compensate for this: when some patch pairs in the region bias matrix exhibit falsely high similarity due to individual skin color differences, the negative correlation bias matrix should provide a lower bias value at the corresponding position, thereby weakening this false association during attention computation; conversely, when some patch pairs have weak associations but stable spatial structures in the region bias matrix, the negative correlation bias matrix should provide a higher compensatory bias. By establishing this inverse correspondence, the negative correlation bias matrix can guide the model to reduce its reliance on potentially distorted color and texture features in scenarios involving individual skin color differences, focusing more on reliable structural information.
[0057] For each tile in the current global image, calculate the color difference between the tile and its neighboring tiles, and average the resulting color differences. Calculate the texture difference between the tile and its neighboring tiles, and average the resulting texture differences. Based on the average color difference and the average texture difference, calculate the skin color difference influence factor, as shown in the formula table below:
[0058]
[0059] In the formula, Representing a block Factors influencing skin color differences; Representing a block The average of the differences in tile color between the tile and its adjacent tiles; This represents the maximum value among the average differences in color between all tiles and their adjacent tiles; Representing a block The average difference in texture between the tile and its adjacent tiles; This represents the maximum value among the average differences in texture differences between all tiles and their adjacent tiles.
[0060] The consistency adjustment factor for the two patches is calculated based on their category labels and skin color difference influence factor. When the patches... and tiles When labels belong to the same category, the consistency adjustment factor is calculated using the following formula:
[0061]
[0062] In the formula, Representing a block and tiles Consistency adjustment factor; For blocks Factors influencing skin color differences; For blocks Factors influencing skin color differences.
[0063] Quantified the tiles and tiles The joint confidence level is either unaffected by individual skin color differences or only minimally affected by them. Its value ranges from [0,1], with values closer to 1 indicating a stronger pattern. and tiles The more reliable the connection, the closer it is to 0, indicating that at least one end of the connection is severely contaminated by individual skin color differences, and its reliability is low.
[0064] When the tile and tiles When labels do not belong to the same category, the consistency adjustment factor is calculated using the following formula:
[0065]
[0066] In the formula, Representing a block and tiles Consistency adjustment factor; For blocks Factors influencing skin color differences; For blocks Factors influencing skin color differences.
[0067] at this time It is a decay factor based on the risk of individual skin color differences. It quantifies the reliability of cross-regional connections into a weight between 0 and 1, which is directly used to reduce the negative correlation bias matrix, so that the model can adjust the dependence on the association between different diagnostic regions according to the actual situation of interference from individual skin color differences.
[0068] Based on the similarity of two tiles in color and texture, spatial similarity, and a consistency adjustment factor, initial factors for the two tiles are calculated. The initial factors between all pairs of tiles constitute an initial negative correlation matrix. The formula for calculating the initial factors between two tiles is as follows:
[0069]
[0070] In the formula, Representing a block and tiles The initial factor; Representing a block and tiles Spatial similarity; Representing a block and tiles Consistency adjustment factor; Representing a block and tiles Similarity in color and texture.
[0071] The ideal negative correlation bias matrix is calculated based on the regional bias matrix. The specific method includes: calculating the difference between the minimum value of the elements in the regional bias matrix and the minimum value of the elements in the regional bias matrix, as the first difference; calculating the difference between the maximum value and the minimum value of the elements in the regional bias matrix, as the second difference; calculating the ratio of the first difference to the second difference, and subtracting this ratio from 1 to obtain the ideal negative correlation bias matrix. The specific formula is expressed as follows:
[0072]
[0073] In the formula, Represents the ideal negative correlation bias matrix; Represents the region bias matrix; This represents the minimum value of an element in the region bias matrix; This represents the maximum value of the elements in the region bias matrix.
[0074] Linearly map all elements of the region bias matrix to the interval [0,1] while preserving the relative size relationship between elements in matrix B; By completely reversing the normalized values, large values become small values, and small values become large values, thus achieving the negative correlation between the ideal negative correlation bias matrix and the regional bias matrix.
[0075] This represents the ideal negative correlation bias matrix that completely cancels out the distortion of individual skin color differences in the region bias matrix in the context of individual skin color differences.
[0076] The negative correlation bias matrix is calculated based on the initial negative correlation matrix and the ideal negative correlation bias matrix. Specifically, this includes: calculating the mean and variance of the initial negative correlation matrix; calculating the mean and variance of the ideal negative correlation bias matrix; calculating the Pearson correlation coefficient between the initial and ideal negative correlation bias matrices based on the mean and variance of both matrices; calculating the difference between the absolute values of the initial and ideal negative correlation bias matrices and multiplying this difference by the initial negative correlation matrix to obtain the first product; calculating the product of the absolute value of the Pearson correlation coefficient and the ideal negative correlation bias matrix to obtain the second product; and finally, calculating the sum of the first and second products to obtain the negative correlation bias matrix. The specific formula is as follows:
[0077]
[0078] In the formula, Represents the negative correlation bias matrix; This represents the Pearson correlation coefficient between the initial negative correlation matrix and the ideal negative correlation bias matrix; Represents the initial negative correlation matrix; This represents the ideal negative correlation bias matrix.
[0079] Based on the initial negative correlation matrix Bias matrix with ideal negative correlation degree of matching It dynamically determines the extent to which the learned pattern is relied upon and the extent to which the theoretical ideal pattern is depended upon in the final negative correlation bias matrix used.
[0080] Current initial negative correlation matrix Negatively correlated bias moment with ideal When the patterns are highly consistent, it indicates that the skin color difference interference pattern in the current image is relatively typical. A simple approach is to use the ideal pattern, i.e., the ideal negative correlation bias moment. It is effective enough to stabilize output and accelerate convergence; if the current initial negative correlation matrix Negatively correlated bias moment with ideal The near absence of a linear relationship indicates that the skin tone differences in the current image are very specific or complex (e.g., extremely uneven lighting, local occlusion, or extremely uneven skin tone), and the ideal negative correlation bias moment... Unable to provide effective guidance, the formula selection primarily relies on more detailed compensation suggestions specifically calculated for this graph, namely the initial negative correlation matrix. If the currently learned initial negative correlation matrix Negatively correlated bias moment with ideal The patterns are completely opposite, indicating the initial negative correlation matrix calculated for the current global image. It could be completely wrong (for example, due to extremely poor image quality or feature extraction failure, a completely inverted compensation suggestion might be calculated). In this case, the formula completely discards the erroneous initial negative correlation matrix. 100% using ideal negative correlation bias moments This is used to forcibly reverse the learning direction of the model as a conservative but safe fallback solution.
[0081] Use the clipping function in Python to preserve the adjusted negative correlation bias matrix. The values range from [0,1], thus yielding the final negative correlation bias matrix. .
[0082] S23: Adjust the standard attention calculation using the region correlation mask matrix, region bias matrix, and negative correlation bias matrix.
[0083] Input the tile matrix obtained in step S20 into the visual Transformer model to obtain the query matrix. Key matrix Sum matrix An improved attention calculation formula in the visual Transformer model based on the region association mask matrix, region bias matrix, and negative correlation bias matrix:
[0084]
[0085] In the formula, Representing the query matrix AND key matrix Attention weight matrix; It is a key matrix The transpose of the matrix; Represents the global similarity matrix; This represents the Hadamard product, indicating element-wise multiplication. Represents a binary mask matrix; Indicates the feature dimension after projection; Represents the region bias matrix; Represents the negative correlation bias matrix; express The activation function normalizes each row of the attention weight matrix, ensuring that the sum of the attention weights for each query position to all key positions is 1.
[0086] Global similarity matrix It reflects the fundamental association strength between the features of any two tiles, that is, the degree of matching between the query vector of one tile and the key vector of another tile; through a binary mask matrix. With the global similarity matrix Multiplication significantly enhances attentional connectivity between patches within the same diagnostic region, while weakening attentional connectivity between different diagnostic regions; region bias matrix By adding a positive bias (initial value 0.5) to tile pairs within the same region, the attention scores of these tile pairs are directly improved; negative correlation bias matrix By negatively biasing the attention scores of certain patches, especially those regions where spurious associations are easily generated due to individual skin color differences, the diagnostic accuracy, robustness, and interpretability of the final model are improved.
[0087] S3: The improved visual Transformer model outputs the global disease probability distribution vector and the regional importance weight vector corresponding to the global image, and obtains the diagnostic results based on the global disease probability distribution vector.
[0088] After hierarchical feature fusion by an L-layer encoder, the improved visual Transformer model finally outputs a global disease probability distribution vector, which represents the probability that the input global image belongs to each TCM syndrome type (such as Qi deficiency, blood deficiency, Yin deficiency, etc.).
[0089] The importance score for each patch is obtained from the attention matrix of the last Transformer encoder layer. Based on the category label of each patch, the patch is assigned to the corresponding key facial region, and the importance score for each key facial region is calculated based on the importance scores of each patch. The specific formula is as follows:
[0090]
[0091] In the formula, Indicates key facial areas Importance score; Indicates that it belongs to the key facial area The total number of tiles; Representing a block Importance score.
[0092] pass The function normalizes the importance scores of five key facial regions into percentages, obtaining the weights of the five key facial regions, forming a 5-dimensional region importance weight vector. This region importance weight vector is visualized using heatmaps and other methods, intuitively showing which facial regions were primarily relied upon for the diagnostic decision, greatly enhancing the model's interpretability and aligning with the thinking habits of traditional Chinese medicine diagnosis.
[0093] The diagnostic results (i.e., the global disease probability distribution vector, including the main syndrome types and their probabilities) are compared with regional importance heatmaps. Figure 1 It is then presented to users or doctors to complete intelligent assisted diagnosis.
[0094] This application also discloses a facial diagnosis image intelligent processing system based on an artificial intelligence model. The system includes a processor and a memory. The memory stores computer program instructions. When the computer program instructions are executed by the processor, the facial diagnosis image intelligent processing method based on an artificial intelligence model according to the above embodiments of the present invention is implemented.
[0095] The system also includes other components well known to those skilled in the art, such as communication buses and communication interfaces, the settings and functions of which are known in the art and will not be described in detail here.
[0096] It should be noted that those skilled in the art can make various modifications and improvements without departing from the inventive concept, and these all fall within the scope of protection of this invention. Therefore, the scope of protection of this patent should be determined by the appended claims.
Claims
1. An AI model-based face-to-face image intelligent processing method, characterized in that, include: Obtain a global image of the user's face and extract multiple key facial regions through semantic segmentation; The global image of the face is input into the improved visual Transformer model for processing; The improved visual Transformer model outputs a global disease probability distribution vector and a region importance weight vector corresponding to the global image, and obtains the diagnostic result based on the global disease probability distribution vector; The improved visual Transformer model's method for processing the global image includes: dividing the global image into multiple patches, determining the key facial regions to which each patch belongs based on the semantic segmentation results, and generating a region association mask matrix based on the key facial regions to which the patches belong. Calculate the RGB average value of all pixels in the global image and each tile to obtain global color features and tile color features; calculate the LBP feature vector of the global image and each tile to obtain global texture features and tile texture features; calculate the normalized center coordinates of the tiles based on the global image size and tile size; calculate the element values of the region bias matrix based on the global color features, tile color features, global texture features, tile texture features, and normalized center coordinates of the tiles to obtain the region bias matrix. Calculate the Euclidean distance between the tile color features and the global color features, and divide this Euclidean distance by the theoretical maximum Euclidean distance in the RGB color space to obtain the tile color difference; calculate the cosine similarity between the tile texture features and the global texture features to obtain the tile texture difference; calculate the initial negative correlation matrix based on the tile color difference, tile texture difference, and the key facial region to which the tile belongs; calculate the ideal negative correlation bias matrix based on the region bias matrix; calculate the negative correlation bias matrix based on the initial negative correlation matrix and the ideal negative correlation bias matrix. The standard attention calculation is adjusted using the region association mask matrix, region bias matrix, and negative correlation bias matrix: the element-wise product of the global similarity matrix and the region association mask matrix is calculated, the element-wise product is divided by the scaling factor of the feature dimension to obtain an intermediate result, the intermediate result is added to the region bias matrix and the negative correlation bias matrix to obtain the attention score matrix, and finally the attention score matrix is normalized to obtain the adjusted attention weight matrix.
2. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The region association mask matrix is a binary mask matrix. For any element in the binary mask matrix, when two patches belong to the same key facial region, the element is a first preset value; when two patches belong to different key facial regions, the element is a second preset value.
3. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The calculation of the element values of the region bias matrix includes: for any two tiles, calculating the difference between the tile color difference and the difference between the tile texture difference, and subtracting the product of the two differences from 1 to obtain the similarity in color and texture; calculating the Euclidean distance between the normalized center coordinates of the two tiles, normalizing the Euclidean distance, and subtracting the normalized Euclidean distance from 1 to obtain the spatial similarity; and multiplying the similarity in color and texture with the spatial similarity to obtain the element value.
4. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The calculation of the initial negative correlation matrix based on tile color difference, tile texture difference, and the key facial region to which the tile belongs includes: calculating a skin color difference influence factor based on tile color difference and tile texture difference; for any two tiles, selecting a preset calculation method according to the key facial region to which the tile belongs, and calculating a consistency adjustment factor based on the calculation method and the skin color difference influence factor; calculating the difference between 1 and the consistency adjustment factor, and multiplying the difference by the similarity of the two tiles in color and texture and the similarity in space to obtain the initial factor; the initial factors between all pairs of tiles constitute the initial negative correlation matrix.
5. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The calculation of the ideal negative correlation bias matrix based on the regional bias matrix includes: calculating the difference between the minimum value of the elements in the regional bias matrix and the minimum value of the elements in the regional bias matrix, as the first difference; calculating the difference between the maximum value and the minimum value of the elements in the regional bias matrix, as the second difference; calculating the ratio of the first difference to the second difference, and subtracting the ratio from 1 to obtain the ideal negative correlation bias matrix.
6. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The calculation of the negative correlation bias matrix based on the initial negative correlation matrix and the ideal negative correlation bias matrix includes: calculating the mean and variance of the initial negative correlation matrix, calculating the mean and variance of the ideal negative correlation bias matrix, calculating the Pearson correlation coefficient based on the mean and variance of the initial negative correlation matrix and the ideal negative correlation bias matrix; calculating the difference between the absolute value of the Pearson correlation coefficient and the absolute value of the Pearson correlation coefficient, and multiplying the difference by the initial negative correlation matrix to obtain the first product; calculating the product of the absolute value of the Pearson correlation coefficient and the ideal negative correlation bias matrix to obtain the second product; and calculating the sum of the first product and the second product to obtain the negative correlation bias matrix.
7. The intelligent image processing method for facial diagnosis based on an artificial intelligence model according to claim 1, characterized in that, The method for obtaining the region importance weight vector includes: obtaining the importance score of each patch from the attention matrix of the last layer of the improved visual Transformer model, aggregating the importance scores of all patches belonging to the same key facial region to obtain the importance score of each key facial region, and normalizing the importance scores of all key facial regions to obtain the region importance weight vector.
8. A facial diagnosis image intelligent processing system based on an artificial intelligence model, characterized in that, include: A processor and a memory, wherein the memory stores computer program instructions that, when executed by the processor, implement the intelligent processing method for facial diagnostic images based on an artificial intelligence model according to any one of claims 1-7.