Deep learning-based eating micro-expression and food satisfaction correlation analysis method and application

By analyzing users' micro-expressions and physiological health indicators during eating using deep learning technology, personalized menu recommendations are generated, resolving the contradiction between user satisfaction and health constraints in existing meal planning systems and achieving a balanced recommendation that considers both health and taste.

CN122244925APending Publication Date: 2026-06-19YUNNAN AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
YUNNAN AGRICULTURAL UNIVERSITY
Filing Date
2026-03-26
Publication Date
2026-06-19

Smart Images

  • Figure CN122244925A_ABST
    Figure CN122244925A_ABST
Patent Text Reader

Abstract

This invention relates to the field of health data analysis technology, and particularly to a method and application for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning. The method includes the following steps: S1, capturing facial video images of users during the eating process using a camera device, and extracting eating facial feature sequences from them; S2, separating chewing actions and facial muscle movements from the eating facial feature sequences based on the spatial displacement differences of facial key points, and obtaining target micro-expression feature vectors. In this invention, by cross-evaluating the captured eating micro-expression emotional state with the user's physiological health indicators, nutritional threshold constraints and weight corrections are applied to the initial food preferences generated based on emotions. This allows for strict control of the intake of core risk nutrients while catering to the user's personal taste satisfaction, ultimately generating personalized recommended recipes that balance emotional experience and medical health standards.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of health data analysis technology, and in particular to a method and application for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning. Background Technology

[0002] With the deep integration and rapid evolution of digital health management, IoT technology, and intelligent recommendation algorithms, personalized meal planning systems for patients with chronic diseases, those with hypertension and hyperglycemia, and those seeking sub-health are gradually becoming widespread. These systems are increasingly becoming important tools for precision nutrition intervention. Their basic principle typically involves collecting an individual's static physiological parameters such as height, weight, and basal metabolic rate, and combining this with a taste preference questionnaire manually filled out by the user during initial registration to construct a basic user profile. In daily use, the system either directly distributes standardized nutritional meal plans for a fixed period using expert-preset fixed dietary guide templates, or it uses simple tracking and statistics of the user's historical ordering frequency and payment records on the smart ordering platform, employing conventional collaborative filtering algorithms to guess the user's dietary preferences, and then automatically generating daily recommended menus that seem to match the user's taste preferences and claim to meet the daily total calorie requirements. However, existing meal recommendation systems have significant shortcomings in balancing user satisfaction and medical and health constraints. They often either exceed the limits of risky ingredients such as high sugar and high fat due to excessive pursuit of user taste, or generate bland and tasteless recipes due to strict and rigid adherence to medical nutrition indicators, resulting in users' extreme resistance to eating and extremely poor long-term compliance. Summary of the Invention

[0003] To address the above shortcomings, this invention provides a method and application for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning, aiming to improve the significant deficiencies of existing meal recommendation systems in balancing user satisfaction and medical and health constraints.

[0004] In a first aspect, the present invention provides the following technical solution: a method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning, comprising the following steps: S1. Collect facial video images of users during the process of eating food using camera equipment, and extract facial feature sequences from them; S2. Based on the spatial displacement difference of facial key points, the chewing action and facial muscle movement of the eating facial feature sequence are separated to obtain the target micro-expression feature vector. S3. Input the target micro-expression feature vector into a pre-trained deep learning micro-expression recognition model for feature extraction and classification, and output multi-dimensional emotional state labels; S4. Map and fuse the multi-dimensional emotional state tags with the food intake records obtained during the same time period when the facial video images were captured to construct an initial emotional-food preference profile. S5. Obtain the user's physiological health indicators, evaluate the initial emotional-food preference profile through preset nutritional threshold rules, and generate a preference correction matrix; S6. Based on the preference correction matrix, the preset basic recommendation weights of dishes are numerically adjusted to obtain the target flavor recommendation weight parameters; S7. Based on the target flavor recommendation weight parameters, match and filter in the preset dish database to generate the target recommended recipe.

[0005] Preferably, in step S1, the step of acquiring facial video images of the user during the process of eating food using a camera device and extracting facial feature sequences from them specifically includes the following steps: The system captures raw facial video streams of the user during the eating process using camera equipment. The original facial video stream is segmented frame by frame to obtain continuous facial image frames; The facial key points are located in the continuous facial image frames using an active appearance model, and the coordinate sequence of the facial key points is extracted. The facial key point coordinate sequence is spliced ​​together according to the time dimension to extract the facial feature sequence of eating.

[0006] Preferably, in step S2, obtaining the target micro-expression feature vector specifically includes the following steps: Extract the relative displacement signals of the key point regions of the mandible and the key point regions of the corners of the eyes and lips from the facial feature sequence of the eating; A low-pass filter is used to filter out the high-frequency displacement signal representing chewing action in the relative displacement signal, while retaining the low-frequency displacement signal representing facial muscle movement. The low-frequency displacement signal is vectorized to obtain the target micro-expression feature vector.

[0007] Preferably, in step S3, the output of multi-dimensional emotional state labels specifically includes the following steps: The target micro-expression feature vector is input into the convolutional layer of the pre-trained deep learning micro-expression recognition model to extract local spatial features and obtain a facial latent feature map. The facial latent feature map is input into the long short-term memory network layer of the pre-trained deep learning micro-expression recognition model to extract temporal features and obtain temporal emotion feature vectors. The temporal sentiment feature vectors are mapped to a preset sentiment state classification space through a fully connected layer, and multi-dimensional sentiment state labels are output.

[0008] Preferably, in step S4, constructing the initial emotional-food preference profile specifically includes the following steps: Acquire the smart device weighing data and dish identification code corresponding to the same time period as the facial video image, parse and obtain the dish ingredient information, and generate a dish intake record; The dish ingredient information in the dish intake record is mapped and bound to the multi-dimensional emotional state label to generate an emotional-ingredient correspondence pair; The emotional-component correspondence is used to update the preset user historical preference matrix and construct an initial emotional-food preference profile.

[0009] Preferably, in step S5, generating the preference correction matrix specifically includes the following steps: The user's physiological health indicators are retrieved through the database interface; Input the physiological health indicators into the nutritional threshold rules to calculate the upper limit of the intake threshold for each nutrient component; The initial emotional-food preference profile is analyzed, and target food ingredients with multi-dimensional emotional state tags that are associated with preset positive emotional dimensions are extracted. The intake of the target food ingredients is calculated based on the food intake records, and the intake is compared with the upper limit of the intake threshold to calculate the ingredient spillover coefficient. A corresponding numerical adjustment factor is generated based on the component spillover coefficient, and a preference correction matrix is ​​generated using the numerical adjustment factor.

[0010] Preferably, in step S6, obtaining the target flavor recommendation weight parameters specifically includes the following steps: Obtain the preset basic recommendation weights for dishes for the user; The preset basic recommendation weights of the dishes are multiplied by the preference correction matrix to obtain the weight adjustment offset. The weight adjustment offset is superimposed on the preset basic recommendation weight of the dish and normalized to obtain the target flavor recommendation weight parameter.

[0011] Preferably, in step S7, generating the target recommended recipe specifically includes the following steps: Extract multiple candidate dishes and the corresponding candidate dish feature vectors from the preset dish database; Calculate the cosine similarity between the target flavor recommendation weight parameter and the feature vector of each candidate dish; The candidate dishes are sorted in descending order of cosine similarity, and the target candidate dish combinations that fall within a preset threshold in the sorting results are extracted to generate the target recommended recipe.

[0012] Secondly, the present invention provides the following technical solution: the application of a deep learning-based method for analyzing the correlation between eating micro-expressions and food satisfaction, which is applied to a personalized health meal planning system for a specific population.

[0013] The present invention has the following beneficial effects: 1. In this invention, by cross-evaluating the captured micro-expression emotions during eating with the user's physiological health indicators, nutritional threshold constraints and weight corrections are applied to the initial food preferences generated based on emotions. This allows for strict control over the intake of core risk nutrients while catering to the user's personal taste satisfaction, ultimately generating personalized recommended recipes that take into account both emotional experience and medical health standards.

[0014] 2. This invention separates the high-frequency chewing physical displacement and the low-frequency facial muscle movement displacement in a continuous eating facial feature sequence, effectively filtering out the deformation interference caused by the violent opening and closing of the jaw in the eating scene, thereby extracting a pure target micro-expression feature vector and improving the accuracy of deep learning models in classifying eating emotions.

[0015] 3. This invention binds the multi-dimensional emotional category tags identified by vision with the weighted intake data of physical dishes obtained synchronously by smart devices through feature mapping, so as to accurately transform subjective and difficult-to-quantify dining emotions into the preference weight values ​​of specific food components, thereby constructing an objective and rigorous digital preference profile that does not require human intervention for scoring. Attached Figure Description

[0016] Figure 1 This is a flowchart of the deep learning-based method for analyzing the correlation between eating micro-expressions and food satisfaction proposed in this invention. Detailed Implementation

[0017] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] This invention provides a method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning, such as... Figure 1 As shown, it includes the following steps: S1. Collect facial video images of users during the process of eating food using camera equipment, and extract facial feature sequences from them; Furthermore, in S1, facial video images of the user during the process of eating food are captured by a camera device, and the facial feature sequence of eating is extracted from them, specifically including the following steps: The system captures raw facial video streams of the user during the eating process using camera equipment. The original facial video stream is segmented frame by frame to obtain continuous facial image frames; Facial key points are located in consecutive facial image frames using an active appearance model, and the coordinate sequence of facial key points is extracted. The facial key point coordinate sequence is spliced ​​together according to the time dimension to extract the facial feature sequence of eating.

[0019] Specifically, the system acquires raw facial video streams of the user during eating using a camera positioned directly in front of the user. To fully capture the extremely short-lived micro-expression changes, the camera continuously captures images of the user's facial area at a fixed high sampling frame rate, obtaining dynamic video files containing the user's gaze shifts, chewing movements, and facial muscle changes. The processing terminal then performs frame-by-frame segmentation of the raw facial video stream according to a fixed time step, discarding redundant information between frames to obtain a time-ordered sequence of facial image frames.

[0020] After acquiring the facial image frame sequence, the system uses a pre-trained active appearance model to locate facial key points and extract their coordinate sequences. The active appearance model fits the user's current facial feature state by fusing a shape model and a global texture model. The system inputs each facial image frame into the active appearance model and iteratively finds the optimal position of each facial key point by minimizing the texture error between the target facial image and the model-synthesized image. The error minimization objective function for this process is formulated as follows: Furthermore, based on this, we find the shape parameter vector that minimizes the error. .in This represents the appearance matching error calculated by the model on the current facial image frame. Represents the square of the L2 norm. Indicates the first Pixel features of a facial image frame. Represented by the shape parameter vector The global geometric deformation mapping function for control. The basic reference coordinates for the defined facial key points, The pre-defined base reference coordinates for the appearance model of the activity Average facial texture at the location, The total number of apparent principal components. For the first The weight coefficients of each apparent principal component To base reference coordinates The first Each apparent principal component feature. The shape parameter vector is iteratively optimized using the Gauss-Newton method. Until the error Convergence allows us to output the precise pixel locations of facial key points in the current frame. For the A facial image frame, the output sequence of facial key point coordinates is represented as follows: The superscript This represents the transpose operation of a vector. This represents the total number of facial key points extracted. Representing the The first frame The horizontal pixel coordinates of each facial key point Representing the The first frame The vertical pixel coordinates of each facial key point are precisely covered by the user's jawline and the core dynamic areas for eating, such as the muscles at the corners of the eyes and lips.

[0021] After locating all image frames, the system concatenates and fuses the extracted facial keypoint coordinate sequences from each frame along the time dimension to obtain the eating facial feature sequence. The duration window length for the eating micro-expression analysis is set to [value missing]. The system generates a sequence of facial key point coordinates for each frame within a time window. Arrange and combine them according to time sequence to generate the feature matrix. The splicing formula is: The characteristic matrix This is the final extracted facial feature sequence of eating.

[0022] By transforming unstructured facial video streams into structured spatial coordinate sequence matrices, the absolute spatial position changes of various facial regions over time were quantified, providing a basic input data source for the next step of separating high-frequency displacement signals representing chewing and low-frequency displacement signals representing facial muscles.

[0023] S2. Based on the spatial displacement difference of facial key points, the chewing action and facial muscle movement of the eating facial feature sequence are separated to obtain the target micro-expression feature vector. Furthermore, in S2, obtaining the target micro-expression feature vector specifically includes the following steps: Extract the relative displacement signals of the mandibular key point region and the eye and lip key point regions from the facial feature sequence of eating; A low-pass filter is used to filter out the high-frequency displacement signal representing chewing action in the relative displacement signal, while retaining the low-frequency displacement signal representing facial muscle movement. The low-frequency displacement signal is vectorized to obtain the target micro-expression feature vector.

[0024] Specifically, the system obtains the total number of frames in the time window generated by the aforementioned steps. Facial feature sequence feature matrix of eating Subsequently, based on a preset facial region index table, the extracted facial key points are clearly divided into the mandibular key point region and the corner of the eye and lip key point regions. To eliminate the interference of rigid spatial displacement caused by the overall head movement during eating, the system uses the coordinates of the center key point of the mandibular key point region as a spatial reference anchor point, and extracts the dynamic Euclidean distance change of each key point in the corner of the eye and lip key point regions relative to that anchor point in each frame. The index of the mandibular center key point is set as... The key point set contained in the key point area at the corner of the eye and the corner of the lip is: Then for the set of the th The key points at the corners of the eyes and lips, in the first The formula for the relative displacement signal of a frame is: ; in Indicates the first The key points at the corners of the eyes and lips are... The relative spatial displacement of the frame with respect to the central key point of the mandible. and Indicates from the characteristic matrix The extracted first The first frame The horizontal and vertical pixel coordinates of key points at the corners of the eyes and lips and Indicates from the characteristic matrix The extracted first Key points of the mandible center in the frame The horizontal and vertical pixel coordinates. The system iterates through all... Frame image and From all the key points in the set, a time series of relative displacement signals is extracted, which changes dynamically with chewing and facial expressions.

[0025] Because the chewing motion during eating has significant periodic mechanical characteristics and its spatial displacement frequency is generally higher than the long-term facial muscle contraction frequency, the system employs a finite-length unit impulse response low-pass filter to perform one-dimensional convolution on the extracted relative displacement signal time series. This filter aims to intercept the high-frequency displacement signal transmitted by the violent opening and closing of the jaw caused by chewing food, while allowing the weak and low-frequency displacement signals of the eye and lip muscle movements reflecting the user's eating emotions to pass through. The low-pass filter calculation formula is as follows: ; in This indicates the number of cells retained after filtering. The first frame The low-frequency displacement signals representing facial muscle movements are found at key points at the corners of the eyes and lips. Indicates the delay stride index of the convolution operation. Indicates the order of the filter. This indicates the first... Each filter coefficient For the preceding input The relative displacement signal value of the frame, when the time index The relative displacement signal value corresponding to a value less than or equal to zero is set to zero. Through this convolution operation, the system filters out high-frequency chewing physical deformation signals and retains low-frequency displacement signals.

[0026] After completing low-pass filtering of all key points at the corners of the eyes and lips, the system splices and vectorizes the low-frequency displacement signals of each key point within the time window. Let there be a total of [number missing] key points within the corner of the eyes and lips. At key points, the system flattens and merges all low-frequency displacement signals into a one-dimensional feature vector according to the key point index order and time frame order. The vectorized concatenation formula is as follows: ; in This represents the final extracted target micro-expression feature vector. Indicates the first The key points at the corners of the eyes and lips are... The low-frequency displacement signal value of the frame, here the variable The value range is from integer 1 to 1. ,variable The value range is from integer 1 to 1. , This indicates the total number of key points contained in the key point area at the corner of the eye and the corner of the lip. Indicates the total number of frames in the time window, superscript This represents the transpose operation of a vector.

[0027] This step successfully decoupled the mixed facial movements during the eating scene, eliminated irrelevant deformation interference caused by chewing food, and extracted facial muscle feature vectors that truly reflect satisfaction. This provides clean input data for subsequent feature extraction and sentiment classification in deep learning models. .

[0028] S3. Input the target micro-expression feature vector into a pre-trained deep learning micro-expression recognition model for feature extraction and classification, and output multi-dimensional emotional state labels. Furthermore, in S3, outputting multi-dimensional sentiment state labels specifically includes the following steps: The target micro-expression feature vector is input into the convolutional layer of a pre-trained deep learning micro-expression recognition model to extract local spatial features and obtain a facial latent feature map. The latent facial feature map is input into the long short-term memory network layer of a pre-trained deep learning micro-expression recognition model to extract temporal features and obtain temporal emotion feature vectors. The temporal sentiment feature vectors are mapped to a predefined sentiment state classification space through a fully connected layer, and multi-dimensional sentiment state labels are output.

[0029] Specifically, in the offline construction phase, the pre-trained deep learning micro-expression recognition model collects a large number of clean facial displacement sequence samples in eating scenarios as model input data, and uses the corresponding manually labeled eating emotion category probabilities as the true labels. The error between the predicted probability and the true label is calculated using the cross-entropy loss function. Then, the network layer weight parameters are continuously updated through backpropagation until the loss value converges, thus training the pre-trained deep learning micro-expression recognition model. In the online application phase, after obtaining the one-dimensional target micro-expression feature vector extracted in the aforementioned steps, to adapt to the temporal input format of the convolutional network, the system first reshapes the one-dimensional target micro-expression feature vector according to the total number of frames in the aforementioned time window. It then extracts all keypoint low-frequency displacement signals at the same time step and combines them into a feature input vector for a single time step, thereby converting the target micro-expression feature vector into a facial displacement feature input vector sequence containing all time series steps. Subsequently, the system inputs the facial displacement feature input vector sequence into the convolutional layer of the pre-trained deep learning micro-expression recognition model for local spatial feature extraction. The convolutional layer utilizes a one-dimensional convolutional kernel to perform sliding operations along the time dimension on the sequence, extracting and capturing spatial latent features that capture the coordinated contraction of local facial muscle groups. The convolutional layer in the first... The feature extraction formula for each time step is: ; in Indicates the convolutional layer at the 1st... The feature vector of the facial latent feature map output at each time step. This represents a linear rectification nonlinear activation function. Indicates the current time step index. This indicates the size of the one-dimensional convolution kernel window. This represents the stride index within the convolution window and its value ranges from zero to... , This indicates that the one-dimensional convolution kernel is in the window at the th... The weight parameter matrix for each sliding position. The representation obtained by reshaping the target micro-expression feature vector at the 1st... The facial displacement feature input vector at each time step This represents the bias vector of the convolutional layer.

[0030] After obtaining the facial latent feature map sequence, the system inputs the sequence into the Long Short-Term Memory (LSTM) network layer of a pre-trained deep learning micro-expression recognition model for temporal feature extraction. The LSM network layer, through its internal gating mechanism, extracts features from the dynamic evolution of micro-expressions during eating, combining long-term memory with short-term fluctuations, capturing the temporal emotional changes in the subtle muscle movements of the user while chewing food. The LSM network layer... The formula for extracting cell state and hidden state features at each time step is: ; ; in Indicates the first The time-series cell state vector updated at each time step. This represents the temporal cell state vector of the previous time step. Indicates the first The hidden state vector updated at each time step, This represents the hidden state vector from the previous time step. , and These represent the facial latent feature maps at the current time step. and the hidden state vector of the previous time step The forget gate activation vector, input gate activation vector, and output gate activation vector are obtained by affine transformation and logistic sigmoid activation function calculation. This represents the bitwise multiplication operation of vector elements. This represents the weight parameter matrix that maps the input facial latent feature map to the cell state. This represents the transition weight parameter matrix from the hidden state at the previous time step to the current cell state. A bias vector representing the cell state. Let represent the hyperbolic tangent activation function, given the preceding state vector at the initial first time step. and All are initialized to a zero vector. The total number of time steps included in the time window is set to... The system extracts the Long Short-Term Memory (LSTM) network layer at the last time step, i.e., the first... The final hidden state vector output at each time step As a temporal sentiment feature vector that integrates the entire dynamic process of eating. .

[0031] Obtaining temporal sentiment feature vectors Then, the system uses a fully connected layer to process the temporal sentiment feature vector. The data is mapped to a predefined emotional state classification space to output multi-dimensional emotional state labels. The fully connected layer linearly combines the high-dimensional temporal features and uses a normalized exponential function to transform the values ​​into probability distributions for each predefined eating emotional category. The predefined dimensions of the emotional classification space include pleasure, indifference, surprise, and disgust, representing levels of satisfaction with food. The formula for calculating the mapping and label output of the fully connected layer is as follows: ; in This represents the output multi-dimensional sentiment category probability distribution vector. Represents the normalized exponential function, This represents the weight parameter matrix of the fully connected layer. This represents the temporal sentiment feature vector extracted in the preceding steps. This represents the bias vector of the fully connected layer. The system directly outputs a multi-dimensional emotion category probability distribution vector containing probability values ​​of all preset dimensions as a multi-dimensional emotion state label that can quantitatively characterize the current user's eating experience. By transforming the pure facial displacement physical features after filtering out chewing interference into a satisfaction emotion probability distribution matrix in a high-dimensional semantic space, a precise mapping from underlying motion data to quantitative indicators of eating emotions is achieved, providing a numerical label basis for the next step of accurately constructing the mapping relationship between emotions and food ingredient preferences.

[0032] S4. Map and fuse the multi-dimensional emotional state labels with the food intake records obtained during the same time period of facial video image acquisition to construct an initial emotional-food preference profile. Furthermore, in S4, constructing the initial emotional-food preference profile specifically includes the following steps: Acquire the smart device weighing data and dish identification code corresponding to the facial video image within the same time period, parse and obtain the dish ingredient information, and generate a dish intake record; The food ingredient information in the food intake record is mapped and bound to multi-dimensional emotional state tags to generate emotional-ingredient correspondence pairs; By utilizing the correspondence between sentiment and ingredients, an initial sentiment-food preference profile is constructed by updating the preset user historical preference matrix.

[0033] Specifically, the system acquires smart device weighing data and food identification codes that are strictly aligned with the timestamps of facial video image capture via an IoT interface. The system calculates the actual total weight of the food intake by matching the weighing data at the start and end times of the meal based on the timestamps. Subsequently, the system uses the food identification codes to query a pre-defined food nutrition database and parses the basic component proportion vector of the target dish. The system then performs a scalar multiplication operation between the total food intake and the basic component proportion vector to calculate the actual intake of specific food components, thereby generating a food intake record vector. The calculation formula is: ; in This represents the generated dish intake record vector used to record the actual intake of each food component, and the mathematical dimension of the dish intake record vector is... 3D column vector, This indicates the initial weight of the food recorded by the smart device at the start time of the meal. This indicates the final weight of the dish recorded by the smart device at the end of the facial video image acquisition period. This represents the basic column vector indicating the proportions of ingredients in a dish, obtained through a dish identification code query. This represents the total number of dimensions for food component categories in the preset food nutrition database.

[0034] After generating the food intake record vector, the system maps and binds the specific component intake information in the vector with the multi-dimensional emotional state labels output in the previous steps to generate an emotional component correspondence matrix. The system uses a vector outer product calculation method to perform feature cross-fusion between the one-dimensional column vector reflecting food intake and the one-dimensional row vector reflecting emotional state distribution, enabling each food component to establish a quantitative weighted relationship with the preset emotional dimensions of pleasure and aversion in the multi-dimensional emotional state labels. The outer product calculation formula is: ; in This represents the generated emotional component correspondence matrix used to quantify the binding relationship between components and emotions during the current eating period, and the mathematical dimension of the emotional component correspondence matrix is... OK List, This indicates that the dimension generated by the pre-calculation is The column vector of food intake records, This indicates that the dimensions of the output extracted in the aforementioned steps are... A multi-dimensional sentiment state label column vector. This indicates the total number of dimensions of the emotional categories included in the preset emotional state classification space, indicated by the superscript. This represents the transpose operation on the column vector of multi-dimensional sentiment state labels, converting it into a row vector. The sentiment component correspondence matrix... Each element in the table represents the strength of the correlation between the intake of a specific ingredient in a dish and a specific emotional dimension.

[0035] After obtaining the sentiment component correspondence matrix, the system uses this matrix to update the preset user historical preference matrix in the database, thereby constructing an initial sentiment-based food preference profile for the current moment. Considering that user taste preferences have temporal continuity and slowly drift with recent dietary experiences, the system employs a moving exponential weighted average algorithm to integrate the new preference matrix data generated in the current time period into the long-term historical preference data. The preference update formula is: ; in This indicates that the dimensions obtained after the merge and update are also the same. OK The initial emotional food preference profile matrix of the column, This indicates a preset preference update learning rate factor with a value greater than zero and less than one. This represents a matrix showing the correspondence between emotional components generated during the current eating period. This represents a user history preference matrix that is pre-stored in the database for a specific user. This matrix is ​​used when the specific user is a first-time user with no prior history of using the system. The default initialization is all zeros. OK The system inputs the initial emotional food preference profile matrix as a digital archive depicting the user's emotional tendencies towards food into the storage module for retrieval and correction in the next step.

[0036] By performing a deep feature cross-product between the visually recognized emotion probability tags and the physical quantities of food intake collected by IoT devices, the subjective micro-expression feedback was objectively quantified into specific food components, constructing a preference profile matrix with clear numerical basis. This provides a basic user feature tensor input for the next step of introducing physiological health indicators for nutritional constraints and preference correction.

[0037] S5. Obtain the user's physiological health indicators, evaluate the initial emotional-food preference profile through preset nutritional threshold rules, and generate a preference correction matrix; Furthermore, in S5, generating the preference correction matrix specifically includes the following steps: Retrieve the user's physiological health indicators through the database interface; Input physiological health indicators into the nutritional threshold rules to calculate the upper limit of intake thresholds for each nutrient. Analyze the initial emotional-food preference profile, extract target food components with multi-dimensional emotional state labels that are associated with preset positive emotional dimensions, calculate the intake of target food components based on food intake records, compare the intake with the upper limit of the intake threshold, and calculate the component spillover coefficient. Numerical adjustment factors are generated based on the component spillover coefficients, and preference correction matrices are generated using these numerical adjustment factors.

[0038] Specifically, the system first retrieves the target user's physiological health indicator data via a network database interface. This data includes the user's blood glucose, blood pressure, and blood lipid levels. After acquiring this data, the system inputs it into preset nutritional threshold rules. These rules are mathematical models developed by an expert system to limit the intake of the three core risk nutrients: carbohydrates, salt, and fat. The models internally store conversion coefficients between various physiological health indicators and their corresponding food components. Based on these nutritional threshold rules, the system calculates the upper limit of intake for each nutrient in a single meal. The calculation formula is as follows: ; in Represents the number calculated by the system. The upper limit of the single intake threshold for a certain nutrient. This represents the dimension index of the nutrient component and its value ranges from one to... , This represents the total number of dimensions of food component types defined in the preceding steps. Indicates the preset for the first Baseline values ​​for the quality intake of each nutrient. Indicates that for the first The dimensionless risk penalty coefficient preset for each nutrient component. This indicates the data extracted from physiological health indicators related to the first... The actual physiological values ​​corresponding to each nutrient component. This represents the standard measured values ​​for corresponding physiological indicators in a normal human body. This represents the pre-defined dimensional mapping coefficient for converting differences in physiological indicators into deductions for component mass. For common nutritional components for which no risk is specified in the physiological health indicators, this represents the corresponding risk penalty coefficient. The fixed value is zero. When the calculated value obtained through the pre-subtraction operation is less than the preset minimum survival quality value, the upper limit of the intake threshold is reached. The value is forcibly assigned to the preset minimum quality required for survival.

[0039] After obtaining the upper limit of intake thresholds for each nutrient, the system analyzes the dimensions constructed in the preceding steps as follows: OK The system extracts target dish components associated with multi-dimensional emotional state labels that have a preset positive emotional dimension from the initial emotional dish preference profile matrix. Specifically, the system designates the preset positive emotional dimension as the pleasant emotional category in the aforementioned emotional classification space and obtains the column index corresponding to the pleasant emotional category in the initial emotional dish preference profile matrix. The system extracts an initial emotional food preference profile matrix. The middle belongs to the column index The system generates a numerical vector, marking the food ingredients corresponding to the row indices whose values ​​are greater than a preset preference threshold as target dish ingredients. The system generates a dimension based on the aforementioned steps. The intake of the target dish component is calculated from the column vector of the dish intake records. The actual intake of the target dish component is then compared with a pre-calculated upper limit intake threshold by division to calculate the component spillover coefficient. The calculation formula is as follows: ; in Indicates that for the first The component spillover coefficient, which includes emotional preference weights, is calculated from the components of the target dish. This represents the first element extracted from the column vector of food intake records generated in the preceding steps. The actual intake mass value of the target dish ingredients. Indicates the pre-calculation for the first Upper limit of the quality intake threshold for a specific food ingredient. This indicates that the system extracts the value located at the th position from the initial emotional food preference profile matrix constructed in the preceding steps. Line and number The numerical value of the preference intensity of the column. This represents the preset preference threshold constant.

[0040] After calculating the component spillover coefficients for each target dish ingredient, the system generates corresponding numerical adjustment factors based on these coefficients. To ensure a smooth transition of health constraints, the system uses a logic function with exponential decay characteristics to map the component spillover coefficients to numerical adjustment factors between zero and one. The mapping formula is as follows: ; in Indicates that for the first Numerical adjustment factors for the generation of target dish components. Represents the natural constant. The smoothness coefficient represents the degree of steepness of the penalty. Indicates the pre-calculation for the first The component spillover coefficient of the target dish ingredient. The corresponding numerical adjustment factor for ordinary dish ingredients not labeled as target dish ingredients. The value is fixed at 1.

[0041] After obtaining the numerical adjustment factors for all components, the system uses these factors to generate a preference correction matrix. The system then processes all components according to their dimensional index order. The numerical adjustment factors of various food components are combined to construct a feature column vector, which is then used as the preference correction matrix. The construction formula is as follows: ; in This indicates that the final generated mathematical dimension is The preference correction matrix of the column vector. to They respectively represent the first to the second. Numerical regulatory factors of various food components, superscript The transpose operation represents the transformation of a row vector into a column vector. The system outputs the preference correction matrix to the recommendation system module for weight adjustment in the next stage.

[0042] By using users' medical vital signs data as hard constraints and incorporating the subjective emotional preference profile obtained in the previous step into the overflow calculation, quantitative attenuation factors were calculated for components that are liked but harmful to health, and a preference correction matrix with strict alignment to the component dimensions was generated. This provides a rigid constraint feature vector for the next step of making a weight compromise between catering to tastes and ensuring medical health.

[0043] S6. Based on the preference correction matrix, the preset basic recommendation weights of dishes are numerically adjusted to obtain the target flavor recommendation weight parameters; Furthermore, in S6, obtaining the target flavor recommendation weight parameters specifically includes the following steps: Obtain the preset basic recommendation weights for dishes based on the user; The weight adjustment offset is obtained by multiplying the preset basic recommendation weights of dishes with the preference correction matrix. The weight adjustment offset is superimposed on the preset basic recommendation weight of the dish and normalized to obtain the target flavor recommendation weight parameter.

[0044] Specifically, the system first retrieves preset basic recommendation weights for dishes for the target user from the user configuration database stored within the recommendation system module. These preset basic recommendation weights are a set of numerical values ​​representing the user's preference for various core nutritional components, initialized by the system using a collaborative filtering algorithm based on the user's historical order records. To ensure strict alignment with the data structure generated in the preceding steps, the system organizes the retrieved preset basic recommendation weights into a basic weight column vector with the same number of dimensions as the total number of food component categories. Each value in the basic weight column vector represents the user's initial expectation of the strength of their preference for the corresponding food component under conditions without health constraints.

[0045] After obtaining the preset basic recommendation weights for dishes, the system performs a dot product operation on these weights and the preference correction matrix output in the previous steps to obtain the weight adjustment offset. The system uses a Hadamard product operation, multiplying vectors element-wise, and leverages numerical adjustment factors containing health constraints and emotional feedback information within the preference correction matrix to perform a targeted decay assessment of the basic preferences. To highlight the magnitude of the correction, the system calculates the attenuated weight share as the offset. The calculation formula is as follows: ; in This represents the column vector of weight adjustment offsets calculated by the system. This indicates that the dimension retrieved from the user configuration database is... The basic weight column vector of the dimension is the preset basic recommendation weight of the dishes. The Hadamard product dot product operation represents the element-wise multiplication of two vectors. This indicates that the mathematical dimension generated in the aforementioned steps based on physiological health indicators and component spillover coefficients is also [missing information]. The preference correction matrix of the column vector. This represents a column vector with all elements equal to one, having the same dimensions as the preference correction matrix. Through the aforementioned combination of subtraction and dot product operations, for target dish components with a numerical adjustment factor less than one, the corresponding weight adjustment offset is negative, indicating the need to reduce the intensity of the recommended preference; for ordinary dish components with a numerical adjustment factor equal to one, the corresponding weight adjustment offset is zero.

[0046] After calculating the weight adjustment offset, the system adds the offset to the preset basic recommendation weights for the dishes and performs normalization to obtain the target flavor recommendation weight parameters. The system first performs vector addition to integrate the offset into the basic weight system for targeted penalty. Then, it uses a normalization function to compress all the superimposed weight values ​​to a unified probability distribution range between zero and one, ensuring that the final generated recommendation parameters maintain scale consistency in subsequent similarity calculations in the recommendation algorithm. To prevent computational anomalies where the sum of all weights equals zero under extreme penalty conditions, the system introduces a minimal smoothing constant in the denominator. The calculation and normalization formulas are as follows: ; in This represents the column vector of target flavor recommendation weight parameters obtained after numerical adjustment and normalization. This represents the aforementioned preset basic recommendation weight column vector for dishes. This represents the column vector of weight adjustment offsets calculated in the preceding steps. The summation operation in the denominator represents the summation of all element values ​​in the superimposed column vector to obtain the total weight value. This represents the dimensional index of the food component and its value ranges from one to... , This represents the total number of dimensions for the aforementioned food component types. This represents the first element in the preset basic recommendation weight column vector for dishes. The value of each element. This represents the first column in the weight adjustment offset column vector. The value of each element. This represents a preset positive smoothing constant close to zero, set to prevent division by zero errors.

[0047] By directly applying the calibration matrix, which integrates medical indicators and emotional profiles, to the underlying feature vector of the recommendation system, the weight reconstruction was completed from solely catering to taste to being driven by both health and emotional indicators. This provides normalized core computational parameters for accurately matching personalized and medically compliant healthy recipes.

[0048] S7. Based on the target flavor recommendation weight parameters, match and filter in the preset dish database to generate the target recommended recipe.

[0049] Furthermore, in S7, generating the target recommended recipe specifically includes the following steps: Extract multiple candidate dishes and their corresponding feature vectors from a pre-set menu database. Calculate the cosine similarity between the target flavor recommendation weight parameters and the feature vectors of each candidate dish; Multiple candidate dishes are sorted in descending order of cosine similarity. The target candidate dish combinations that fall within a preset threshold in the sorting results are extracted to generate the target recommended recipe.

[0050] Specifically, the system first connects to a pre-defined menu database via a network interface. Based on the current mealtime, the system performs a preliminary search of the database to extract multiple candidate dishes that match the current mealtime. To measure similarity with the target flavor recommendation weight parameters generated in the previous steps within a unified feature space, the system synchronously retrieves the candidate dish feature vectors corresponding to each candidate dish from the pre-defined menu database. The candidate dish feature vector is a set of values ​​consisting of the relative mass proportions of various food components contained in the corresponding candidate dish, and the types and order of the food components are strictly aligned with the aforementioned defined basic weight column vector.

[0051] After extracting multiple candidate dishes and their corresponding feature vectors, the system calculates the cosine similarity between the target flavor recommendation weight parameters output from the previous steps and the feature vectors of each candidate dish. The cosine similarity algorithm quantitatively assesses the matching tightness between the user's expected flavor preferences (after dual correction by physiological health indicators and emotional feedback) and the nutritional distribution of each actual candidate dish by measuring the cosine value of the angle between the directions of two multidimensional vectors in the feature space. The closer the cosine value is to a positive value, the better the dish meets the user's current overall needs. The total number of candidate dishes is set to [value missing]. The system for the first The feature vectors of each candidate dish are multiplied by the target flavor recommendation weights, and then divided by the product of their L2 norms. The calculation formula is as follows: ; in This indicates that the target flavor recommendation weight parameter calculated by the system is related to the first... The cosine similarity values ​​of the feature vectors of each candidate dish. This represents the index identifier of the candidate dish, and its value ranges from an integer to... , This represents the total number of candidate dishes extracted from the preset menu database. This indicates that the dimension of the output after offset adjustment and normalization in the preceding steps is... The target flavor recommendation weight parameter column vector, This indicates the total number of dimensions for the aforementioned food component types, indicated by the superscript. The transpose operation of a vector is used to convert a column vector into a row vector so that it can be used to perform a dot product operation with the feature vectors of candidate dishes. This indicates that the dimensions extracted from the preset menu database are the same. Victor Each candidate dish feature vector This represents the length of the L2 norm of the column vector containing the target flavor recommendation weight parameters. Indicates the first The length of the second norm modulus of the feature vectors of each candidate dish.

[0052] After calculating the cosine similarity of all candidate dishes, the system globally sorts the candidate dishes in descending order of cosine similarity. The system sets a preset quantity threshold based on the number of dish types typically selected by a user per meal. The system then extracts the top-ranking target candidate dishes from the global ranking results that fall within the preset quantity threshold ranking range to form a target candidate dish combination, ultimately generating a target recommended menu and outputting it to the terminal.

[0053] By using a multi-dimensional feature space cosine similarity measurement mechanism, the system achieves efficient comparison between the target flavor recommendation weight parameters and the actual nutritional components of the dishes. This accurately filters out dishes that deviate from the user's emotional and health profiles, and objectively outputs target recommended recipes that take into account both user eating satisfaction and physiological health indicator limitations.

[0054] This invention also provides an application of a deep learning-based method for analyzing the correlation between eating micro-expressions and food satisfaction, which can be applied to a personalized health meal planning system for specific populations.

[0055] Finally, it should be noted that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning, characterized in that, Includes the following steps: S1. Collect facial video images of users during the process of eating food using camera equipment, and extract facial feature sequences from them; S2. Based on the spatial displacement difference of facial key points, the chewing action and facial muscle movement of the eating facial feature sequence are separated to obtain the target micro-expression feature vector. S3. Input the target micro-expression feature vector into a pre-trained deep learning micro-expression recognition model for feature extraction and classification, and output multi-dimensional emotional state labels; S4. Map and fuse the multi-dimensional emotional state tags with the food intake records obtained during the same time period when the facial video images were captured to construct an initial emotional-food preference profile. S5. Obtain the user's physiological health indicators, evaluate the initial emotional-food preference profile through preset nutritional threshold rules, and generate a preference correction matrix; S6. Based on the preference correction matrix, the preset basic recommendation weights of dishes are numerically adjusted to obtain the target flavor recommendation weight parameters; S7. Based on the target flavor recommendation weight parameters, match and filter in the preset dish database to generate the target recommended recipe. 2.The deep learning-based eating micro-expression and food satisfaction association analysis method according to claim 1, characterized in that, In step S1, the step of acquiring facial video images of the user during the process of eating food using a camera device and extracting facial feature sequences from them specifically includes the following steps: The system captures raw facial video streams of the user during the eating process using camera equipment. The original facial video stream is segmented frame by frame to obtain continuous facial image frames; The facial key points are located in the continuous facial image frames using an active appearance model, and the coordinate sequence of the facial key points is extracted. The facial key point coordinate sequence is spliced ​​together according to the time dimension to extract the facial feature sequence of eating.

3. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S2, obtaining the target micro-expression feature vector specifically includes the following steps: Extract the relative displacement signals of the key point regions of the mandible and the key point regions of the corners of the eyes and lips from the facial feature sequence of the eating; A low-pass filter is used to filter out the high-frequency displacement signal representing chewing action in the relative displacement signal, while retaining the low-frequency displacement signal representing facial muscle movement. The low-frequency displacement signal is vectorized to obtain the target micro-expression feature vector.

4. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S3, the output of multi-dimensional sentiment state labels specifically includes the following steps: The target micro-expression feature vector is input into the convolutional layer of the pre-trained deep learning micro-expression recognition model to extract local spatial features and obtain a facial latent feature map. The facial latent feature map is input into the long short-term memory network layer of the pre-trained deep learning micro-expression recognition model to extract temporal features and obtain temporal emotion feature vectors. The temporal sentiment feature vectors are mapped to a preset sentiment state classification space through a fully connected layer, and multi-dimensional sentiment state labels are output.

5. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S4, constructing the initial emotional-food preference profile specifically includes the following steps: Acquire the smart device weighing data and dish identification code corresponding to the same time period as the facial video image, parse and obtain the dish ingredient information, and generate a dish intake record; The dish ingredient information in the dish intake record is mapped and bound to the multi-dimensional emotional state label to generate an emotional-ingredient correspondence pair; The emotional-component correspondence is used to update the preset user historical preference matrix and construct an initial emotional-food preference profile.

6. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S5, generating the preference correction matrix specifically includes the following steps: The user's physiological health indicators are retrieved through the database interface; Input the physiological health indicators into the nutritional threshold rules to calculate the upper limit of the intake threshold for each nutrient component; The initial emotional-food preference profile is analyzed, and target food ingredients with multi-dimensional emotional state tags that are associated with preset positive emotional dimensions are extracted. The intake of the target food ingredients is calculated based on the food intake records, and the intake is compared with the upper limit of the intake threshold to calculate the ingredient spillover coefficient. A corresponding numerical adjustment factor is generated based on the component spillover coefficient, and a preference correction matrix is ​​generated using the numerical adjustment factor.

7. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S6, obtaining the target flavor recommendation weight parameters specifically includes the following steps: Obtain the preset basic recommendation weights for dishes for the user; The preset basic recommendation weights of the dishes are multiplied by the preference correction matrix to obtain the weight adjustment offset. The weight adjustment offset is superimposed on the preset basic recommendation weight of the dish and normalized to obtain the target flavor recommendation weight parameter.

8. The method for analyzing the correlation between eating micro-expressions and food satisfaction based on deep learning according to claim 1, characterized in that, In step S7, generating the target recommended recipe specifically includes the following steps: Extract multiple candidate dishes and the corresponding candidate dish feature vectors from the preset dish database; Calculate the cosine similarity between the target flavor recommendation weight parameter and the feature vector of each candidate dish; The candidate dishes are sorted in descending order of cosine similarity, and the target candidate dish combinations that fall within a preset threshold in the sorting results are extracted to generate the target recommended recipe.

9. An application of a deep learning-based method for analyzing the correlation between eating micro-expressions and food satisfaction, used in the deep learning-based method for analyzing the correlation between eating micro-expressions and food satisfaction as described in any one of claims 1-8, characterized in that... Personalized healthy meal planning systems applied to specific population groups.