Method and device for analyzing nutrition of dishes, equipment and storage medium

By automatically identifying dishes using image acquisition equipment and a dish recognition model, and combining this with a nutritional analysis model to calculate nutritional components, the problem of calculation errors caused by manual input by users in existing technologies has been solved, achieving efficient and accurate nutritional analysis of dishes.

CN122244850APending Publication Date: 2026-06-19SHENZHEN MSU-BIT UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN MSU-BIT UNIVERSITY
Filing Date
2026-02-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing nutritional calculation tools require users to manually enter ingredient information, which can easily lead to inaccurate nutritional calculation results due to input errors.

Method used

Images are captured by image acquisition devices, and dishes are automatically identified using multi-target detection and dish recognition models. Nutritional components are calculated using a nutritional analysis model, and gesture recognition is supported to optimize dish selection.

Benefits of technology

It enables nutritional analysis of dishes without requiring users to manually input ingredient information, thus improving the accuracy and efficiency of the analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244850A_ABST
    Figure CN122244850A_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, device, and storage medium for nutritional analysis of dishes, relating to the field of artificial intelligence technology. The method includes: acquiring an image to be identified using a preset image acquisition device; performing multi-target detection on the image to be identified to obtain candidate detection boxes; identifying the dish using a preset dish recognition model based on the candidate detection boxes to obtain dish identification information; and analyzing the dish identification information using a preset nutritional analysis model to obtain the nutritional components of the dish. This application automatically completes dish detection, identification, and nutritional analysis from images, eliminating the need for users to manually input ingredient-related information or manually calculate nutritional data, effectively improving the accuracy of nutritional analysis of dishes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, equipment and storage medium for nutritional analysis of dishes. Background Technology

[0002] As people's living standards improve, they are paying more and more attention to healthy eating, and the nutritional components of dishes have become an indispensable core element in daily dietary considerations. Most dietary nutrient calculation tools on the market require users to manually enter the names, quantities, and cooking methods of ingredients. However, user input errors can easily lead to inaccurate nutrient calculation results. Summary of the Invention

[0003] The main purpose of this application is to provide a method, apparatus, equipment and storage medium for nutritional analysis of dishes, with the aim of improving the accuracy of nutritional analysis of dishes.

[0004] To achieve the above objectives, this application proposes a method for nutritional analysis of dishes, the method comprising:

[0005] The image to be identified is acquired through a preset image acquisition device; Perform multi-target detection on the image to be identified to obtain candidate detection boxes; Based on the candidate detection box, the dish recognition information is obtained by using a preset dish recognition model; Based on the dish identification information, the nutritional components of the dish are obtained by using a preset nutritional analysis model.

[0006] In one embodiment, the step of identifying dish recognition information using a preset dish recognition model based on the candidate detection box includes: The image to be recognized is input into a preset gesture recognition model to detect the fingertip position in the image to be recognized; Based on the fingertip position, a target detection box is obtained by filtering among the candidate detection boxes; Based on the target detection box, the dish recognition model is used to identify the dish and obtain the dish recognition information.

[0007] In one embodiment, the step of filtering the candidate detection boxes to obtain the target detection box based on the fingertip position includes: Determine whether the fingertip position falls within any of the candidate detection boxes; If so, the candidate detection box where the fingertip is located is taken as the target detection box; If not, then determine the target distance between the fingertip position and the center point of each of the candidate detection boxes; and filter the candidate detection boxes according to the target distance to obtain the target detection box.

[0008] In one embodiment, the dish recognition model includes a first backbone network, a second backbone network, a third backbone network, and a gating classification head; the step of using the dish recognition model to identify the target detection box and obtain the dish recognition information includes: Based on the target detection box, feature extraction is performed using the first backbone network, the second backbone network, and the third backbone network respectively, to obtain the feature vectors output by the first backbone network, the second backbone network, and the third backbone network respectively; The feature vectors are fused to obtain a fused feature vector. The fused feature vector is input into the gating classification head for adaptive weighting and classification to obtain the dish recognition information.

[0009] In one embodiment, after analyzing the nutritional components of the dish using a preset nutritional analysis model based on the dish identification information, the method further includes: Obtain the user's historical dietary data over a preset time period; The historical dietary data and the preset standard nutritional information are compared and analyzed to obtain the nutritional difference results. Based on the nutritional differences, recommended recipes are generated and pushed to the user.

[0010] In one embodiment, the step of performing multi-target detection on the image to be identified to obtain candidate detection boxes includes: The image to be identified is input into a preset target detection model for multi-target detection to obtain candidate detection boxes; The target detection model is trained according to the following steps: Acquire several image samples; For each image sample, the image sample is input into the detection model to be trained, and a predicted bounding box is output. Determine the intersection-union ratio of the predicted bounding box and the ground truth bounding box associated with the image sample, the center distance between the predicted bounding box and the ground truth bounding box, and determine the aspect ratios of the predicted bounding box and the ground truth bounding box respectively; The target detection model is trained based on the intersection-union ratio, the center distance, and the aspect ratio to obtain the target detection model.

[0011] In one embodiment, acquiring the image to be identified through a preset image acquisition device includes: In response to the user's wake-up command, the image acquisition device is activated to acquire the image to be identified.

[0012] Furthermore, to achieve the above objectives, this application also proposes a food nutrition analysis device, which includes: The image acquisition module is used to acquire the image to be recognized through a preset image acquisition device; The target detection module is used to perform multi-target detection on the image to be identified and obtain candidate detection boxes; The dish recognition module is used to identify dish recognition information based on the candidate detection box and a preset dish recognition model. The nutrition analysis module is used to analyze the nutritional components of a dish based on its identification information using a preset nutrition analysis model.

[0013] In addition, to achieve the above objectives, this application also proposes a food nutrition analysis device, the device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the food nutrition analysis method described above.

[0014] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the above-described method for nutritional analysis of dishes.

[0015] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the nutritional analysis method for dishes as described above.

[0016] This application provides a method, apparatus, device, and storage medium for nutritional analysis of dishes. The method includes: acquiring an image to be identified using a preset image acquisition device; performing multi-target detection on the image to be identified to obtain candidate detection boxes; identifying dish identification information using a preset dish identification model based on the candidate detection boxes; and analyzing the dish identification information using a preset nutritional analysis model to obtain the nutritional components of the dish. By automatically completing dish detection, identification, and nutritional analysis from the image, the entire process eliminates the need for users to manually input ingredient information or manually calculate nutritional data, effectively improving the accuracy of nutritional analysis. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0018] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the first embodiment of the nutritional analysis method for dishes in this application. Figure 2 This is a flowchart illustrating Example 2 of the nutritional analysis method for vegetables in this application. Figure 3 This is a flowchart illustrating Example 3 of the nutritional analysis method for dishes in this application. Figure 4 This is a flowchart illustrating Example 4 of the nutritional analysis method for dishes in this application. Figure 5 This is a schematic diagram of the modular structure of the food nutrition analysis device according to an embodiment of this application; Figure 6 This is a schematic diagram of the hardware operating environment involved in the nutritional analysis method for dishes in this application.

[0020] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0021] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0022] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0023] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, mobile phone, wearable device, etc., or an electronic device, big data service platform, or food nutrition analysis system capable of realizing the above functions. Wearable devices include smart glasses, smartwatches, etc. This embodiment and the following embodiments will be described.

[0024] Based on this, the embodiments of this application provide a method for nutritional analysis of dishes, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the nutritional analysis method for dishes in this application.

[0025] Step S11: Acquire the image to be identified using a preset image acquisition device; In this embodiment, the image acquisition device is activated in response to a user's wake-up command, so as to acquire the image to be identified. For example, the wake-up command includes commands such as taking a picture or food recognition.

[0026] It should be noted that the image acquisition device is located within the wearable device. This embodiment uses smart glasses as an example. The frame body is made of lightweight, high-strength PLA (polylactic acid) material, integrally molded using FDM (Fused Deposition Modeling) technology, with the overall weight controlled within a preset weight, for example, less than 45 grams, significantly improving wearing comfort. In terms of structural design, the frame is optimized based on ergonomic principles, with the temples designed as a foldable structure, achieving 180-degree folding via a designed hinge connection. For the integration requirements of hardware components, the front frame of the frame is designed with precise mounting holes. The mounting holes for the camera (i.e., the image acquisition device in this embodiment) use a standard M2 thread design, supporting quick installation and removal. The hole positions are optically calculated to ensure the optimal shooting angle. The AR optical engine mounting holes achieve a secure connection through a snap-fit ​​method. All holes have undergone stress analysis optimization to ensure minimal weight and volume while maintaining structural strength.

[0027] Furthermore, the smart glasses in this embodiment employ advanced AR waveguide technology to achieve the fusion display of virtual information and real-world scenes. The AR optical engine is driven and controlled via a GPIO (General-Purpose Input / Output) interface, converting digital image signals into optical signals, which are then projected onto the AR lenses through an optical system. The waveguide technology utilizes the principle of total internal reflection, transmitting light through multiple reflections within the waveguide, ultimately forming a clear superimposed virtual image in the user's field of vision. Optionally, the waveguide screen is approximately 600 pixels. 800. Furthermore, the optical waveguide lens employs a multi-layer optical thin-film structure, achieving efficient coupling and transmission of light through nanoscale precision machining technology. The optomechanical drive circuitry is integrated on the Raspberry Pi device's expansion board and communicates via the I2C bus.

[0028] It should be noted that this embodiment constructs a cloud-edge collaborative IoT architecture, rationally distributing complex AI computing tasks to the cloud server and Raspberry Pi edge devices. The cloud server is responsible for deploying multimodal AI models and a nutrition analysis engine, undertaking complex visual understanding and natural language processing tasks, and uniformly managing historical information such as user nutrition data. The edge devices directly control physical hardware such as cameras, microphones, and speakers to achieve local language processing and image acquisition, ensuring real-time interaction. Optionally, the cloud server and edge devices use the WebSocket real-time communication protocol to achieve bidirectional data exchange.

[0029] It should be noted that the image acquisition device is installed inside the smart glasses and connects to the Raspberry Pi via a pre-defined interface, such as a MIPI CSI-2 interface, supporting real-time video streaming and high-quality image capture. The smart glasses also have a microphone capable of recognizing user voice input commands.

[0030] In addition, the smart glasses are equipped with bone conduction speakers. Bone conduction transducers are mounted at the ends of the temples, transmitting sound to the user's temporal bone through vibration, providing an open-back audio experience. This design avoids the ear canal blockage common with traditional headphones, maintaining the ability to perceive ambient sound, making it particularly suitable for outdoor use. The system supports volume adjustment and sound effect optimization, ensuring clear and comfortable voice prompts.

[0031] Specifically, the microphone collects the user's voice signal and converts it into a voice data stream, which is then sent to a cloud server. The voice recognition model on the cloud server recognizes the voice data stream and obtains corresponding instructions. Based on these instructions, a preset function program is triggered, which calls the hardware to take a picture and upload it, ultimately obtaining the image to be recognized.

[0032] Step S12: Perform multi-target detection on the image to be identified to obtain candidate detection boxes; It should be noted that if the object detection module does not identify any food items or gestures, the model will not output the corresponding detection boxes. If multiple food items are detected but no gestures are detected, the model will select the food item that occupies the largest proportion of the image to complete the recognition.

[0033] In this example, the image to be identified is input into a preset target detection model for multi-target detection, resulting in at least one detection box and a confidence score for each detection box. Detection boxes with confidence scores higher than a preset confidence threshold are selected as candidate detection boxes. Optionally, the target detection model can be YOLOv8 or other detection models, capable of detecting various food categories, including fruits, vegetables, and staple foods. It should be noted that YOLOv8 employs a single-stage detection architecture, achieving real-time processing speed while maintaining high accuracy, meeting the real-time interaction requirements of smart glasses.

[0034] More specifically, the process of inputting the image to be identified into a preset target detection model for multi-target detection includes steps 1 to 4, wherein: Step 1: Input the image to be identified into the backbone network of the object detection model. This will generate multiple feature maps at different scales. For example, feature maps at different scales include: P3 feature map (large scale, high resolution): corresponding to shallow features, focusing on small food items (e.g., a nut, a leaf); P5 feature map (small scale, low resolution): corresponding to deep features, focusing on large plates (e.g., a whole plate of steak, a hot pot); P4 feature map: intermediate scale, taking into account medium-sized targets.

[0035] Step 2: High semantic information is passed from top to bottom through FPN (Feature Pyramid Network) to ensure that shallow feature maps obtain sufficient semantic information. Optionally, starting from the deepest P5 feature map, the size of the P5 feature map is enlarged to the same size as the P4 feature map of the previous layer through upsampling. Figure 1 The magnified P5 feature map is then fused (added or spliced) with the P4 feature map element by element. The fused feature map P4' retains the detailed information of the P4 feature map and obtains the high semantic information of the P5 feature map.

[0036] Furthermore, the fused P4' feature map is upsampled and then fused with the P3 feature map to obtain the P3' feature map. At this point, the P3' feature map has both shallow high-resolution details (capable of locating small ingredients) and deep semantic information (capable of accurately identifying the category of small ingredients).

[0037] In addition, a PAN (Path Aggregation Network) is added to the FPN to compensate for the problem of insufficient propagation of shallow details to deeper layers in the traditional FPN.

[0038] Step 3: Transfer high-resolution details from bottom to top via PAN (enabling large target localization). Optionally, starting from the P3' feature map after FPN fusion, downsample the size of the P3' feature map to match the P4' feature map. Figure 1 The reduced P3' feature map is then fused with the P4' feature map to obtain the P4'' feature map. At this point, P4'' retains the high semantic information of the P4' feature map and supplements the detailed information of the P3' feature map.

[0039] Furthermore, the fused P4'' feature map is downsampled and then fused with the P5 feature map to obtain the P5'' feature map, so that the deep feature map also has sufficient details, improving the positioning accuracy of the large plate (for example, accurately outlining the edge of the plate and separating the boundary between the plate and the food).

[0040] Step 4: After bidirectional multi-scale feature fusion via FPN (top-down) and PAN (bottom-up), three optimized feature maps P3', P4'', and P5'' are obtained. Each feature map possesses both high-resolution details and high semantic information. Specifically: the large-scale feature map P3' is used to detect small food targets (such as small-particle ingredients or chopped side dishes), accurately locating the pixel-level position of the target and accurately determining the food category; the small-scale feature map P5'' is used to detect large food targets (such as whole dishes or large plates), clearly identifying the target category and accurately defining the complete range of the target; and the intermediate-scale feature map P4'' covers medium-sized food targets, filling the recognition gap between large and small targets. Furthermore, by performing detection and segmentation prediction on these three scale feature maps respectively, accurate recognition of food targets of all sizes, from large plates to small ingredients, is achieved, avoiding the problems of missed detection of large targets and false detection of small targets.

[0041] Step S13: Based on the candidate detection box, the dish recognition information is obtained by using a preset dish recognition model; It should be noted that the dish identification information includes the dish name, identification confidence level, and other information.

[0042] In one embodiment, the candidate detection box obtained from the target detection is directly input into a preset dish recognition model to perform dish recognition and obtain dish recognition information.

[0043] In another embodiment, the user can specify a dish to be identified; optionally, during image capture, the user can point to the dish to be identified with their finger, input the image to be identified into a preset gesture recognition model, detect and determine the fingertip position in the image to be identified; further, based on the fingertip position, a matching target detection box is selected from each of the candidate detection boxes, optionally, the candidate detection box closest to the fingertip position is taken as the target detection box; then, based on the target detection box, the dish recognition model is used to identify the image region corresponding to the target detection box, and output dish recognition information.

[0044] It should be noted that the dish recognition model includes a first backbone network, a second backbone network, a third backbone network, and a gating classification head. The first backbone network, the second backbone network, and the third backbone network are regnet_y_16gf, regnet_y_32gf, and regnet_y_128gf, respectively. The first backbone network, the second backbone network, and the third backbone network extract features from the detection boxes to be recognized. The three feature vectors extracted by the three backbone networks are fused together. The gating classification head adaptively weights the feature representations at different scales to achieve effective fusion of multi-scale features, thereby classifying and obtaining dish recognition information.

[0045] Step S14: Based on the dish identification information, analyze the dish using a preset nutritional analysis model to obtain the nutritional components of the dish.

[0046] It should be noted that a data distillation technique based on a pre-set large model was employed to distill high-quality datasets at the level of professional nutritionists, thereby constructing a professional knowledge base for the field of nutrition and health. This distillation process utilizes the powerful reasoning capabilities and rich nutritional knowledge of the large model to generate training data covering professional content such as nutritional analysis of dishes, dietary pairing suggestions, and health assessments.

[0047] It should be noted that the nutritional components of the dishes mentioned include macronutrients, such as calories, protein, fat, carbohydrates, and dietary fiber; vitamins, such as vitamin A, vitamin B1, vitamin B2, and vitamin B6; minerals, such as calcium, iron, zinc, selenium, magnesium, phosphorus, potassium, sodium, copper, and manganese; and other nutrients, such as cholesterol, saturated fatty acids, unsaturated fatty acids, trans fatty acids, and sugars.

[0048] In this embodiment, the nutritional analysis model has a built-in or associated authoritative nutritional database containing basic nutritional information for various standard dishes (such as the content of protein, carbohydrates, fat, vitamins, and minerals per 100 grams of dish, as well as core indicators such as calories and dietary fiber). Further, based on the identified standard dish name, the nutritional analysis model accurately extracts the basic nutritional data of the corresponding dish from the database; for compound dishes (such as platters or rice bowls), it extracts the basic nutritional data of each sub-dish separately. Then, the nutritional components of the dish are calculated based on the basic nutritional data.

[0049] This embodiment acquires an image to be identified using a pre-set image acquisition device; performs multi-target detection on the image to obtain candidate detection boxes; identifies the dish using a pre-set dish recognition model based on the candidate detection boxes; and analyzes the dish's nutritional components using a pre-set nutritional analysis model based on the dish recognition information. By automatically completing dish detection, recognition, and nutritional analysis from the image, the entire process eliminates the need for users to manually input ingredient information or manually calculate nutritional data, effectively improving the accuracy of dish nutritional analysis.

[0050] In one feasible implementation, refer to Figure 2 , Figure 2 This is a flowchart illustrating Embodiment 2 of the nutritional analysis method for dishes in this application; based on the candidate detection box, a preset dish recognition model is used to identify dish recognition information, including: Step S21: Input the image to be recognized into a preset gesture recognition model to detect the fingertip position in the image to be recognized; In this embodiment, the gesture recognition model is based on a deep learning-based hand keypoint detection algorithm, which can accurately locate 21 hand joints, including wrist, knuckle, and fingertip joints. In this embodiment, the depth features of the hand region are extracted through the backbone network of the gesture recognition model, and then the keypoint prediction head is used to analyze the features of the hand region, outputting the two-dimensional pixel coordinates of each joint in the image; further, based on the two-dimensional pixel coordinates of each joint, the user's fingertip position is obtained.

[0051] Step S22: Based on the fingertip position, a target detection box is obtained by filtering from each of the candidate detection boxes; In this embodiment, precise selection of the target detection box is achieved by calculating the Euclidean distance between the fingertip position and the center point of the food detection box. More specifically, it is determined whether the fingertip position falls within the region of any of the candidate detection boxes; if so, the candidate detection box containing the fingertip position is directly used as the target detection box; if not, the center point of each candidate detection box is determined, and then the target distance between the fingertip position and each of the center points is calculated; among the candidate detection boxes, the candidate detection box with the smallest target distance is selected as the target detection box.

[0052] Step S23: Based on the target detection box, the dish recognition model is used to perform recognition to obtain the dish recognition information.

[0053] It should be noted that the model is trained on a large-scale dataset of Chinese food images, covering a variety of common Chinese dishes from the eight major cuisines, including Sichuan, Cantonese, Hunan, and Shandong cuisines.

[0054] In this embodiment, three backbone networks, regnet_y_16gf, regnet_y_32gf, and regnet_y_128gf, are used to extract feature vectors from the image regions corresponding to the detection boxes. Then, feature fusion is performed on the feature vectors output by each backbone network. A gated classification head is used to automatically assign weights through a gating mechanism. Furthermore, the fused feature vector is multiplied element-wise with the gate weights, and the weighted feature vectors are summed to obtain the final adaptive fused feature vector, thus achieving effective fusion of multi-scale features. The latter half of the gated classification head is the classification module, which completes the dish recognition based on the final adaptive fused feature vector, thereby obtaining the dish recognition information through classification inference.

[0055] This embodiment inputs the image to be recognized into a preset gesture recognition model to detect the fingertip position in the image; based on the fingertip position, a target detection box is obtained by filtering from the candidate detection boxes; based on the target detection box, the dish recognition model is used for recognition to obtain the dish recognition information. Users can independently select the dish to be recognized, thereby accurately obtaining the corresponding dish recognition information with the help of the dish recognition model.

[0056] In one feasible implementation, refer to Figure 3 , Figure 3 This is a flowchart illustrating the third embodiment of the nutritional analysis method for dishes in this application; based on the target detection box, the dish recognition model is used for recognition to obtain the dish recognition information, including: Step S31: Based on the target detection box, feature extraction is performed using the first backbone network, the second backbone network, and the third backbone network respectively to obtain the feature vectors output by the first backbone network, the second backbone network, and the third backbone network respectively. In this embodiment, the image region corresponding to the target detection box is preprocessed, for example, the region is cropped and the size is scaled to a uniform input size preset by the backbone network. The preprocessed image region is then input into three pre-trained backbone networks. That is, the first backbone network, the second backbone network, and the third backbone network are used for feature extraction, parallel and independent inference, and each outputs its corresponding feature vector.

[0057] Step S32: Perform feature fusion on each of the feature vectors to obtain a fused feature vector; It should be noted that the dimensions of the three feature vectors may be inconsistent. Therefore, each feature vector can be mapped to the same target feature dimension, and then the mapped feature vectors can be fused to obtain a fused feature vector.

[0058] Step S33: Input the fused feature vector into the gating classification head for adaptive weighting and classification to obtain the dish recognition information.

[0059] It should be noted that the gating classification head includes a gating module and a classification module. The classification module includes a fully connected layer and a Softmax activation function. In this embodiment, the fused feature vector is input into the gating module, which outputs three corresponding gating weights. The gating module automatically adjusts the weight allocation according to the specific characteristics of the dish, and then performs element-wise weighted multiplication of the gating weights with the fused feature vector. The weighted feature vectors are then summed to obtain the adaptively fused target feature vector. The classification module completes dish recognition based on the adaptively fused target feature vector, outputting the probability value of each dish category. The dish category with the highest probability value is selected as the dish name, and this highest probability value is used as the recognition confidence level. After integration, the complete dish recognition information is output.

[0060] This embodiment extracts features from the target detection box using the first backbone network, the second backbone network, and the third backbone network, respectively, to obtain feature vectors output by each network. These feature vectors are then fused to obtain a fused feature vector. This fused feature vector is input to the gating classification head for adaptive weighting and classification to obtain the dish recognition information. By using multiple backbone networks for feature extraction and a gating classification head to automatically allocate weights through a gating mechanism, efficient fusion of multi-dimensional features is achieved, effectively improving the accuracy and robustness of dish recognition.

[0061] In one feasible implementation, refer to Figure 4 , Figure 4 This is a flowchart illustrating Embodiment 4 of the nutritional analysis method for dishes in this application; after analyzing the nutritional components of the dish using a preset nutritional analysis model based on the dish identification information, the method further includes: Step S41: Obtain the user's historical dietary data within a preset time period; Step S42: Perform a differential comparative analysis on the historical dietary data and the preset standard nutritional information to obtain the nutritional difference results; Step S43: Based on the nutritional differences, generate a recommended recipe and push the recommended recipe to the user.

[0062] In this embodiment, historical dietary data of the user over a preset time period is acquired, including the names of the dishes consumed during the preset time period, the portion size per serving, and the corresponding basic nutrient intakes such as protein, carbohydrates, and vitamins. The preset time period can be set according to actual circumstances and is not specifically limited here. Further, the historical dietary data and preset standard nutritional information are compared and analyzed to calculate the difference between the user's actual intake and the standard value, as well as the deviation ratio. Further, based on the difference and deviation ratio, the nutritional difference results are presented in a structured form, clearly indicating the degree of deviation for various nutrients. Then, based on the nutritional difference results, dishes that can specifically fill gaps or control excesses are selected from a preset recipe library to generate recommended recipes, which are then pushed to the user.

[0063] In other embodiments, the user's dietary preferences can be analyzed based on the names of the dishes consumed by the user within a preset time period, and then recommended recipes can be generated based on the nutritional differences and the user's dietary preferences.

[0064] In addition, other embodiments support the analysis of nutritional intake trends from historical dietary data at different time dimensions such as day, week, and month, to help users understand changes in their eating habits: when nutritional intake deviates significantly from the preset nutritional recommendation value, an alert is automatically triggered and specific improvement suggestions are provided. Among these, the nutritional recommendation value can be automatically adjusted according to the user's age, gender, activity level, and other personal characteristics.

[0065] This embodiment performs a differential comparative analysis of a user's historical dietary data over a preset time period and preset standard nutritional information to obtain nutritional difference results. Based on these results, a recommended recipe is generated and pushed to the user. This achieves the generation of nutritionally balanced recommended recipes based on the user's personalized dietary data, helping users to scientifically adjust their daily diet.

[0066] In one feasible implementation, the object detection model is trained according to the following steps: Step S51: Obtain several image samples; Step S52: For each image sample, input the image sample into the detection model to be trained and output a prediction box; In this embodiment, several image samples related to dishes are collected in advance. Each image sample has a corresponding ground truth bounding box pre-set. The ground truth bounding box refers to a manually labeled bounding box that accurately defines the actual position and outline of the dish target in the image sample. For each image sample, it is input into the detection model to be trained to perform a multi-target detection operation for dishes, and the predicted bounding box corresponding to the detected dish target in the image sample is output.

[0067] Step S53: Determine the intersection-union ratio of the predicted bounding box and the ground truth bounding box associated with the image sample, the center distance between the predicted bounding box and the ground truth bounding box, and determine the aspect ratios of the predicted bounding box and the ground truth bounding box respectively; In this embodiment, the intersection-over-union ratio (IoU) is a core indicator for measuring the degree of overlap between the predicted bounding box and the ground truth bounding box, with a value ranging from [0,1]. A higher value indicates a higher degree of overlap. The IoU is calculated based on the predicted bounding box and the ground truth bounding box associated with the image sample. Furthermore, the center point of the predicted bounding box and the center distance between them are determined. Additionally, the core of the aspect ratio calculation is to quantify the shape difference between the predicted bounding box and the ground truth bounding box; the corresponding aspect ratio is calculated based on the length and width of the predicted bounding box and the ground truth bounding box.

[0068] Step S54: Train the detection model to be trained based on the intersection-union ratio, the center distance, and the aspect ratio to obtain the target detection model.

[0069] In this embodiment, a loss value is calculated based on the intersection-union ratio, the center distance, and the aspect ratio. Then, the detection model to be trained is trained based on the loss value until the termination condition is met, thereby obtaining the target detection model. The termination condition includes the loss value converging and the training count reaching the maximum number of training iterations.

[0070] This embodiment inputs each image sample into a detection model to be trained, outputting a predicted bounding box. It determines the intersection-over-union (IoU) ratio between the predicted bounding box and the ground truth bounding box associated with the image sample, the center distance between the predicted bounding box and the ground truth bounding box, and the aspect ratios of the predicted bounding box and the ground truth bounding box, respectively. Based on the IoU, the center distance, and the aspect ratios, the detection model is trained to obtain the target detection model. Training the model using the IoU, the center distance, and the aspect ratios improves the accuracy of target detection.

[0071] It should be noted that the examples in the figure are only for understanding this application and do not constitute a limitation on the nutritional analysis method of the dishes in this application. Any simple modifications based on this technical concept are within the protection scope of this application.

[0072] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0073] This application also provides a food nutrition analysis device; please refer to [reference needed]. Figure 5 , Figure 5This is a schematic diagram of the module structure of the food nutrition analysis device according to an embodiment of this application; the food nutrition analysis device includes: Image acquisition module 61 is used to acquire the image to be recognized through a preset image acquisition device; The target detection module 62 is used to perform multi-target detection on the image to be identified and obtain candidate detection boxes; The dish recognition module 63 is used to identify dish recognition information based on the candidate detection box and a preset dish recognition model. The nutrition analysis module 64 is used to analyze the nutritional components of the dish based on the dish identification information using a preset nutrition analysis model.

[0074] The dish recognition module 63 is also used for: The image to be recognized is input into a preset gesture recognition model to detect the fingertip position in the image to be recognized; Based on the fingertip position, a target detection box is obtained by filtering among the candidate detection boxes; Based on the target detection box, the dish recognition model is used to identify the dish and obtain the dish recognition information.

[0075] The dish recognition module 63 is also used for: Determine whether the fingertip position falls within any of the candidate detection boxes; If so, the candidate detection box where the fingertip is located is taken as the target detection box; If not, then determine the target distance between the fingertip position and the center point of each of the candidate detection boxes; and filter the candidate detection boxes according to the target distance to obtain the target detection box.

[0076] The food identification model includes a first backbone network, a second backbone network, a third backbone network, and a gating classification head.

[0077] The dish recognition module 63 is also used for: Based on the target detection box, feature extraction is performed using the first backbone network, the second backbone network, and the third backbone network respectively, to obtain the feature vectors output by the first backbone network, the second backbone network, and the third backbone network respectively; The feature vectors are fused to obtain a fused feature vector. The fused feature vector is input into the gating classification head for adaptive weighting and classification to obtain the dish recognition information.

[0078] The target detection module 62 is also used for: The image to be identified is input into a preset target detection model for multi-target detection to obtain candidate detection boxes.

[0079] The nutritional analysis device for the dishes also includes: The diet data acquisition module is used to acquire the user's historical diet data over a preset time period; The analysis module is used to perform a comparative analysis of the historical dietary data and preset standard nutritional information to obtain nutritional difference results. The recommendation module is used to generate recommended recipes based on the nutritional differences and push the recommended recipes to the user.

[0080] The nutritional analysis device for dishes provided in this application, employing the nutritional analysis method for dishes in the above embodiments, can solve the technical problems mentioned in the background art. Compared with the prior art, the beneficial effects of the nutritional analysis device for dishes provided in this application are the same as the beneficial effects of the nutritional analysis method for dishes provided in the above embodiments, and other technical features in the nutritional analysis device for dishes are the same as the features disclosed in the methods of the above embodiments, and will not be repeated here.

[0081] This application provides a food nutrition analysis device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the food nutrition analysis method in the above embodiment 1.

[0082] The following is for reference. Figure 6 , Figure 6 This is a schematic diagram of the hardware operating environment involved in the nutritional analysis method for dishes in this application embodiment. The nutritional analysis device for dishes in this application embodiment may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital radio receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (such as in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 6 The food nutrition analysis device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0083] like Figure 6As shown, the food nutrition analysis device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory 1002 or a program loaded from a storage device 1003 into a random access memory 1004. The random access memory 1004 also stores various programs and data required for the operation of the food nutrition analysis device. The processing unit 1001, the read-only memory 1002, and the random access memory 1004 are interconnected via a bus 1005. An input / output interface 1006 is also connected to the bus. Typically, the following systems can be connected to the input / output interface 1006: input devices 1007 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; storage devices 1003 including, for example, magnetic tape, hard disk, etc.; and communication devices 1009. Communication device 1009 allows the food nutrition analysis equipment to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows food nutrition analysis equipment with various systems, it should be understood that it is not required to implement or possess all of the systems shown. More or fewer systems may be implemented alternatively.

[0084] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from read-only memory 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.

[0085] The nutritional analysis device for dishes provided in this application, employing the nutritional analysis method for dishes in the above embodiments, can solve the technical problems mentioned in the background art. Compared with the prior art, the beneficial effects of the nutritional analysis device for dishes provided in this application are the same as the beneficial effects of the nutritional analysis method for dishes provided in the above embodiments, and other technical features of the nutritional analysis device for dishes are the same as those disclosed in the method of the previous embodiment, and will not be repeated here.

[0086] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.

[0087] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0088] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, which are used to execute the food nutrition analysis method in the above embodiments.

[0089] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems or devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0090] The aforementioned computer-readable storage medium may be included in the food nutrition analysis equipment; or it may exist independently and not assembled into the food nutrition analysis equipment.

[0091] The aforementioned computer-readable storage medium carries one or more programs. When these programs are executed by the food nutrition analysis device, the device performs the following actions: acquires an image to be identified using a preset image acquisition device; performs multi-target detection on the image to be identified to obtain candidate detection boxes; identifies the food using a preset food recognition model based on the candidate detection boxes to obtain food identification information; and analyzes the food identification information using a preset nutrition analysis model to obtain the nutritional components of the food. By automatically completing food detection, identification, and nutritional analysis from images, the entire process eliminates the need for users to manually input ingredient-related information or manually calculate nutritional data, effectively improving the accuracy of food nutrition analysis.

[0092] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0093] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0094] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.

[0095] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described method for nutritional analysis of dishes, and is capable of solving the technical problems described in the background section. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as the beneficial effects of the method for nutritional analysis of dishes provided in the above embodiments, and will not be repeated here.

[0096] This application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-described method for nutritional analysis of dishes.

[0097] The computer program product provided in this application can solve the technical problems described in the background section. Compared with the prior art, the beneficial effects of the computer program product provided in the embodiments of this application are the same as the beneficial effects of the food nutrition analysis method provided in the above embodiments, and will not be repeated here.

[0098] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0099] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A method for nutritional analysis of dishes, characterized in that, include: The image to be identified is acquired through a preset image acquisition device; Perform multi-target detection on the image to be identified to obtain candidate detection boxes; Based on the candidate detection box, the dish recognition information is obtained by using a preset dish recognition model; Based on the dish identification information, the nutritional components of the dish are obtained by using a preset nutritional analysis model.

2. The method for nutritional analysis of dishes as described in claim 1, characterized in that, The step of identifying dish recognition information using a preset dish recognition model based on the candidate detection box includes: The image to be recognized is input into a preset gesture recognition model to detect the fingertip position in the image to be recognized; Based on the fingertip position, a target detection box is obtained by filtering among the candidate detection boxes; Based on the target detection box, the dish recognition model is used to identify the dish and obtain the dish recognition information.

3. The method for nutritional analysis of dishes as described in claim 2, characterized in that, The step of filtering the candidate detection boxes based on the fingertip position to obtain the target detection box includes: Determine whether the fingertip position falls within any of the candidate detection boxes; If so, the candidate detection box where the fingertip is located is taken as the target detection box; If not, then determine the target distance between the fingertip position and the center point of each of the candidate detection boxes; and filter the candidate detection boxes according to the target distance to obtain the target detection box.

4. The method for nutritional analysis of dishes as described in claim 2, characterized in that, The dish recognition model includes a first backbone network, a second backbone network, a third backbone network, and a gating classification head; the step of using the dish recognition model to identify the target detection box and obtain the dish recognition information includes: Based on the target detection box, feature extraction is performed using the first backbone network, the second backbone network, and the third backbone network respectively, to obtain the feature vectors output by the first backbone network, the second backbone network, and the third backbone network respectively; The feature vectors are fused to obtain a fused feature vector. The fused feature vector is input into the gating classification head for adaptive weighting and classification to obtain the dish recognition information.

5. The method for nutritional analysis of dishes as described in claim 1, characterized in that, After analyzing the nutritional components of the dish using a preset nutritional analysis model based on the dish identification information, the process further includes: Obtain the user's historical dietary data over a preset time period; The historical dietary data and the preset standard nutritional information are compared and analyzed to obtain the nutritional difference results. Based on the nutritional differences, recommended recipes are generated and pushed to the user.

6. The method for nutritional analysis of dishes as described in claim 1, characterized in that, The step of performing multi-target detection on the image to be identified to obtain candidate detection boxes includes: The image to be identified is input into a preset target detection model for multi-target detection to obtain candidate detection boxes; The target detection model is trained according to the following steps: Acquire several image samples; For each image sample, the image sample is input into the detection model to be trained, and a predicted bounding box is output. Determine the intersection-union ratio of the predicted bounding box and the ground truth bounding box associated with the image sample, the center distance between the predicted bounding box and the ground truth bounding box, and determine the aspect ratios of the predicted bounding box and the ground truth bounding box respectively; The target detection model is trained based on the intersection-union ratio, the center distance, and the aspect ratio to obtain the target detection model.

7. The method for nutritional analysis of dishes as described in claim 1, characterized in that, The acquisition of the image to be identified through a preset image acquisition device includes: In response to the user's wake-up command, the image acquisition device is activated to acquire the image to be identified.

8. A nutritional analysis device for food, characterized in that, Applications in wearable devices, including: The image acquisition module is used to acquire the image to be recognized through a preset image acquisition device; The target detection module is used to perform multi-target detection on the image to be identified and obtain candidate detection boxes; The dish recognition module is used to identify dish recognition information based on the candidate detection box and a preset dish recognition model. The nutrition analysis module is used to analyze the nutritional components of a dish based on its identification information using a preset nutrition analysis model.

9. A nutritional analysis device for food products, characterized in that, The food nutrition analysis device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the food nutrition analysis method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the food nutrition analysis method as described in any one of claims 1 to 7.