Diet planning method, apparatus, device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining pre-defined rules with dining images and object data, and using a large language model to filter target tasks, dietary suggestions in natural language form are generated, solving the problem of mismatch between large model suggestions and achieving more accurate and personalized dietary planning.

CN122245629APending Publication Date: 2026-06-19北京爱和健康科技服务有限公司

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: 北京爱和健康科技服务有限公司
Filing Date: 2026-03-19
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, when large models rely solely on images of meals to generate dietary recommendations, they cannot match the actual situation of the user, resulting in inaccurate recommendations.

Method used

By acquiring images and data of users dining, and combining them with preset rules, a large language model is used to fuse multi-source information, select suitable target tasks, and generate dietary suggestions in natural language.

Benefits of technology

It improves the personalization and accuracy of dietary planning, reduces the risk of erroneous judgments in large models, and enhances the reliability of dietary recommendations.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245629A_ABST

Patent Text Reader

Abstract

This application discloses a diet planning method, apparatus, device, and storage medium, relating to the field of image processing technology. The method includes: acquiring a user's dining image and object data; acquiring preset rules, which include judgment conditions corresponding to at least one task; determining the target task that the user should perform from at least one task using a large language model based on the preset rules, dining image, and object data; and performing natural language processing on the target task using the large language model to generate output text. This multi-source information (dining image, object data, and preset rules) filtering method avoids the problem in related technologies where the large model only references dining images, leading to a mismatch between the output diet advice and the user's actual situation, thus improving the personalization of diet planning.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a diet planning method, apparatus, device, and storage medium. Background Technology

[0002] Diet planning is used to plan the dietary behavior of users in order to generate dietary recommendations.

[0003] In related technologies, large-scale models can be used to plan users' dietary behavior. Specifically, images of users eating can be input into a large-scale model, which can identify the types of ingredients in the images and the quality of each type of ingredient, and generate dietary recommendations based on its internally trained general nutritional knowledge.

[0004] However, since the large model references only images of meals, the resulting dietary recommendations may not match the actual situation of the user. Summary of the Invention

[0005] This application provides a diet planning method, apparatus, device, and storage medium. The technical solution provided by this application includes the following aspects.

[0006] According to one aspect of the embodiments of this application, a diet planning method is provided, the method comprising: Acquire dining images and object data of the user object, wherein the object data is used to indicate the attribute information and pathological information of the user object; Obtain preset rules, which include at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the dietary behavior of the user, and the judgment condition corresponding to the task is used to evaluate whether the user meets the requirements for performing the task. Based on the judgment conditions corresponding to each task in the preset rules, the dining image, and the object data, the target task that the user object should perform is determined from the at least one task using a large language model. The target task is processed by the large language model to generate output text; wherein the output text is used to provide dietary advice corresponding to the target task to the user in natural language form.

[0007] According to one aspect of the embodiments of this application, a diet planning device is provided, the device comprising: The first acquisition module is used to acquire the dining image and object data of the user object, wherein the object data is used to indicate the attribute information and pathological information of the user object; The second acquisition module is used to acquire preset rules, the preset rules including at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the dietary behavior of the user object, and the judgment condition corresponding to the task is used to evaluate whether the user object meets the requirements for performing the task. The determination module is used to determine the target task that the user object should perform from the at least one task by using a large language model based on the judgment conditions corresponding to each task in the preset rules, the dining image and the object data; The generation module is used to perform natural language organization processing on the target task through the large language model to generate output text; wherein the output text is used to provide dietary suggestions corresponding to the target task to the user in natural language form.

[0008] According to one aspect of the present application, a computer device is provided, the computer device including a processor and a memory, the memory storing a computer program, the computer program being loaded and executed by the processor to implement the above-described diet planning method.

[0009] According to one aspect of the embodiments of this application, a computer-readable storage medium is provided, wherein a computer program is stored in the computer-readable storage medium, the computer program being loaded and executed by a processor to implement the above-described diet planning method.

[0010] According to one aspect of the embodiments of this application, a computer program product is provided, the computer program product including a computer program stored in a computer-readable storage medium, and a processor reading from the computer-readable storage medium and executing the computer program to implement the above-described diet planning method.

[0011] The technical solution provided in this application can bring the following beneficial effects: By acquiring images and data of users' meals and combining this with pre-defined rules, the large language model can accurately select the appropriate target task from multiple sources. This multi-source information-based selection method (meal images, object data, and pre-defined rules) avoids the problem in related technologies where the large model's reference information only includes meal images, leading to mismatches between the output dietary recommendations and the user's actual situation, thus improving the personalization of dietary planning. Furthermore, because the pre-defined rules conform to professional nutritional knowledge, they can constrain the large language model's selection process, avoiding erroneous judgments caused by the "big model illusion," and improving the accuracy of dietary planning. Attached Figure Description

[0012] Figure 1 This is a schematic diagram of a computer system provided in one embodiment of this application; Figure 2 This is a flowchart of a diet planning method provided in one embodiment of this application; Figure 3 This is a flowchart of a diet planning method provided in another embodiment of this application; Figure 4 This is a flowchart of a diet planning method provided in another embodiment of this application; Figure 5 This is a block diagram of a diet planning device provided in one embodiment of this application; Figure 6 This is a structural block diagram of a computer device provided in one embodiment of this application. Detailed Implementation

[0013] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0014] Please refer to Figure 1 This illustration shows a schematic diagram of a computer system provided in one embodiment of this application. The computer system may include: a terminal device 10 and a server 20.

[0015] Terminal device 10 is an electronic device with data computing, processing, and storage functions. In some embodiments, terminal device 10 is used as a client to run a target application, which may include a diet planning application. This diet planning application generates output text based on a dining image input by a user, combined with the user's object data and preset rules; wherein the object data indicates the user's attribute and pathological information, the preset rules include at least one judgment condition corresponding to each task, and the output text provides dietary suggestions corresponding to the target task to the user in natural language.

[0016] The execution entity of the above-mentioned diet planning method can be either terminal device 10 or server 20. When the execution entity is server 20, server 20 sends output text to terminal device 10 so that terminal device 10 can receive and display the output text to the user.

[0017] In some embodiments, the terminal device 10 may include, but is not limited to, at least one of the following: mobile phone, tablet computer, personal computer, vehicle terminal, smart wearable device, smart TV, smart voice interaction device, multimedia playback device, etc., and may also include other electronic devices, which are not limited in this application embodiment.

[0018] Server 20 is a computer system specifically designed to provide services, resources, or functions. In some embodiments, server 20 is used to provide background services to clients of a target application running on terminal device 10.

[0019] In some embodiments, the server may include, but is not limited to, at least one of the following: physical server, cloud server, edge server, server cluster, etc., and may also include other types of servers, which are not limited in this application embodiment.

[0020] Terminal device 10 and server 20 communicate with each other via a network. This network can be a wired network or a wireless network.

[0021] Please refer to Figure 2 The document illustrates a flowchart of a diet planning method provided in one embodiment of this application. The execution entity for each step of the method can be a computer device; for example, the computer device could be... Figure 1 The terminal device 10 or server 20 in the computer system shown. The method may include at least one of the following steps (210-240).

[0022] Step 210: Obtain the dining image and object data of the user object. The object data is used to indicate the attribute information and pathological information of the user object.

[0023] The user subject refers to the individual receiving the dietary planning services provided in this application. In the context of this application, the user subject may also be referred to as a patient, service recipient, or user, depending on their specific health status or application scenario. This application does not specifically limit such designations; its core purpose is to refer to the user subject receiving personalized dietary intervention recommendations.

[0024] Dining images refer to image data containing the contents of the current meal, captured by a user through a camera device.

[0025] Attribute information is used to indicate basic information about a user object. Attribute information may include at least one of the following: age, gender, height, weight, body mass index (BMI), basal metabolic rate (BMR), daily activity level (e.g., sedentary, light activity, moderate activity, heavy activity), dietary preferences (e.g., vegetarian, low-fat diet, etc.), which are not limited in this application.

[0026] Pathological information is used to indicate the health status of the user. Pathological information may include diseases or physiological indicators related to dietary management, used to identify the user's specific nutritional needs and dietary restrictions. Pathological information may include at least one of the following: disease diagnosis (such as type 2 diabetes, hypertension, hyperlipidemia, gout, kidney disease), key physiological indicators (such as fasting blood glucose, glycated hemoglobin, blood pressure, lipid profile, uric acid), medication use (especially drugs that affect diet or metabolism), and history of food allergies or intolerances. This application does not limit this.

[0027] Step 220: Obtain preset rules, which include at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the user's dietary behavior, and the judgment condition corresponding to the task is used to evaluate whether the user meets the requirements for performing the task.

[0028] A task refers to a dietary intervention message with a clear objective and execution content, used to provide actionable optimization suggestions or adjustments to the dietary behavior of a user.

[0029] Optionally, the task may include at least one of the following: a task to adjust the intake of nutrients, a task to optimize eating behavior, a task to plan meal structure, or a task to adapt to a special diet. This application does not limit the scope of the task.

[0030] For example, tasks related to adjusting nutritional elements may include "reducing fat intake" or "increasing dietary fiber intake to 25 grams per day"; tasks related to optimizing eating behavior may include "reducing eating speed to more than 20 chews per bite" or "adjusting the order of eating to vegetables-protein-staple food"; tasks related to planning meal structure may include "adjusting three meals a day to a small, frequent meal pattern" or "recommending that dinner's calorie content should not exceed 30% of the total daily calorie intake"; and tasks related to adapting to special diets may include "activating a low-purine diet pattern" or "initiating a diabetic diet replacement program." It should be noted that the above examples are merely illustrative, and this application does not limit the specific content of the tasks.

[0031] User subject dietary behavior refers to the set of observable and quantifiable behavioral characteristics exhibited by user subjects during the process of consuming food, including but not limited to food selection behavior (such as food type and processing method preference), intake control behavior (such as food portion size and frequency), eating process behavior (such as eating speed and order of eating), and diet-related behavior (such as post-meal activity intensity and drinking habits). This application does not limit this.

[0032] In some embodiments, the preset rule is determined by a nutrition expert. Specifically: a rule configuration interface is displayed for configuring the preset rule; in response to the nutrition expert's configuration operation on the preset rule, the configured rule is displayed; a verification process is performed on the configured rule, and if the configured rule passes the verification process, the configured rule is determined as the preset rule. The verification process includes at least one of the following: logical conflict verification, data integrity verification, threshold rationality verification, syntax and structure verification, and expert review and confirmation; this application does not limit the scope of the verification process.

[0033] The logical conflict check examines whether newly configured rules contradict or conflict with existing rules in the rule base. For example, it checks whether there are conflicting rules targeting the same user group, such as "recommending increased dairy intake" and "prohibiting dairy intake for lactose-intolerant patients." The data integrity check verifies whether the data items required for the judgment conditions referenced in the rules can be obtained or derived from the "meal images" and "object data" defined in step 210. For example, if a rule condition is "triggered when the user's glycated hemoglobin (HbA1c) value is greater than 7%", it checks whether the system can obtain this value from the pathology information field in the object data. The threshold rationality check reviews the reasonableness range of numerical thresholds (such as calorie percentage, nutrient grams) set in the rules based on publicly published clinical nutrition guidelines or medical consensus. For example, it checks whether the value of X in "adjusting the calorie percentage of dinner to no more than X% of the total daily calories" falls within the common reasonable range of 20%-35%. Syntax and structure validation ensures that rules are configured in a standard format that the system can parse (such as a specific scripting language, structured query language, or predefined logical expressions) and are free of syntax errors. Expert review and confirmation, after automated validation, involves submitting the rules to another person or the same expert group for cross-review and final confirmation to increase the rules' authority and reliability.

[0034] The preset rule's judgment criteria are logical judgment rules built upon nutritional and clinical medical knowledge, used to determine whether the user meets the requirements for performing the task. For example, suppose the task could include: "Inform the user to reduce sugar intake." The judgment criteria could include: if the user's pathological information includes "type 2 diabetes," and the meal image shows that high-glycemic index carbohydrates account for >60% of the calories in the current meal, then the task "Inform the user to reduce sugar intake" is triggered.

[0035] Step 230: Based on the judgment conditions, dining images and object data corresponding to each task in the preset rules, the large language model determines the target task that the user object should perform from at least one task.

[0036] Large language models refer to artificial intelligence language processing models trained on large-scale text data, used to understand complex contexts, perform multimodal information fusion, and logical reasoning. Optionally, large language models can be built on at least one of the following: based on a Transformer architecture, for example, using an encoder-decoder structure, a decoder-only structure, or a specific variant as the core architecture of the large language model; based on large-scale unsupervised pre-training, performing self-supervised learning on ultra-large-scale, diverse general text corpora to acquire basic language understanding and generation capabilities; based on multi-task instruction fine-tuning, using instruction datasets covering multiple tasks (such as question answering, summarizing, reasoning, and code generation) to perform supervised fine-tuning of the pre-trained model to align with user intent and follow instructions; or based on reinforcement learning methods, constructing the large language model through reinforcement learning. It should be noted that the above are merely exemplary and this application does not limit the scope of the model.

[0037] Step 230 above can be understood as a task selection mechanism. The large language model matches and infers the current state of the user object (determined based on dining images and object data) with the judgment conditions of preset rules, thereby selecting tasks that meet the requirements from at least one task. In steps 210 to 230, this application obtains dining images and object data of the user object and introduces the judgment conditions corresponding to each task in the preset rules, so that the large language model not only refers to image information when determining the target task, but also makes a comprehensive judgment based on user attribute information, pathological information, and professional nutrition rules. This selection mechanism, which combines multi-source information with rule constraints, makes the process of determining the target task have a clear technical constraint path, thereby improving the matching degree between dietary recommendations and the user's actual health status.

[0038] Therefore, this application achieves technical control over the decision-making process of large language models through rule constraints and multi-source information fusion mechanisms. Compared with schemes that rely solely on image input or single model reasoning, it can improve the personalization, accuracy, and reliability of diet planning.

[0039] Step 240: Perform natural language organization processing on the target task using a large language model to generate output text; wherein, the output text is used to provide dietary advice corresponding to the target task to the user in natural language form.

[0040] Natural Language Processing (NLP) refers to the process of transforming structured task instructions into textual expressions that conform to human communication habits, making suggestions more readable, approachable, and instructive. Natural language form refers to suggestions presented in a conversational style. For example, the task "Inform the patient that they should eat 150 grams of vegetables" can be processed by NLP to output: "Based on your health goals, this meal is recommended to include 150 grams of vegetables. You can allocate it like this: for example, 100 grams of stir-fried broccoli, plus 50 grams of cucumber salad, which will result in better color and nutrition."

[0041] In step 240 above, by performing natural language processing on the target task, the output text can be made more approachable, reducing the understanding threshold and execution resistance for users, thereby improving the acceptability of dietary recommendations.

[0042] In summary, the technical solution provided in this application, by acquiring dining images and object data of the user and combining them with preset rules, enables a large language model to accurately select the target task suitable for the user from multiple tasks. This filtering method based on multi-source information (dining images, object data, and preset rules) avoids the problem in related technologies where the large model's reference information only includes dining images, resulting in the output dietary suggestions not matching the user's actual situation, thus improving the personalization of dietary planning.

[0043] Furthermore, since the preset rules conform to professional nutritional knowledge, by using the judgment conditions in the preset rules as constraints on the decision-making process of the large language model, the risk of illusion in the process of generating dietary suggestions can be reduced, the erroneous judgment caused by the illusion of the large model can be avoided, and the accuracy, stability and reliability of dietary planning can be improved.

[0044] The following describes how to determine the target task that a user object should perform from at least one task.

[0045] In some embodiments, the decision conditions corresponding to each task include at least one decision sub-condition, and the at least one decision sub-condition corresponds to a preset order, which is used to indicate the order in which the at least one decision sub-condition is executed in the decision process.

[0046] In some embodiments, for any one of the at least one tasks, a large language model is used to perform judgments on at least one decision sub-condition corresponding to the task in a preset order based on the dining image and object data. If it is determined that the task meets the requirements for performing the task, the task is determined as the target task.

[0047] In some embodiments, the decision criteria for each task include at least one decision sub-condition. For example, as shown in Table 1 below, a task may include: informing the patient that they should eat xx grams of vegetables. The at least one decision sub-condition corresponding to this task may include four decision sub-conditions: whether vegetables are present, whether it is breakfast, whether it is a group meal, and whether the patient's vegetables seriously fail to meet nutritional standards.

[0048] Table 1

[0049] Preset order means that when judging a task, the decision conditions are not executed randomly or arbitrarily, but are executed one by one in a predetermined order.

[0050] In some embodiments, the preset order of at least one judgment sub-condition can be set based on the experience of relevant technical personnel (such as nutrition experts), and this application does not limit this. Optionally, the preset order can be determined based on the complexity of at least one judgment sub-condition. Judgments on judgment sub-conditions with lower complexity are performed first. For example, a basic judgment such as "whether there are vegetables" is performed first, then "whether it is breakfast / group meal" is judged, and finally a judgment such as "whether it is seriously unsatisfactory" that requires more information / calculation is performed. For example, as shown in Table 1 above, the preset order can be to perform the judgment conditions in ascending order of their identifiers id, that is, to judge "id=1, whether there are vegetables", "id=2, whether it is breakfast", "id=3, whether it is group meal", and "id=4, whether the patient's vegetables seriously fail to meet nutritional standards" in sequence.

[0051] In some embodiments, at least one decision sub-condition corresponding to a task is judged sequentially in a preset order, and if the requirement to execute the task is met, the task is determined as the target task. This includes: judging at least one decision sub-condition corresponding to a task in a preset order, determining that the first decision sub-condition among the at least one decision sub-condition triggers a termination flag, and if the termination flag is an acceptance flag, the task is determined as the target task.

[0052] In some embodiments, if a first decision sub-condition among at least one decision sub-condition triggers a termination flag, and the termination flag is a rejection flag, then it is determined that the task is not the target task. The first decision sub-condition can be any one of the at least one decision sub-conditions. For detailed procedures, please refer to the corresponding description below; they will not be repeated here.

[0053] The above method performs judgments on each decision sub-condition corresponding to the task in a preset order, making the judgment process of each decision sub-condition more logical. That is, it can perform judgments on each decision sub-condition step by step in a way that is simple to complex, thereby improving the stability, reproducibility and overall processing efficiency of the task judgment results.

[0054] The following describes the specific implementation method of performing judgments on at least one decision sub-condition corresponding to the task in sequence, which may include the following steps S1~S4 (not shown in the figure).

[0055] Step S1: Based on the dining image and object data, the large language model determines the output result of the i-th decision sub-condition in at least one decision sub-condition corresponding to the task. The output result is used to indicate whether the i-th decision sub-condition is satisfied, where i is a positive integer.

[0056] In some embodiments, the output result includes whether the i-th decision sub-condition is met or not. For example, as shown in Table 1 above, meeting the i-th decision sub-condition corresponds to T (yes), and not meeting the i-th decision condition corresponds to F (no).

[0057] In some embodiments, such as Figure 3 As shown, the method for determining the output of the i-th decision sub-condition among at least one decision sub-conditions corresponding to a task using a large language model based on dining images and object data includes: performing a task planning recognition operation on the i-th decision sub-condition using the large language model to determine the planning result, the planning result being used to indicate the execution steps required to determine the output of the i-th decision sub-condition; determining a first target tool function from at least one candidate tool function included in the function library based on the planning result, dining images, and object data, and determining the input parameters of the first target tool function; and inputting the input parameters into the first target tool function to generate the output of the i-th decision sub-condition.

[0058] The planning result refers to the detailed operational steps generated by the large language model to obtain the output result for the decision sub-condition to be judged in the current task. It does not directly provide a yes or no answer, but rather generates action guidelines to clearly indicate which steps and which tool functions to call to calculate the final output result. For example, for the task "Inform the patient that they should eat xx grams of vegetables," assuming the current decision sub-condition is "Are there vegetables?", the planning result may include calling the tool function for the multimodal large model; determining the input parameters of the multimodal large model (e.g., the input parameters of the multimodal large model are the image of the meal and the prompt words); and determining the output result as yes (meaning vegetables are included) or no (meaning vegetables are not included). It should be noted that the above is merely an example, and this application does not limit it.

[0059] Tool functions refer to pre-packaged program modules within a system that can independently perform specific calculations, analyses, or information processing functions. Tool functions can be called by large language models during planning to obtain information or judgments that they cannot directly, accurately, or reliably generate themselves. Optionally, tool functions may include at least one of the following: multimodal large model functions, which call AI (Artificial Intelligence) models with image understanding capabilities to perform content recognition, description, or question answering on dining images; specialized calculation functions, such as nutrition calculation functions (calculating nutritional needs based on individual data) and calorie estimation functions; rule engine functions, which perform logical matching and judgment based on preset business rules (such as dietary restrictions for diseases, meal standard rules); data query functions, which retrieve relevant information (such as historical dietary records, food nutritional components) from user databases, knowledge bases, or external APIs (Application Programming Interfaces); traditional computer vision functions, such as dedicated algorithms for image segmentation, object detection, and weight estimation; natural language processing functions, such as sentiment analysis (analyzing user feedback) and key information extraction; and logic and comparison functions, which perform basic logical operations such as numerical comparison and set operations. It should be noted that the above is merely illustrative and is not intended to be limiting.

[0060] The above method, which combines task planning and calling tool functions to determine the output of decision sub-conditions, combines the logical planning capabilities of large language models with the professional execution capabilities of various tool functions, thereby achieving more accurate, reliable and interpretable judgments on each decision sub-condition.

[0061] In some embodiments, the first target tool function includes a multimodal large model, and the input parameters of the first target tool function include a prompt word and a dining image; inputting the input parameters into the first target tool function to generate the output result of the i-th decision sub-condition includes: inputting the prompt word and the dining image into the multimodal large model, performing image recognition processing on the dining image through the multimodal large model, and generating the output result of the i-th decision sub-condition.

[0062] A multimodal large model refers to an artificial intelligence model that can simultaneously process and fuse multiple types of information (modalities). In the context of this application, it specifically refers to a large pre-trained model that can receive and jointly understand image information (such as photos of meals) and text information (such as prompts), and generate corresponding text answers accordingly.

[0063] In some embodiments, the step of generating prompt words based on the decision sub-conditions may include: (1) parsing the sub-conditions, the system understanding the core semantics of the decision sub-conditions that need to be judged. For example, the core of the sub-condition "Are there vegetables?" is to perform an "existence judgment", and the object is "vegetables". (2) Selecting template filling or large language model generation method. For the template filling method, a series of prompt word templates suitable for different judgment types can be preset. For example, for "existence judgment", the template may be "Judgment whether the image contains [object]? Please answer only 'yes' or 'no'." The system will fill the specific object (such as "vegetables") in the sub-condition into the corresponding position of the template. For the large language model generation method, the large language model responsible for planning will directly generate a clear, unambiguous prompt word suitable for execution by the multimodal large model based on the understanding of the sub-conditions. (3) Optimization and formatting, ensuring that the generated prompt word instructions are clear and the format is standardized, so as to guide the multimodal large model to output structured or simple answers (such as requiring only "yes / no"), which is convenient for the subsequent automatic parsing by the system. For example, for the task "Inform the patient that they should eat xx grams of vegetables", assuming the current decision sub-condition is "Does this contain vegetables?", the prompt could be: Does this image contain vegetables? The answer is yes or no.

[0064] In some embodiments, the input parameters are determined based on the specific circumstances of the decision sub-conditions. When necessary, the input parameters may also include user data, which is not limited in this application.

[0065] The aforementioned methods and related technologies may require the concatenation of multiple dedicated image recognition models (e.g., first identifying food regions, then classifying whether they are vegetables) to obtain the output results of the decision sub-conditions. However, by leveraging the powerful general understanding capabilities of multimodal large models, the output results of the decision sub-conditions can be obtained directly using only prompt words, thus improving the generation efficiency of the decision sub-condition output results.

[0066] Step S2: Based on the output of the i-th decision sub-condition, determine the decision result of the i-th decision sub-condition; wherein the decision result includes at least one of the following: a transition identifier and a termination identifier. The transition identifier is used to indicate the (i+1)-th decision sub-condition to be executed in at least one decision sub-condition. The termination identifier includes an acceptance identifier and a rejection identifier. The acceptance identifier is used to indicate that the task is accepted, and the rejection identifier is used to indicate that the task is rejected.

[0067] For example, as shown in Table 1 above, the acceptance identifier can be denoted as required, and the rejection identifier can be denoted as reject.

[0068] Step S3: If the judgment result of the i-th decision sub-condition is a flow identifier, determine the output result of the (i+1)-th decision sub-condition based on the dining image and object data, and determine the judgment result of the (i+1)-th decision sub-condition based on the output result of the (i+1)-th decision sub-condition.

[0069] Step S4: If the result of the i-th decision sub-condition is a termination flag, and the termination flag is an acceptance flag, then the task is determined as the target task.

[0070] In some embodiments, if the termination flag is a rejection flag, the task is rejected. In other words, the task is not the target task.

[0071] For example, as shown in Table 1 above, it illustrates a task (task column) that requires a patient to eat vegetables and tells them how many vegetables they should eat. It contains four conditions (decision sub-conditions) with id=1, 2, 3, and 4. Condition _id=1 (whether vegetables are present) means determining whether the patient ate vegetables during the meal. If the answer is yes, the process proceeds to the next step according to column T; if the answer is no, it proceeds to the next step according to column F. The destination of this process might be a next step id or a termination condition (required (acceptance flag) or reject (rejection flag)). For instance, if we assume the patient ate vegetables, then according to T, the process proceeds to the decision sub-condition id=2, which determines whether the meal was breakfast. If we assume the patient did indeed eat breakfast, then the process again proceeds according to column T, resulting in a termination condition reject, meaning the task is rejected; that is, it's unnecessary to tell the patient "they should eat xx grams of vegetables." In natural language, this logic means: if the patient ate vegetables for breakfast, that's sufficient; there's no need to control the quantity. Assuming the patient's meal was not breakfast, the flow from column F to id=3 determines whether the patient ate a group meal. If it was a group meal, the task is accepted, meaning the patient needs to be informed how many grams of vegetables they should eat. If it was not a group meal, meaning it was an individual meal, the required vegetable amount needs to be determined based on nutritional guidelines. This information is then compared with the patient's uploaded image to see if there are any significant discrepancies, such as an excessive or insufficient amount of vegetables. In such cases, the patient needs to be advised on how much vegetables they should eat.

[0072] like Figure 4The diagram illustrates a flowchart of a diet planning method provided in another embodiment of this application. The flowchart describes the process of determining whether a task is a target task through rule-based flow judgment: First, the user's dining image and object data are acquired, and the judgment sub-condition number i is initialized; then, the i-th judgment sub-condition is calculated to obtain the output result of the sub-condition (used to indicate whether the judgment sub-condition is satisfied); then, the judgment result is determined based on the output result, which includes at least a flow identifier and a termination identifier, where the flow identifier indicates the next judgment sub-condition to be executed, and the termination identifier includes an acceptance identifier and a rejection identifier; when the judgment result is a flow identifier, the system updates i according to the flow pointer and continues to judge the next judgment sub-condition; when the judgment result is a termination identifier and an acceptance identifier, the corresponding task is determined to be a target task; when the judgment result is a termination identifier and a rejection identifier, the task is determined not to be a target task and the process ends. For example, taking the task "whether to recommend vegetable intake" as an example, the flow judgment process of this task can be represented as the following logical chain: Judgment sub-condition i=1: Determine whether the user's current meal contains vegetables. If the output is "Yes", the result is a flow indicator, and the flow points to i=2. If it is "No", the result is an acceptance indicator, and this task is identified as the target task (the user needs to be reminded to increase vegetable intake). Decision sub-condition i=2: Determine if the meal is breakfast. If the output is "Yes", the result is a rejection indicator, and this task is not adopted (breakfast already contains vegetables, so there's no need to emphasize the quantity). If it is "No", the result is a flow indicator, and the flow points to i=3. Decision sub-condition i=3: Determine if the user is in a group meal scenario. If the output is "Yes", the result is an acceptance indicator, and this task is identified as the target task (group meals require assessment and recommendations on vegetable intake). If it is "No", the result is a flow indicator, and the flow points to i=4. Decision sub-condition i=4: Based on nutritional recommendations and image analysis, determine if the user's current vegetable intake is severely insufficient or excessive (e.g., below 50% or above 200% of the recommended amount). If the output is "Yes", the result is an acceptance flag, and this task is designated as the target task (a reasonable intake range should be indicated). If the output is "No", the result is a rejection flag, and this task is not adopted (vegetable intake is within a reasonable range).

[0073] The above method can accurately determine the target task by finely evaluating and filtering multiple decision sub-conditions for each task in a step-by-step confirmation manner based on the user's dining images and user data.

[0074] In some embodiments, as shown in Table 1 above, for any one of the at least one tasks, the task may correspond to a score, also referred to as a fraction, which is not limited in this application. The task column is defined as each thing that may need to be said to the patient, and the things that need to be said to the patient are defined here as the things that the patient does wrong. The things that are wrong are the deduction items, and the score column records the points that should be deducted if the patient's diet has a certain problem.

[0075] In some embodiments, a score corresponding to the target task is obtained; when the target task includes multiple tasks, the sum of these multiple tasks can be determined to obtain the user's total score. Alternatively, a weighted sum of the scores of multiple tasks can be performed based on the importance or urgency of the tasks to obtain the user's total score. For example, if the system identifies two target tasks for a diabetic patient, with a score of 15 for task 1 and a score of 10 for task 2, the user's total score is calculated as follows: Total score = 15 + 10 = 25 points. In other embodiments, weighting can also be performed based on the urgency and intervention priority of the tasks. For example, task 1, which involves health safety and acute risks, can be given a higher weight, while task 2, which involves long-term behavior development or general optimization suggestions, can be given a lower weight, making the scoring system more aligned with the tiered goals of actual health interventions.

[0076] Optionally, the total score is the cumulative deduction value, and the higher the total score, the more problems there are; or, the total score is a normalized score, and the higher the total score, the higher the degree of matching. This application does not limit this.

[0077] In some embodiments, the user's overall score is used to characterize the degree to which the user's eating behavior in the current dining scenario matches preset eating rules. The user's overall score can be used for at least one of the following: (1) Used to generate rating display results for user-oriented objects, so that users can intuitively understand the overall performance of this meal; (2) Used to determine the prompt intensity or intervention level of the output text. For example, when the total score is higher than the preset threshold, the prompt message is mainly praising, and when the total score is lower than the preset threshold, the prompt message is mainly correcting and reminding. (3) Used to sort multiple target tasks so that tasks with higher scores or higher weights are presented first in the output text, thereby highlighting dietary issues that need to be improved first. (4) Used to trigger subsequent processing procedures, such as triggering further health risk warnings, review procedures or medical staff intervention procedures when the total score is lower than the preset threshold or is lower than the threshold multiple times in a row; (5) Used to generate historical rating records for user objects and calculate trend indicators based on historical rating records for subsequent personalized rule adjustments, model evaluation or effect tracking.

[0078] In some embodiments, the user's total score is also used to quantitatively assess the overall health compliance of their current meal eating behavior, and serves as the basis for generating periodic nutrition reports, adjusting the intensity of subsequent interventions, or triggering personalized educational content, thereby achieving a closed loop from single recommendations to long-term behavior management. Specifically, the total score will be mapped to a preset health grading range (e.g., excellent, good, need improvement, need intervention), and can automatically trigger corresponding subsequent actions according to different levels, which may include at least one of the following: (1) Report generation: when the cumulative score of a single meal or cycle (e.g., one week) falls into the "need improvement" or "need intervention" range, a detailed behavior analysis report will be automatically generated and pushed. The report may include a summary of deviation items, trend charts, and suggestions for improvement priorities. (2) Adjustment of intervention intensity: if the user's score remains in a low range for several consecutive cycles, the system can automatically increase the intervention intensity, such as increasing the frequency of daily reminders, upgrading to follow-up by a human nutritionist, or recommending a more structured diet plan. (3) Matching educational content: The system can push relevant popular science articles, recipe examples, or short videos based on the specific task type that results in a deduction. For example, it can trigger content about the importance of dietary fiber for the task of "insufficient vegetable intake." (4) Dynamic calibration of goals: Long-term scoring trends can serve as a basis for dynamically adjusting users' health goals. For example, when the score is consistently "excellent," the system can suggest that users move on to the next stage of dietary optimization goals (such as increasing food diversity or trying specific dietary patterns). Based on the above, the total score is not only a quantitative evaluation of a single meal, but also a hub connecting identification, analysis, intervention, education, and long-term monitoring, forming an intelligent health management closed loop of evaluation, feedback, and adjustment.

[0079] In some embodiments, when providing output text to a user object, the overall score of the user object can also be provided, and this application does not limit this.

[0080] The following describes how to perform fine-tuning on large language models and / or multimodal large models.

[0081] In some embodiments, the method further includes: obtaining a corrected target task in response to a correction operation for the target task; performing a difference recognition process on the target task and the corrected target task to obtain difference information; determining an erroneous task from the target task based on the difference information, wherein the erroneous task refers to a task that the user object does not need to perform; generating at least one training sample based on the erroneous task; and training at least one of a large language model and a multimodal large model based on the at least one training sample, provided that preset training conditions are met.

[0082] In some embodiments, obtaining a corrected target task in response to a correction operation on the target task can also be replaced by obtaining a corrected output text in response to a correction operation on the output text. However, the correction operation on the output text is essentially a correction operation performed on the target task, so these two concepts can be used interchangeably in this application.

[0083] Difference information refers to the structured, comparable differences between two versions of a target task before and after modification. When a user (referring to relevant reviewers, such as medical personnel) modifies the generated target task and / or output text, the system automatically compares the "before modification" and "after modification" versions to identify specific differences. This allows the system to accurately pinpoint which specific steps in the original task are redundant, incorrect, or unnecessary for the user.

[0084] Error tasks refer to those tasks in the original system-generated task sequence that, after correction, are deemed redundant, unnecessary, or unrealistic. Essentially, they are tasks that user objects do not actually need to execute.

[0085] The preset training conditions refer to the rules that trigger the training and updating of the model using the aforementioned training samples. Training conditions may be related to at least one of the following: data volume conditions, time or period conditions, system resource and security conditions; however, this application does not limit these.

[0086] Regarding the data volume condition, a sample accumulation threshold condition can be included. When the number of collected training samples (generated based on the error task) is greater than a first threshold (e.g., 100 or 500), it is determined that the preset training condition is met.

[0087] For time- or periodic conditions, this can include timed / periodic triggering. For example, the system is set to automatically start training tasks during the off-peak hours of early Sunday morning to ensure that performance is not affected during daily use. It can also include batch triggering, where the system starts a round of training after receiving a certain number of user correction operations (e.g., the Nth time).

[0088] Based on system resources and security conditions, training tasks are only allowed to start when system computing resources are sufficient (e.g., server load is below a certain level) and during off-peak business periods.

[0089] In some embodiments, the method further includes: obtaining a modified target task in response to a modification operation for the target task; performing natural language organization processing on the modified target task through a large language model to generate output text; wherein the output text is used to provide dietary advice corresponding to the target task to a user object in natural language form.

[0090] In some embodiments, in response to a correction operation on the output text, a corrected output text is obtained, wherein the corrected output text is used to provide dietary recommendations corresponding to the target task to the user object in natural language.

[0091] In other words, regardless of the target task the user chooses to modify or the final output text, the system can optimize the output text through different paths and ensure that the dietary advice presented to the user is revised, clearly expressed, and tailored to their needs.

[0092] In some embodiments, training at least one of a large language model and a multimodal large model based on at least one training sample means updating the parameters of the target model using the input information and corresponding expected output contained in the training sample, so that the target model is more likely to output results consistent with the expected output when processing the same or similar inputs in the future, thereby reducing the probability of the same type of error recurring. The training object can be a large language model, a multimodal large model, or a combination of both, depending on the type of error loop corresponding to the training sample. Optionally, training may include the following steps: (1) reading at least one training sample from the dataset to be trained, the training sample including sample input and expected output, wherein the sample input includes dining images and / or object data and prompt information related to decision subconditions or tasks; (2) determining the target training model according to the type of training sample: when the sample input contains dining images, it is determined to train the multimodal large model; when the sample input is mainly text or structured data, it is determined to train the large language model; (3) inputting the training sample into the target training model to obtain the model output result; (4) comparing the model output result with the expected output to obtain training feedback information to represent the difference between the two; (5) updating the parameters of the target training model based on the training feedback information, and repeating the above steps until the preset training termination condition is met; (6) saving the updated model parameters to obtain the trained large language model and / or trained multimodal large model for subsequent deployment or testing.

[0093] The methods described above, and related technologies, typically require engineers to manually analyze logs and compile specific data to train and correct large language models and / or multimodal models, a process that is expensive and time-consuming. This method, however, proactively identifies erroneous tasks and generates corresponding training samples, thus automating model training. Furthermore, it provides a process for modifying the target task, further ensuring the accuracy of the output text. Moreover, by generating training samples and updating the model in response to correction operations, the system possesses continuous optimization capabilities, helping to reduce the recurrence of the same type of error and improving the overall system's accuracy and adaptability.

[0094] In some embodiments, generating at least one training sample based on an error task includes: determining an error decision sub-condition from at least one decision sub-condition corresponding to the error task, wherein the error decision sub-condition refers to a decision sub-condition whose output result is incorrect; performing correction processing on the output result of the error decision sub-condition to obtain a corrected output result; and generating at least one training sample based on the input parameters of the error decision sub-condition and the corrected output result.

[0095] In some embodiments, the training samples include input parameters and corrected output results.

[0096] An erroneous sub-condition refers to the specific sub-conditions in a task's execution where the final output or intermediate judgment is deemed incorrect, inaccurate, or undesirable by the system or a human. In other words, an erroneous sub-condition is the erroneous link that leads to the overall failure of the task or produces misleading results. Identifying and correcting erroneous sub-conditions is core to the system's iterative learning and performance improvement. For example, suppose a patient uploads a picture of a breakfast meal with sufficient vegetables. According to preset rules, the review should not instruct the patient on vegetable intake. However, due to a recognition error, the image is judged to lack vegetables, resulting in the AI review stating: "Your meal lacks vegetables; you should maintain a sufficient vegetable intake of xx grams...". Medical staff will not directly adopt this AI review (i.e., the output text); at the very least, they will delete the suggestion regarding vegetables. The large language model identifies the differences between the corrected output text and the original output text (equivalent to identifying the differences between the corrected target task and the target task). It captures the difference information of "vegetable suggestion is deleted" and traces it in the rule flow log. It can be found that the reason why the AI thinks there are no vegetables is that the judgment sub-condition "id=1 judge whether there are vegetables" is judged as "no". The large language model changes this judgment to "yes" and saves the input and output and related information to the data table to be fine-tuned (information: multimodal node (i.e., multimodal large model); input parameter: meal image; prompt word: "judge whether the meal has vegetables"; output result: "yes"). After a period of time, a lot of such data may be collected. For example, every 1000 accumulated data triggers the model's automatic fine-tuning, which triggers the preset training conditions and automatically calls the pre-written fine-tuning script to perform fine-tuning on the multimodal large model.

[0097] In some embodiments, the method further includes: performing correction processing on the output result of the erroneous decision sub-condition to obtain a corrected output result; or performing correction processing on the input parameters of the erroneous decision sub-condition to obtain corrected input parameters; generating at least one training sample based on the input parameters and the corrected output result; and / or generating at least one training sample based on the corrected input parameters and the corrected output result.

[0098] In other words, the system can generate training samples not only by correcting the output results but also by correcting the input parameters. Specifically, correcting the input parameters means that when discrepancy analysis and error tracing indicate that the error is not caused by insufficient model reasoning ability but by missing, incorrectly valued, or improperly expressed input information entering the decision condition, the input parameters corresponding to the decision condition are corrected or completed to make them consistent with the actual business semantics, and training samples are constructed accordingly. Input parameters may include, but are not limited to: the dining image itself or its preprocessing results (e.g., cropped image regions, image quality enhancement results), meal information, vital sign information, pathological information, medication information in the object data, and intermediate structured results generated by upstream tool functions (e.g., identified food categories, portion sizes, beverage type identifiers, etc.). For example, when the system misreads "lunch" as "breakfast," causing an error in the rule flow path, the meal field in the input parameters can be corrected from "breakfast" to "lunch," and training samples can be generated while maintaining the original output or combining it with the corrected expected output. Similarly, when vegetable areas in an image are occluded or incorrectly cropped, causing multimodal judgment failure, training samples can be constructed by correcting the image input parameters (e.g., replacing them with an uncropped image or correcting the cropped area) and using the correct output labels. By simultaneously supporting both output correction and input parameter correction in sample generation, more targeted training data can be generated under different error root cause scenarios, improving the model's adaptability to real data distribution and further enhancing the accuracy and stability of subsequent training and deployment.

[0099] The above method identifies erroneous sub-conditions in the output from the decision sub-conditions corresponding to the erroneous task, corrects the output of these erroneous sub-conditions, and constructs training samples based on the input parameters and the corrected output. This approach transforms the adoption / modification feedback from medical staff into structured, reusable supervisory data, enabling targeted correction of error root causes. Compared to simply fine-tuning the final output text, this method more precisely targets the key nodes that generate errors, reduces interference from irrelevant samples, and improves training data quality and efficiency. Simultaneously, the training samples are bound to the corresponding input parameters and decision chains, facilitating subsequent error tracing, regression verification, and performance evaluation. This forms a closed-loop optimization mechanism from discovering differences to locating nodes to generating samples to updating the model, reducing the impact of model illusions and recognition errors in actual business operations, and improving the stability and consistency of dietary recognition and nutritional advice output.

[0100] The following describes how to perform completion processing on the target task.

[0101] In some embodiments, steps 250-260 are further included after step 240 described above. Figure 2 (Not shown in the image).

[0102] Step 250: In the case of missing information in the target task, the missing information is determined by using a large language model based on the target task, dining images and object data.

[0103] Information gaps in a target task mean that the target task is logically or informationally incomplete or insufficient. For example, a target task might include "inform the patient that they should eat xx grams of vegetables." However, the target task does not specify the exact number of grams of vegetables that should be consumed.

[0104] Information completion refers to the specific numerical values, explicit objects, or detailed constraints generated by a large language model based on context (dining images, object data, common sense, etc.) to fill in missing parameters or vague descriptions in a target task. It is not an independent, executable task in itself, but rather a data fragment or precise description used to complete the task.

[0105] Step 260: Based on the completion information, perform completion processing on the target task to obtain the completed target task.

[0106] Completion processing for a target task refers to filling in, replacing, or structurally assigning values to the task description, parameter, or execution fields of the target task based on the completed information, addressing any missing information in the target task. This transforms the target task from one containing placeholders or undefined parameters into an executable task with defined parameters. For example, when the target task is "inform the patient that they should eat xx grams of vegetables" and "xx grams" is missing, completion processing can replace "xx grams" with the specific intake amount or range (e.g., 150 grams or 120–180 grams) determined by the completed information, resulting in the completed target task "inform the patient that they should eat 150 grams of vegetables." When the target task requires further indication of restrictions or thresholds, completion processing can also include corresponding thresholds, precautions, or applicable conditions for subsequent output text generation.

[0107] The above method can identify missing information in the target task, and when missing information is identified, it calls the corresponding tool function to generate complete information based on the dining image and object data. The complete information is then written into the target task to perform completion processing, ensuring the information integrity and reliability of the final output text.

[0108] In some embodiments, determining completion information based on the target task, dining image, and object data using a large language model includes: determining a second target tool function matching the target task from at least one candidate tool function included in the function library using the large language model; determining input parameters of the second target tool function based on the dining image and object data; and inputting the input parameters into the second target tool function to generate completion information.

[0109] In some embodiments, determining a second target tool function that matches the target task from at least one candidate tool function included in the function library may include the following steps: parsing the task description field or the field to be completed of the target task to determine the completion target type; the completion target type includes at least one or more of quantitative values, value ranges, threshold parameters, and taboo parameters; based on the completion target type, retrieving a second tool function with a corresponding functional description from the function library.

[0110] The method described above, when information is missing in the target task, does not directly generate the completed content from the large language model. Instead, the large language model selects a second target tool function that matches the task, the dining image, and the object data, and constructs the input parameters of this tool function to obtain the completed information. This delegates the key completion step to a reproducible tool function, reducing biases introduced by the large language model directly generating numerical values or professional conclusions, and improving the accuracy of the completed information. Simultaneously, the process of generating the completed information can be recorded and traced back through the task, tool selection, input parameters, and output results, facilitating subsequent error localization, data accumulation, and model optimization, thereby improving the overall interpretability and reliability of the system.

[0111] In some embodiments, the second target tool function includes a nutrition function used to determine the intake amount or intake range of nutrients corresponding to the target task; determining the second target tool function matching the target task from at least one candidate tool function included in the function library through a large language model includes: determining the symptom tags of user objects based on object data; wherein, different symptom tags correspond to different nutrition functions; and determining the nutrition function matching the symptom tags from at least one candidate tool function based on the symptom tags.

[0112] Nutritional functions refer to pre-provided tool functions by the system used to determine nutrition-related target parameters based on user object data. These parameters include at least recommended intake values, recommended intake ranges, and / or threshold parameters corresponding to the target task. For example, nutritional functions may include: calculation functions for calculating daily energy requirements and generating energy allocation for each meal; calculation functions for calculating recommended vegetable intake or range for a single meal; calculation functions for determining recommended carbohydrate intake or range; calculation functions for determining the upper limit of sodium intake; and calculation functions for generating a list of prohibited or restricted foods. This application does not limit these functions.

[0113] The nutritional elements (preferably nutrients / nutrient components) corresponding to the target task refer to nutritional indicators related to the dietary components constrained or recommended by the target task. For example, when the target task is "inform the patient to eat xx grams of vegetables," the corresponding nutritional elements may include vegetable intake (in grams) and / or dietary fiber intake; when the target task is "inform the patient not to drink sugary drinks," the corresponding nutritional elements may include added sugar intake or carbohydrate intake; when the target task involves controlling blood pressure, the corresponding nutritional elements may include sodium intake; when the target task involves renal function management, the corresponding nutritional elements may include protein intake and / or potassium and phosphorus intake. This application does not limit this.

[0114] The intake amount or range of a nutrient element refers to the target value or range recommended for the user to consume in the current meal. The intake amount or range can be expressed in grams, milligrams, kilocalories or servings, and may further include upper limit values, lower limit values or recommended interval boundary values, for parameter filling of fields to be completed in the target task. This application does not limit this.

[0115] The disease label of a user object refers to the identification information determined based on the pathological information in the object data, used to characterize the type of chronic disease, risk type, or disease stratification of the user object. The disease label is used to guide the selection of subsequent nutritional functions and the determination of nutritional parameter calculation strategies. For example, the disease label may characterize at least one of diabetes, hypertension, chronic kidney disease, gout, or hyperlipidemia, or characterize the stage, risk level, or complication stratification of the above diseases, and this application does not limit it in this way.

[0116] In some embodiments, the nutritional functions corresponding to different disease labels differ for the intake amount or intake range of the same nutrient element. In other words, for the same nutrient element, the system selects different calculation strategies, parameter configurations, or constraints based on the user's disease label to output an intake amount or intake range that is suitable for the management goal of that disease. For example, when the nutrient element is carbohydrates, the nutritional function corresponding to the diabetes disease label is used to output a single-meal carbohydrate target value or target range that is more biased towards blood sugar control, while the nutritional function corresponding to the hypertension disease label may not output a carbohydrate target or output a relatively loose range; when the nutrient element is sodium, the nutritional function corresponding to the hypertension disease label is used to output the upper limit or recommended range of sodium intake, while the nutritional function corresponding to the diabetes disease label may not output the upper limit of sodium or use different threshold parameters; when the nutrient element is protein or potassium and phosphorus, the nutritional function corresponding to the chronic kidney disease disease label is used to output the upper limit of protein intake and / or the restricted range of potassium and phosphorus intake, while the nutritional functions corresponding to other disease labels may output different restriction parameters or not enable the corresponding restrictions, which is not limited in this application.

[0117] The above method determines symptom labels based on object data and selects matching nutritional functions from a function library accordingly. This ensures that the completed information for the target task is adapted to the pathological characteristics of the user object, thereby improving the relevance and rationality of the completion results. Simultaneously, the nutritional functions output intake amounts or ranges in a structured manner, giving the completion process clear field meanings and unit constraints, reducing the arbitrary interpretation of large language models during the numerical completion stage and minimizing output bias.

[0118] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.

[0119] Please refer to Figure 5 This diagram illustrates a block diagram of a diet planning apparatus according to an embodiment of this application. The apparatus has the functionality to implement the diet planning method example described above; this functionality can be implemented in hardware or by hardware executing corresponding software. The apparatus can be the computer device described above, or it can be installed within a computer device. For example... Figure 5 As shown, the device 500 may include: a first acquisition module 510, a second acquisition module 520, a determination module 530, and a generation module 540.

[0120] The first acquisition module 510 is used to acquire the dining image and object data of the user object, wherein the object data is used to indicate the attribute information and pathological information of the user object.

[0121] The second acquisition module 520 is used to acquire preset rules, the preset rules including at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the dietary behavior of the user object, and the judgment condition corresponding to the task is used to evaluate whether the user object meets the requirements for performing the task.

[0122] The determining module 530 is used to determine the target task that the user object should perform from the at least one task by using a large language model based on the judgment conditions corresponding to each task in the preset rules, the dining image and the object data.

[0123] The generation module 540 is used to perform natural language organization processing on the target task through the large language model to generate output text; wherein, the output text is used to provide dietary suggestions corresponding to the target task to the user in natural language form.

[0124] In some embodiments, the judgment condition corresponding to each task includes at least one judgment sub-condition, the at least one judgment sub-condition corresponding to a preset order, the preset order being used to indicate the order in which the at least one judgment sub-condition is executed during the judgment process; the determining module 530 is used to, for any one of the at least one tasks, perform judgments on at least one judgment sub-condition corresponding to the task in sequence according to the preset order based on the dining image and the object data using the large language model, and determine the task as the target task if it is determined that the requirements for executing the task are met.

[0125] In some embodiments, the determining module 530 is configured to, for any one of the at least one tasks, determine the output result of the i-th decision sub-condition among the at least one decision sub-conditions corresponding to the task based on the dining image and the object data using the large language model, wherein the output result is used to indicate whether the i-th decision sub-condition is satisfied, where i is a positive integer; and determine the determination result of the i-th decision sub-condition based on the output result of the i-th decision sub-condition; wherein the determination result includes at least one of the following: a transition identifier and a termination identifier, wherein the transition identifier is used to indicate the (i+1)-th decision sub-condition to be executed. The decision sub-condition includes a termination identifier and a rejection identifier. The acceptance identifier indicates acceptance of the task, and the rejection identifier indicates rejection of the task. If the decision result of the i-th decision sub-condition is the flow identifier, the output result of the (i+1)-th decision sub-condition is determined based on the dining image and the object data, and the decision result of the (i+1)-th decision sub-condition is determined based on the output result of the (i+1)-th decision sub-condition. If the decision result of the i-th decision sub-condition is the termination identifier, and the termination identifier is the acceptance identifier, the task is determined as the target task.

[0126] In some embodiments, the determining module 530 is configured to perform a task planning recognition operation on the i-th decision sub-condition using the large language model, determine a planning result, the planning result being used to indicate the execution steps required to determine the output result of the i-th decision sub-condition; based on the planning result, the dining image, and the object data, determine a first target tool function from at least one candidate tool function included in the function library, and determine the input parameters of the first target tool function; input the input parameters into the first target tool function to generate the output result of the i-th decision sub-condition.

[0127] In some embodiments, the first target tool function includes a multimodal large model, and the input parameters of the first target tool function include a prompt word and the dining image; the determining module is used to input the prompt word and the dining image into the multimodal large model, and perform image recognition processing on the dining image through the multimodal large model to generate the output result of the i-th decision sub-condition.

[0128] In some embodiments, the device 500 further includes: a training module ( Figure 5 (Not shown in the image).

[0129] The training module is configured to: respond to a correction operation for the target task to obtain a corrected target task; perform difference recognition processing on the target task and the corrected target task to obtain difference information; based on the difference information, identify erroneous tasks from the target task, wherein the erroneous task refers to a task that the user object does not need to perform; generate at least one training sample based on the erroneous task; and, under preset training conditions, train at least one of the large language model and the multimodal large model based on the at least one training sample.

[0130] In some embodiments, the training module is configured to determine erroneous decision sub-conditions from at least one decision sub-condition corresponding to the erroneous task, wherein the erroneous decision sub-condition refers to a decision sub-condition in which the output result is erroneous; perform correction processing on the output result of the erroneous decision sub-condition to obtain a corrected output result; and generate the at least one training sample based on the input parameters of the erroneous decision sub-condition and the corrected output result.

[0131] In some embodiments, the device 500 further includes: a completion module ( Figure 5 (Not shown in the image).

[0132] The completion module is used to determine completion information based on the target task, the dining image, and the object data using the large language model when the target task has missing information; and to perform completion processing on the target task based on the completion information to obtain the completed target task.

[0133] In some embodiments, the completion module is configured to determine a second target tool function that matches the target task from at least one candidate tool function included in the function library using the large language model; determine the input parameters of the second target tool function based on the dining image and the object data; and input the input parameters into the second target tool function to generate the completion information.

[0134] In some embodiments, the second target tool function includes a nutrition function, which is used to determine the intake amount or intake range of the nutrient element corresponding to the target task; the completion module is used to determine the symptom tag of the user object based on the object data; wherein, different symptom tags correspond to different nutrition functions; based on the symptom tag, a nutrition function matching the symptom tag is determined from the at least one candidate tool function.

[0135] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the content structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0136] Please refer to Figure 6 The diagram shows a structural block diagram of a computer device 600 provided in one embodiment of this application.

[0137] Typically, computer device 600 includes a processor 610 and a memory 620.

[0138] Processor 610 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 610 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field Programmable Gate Array), and PLA (Programmable Logic Array). Processor 610 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 610 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 610 may also include an AI processor for handling computational operations related to machine learning.

[0139] The memory 620 may include one or more computer-readable storage media, which may be non-transitory. The memory 620 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 620 are used to store a computer program configured to be executed by one or more processors to implement the above-described diet planning method.

[0140] Those skilled in the art will understand that Figure 6The structure shown does not constitute a limitation on the computer device 600, and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0141] In an exemplary embodiment, a computer-readable storage medium is also provided, wherein a computer program is stored in the storage medium, and the computer program, when executed by a processor, implements the above-described diet planning method. Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random Access Memory), SSD (Solid State Drives), or optical disc, etc. The random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).

[0142] In an exemplary embodiment, a computer program product is also provided, the computer program product including a computer program stored in a computer-readable storage medium. A processor of a computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, causing the computer device to perform the above-described diet planning method.

[0143] It should be noted that the acquisition, storage, processing and use of the relevant data involved in this application, such as user dining images and object data, all comply with relevant laws and regulations and privacy policies, and are carried out with the explicit authorization of the user to protect the security and privacy rights of user data.

[0144] It should be understood that "multiple" as used herein refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. Furthermore, the step numbers described herein are merely illustrative of one possible execution order. In some other embodiments, the steps may not be executed in numerical order, such as two steps with different numbers being executed simultaneously, or two steps with different numbers being executed in the reverse order of the illustration. This application does not limit this.

[0145] The above description is merely an exemplary embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A dietary planning method, characterized in that, The method includes: Acquire dining images and object data of the user object, wherein the object data is used to indicate the attribute information and pathological information of the user object; Obtain preset rules, which include at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the dietary behavior of the user, and the judgment condition corresponding to the task is used to evaluate whether the user meets the requirements for performing the task. Based on the judgment conditions corresponding to each task in the preset rules, the dining image, and the object data, the target task that the user object should perform is determined from the at least one task using a large language model. The target task is processed by the large language model to generate output text; wherein the output text is used to provide dietary advice corresponding to the target task to the user in natural language form.

2. The method according to claim 1, characterized in that, Each of the tasks includes at least one sub-condition for judgment, and the at least one sub-condition for judgment corresponds to a preset order. The preset order is used to indicate the order in which the at least one sub-condition for judgment is executed in the judgment process. The step of determining the target task that the user object should perform from the at least one task using a large language model based on the judgment conditions corresponding to each task in the preset rules, the dining image, and the object data includes: For any one of the at least one tasks, the large language model performs judgments on at least one decision sub-condition corresponding to the task in a predetermined order based on the dining image and the object data. If the requirements for executing the task are met, the task is determined as the target task.

3. The method according to claim 2, characterized in that, The step of using the large language model to perform judgments on at least one decision sub-condition corresponding to the task in a preset order based on the dining image and the object data, and determining the task as the target task when the requirements for executing the task are met, includes: Based on the dining image and the object data, the large language model determines the output result of the i-th decision sub-condition in at least one decision sub-condition corresponding to the task. The output result is used to indicate whether the i-th decision sub-condition is satisfied, where i is a positive integer. Based on the output of the i-th decision sub-condition, the decision result of the i-th decision sub-condition is determined; wherein, the decision result includes at least one of the following: a transition identifier and a termination identifier, wherein the transition identifier is used to indicate the (i+1)-th decision sub-condition to be executed in the at least one decision sub-condition, and the termination identifier includes an acceptance identifier and a rejection identifier, wherein the acceptance identifier is used to indicate acceptance of the task, and the rejection identifier is used to indicate rejection of execution of the task; If the determination result of the i-th determination sub-condition is the circulation identifier, the output result of the (i+1)-th determination sub-condition is determined based on the dining image and the object data, and the determination result of the (i+1)-th determination sub-condition is determined based on the output result of the (i+1)-th determination sub-condition. If the determination result of the i-th decision sub-condition is the termination identifier, and if the termination identifier is the acceptance identifier, the task is determined as the target task.

4. The method according to claim 3, characterized in that, The step of determining the output of the i-th decision sub-condition among at least one decision sub-condition corresponding to the task based on the dining image and the object data using the large language model includes: The large language model is used to perform a task planning recognition operation on the i-th decision sub-condition to determine the planning result. The planning result is used to indicate the execution steps required to determine the output result of the i-th decision sub-condition. Based on the planning results, the dining image, and the object data, a first target tool function is determined from at least one candidate tool function included in the function library, and the input parameters of the first target tool function are determined. The input parameters are input into the first target tool function to generate the output result of the i-th decision sub-condition.

5. The method according to claim 4, characterized in that, The first target tool function includes a multimodal large model, and the input parameters of the first target tool function include prompt words and the dining image; The step of inputting the input parameters into the first target tool function to generate the output result of the i-th decision sub-condition includes: The prompt word and the dining image are input into the multimodal large model. The multimodal large model performs image recognition processing on the dining image to generate the output result of the i-th decision sub-condition.

6. The method according to claim 5, characterized in that, The method further includes: In response to the correction operation for the target task, the corrected target task is obtained; Perform difference identification processing on the target task and the modified target task to obtain difference information; Based on the difference information, erroneous tasks are identified from the target tasks. The erroneous tasks refer to tasks that the user object does not need to perform. Based on the aforementioned error task, at least one training sample is generated; Under the condition of meeting the preset training conditions, at least one of the large language model and the multimodal large model is trained based on the at least one training sample.

7. The method according to claim 6, characterized in that, The step of generating at least one training sample based on the error task includes: An error decision sub-condition is determined from at least one decision sub-condition corresponding to the error task, wherein the error decision sub-condition refers to a decision sub-condition in which the output result is incorrect; The output of the error determination sub-condition is corrected to obtain the corrected output. Based on the input parameters of the error determination sub-condition and the corrected output result, the at least one training sample is generated.

8. The method according to any one of claims 1 to 7, characterized in that, The method further includes: In the event that information is missing in the target task, the large language model determines the missing information based on the target task, the dining image, and the object data. Based on the completion information, the target task is completed to obtain the completed target task.

9. The method according to claim 8, characterized in that, The step of determining the completion information based on the target task, the dining image, and the object data using the large language model includes: The large language model is used to determine a second target tool function that matches the target task from at least one candidate tool function included in the function library; Based on the dining image and the object data, determine the input parameters of the second target tool function; The input parameters are input into the second target tool function to generate the completion information.

10. The method according to claim 9, characterized in that, The second target tool function includes a nutrition function, which is used to determine the intake amount or intake range of the nutrient elements corresponding to the target task; The step of determining a second target tool function matching the target task from at least one candidate tool function included in the function library using the large language model includes: Based on the object data, the symptom tags of the user object are determined; wherein, different symptom tags correspond to different nutritional functions. Based on the symptom label, a nutritional function matching the symptom label is determined from the at least one candidate tool function.

11. A diet planning device, characterized in that, The device includes: The first acquisition module is used to acquire the dining image and object data of the user object, wherein the object data is used to indicate the attribute information and pathological information of the user object; The second acquisition module is used to acquire preset rules, the preset rules including at least one judgment condition corresponding to each task; wherein, different tasks are used to indicate different suggestions for the dietary behavior of the user object, and the judgment condition corresponding to the task is used to evaluate whether the user object meets the requirements for performing the task. The determination module is used to determine the target task that the user object should perform from the at least one task by using a large language model based on the judgment conditions corresponding to each task in the preset rules, the dining image and the object data; The generation module is used to perform natural language organization processing on the target task through the large language model to generate output text; wherein the output text is used to provide dietary suggestions corresponding to the target task to the user in natural language form.

12. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the method as claimed in any one of claims 1 to 10.

13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, which is loaded and executed by a processor to implement the method as described in any one of claims 1 to 10.

14. A computer program product, characterized in that, The computer program product includes a computer program stored in a computer-readable storage medium, which a processor reads from and executes to implement the method as described in any one of claims 1 to 10.