A machine learning based method and apparatus for predicting hydroponic lettuce yield in a facility

By applying random forest, support vector machine regression, and extreme gradient boosting models to hydroponically grown lettuce in greenhouses, combined with lettuce images and environmental heat indicators, the problems of high computational resource requirements and low prediction accuracy of vegetable crops at different growth stages in greenhouses were solved, achieving more efficient yield prediction.

CN119494437BActive Publication Date: 2026-06-16BEIJING JINGWA AGRICULTURAL SCIENCE & TECHNOLOGY INNOVATION CENTER

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING JINGWA AGRICULTURAL SCIENCE & TECHNOLOGY INNOVATION CENTER
Filing Date
2024-10-29
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In greenhouses, vegetable crops are at different growth stages, and existing machine learning algorithms require input of multiple feature parameters, resulting in high computational resource requirements and low prediction accuracy.

Method used

A machine learning-based method for predicting the yield of hydroponically grown lettuce was adopted. By determining the number of plants and growth indicators at each growth stage in the lettuce image, and combining the environmental heat index, the yield prediction was carried out using random forest, support vector machine regression and extreme gradient boosting models. The lettuce yield models for each growth stage were trained respectively.

🎯Benefits of technology

This reduces the computational resource requirements of machine learning algorithms and improves the accuracy of yield prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119494437B_ABST
    Figure CN119494437B_ABST
Patent Text Reader

Abstract

The application provides a facility hydroponic lettuce yield prediction method and device based on machine learning, comprising obtaining an environmental heat index value in a current growth period in a to-be-predicted greenhouse, inputting a growth index set of plants in each growth stage, the number of plants in each growth stage and the environmental heat index value in the to-be-predicted greenhouse into a corresponding lettuce yield model of each growth stage respectively, obtaining a lettuce yield prediction value of each growth stage, and obtaining a final lettuce yield prediction value of hydroponic lettuce in the to-be-predicted greenhouse according to the lettuce yield prediction values of all growth stages. In the application, the problem that multiple characteristic parameters need to be input into a machine learning algorithm due to the fact that the vegetable crops in the greenhouse are in different growth stages is solved, the machine learning algorithm does not require large computing resources, the training and calculation cost is reduced, and the prediction accuracy is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to one or more embodiments in the field of network technology, and more particularly to a method and apparatus for predicting the yield of hydroponic lettuce based on machine learning. Background Technology

[0002] Hydroponic lettuce is mostly cultivated in greenhouses. Greenhouses provide a relatively stable microclimate environment for the growth and development of horticultural crops, enabling multiple cropping and improving yield and quality. In greenhouses, due to the different planting times of vegetables, multiple growth stages coexist. Machine learning algorithms can capture features that cannot be fully summarized through manual statistics and have been widely used in interdisciplinary research. With the application of machine learning and deep learning in agriculture, related algorithms have also been applied to predict agricultural yields. However, because greenhouse vegetables are at different growth stages, multiple feature parameters need to be input into the machine learning algorithm, requiring significant computational resources and increasing training costs. Furthermore, using only a subset of feature parameters affects the model's prediction accuracy. Summary of the Invention

[0003] This application describes a method and apparatus for predicting the yield of hydroponically grown lettuce based on machine learning, which can solve the above-mentioned technical problems.

[0004] According to the first aspect, a method for predicting the yield of hydroponically grown lettuce in greenhouses based on machine learning is provided, the method comprising:

[0005] Based on images of hydroponic lettuce in the greenhouse to be predicted during its current growth period, the number of plants in each growth stage and the set of growth indicators for each growth stage in the hydroponic lettuce in the greenhouse to be predicted are determined.

[0006] Obtain the environmental heat index values ​​of the greenhouse to be predicted during the current growth period, the environmental heat index values ​​including daily photosynthetically active radiation product, daily effective accumulated temperature and daily radiative heat product;

[0007] The growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value of the greenhouse to be predicted are respectively input into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes random forest lettuce yield model, support vector machine regression lettuce yield model, and extreme gradient boosting lettuce yield model.

[0008] Based on the predicted yield values ​​of lettuce at all growth stages, the final predicted yield value of hydroponic lettuce in the greenhouse to be predicted is obtained.

[0009] In some embodiments, the growth stages of hydroponic lettuce include germination, seedling, and growth stages;

[0010] The process involves inputting the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value of the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. Specifically, this includes:

[0011] When the hydroponic lettuce is in the germination stage, the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage are input into the random forest lettuce yield model to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0012] When the hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0013] When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0014] The random forest lettuce yield model is trained based on a first sample growth index subset and a first sample predicted value; the support vector machine regression lettuce yield model is trained based on a second sample growth index subset and a second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on a third sample growth index subset and a third sample predicted value.

[0015] In some embodiments, obtaining the final predicted yield of the hydroponic lettuce in the greenhouse to be predicted based on the predicted yield values ​​of lettuce at all growth stages specifically includes:

[0016] Based on the first yield prediction result, the second yield prediction result, and the third yield prediction result, the final predicted yield of hydroponic lettuce in the greenhouse to be predicted is obtained.

[0017] In some embodiments, determining the number of hydroponic lettuce plants at each growth stage and the set of growth indicators for each growth stage in the hydroponic lettuce in the greenhouse to be predicted, based on images of the lettuce at its current growth stage, specifically includes:

[0018] The images of hydroponic lettuce at the current growth stage are divided according to a preset growth stage division rule to obtain a first sub-lettuce image, a second sub-lettuce image, and a third sub-lettuce image. The first sub-lettuce image is the lettuce image in the germination stage, the second sub-lettuce image is the lettuce image in the seedling stage, and the third sub-lettuce image is the lettuce image in the growth stage.

[0019] Extract a first set of growth indicators for lettuce in the germination stage from the first lettuce image, extract a second set of growth indicators for lettuce in the seedling stage from the second lettuce image, and extract a third set of growth indicators for lettuce in the growth stage from the third lettuce image.

[0020] The first subset of growth indicators includes plant height and number of leaves, the second subset of growth indicators includes the number of leaves and leaf area index characteristics, and the third subset of growth indicators includes the crown width and leaf area index characteristics.

[0021] In some embodiments, the method further includes:

[0022] Establish a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boosting lettuce yield model, wherein the random forest lettuce yield model is based on the random forest model, the support vector machine regression lettuce yield model is based on the support vector regression model, and the extreme gradient boosting lettuce yield model is based on the extreme gradient boosting model;

[0023] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the random forest lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the random forest lettuce yield model.

[0024] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the support vector machine regression lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the support vector machine regression lettuce yield model.

[0025] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the extreme gradient enhancement lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the extreme gradient enhancement lettuce yield model.

[0026] Calculate the first predicted value of the random forest lettuce yield model, the first predicted value of the support vector machine regression lettuce yield model, and the first predicted value of the extreme gradient boost lettuce yield model, respectively, and compare them with the first evaluation index value of the actual yield value of the sample. Based on the first evaluation index value, determine that the prediction model corresponding to the first growth index subset is the random forest lettuce yield model.

[0027] The second predicted value of the random forest lettuce yield model, the second predicted value of the support vector machine regression lettuce yield model, and the second predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the second evaluation index value of the actual yield value of the sample. Based on the second evaluation index value, it is determined that the prediction model corresponding to the second growth index subset is the support vector machine regression lettuce yield model.

[0028] The third predicted value of the random forest lettuce yield model, the third predicted value of the support vector machine regression lettuce yield model, and the third predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the third evaluation index value of the actual yield value of the sample. Based on the third evaluation index value, it is determined that the prediction model corresponding to the subset of the third growth index is the extreme gradient boost lettuce yield model.

[0029] In some embodiments, the method further includes:

[0030] Input the following parameters into the function: the lower limit temperature Tb for plant development, the upper limit temperature Tm for plant development, the lower limit temperature Tob for optimal growth, and the upper limit temperature Tou for optimal growth:

[0031]

[0032] The relative thermal effect ERTE(T) at the current average temperature T is obtained.

[0033] The daily radiative heat product is obtained based on the relative thermal effect ERTE(T) of the current average temperature T and the average daily photosynthetically active radiation during the current growth period.

[0034] The effective accumulated temperature for each day is obtained based on the current average temperature T, the lower limit temperature for development Tmin, and the preset number of growth days.

[0035] According to a second aspect, a machine learning-based device for predicting the yield of hydroponically grown lettuce is provided, the device comprising:

[0036] The first processing module is used to determine the number of hydroponic lettuce plants in each growth stage and the set of growth indicators of the plants in each growth stage in the hydroponic lettuce in the greenhouse to be predicted, based on the lettuce images in the current growth stage of the hydroponic lettuce in the greenhouse to be predicted.

[0037] The second processing module is used to obtain the environmental heat index value of the greenhouse to be predicted during the current growth period. The environmental heat index value includes the daily photosynthetically active radiation product, the daily effective accumulated temperature, and the daily radiative heat product.

[0038] The third processing module is used to input the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value in the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boost lettuce yield model.

[0039] The fourth processing module is used to obtain the final predicted yield of hydroponic lettuce in the greenhouse to be predicted based on the predicted yield values ​​of lettuce at all growth stages.

[0040] In some embodiments, the growth stages of hydroponic lettuce include germination, seedling, and growth stages;

[0041] The third processing module is specifically used to input the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage into the random forest lettuce yield model when the hydroponic lettuce is in the germination stage, to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted.

[0042] When the hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0043] When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0044] The random forest lettuce yield model is trained based on a first sample growth index subset and a first sample predicted value; the support vector machine regression lettuce yield model is trained based on a second sample growth index subset and a second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on a third sample growth index subset and a third sample predicted value.

[0045] Thirdly, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the machine learning-based method for predicting the yield of hydroponic lettuce in facilities as described above.

[0046] Fourthly, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the aforementioned machine learning-based method for predicting the yield of hydroponic lettuce in facilities.

[0047] The systems and methods provided in the embodiments of this specification solve the problem that, due to the different growth stages of vegetable crops in greenhouses, multiple feature parameters need to be input into machine learning algorithms. The machine learning algorithms do not require large computing resources, reducing training and computation costs, and improving prediction accuracy. (See attached figures.)

[0048] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0049] Figure 1 This diagram illustrates a flowchart of a machine learning-based method for predicting the yield of hydroponically grown lettuce in a facility, as provided in an embodiment of this specification.

[0050] Figure 2 This diagram illustrates a module schematic of a machine learning-based hydroponic lettuce yield prediction device provided in an embodiment of this specification.

[0051] Figure 3 A schematic diagram showing an example of a sensor installation location provided in an embodiment of this specification;

[0052] Figure 4 A schematic diagram illustrating the principle of random forest provided in the embodiments of this specification is shown;

[0053] Figure 5 A schematic diagram illustrating the support vector principle provided in the embodiments of this specification is shown;

[0054] Figure 6 A schematic diagram illustrating the extreme gradient boosting principle provided in the embodiments of this specification is shown;

[0055] Figure 7 This diagram illustrates the model training process provided in the embodiments of this specification.

[0056] Figure 8 This diagram illustrates the dataset segmentation process provided in the embodiments of this specification. Detailed Implementation

[0057] The solution provided in this specification will now be described with reference to the accompanying drawings.

[0058] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions in the embodiments of this application will be described below with reference to the accompanying drawings.

[0059] In the description of the embodiments of this application, the words "exemplary," "for example," or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplary," "for example," or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the words "exemplary," "for example," or "for instance" is intended to present the relevant concepts in a specific manner.

[0060] In the description of the embodiments of this application, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, B existing alone, and A and B existing simultaneously. Furthermore, unless otherwise stated, the term "multiple" means two or more.

[0061] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and their variations all mean "including but not limited to," unless otherwise specifically emphasized.

[0062] Figure 1 The present invention illustrates a processing flow for a machine learning-based method for predicting the yield of hydroponically grown lettuce in greenhouses, comprising:

[0063] 110. Based on the images of hydroponic lettuce in the greenhouse to be predicted during its current growth period, determine the number of plants in each growth stage and the set of growth indicators for plants in each growth stage in the hydroponic lettuce in the greenhouse to be predicted.

[0064] 120. Obtain the environmental heat index values ​​of the greenhouse to be predicted during the current growth period. The environmental heat index values ​​include the daily photosynthetically active radiation volume, the daily effective accumulated temperature, and the daily radiative heat volume.

[0065] 130. Input the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value of the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes random forest lettuce yield model, support vector machine regression lettuce yield model, and extreme gradient boosting lettuce yield model.

[0066] 140. Based on the predicted yield values ​​of lettuce at all growth stages, the final predicted yield value of hydroponic lettuce in the greenhouse to be predicted is obtained.

[0067] In some embodiments, the growth stages of hydroponic lettuce include germination, seedling, and growth stages;

[0068] Step 130 specifically includes:

[0069] When hydroponic lettuce is in the germination stage, the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage are input into the random forest lettuce yield model to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted.

[0070] When hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted.

[0071] When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted.

[0072] Among them, the random forest lettuce yield model is trained based on the first sample growth index subset and the first sample predicted value; the support vector machine regression lettuce yield model is trained based on the second sample growth index subset and the second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on the third sample growth index subset and the third sample predicted value.

[0073] In some embodiments, step 140 specifically includes:

[0074] Based on the first, second, and third yield prediction results, the final predicted yield of hydroponically grown lettuce in the greenhouse is obtained.

[0075] In some embodiments, step 110 specifically includes:

[0076] The images of hydroponic lettuce at the current growth stage are divided according to a preset growth stage division rule to obtain the first sub-lettuce image, the second sub-lettuce image, and the third sub-lettuce image. The first sub-lettuce image is the lettuce image in the germination stage, the second sub-lettuce image is the lettuce image in the seedling stage, and the third sub-lettuce image is the lettuce image in the growth stage.

[0077] Extract the first growth index subset of lettuce in the germination stage from the first lettuce image, extract the second growth index subset of lettuce in the seedling stage from the second lettuce image, and extract the third growth index subset of lettuce in the growth stage from the third lettuce image.

[0078] The first subset of growth indicators includes plant height and number of leaves; the second subset includes the number of leaves and leaf area index characteristics; and the third subset includes the crown width and leaf area index characteristics.

[0079] In some embodiments, it also includes:

[0080] Establish a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boosting lettuce yield model, wherein the random forest lettuce yield model is based on the random forest model, the support vector machine regression lettuce yield model is based on the support vector regression model, and the extreme gradient boosting lettuce yield model is based on the extreme gradient boosting model;

[0081] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the random forest lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the random forest lettuce yield model.

[0082] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the support vector machine regression lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the support vector machine regression lettuce yield model.

[0083] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the extreme gradient enhancement lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the extreme gradient enhancement lettuce yield model.

[0084] Calculate the first predicted value of the random forest lettuce yield model, the first predicted value of the support vector machine regression lettuce yield model, and the first predicted value of the extreme gradient boost lettuce yield model, respectively, and compare them with the first evaluation index value of the actual yield value of the sample. Based on the first evaluation index value, determine that the prediction model corresponding to the first growth index subset is the random forest lettuce yield model.

[0085] The second predicted value of the random forest lettuce yield model, the second predicted value of the support vector machine regression lettuce yield model, and the second predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the second evaluation index value of the actual yield value of the sample. Based on the second evaluation index value, it is determined that the prediction model corresponding to the second growth index subset is the support vector machine regression lettuce yield model.

[0086] The third predicted value of the random forest lettuce yield model, the third predicted value of the support vector machine regression lettuce yield model, and the third predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the third evaluation index value of the actual yield value of the sample. Based on the third evaluation index value, it is determined that the prediction model corresponding to the subset of the third growth index is the extreme gradient boost lettuce yield model.

[0087] In some embodiments, including:

[0088] Input the following parameters into the function: the lower limit temperature Tb for plant development, the upper limit temperature Tm for plant development, the lower limit temperature Tob for optimal growth, and the upper limit temperature Tou for optimal growth:

[0089]

[0090] The relative thermal effect ERTE(T) at the current average temperature T is obtained.

[0091] The daily radiative heat product is obtained based on the relative thermal effect ERTE(T) of the current average temperature T and the average daily photosynthetically active radiation during the current growth period.

[0092] The effective accumulated temperature for each day is obtained based on the current average temperature T, the lower limit temperature for development Tmin, and the preset number of growth days.

[0093] Hydroponic lettuce is mostly cultivated in greenhouses, which provide a relatively stable microclimate environment for the growth and development of horticultural crops, enabling multiple cropping and improving yield and quality. However, greenhouse cultivation management and yield prediction rely heavily on empirical methods, lacking multi-factor analysis of the entire crop growth process. Modeling methods need to be applied to horticultural crop growth research to provide more accurate predictions.

[0094] Plant growth models, which simulate crop growth and development using computer simulations and establish several relationships, are important tools for studying the growth patterns of crops and horticultural crops. Machine learning algorithms can capture features that cannot be fully summarized through manual statistics and have been widely used in interdisciplinary research. With the application of machine learning and deep learning in agriculture, related algorithms have also been applied to predict agricultural yields. Machine learning methods such as Artificial Neural Networks (ANN), Random Forests, Support Vector Machines (SVMs), and Extreme Gradient Boosting are used to predict crop yields and can perform yield analysis under the influence of multiple factors, featuring low training costs and high efficiency.

[0095] This invention studies the growth characteristics of hydroponically grown lettuce in large-span plastic greenhouses throughout the year, the relationship between lettuce growth and climate environment, and predicts lettuce leaf area and yield based on three machine learning algorithms: Logistic regression and random forest, support vector regression, and extreme gradient boosting. The appropriate model is selected based on the evaluation index.

[0096] Specifically, in the data collection for the hydroponic lettuce environment, an indoor data logger (HOBOH21-USB) was placed on each of the east and west sides of the planting area. This logger included a temperature and humidity sensor (HOBOS-THC-M00x), a photosynthetically active radiation sensor (HOBOS-LIA-M003), a total radiation sensor (HOBOS-LIB-M003), and a carbon dioxide sensor (EspecE2020255). The sensor installation locations are as follows: Figure 3 As shown.

[0097] Specifically, the radiation heat product is calculated as follows:

[0098] The daily relative heat effect (ERTE) is calculated using a piecewise function. The relative heat effect and the radiative product (TEP, Accumulated Product of Thermal Effectiveness and PAR) are calculated using the following formula:

[0099]

[0100] Where ERTE(T) represents the relative thermal effect at temperature T. Tb represents the lower limit temperature for growth. Tm represents the upper limit temperature for growth. Tob and Tou represent the lower and upper limits of optimal growth, respectively. The values ​​of Tob and Tou are 8℃, 40℃, 25℃, and 30℃, respectively. The unit of daily radiative heat accumulation is MJ / m2.

[0101]

[0102] Where TTEP is the radiative heat product per ten minutes, ERTE is the relative thermal effect, PAR is the average photosynthetically active radiation (W / m2) within ten minutes, and DTEP is the radiative heat product per day.

[0103] The calculation process is implemented using Python statements. First, the Pandas library is imported; temperature data from the directory is read and a list is created to store the results; then, nested if statements are used to perform piecewise function calculations; finally, each result is appended to the list, a new data frame is created, and the data is saved.

[0104] Method for calculating effective accumulated temperature:

[0105] HU = d * (T - Tb)

[0106] HU represents the effective accumulated temperature (°C·d: degree-day); T represents the daily average temperature; Tb represents the lower limit temperature for development; and d represents the number of growth days.

[0107] Methods for calculating the photosynthetically active radiation product:

[0108]

[0109] Where Q is the average total solar radiation over ten minutes (W / m2), PAR is the average photosynthetically active radiation over ten minutes (W / m2), PAR(t) is the photosynthetically active radiation per ten minutes (J / m2), and DPARA is the daily accumulation of photosynthetically active radiation (MJ / m2).

[0110] Growth index measurements include:

[0111] Fresh weight: The above-ground and underground parts of the lettuce plant are separated, and the fresh weight is measured.

[0112] Dry weight: The above-ground and underground parts of lettuce are blanched in an oven at 105℃ for 20 minutes, then dried at 72℃ until constant weight, and the dry weight is measured.

[0113] Leaf count: Record the number of leaves.

[0114] Leaf area: The leaf area was measured by scanning with a leaf tile scanner (Epson DS-5000) and analyzing the results with leaf area measurement software (Leaf1000).

[0115] Plant height: Measure the height from the base of the lettuce stem to the canopy.

[0116] Crown width: Measure the distance the lettuce crown spreads out horizontally in its natural state.

[0117] The nonlinear regression yield model uses the nonlinear Logistic growth function, with photosynthetically effective radiation product, effective accumulated temperature and radiative product as input variables, and aboveground fresh weight as output value, to establish a nonlinear yield model.

[0118] The Logistic function is calculated using the following formula:

[0119]

[0120] Where Y is the aboveground fresh weight, b and k are constants, A is the maximum aboveground fresh weight, and t is the number of growing days.

[0121] Machine learning can capture the relationships between various feature variables. By training on an existing dataset, it can predict points outside the dataset, demonstrating a certain degree of learning capability. Compared to nonlinear regression models, machine learning models can analyze more relationships between feature variables, provide better interpretability of the output results, and achieve better model fit.

[0122] This invention selects three machine learning regression algorithms that are widely used in agricultural research: Random Forest (SF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost) to model lettuce yield, leaf area, etc.

[0123] Random Forest (RF) improves the accuracy and stability of predictions by assembling multiple CART regression decision trees. For example... Figure 4 Each decision tree is built based on a process of random sampling of the dataset (drawing multiple subsamples) and random selection of splitting features (splitting the tree nodes). For regression problems, the squared error of all possible splitting points is calculated for each feature, and the splitting point that minimizes the sum of the squared errors of the two child nodes after the node splits is selected as the optimal splitting point. The node splitting criterion is as follows:

[0124]

[0125] Where K represents the number of samples, ŷi represents the actual value, and yi represents the predicted value.

[0126] Each dataset is randomly selected with replacement, and each decision tree produces one result. The final result of the random forest is the average of the predictions from multiple decision trees, as shown in the following formula:

[0127]

[0128] Where Y represents the model prediction, K represents the number of decision regression trees, and h(x) represents the result of each decision tree.

[0129] Support Vector Machine (SVM) is a supervised learning algorithm for classification and regression that finds the optimal hyperplane by maximizing the margins. Figure 5 The SVR model can be simply understood as creating a "gaps" on both sides of a linear function. Values ​​inside the dashed line can be considered as correct predictions. The ultimate goal is to maximize the value falling within the gap and minimize the loss. That is, the samples should be as close as possible to f(x-)-ɛ and f(x-)+ɛ, where ɛ is the hyperparameter of SVR.

[0130] The formula for calculating a linear function is:

[0131] f(x) = ωtx + b

[0132] Where f(x) represents the predicted value, ωt is the transpose of the weight vector ω, x is the aboveground fresh weight or leaf area containing each feature vector, and b is the bias term.

[0133] Extreme Gradient Boosting (XGBoost) is a decision tree-based optimization ensemble algorithm, such as... Figure 6 The main method is to gradually increase the complexity of the model and train the residuals at each step to optimize the prediction performance. In other words, the decision trees that make up XGBoost are arranged in a specific order.

[0134] In the XGBoost algorithm, the objective function determines the model's training process and typically consists of two parts: a loss function and a regularization term. The regularization term controls the model's complexity, and its formula is:

[0135]

[0136] Where ŷi is the predicted value of the i-th sample, fk is the prediction function of the K-th tree, t(yiŷi(t)) represents the loss function, Ω(f) represents the regularization term for a single tree, γ and λ are regularization parameters, T is the number of leaf nodes, and ω is the weight of the leaf nodes.

[0137] XGBoost uses the Gradient Boosting algorithm to generate weak classifiers. In this model, for a training set, it first trains a model using decision trees, generating a residual value for each sample. Then, these residual values ​​are used as a new training set, and the model is trained using decision trees again. This process is repeated until a certain exit condition is met. Therefore, the prediction value of each tree is the sum of the prediction values ​​of all the preceding trees. The formula for the prediction value is:

[0138]

[0139] Where xi is the aboveground fresh weight or leaf area containing each feature vector, ŷi(t) is the predicted value of the i-th sample in the i-th iteration, ŷi(t-1) is the sum of the predicted values ​​of the first t-1 trees, and ft is the prediction function of the t-th tree.

[0140] The experimental dataset consisted of lettuce growth and environmental data from April 2023 to April 2024. During the planting process, the conductivity and pH of the nutrient solution fluctuated slightly within a certain range and were relatively stable; therefore, they were treated as constants in the modeling process. Environmental parameters such as photosynthetically active radiation area, carbon dioxide concentration, daily average air temperature, effective accumulated temperature, and radiative heat area directly affected plant growth during planting and had their own characteristics; therefore, they were directly used as input variables. Growth indicators such as crown width, plant height, number of leaves, leaf area, number of days after transplanting, and above-ground fresh weight reflected the overall growth of the plant and were also used as input variables. The final output variables were above-ground fresh weight and leaf area, respectively. Twenty-two batches of lettuce were planted in the experiment; 20 batches were used as the training set, 1 batch as the validation set, and 1 batch as the test set. The datasets for the lettuce yield and leaf area models each contained 1870 data points.

[0141] Model training flowchart as follows Figure 7 First, the raw data is preprocessed to remove outliers and null values. To ensure the model's repeatability, the Random_State needs to be within the range of 100-600, with different random states determined in steps of 100, resulting in six different training and test sets, which defines the sampling rules. The same Random_State value results in the same sampling batch.

[0142] Training set segmentation flowchart as follows Figure 8 The model is divided into 9 training sets and 1 validation set using 10-fold cross-validation to evaluate the performance of different hyperparameter combinations, ensuring the model's generalization ability and helping to avoid overfitting or underfitting.

[0143] After dividing the training, validation, and test sets, a grid search is performed on different parameter combinations. All found combinations are then evaluated on the validation set, and the combination that performs best is identified as the optimal model parameter combination. Finally, each set of parameters is input into the respective model's test set to obtain the model's final performance result.

[0144] The hyperparameters in a random forest are the depth of each decision tree (max_depth), the maximum number of features used per decision tree (max_features), the minimum number of samples per internal node (min_samples_split), and the number of trees (n_estimators). The maximum depth of the decision trees is generally not limited when the sample size or number of features is small. The number of trees represents the number of weak classifiers; a larger value results in better accuracy, but excessively large values ​​offer limited improvement and significantly waste computational resources.

[0145] In the SVR model, after comparing the performance results of using three kernel functions, the poly kernel was chosen as the model kernel function. The hyperparameters are the penalty coefficient C and the constant value coef0 in the kernel function, with all other parameters set to default values. A larger penalty coefficient indicates a higher level of focus on the total error during optimization and a greater requirement for error reduction. The constant value in the kernel function is only used when the kernel function is poly or sigmoid.

[0146] In extreme gradient boosting, the hyperparameters are the learning rate and the maximum depth of the tree. The learning rate is a value in the range (0, 1) that XGBoost multiplies the residual value fitted by each weak learner (tree) to prevent overfitting. Setting a small learning rate allows you to learn several weak learners to compensate for the insufficient residual. Setting the maximum depth of the tree (weak learner) reasonably can prevent overfitting.

[0147] At the beginning of each model's development, the hyperparameters of each model's algorithm must first be determined. The best-performing model parameters are obtained through a grid search method. Generally, only the hyperparameters are selected, while the remaining parameters are left at default values. The preset hyperparameter combinations for the three machine learning models are shown in Table 1.

[0148]

[0149] Table 1 Model Hyperparameter Combinations

[0150] After establishing each model, the mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R²) are used to determine the results. 2 The models are comprehensively evaluated using criteria such as mean square error (MSE) and root mean square error (RMSE). Smaller MSE and RMS values ​​indicate smaller model errors and better predictive performance. The linearity between predicted and measured values ​​is expressed using R². 2This means that the closer the predicted value is to the measured value, the closer it is to the actual value. The parameter value is selected based on the optimal result. The evaluation index formula is as follows:

[0151]

[0152] in, ŷ i Indicates the predicted value. y i Represents the true value. n For the sample size, ӯ i It is all y i The average value.

[0153] Using the photosynthetically active radiation product, effective accumulated temperature, and radiative heat product during the lettuce's growth period as inputs, and the aboveground fresh weight as the output, an initial dataset was obtained. After removing outliers, the initial dataset was used to calculate the parameters of the Logistic model using the SPSS nonlinear regression algorithm. The calculation model is as follows:

[0154]

[0155] Where Y is the aboveground fresh weight; t represents the photosynthetically active radiation volume, effective accumulated temperature, and radiative heat volume, respectively.

[0156] Based on the above calculation model, the yield prediction model for lettuce was evaluated. The results are shown in Table 2. The effective accumulated temperature-above-ground fresh weight model had the highest R² of 0.832 and the lowest RMSE of 18.693g, showing the best overall performance. The model comparison results indicate that when effective accumulated temperature is used as an input variable, the predicted value is more accurate, and the effect of effective accumulated temperature on lettuce growth is relatively stable.

[0157]

[0158] Table 2 Evaluation of three models for the fresh weight of aboveground parts of lettuce

[0159] Table 3 shows that in the validation set of lettuce aboveground fresh weight, the random_state setting of 600 performed best, while the setting of 400 performed worst. Therefore, the random_state can be determined to be 600. At this setting, the hyperparameters are set as max_depth=6, max_features=0.5, min_samples_split=4, and n_estimators=50. The evaluation metrics for the validation set are RMSE=4.363g and R²=0.991.

[0160]

[0161] Table 3. Parameter settings and evaluation indicators for the random forest lettuce yield model.

[0162] Table 4 shows that the random_state setting of 600 performed best in the validation set of lettuce aboveground fresh weight, while the performance was worst at 500. Therefore, the random_state setting can be determined to be 600, with the hyperparameters set to C=20 and coef0=7. The evaluation metrics for the validation set were RMSE 2.165g and R² 0.998, indicating that the support vector regression (SVR) model fits the lettuce yield very well.

[0163]

[0164] Table 4. Parameter settings and evaluation indicators for the support vector regression model of lettuce yield.

[0165]

[0166] Table 5. Parameter settings and evaluation indicators for the extreme gradient boosting lettuce yield model.

[0167] Table 5 shows that the evaluation performance of the lettuce validation set varied greatly. The best performance across all evaluation metrics was achieved with a random_state setting of 600, while the worst performance was achieved with a random_state setting of 400. Therefore, the random_state value was set to 600, with the hyperparameters set as follows: Gamma = 0.7, Learning_rate = 0.1, max_depth = 12, min_child_weight = 3, and subsample = 0.6. The validation set's RMSE was 4.509g, and R... 2 It is 0.991.

[0168] Lower mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE) indicate lower error and better model prediction accuracy; R 2 A higher value indicates that the predicted value is closer to the measured value, and the better the model fit. Table 6 shows that all three lettuce yield prediction models performed well, with RMSE and MAE both below 5g; and R² values ​​all exceeding 0.99. This indicates that all three models can be used for lettuce yield prediction with good results.

[0169]

[0170] Table 6 Evaluation Indicators for Lettuce Yield by Model

[0171] Machine learning, as a branch of artificial intelligence, can handle the complex relationship between climate change and crop growth and development, and has great potential for establishing smart agricultural systems. The machine learning regression models used in this invention, such as Logistic Regression, Random Forest, Support Vector Regression, and Extreme Gradient Boosting, are all black-box models. Black-box models do not require knowledge of complex physiological processes; they can obtain prediction results by parameterizing the growth process and influencing factors. In practical applications, they are convenient, fast, simple, and practical.

[0172] Crop yield models, developed using various methods, are of great significance to agricultural production. By collecting and analyzing historical yield data and various factors affecting crop growth (such as climate, soil conditions, fertilization, and pests and diseases), they can predict future lettuce yields. This helps managers make production plans, allocate resources rationally, and decide when to plant, when to harvest, and how to manage crops, thereby improving production efficiency and crop quality. Yield models can also provide insights into market supply. By understanding expected yields, market participants can better predict changes in crop supply and adjust procurement plans, inventory management, and pricing strategies accordingly. This helps reduce losses caused by supply-demand imbalances, optimize supply chain efficiency, and improve market stability.

[0173] Figure 2 As shown, the present invention provides a machine learning-based device for predicting the yield of hydroponically grown lettuce, the device comprising:

[0174] The first processing module is used to determine the number of hydroponic lettuce plants in each growth stage and the set of growth indicators of the plants in each growth stage in the hydroponic lettuce in the greenhouse to be predicted, based on the lettuce images in the current growth stage of the hydroponic lettuce in the greenhouse to be predicted.

[0175] The second processing module is used to obtain the environmental heat index value of the greenhouse to be predicted during the current growth period. The environmental heat index value includes the daily photosynthetically active radiation product, the daily effective accumulated temperature, and the daily radiative heat product.

[0176] The third processing module is used to input the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value in the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boost lettuce yield model.

[0177] The fourth processing module is used to obtain the final predicted yield of hydroponic lettuce in the greenhouse to be predicted based on the predicted yield values ​​of lettuce at all growth stages.

[0178] In some embodiments, the growth stages of hydroponic lettuce include germination, seedling, and growth stages;

[0179] The third processing module is specifically used to input the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage into the random forest lettuce yield model when the hydroponic lettuce is in the germination stage, to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted.

[0180] When the hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0181] When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted.

[0182] The random forest lettuce yield model is trained based on a first sample growth index subset and a first sample predicted value; the support vector machine regression lettuce yield model is trained based on a second sample growth index subset and a second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on a third sample growth index subset and a third sample predicted value.

[0183] In some embodiments, the fourth processing module is specifically used to obtain the final predicted yield of hydroponic lettuce in the greenhouse to be predicted based on the first yield prediction result, the second yield prediction result, and the third yield prediction result.

[0184] In some embodiments, the first processing module is specifically used to divide the hydroponic lettuce image at the current growth stage according to a preset growth stage division rule to obtain a first sub-lettuce image, a second sub-lettuce image, and a third sub-lettuce image, wherein the first sub-lettuce image is a lettuce image in the germination stage, the second sub-lettuce image is a lettuce image in the seedling stage, and the third sub-lettuce image is a lettuce image in the growth stage.

[0185] Extract a first set of growth indicators for lettuce in the germination stage from the first lettuce image, extract a second set of growth indicators for lettuce in the seedling stage from the second lettuce image, and extract a third set of growth indicators for lettuce in the growth stage from the third lettuce image.

[0186] The first subset of growth indicators includes plant height and number of leaves, the second subset of growth indicators includes the number of leaves and leaf area index characteristics, and the third subset of growth indicators includes the crown width and leaf area index characteristics.

[0187] In some embodiments, a pre-validation module is also included, used to establish a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boosting lettuce yield model, wherein the random forest lettuce yield model is based on a random forest model, the support vector machine regression lettuce yield model is based on a support vector regression model, and the extreme gradient boosting lettuce yield model is based on an extreme gradient boosting model.

[0188] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the random forest lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the random forest lettuce yield model.

[0189] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the support vector machine regression lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the support vector machine regression lettuce yield model.

[0190] The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the extreme gradient enhancement lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the extreme gradient enhancement lettuce yield model.

[0191] Calculate the first predicted value of the random forest lettuce yield model, the first predicted value of the support vector machine regression lettuce yield model, and the first predicted value of the extreme gradient boost lettuce yield model, respectively, and compare them with the first evaluation index value of the actual yield value of the sample. Based on the first evaluation index value, determine that the prediction model corresponding to the first growth index subset is the random forest lettuce yield model.

[0192] The second predicted value of the random forest lettuce yield model, the second predicted value of the support vector machine regression lettuce yield model, and the second predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the second evaluation index value of the actual yield value of the sample. Based on the second evaluation index value, it is determined that the prediction model corresponding to the second growth index subset is the support vector machine regression lettuce yield model.

[0193] The third predicted value of the random forest lettuce yield model, the third predicted value of the support vector machine regression lettuce yield model, and the third predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the third evaluation index value of the actual yield value of the sample. Based on the third evaluation index value, it is determined that the prediction model corresponding to the subset of the third growth index is the extreme gradient boost lettuce yield model.

[0194] In some embodiments, the first processing module is further configured to input a function based on the plant's lower developmental limit temperature Tb, upper developmental limit temperature Tm, optimal lower growth temperature Tob, and optimal upper growth temperature Tou:

[0195]

[0196] The relative thermal effect ERTE(T) at the current average temperature T is obtained.

[0197] The daily radiative heat product is obtained based on the relative thermal effect ERTE(T) of the current average temperature T and the average daily photosynthetically active radiation during the current growth period.

[0198] The effective accumulated temperature for each day is obtained based on the current average temperature T, the lower limit temperature for development, and the preset number of growth days.

[0199] According to another embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed in a computer, causes the computer to perform a combination Figure 1 The method described.

[0200] According to another embodiment, a computing device is also provided, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements a combination... Figure 1 The method described.

[0201] Those skilled in the art will recognize that, in one or more of the examples above, the functions described in this application can be implemented using hardware, software, firmware, or any combination thereof. When implemented in software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

[0202] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above description is only a specific embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solution of this application should be included within the scope of protection of this application.

Claims

1. A method for predicting the yield of hydroponically grown lettuce based on machine learning, characterized in that, The method includes: Based on images of hydroponic lettuce in the greenhouse to be predicted during its current growth period, the number of plants in each growth stage and the set of growth indicators for each growth stage in the hydroponic lettuce in the greenhouse to be predicted are determined. Obtain the environmental heat index values ​​of the greenhouse to be predicted during the current growth period, the environmental heat index values ​​including daily photosynthetically active radiation product, daily effective accumulated temperature and daily radiative heat product; The growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value of the greenhouse to be predicted are respectively input into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes random forest lettuce yield model, support vector machine regression lettuce yield model, and extreme gradient boosting lettuce yield model. Based on the predicted yield values ​​of lettuce at all growth stages, the final predicted yield value of hydroponic lettuce in the greenhouse to be predicted is obtained. The growth stages of hydroponically grown lettuce include germination, seedling, and growth stages; The process involves inputting the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value of the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. Specifically, this includes: When the hydroponic lettuce is in the germination stage, the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage are input into the random forest lettuce yield model to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted. When the hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted. When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted. The random forest lettuce yield model is trained based on a first sample growth index subset and a first sample predicted value; the support vector machine regression lettuce yield model is trained based on a second sample growth index subset and a second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on a third sample growth index subset and a third sample predicted value.

2. The method according to claim 1, characterized in that, The process of obtaining the final predicted yield of hydroponic lettuce in the greenhouse based on the predicted yield values ​​at all growth stages specifically includes: Based on the first yield prediction result, the second yield prediction result, and the third yield prediction result, the final predicted yield of hydroponic lettuce in the greenhouse to be predicted is obtained.

3. The method according to claim 2, characterized in that, The method involves determining the number of hydroponic lettuce plants at each growth stage and the set of growth indicators for each growth stage in the hydroponic lettuce plants in the greenhouse to be predicted, based on images of the lettuce plants at their current growth stage in the greenhouse to be predicted. Specifically, this includes: The images of hydroponic lettuce at the current growth stage are divided according to a preset growth stage division rule to obtain a first sub-lettuce image, a second sub-lettuce image, and a third sub-lettuce image. The first sub-lettuce image is the lettuce image in the germination stage, the second sub-lettuce image is the lettuce image in the seedling stage, and the third sub-lettuce image is the lettuce image in the growth stage. Extract a first set of growth indicators for lettuce in the germination stage from the first lettuce image, extract a second set of growth indicators for lettuce in the seedling stage from the second lettuce image, and extract a third set of growth indicators for lettuce in the growth stage from the third lettuce image. The first subset of growth indicators includes plant height and number of leaves, the second subset of growth indicators includes the number of leaves and leaf area index characteristics, and the third subset of growth indicators includes the crown width and leaf area index characteristics.

4. The method according to claim 3, characterized in that, The method further includes: Establish a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boosting lettuce yield model, wherein the random forest lettuce yield model is based on the random forest model, the support vector machine regression lettuce yield model is based on the support vector regression model, and the extreme gradient boosting lettuce yield model is based on the extreme gradient boosting model; The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the random forest lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the random forest lettuce yield model. The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the support vector machine regression lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the support vector machine regression lettuce yield model. The first sample growth index subset, the second sample growth index subset, and the third sample growth index subset are respectively input into the extreme gradient enhancement lettuce yield model to obtain the first predicted value, the second predicted value, and the third predicted value of the extreme gradient enhancement lettuce yield model. Calculate the first predicted value of the random forest lettuce yield model, the first predicted value of the support vector machine regression lettuce yield model, and the first predicted value of the extreme gradient boost lettuce yield model, respectively, and compare them with the first evaluation index value of the actual yield value of the sample. Based on the first evaluation index value, determine that the prediction model corresponding to the first growth index subset is the random forest lettuce yield model. The second predicted value of the random forest lettuce yield model, the second predicted value of the support vector machine regression lettuce yield model, and the second predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the second evaluation index value of the actual yield value of the sample. Based on the second evaluation index value, it is determined that the prediction model corresponding to the second growth index subset is the support vector machine regression lettuce yield model. The third predicted value of the random forest lettuce yield model, the third predicted value of the support vector machine regression lettuce yield model, and the third predicted value of the extreme gradient boost lettuce yield model are calculated respectively, and compared with the third evaluation index value of the actual yield value of the sample. Based on the third evaluation index value, it is determined that the prediction model corresponding to the subset of the third growth index is the extreme gradient boost lettuce yield model.

5. The method according to claim 4, characterized in that, The method further includes: Input the following parameters into the function: the lower limit temperature Tb for plant development, the upper limit temperature Tm for plant development, the lower limit temperature Tob for optimal growth, and the upper limit temperature Tou for optimal growth: ; The relative thermal effect ERTE(T) at the current average temperature T is obtained. The daily radiative heat product is obtained based on the relative thermal effect ERTE(T) of the current average temperature T and the average daily photosynthetically active radiation during the current growth period. The effective accumulated temperature for each day is obtained based on the current average temperature T, the lower limit temperature for development, and the preset number of growth days.

6. A machine learning-based device for predicting the yield of hydroponically grown lettuce, characterized in that, The device includes: The first processing module is used to determine the number of hydroponic lettuce plants in each growth stage and the set of growth indicators of the plants in each growth stage in the hydroponic lettuce in the greenhouse to be predicted, based on the lettuce images in the current growth stage of the hydroponic lettuce in the greenhouse to be predicted. The second processing module is used to obtain the environmental heat index value of the greenhouse to be predicted during the current growth period. The environmental heat index value includes the daily photosynthetically active radiation product, the daily effective accumulated temperature, and the daily radiative heat product. The third processing module is used to input the growth index set of plants at each growth stage, the number of plants at each growth stage, and the environmental heat index value in the greenhouse to be predicted into the corresponding lettuce yield model for each growth stage to obtain the predicted lettuce yield value for each growth stage. The lettuce yield model includes a random forest lettuce yield model, a support vector machine regression lettuce yield model, and an extreme gradient boost lettuce yield model. The fourth processing module is used to obtain the final predicted yield of hydroponic lettuce in the greenhouse to be predicted based on the predicted yield values ​​of lettuce at all growth stages. The growth stages of hydroponically grown lettuce include germination, seedling, and growth stages; The third processing module is specifically used to input the first growth index subset, the number of plants in the germination stage, and the environmental heat index value in the current growth stage into the random forest lettuce yield model when the hydroponic lettuce is in the germination stage, to obtain the first yield prediction result. The first growth index subset is extracted from the growth index set of hydroponic lettuce in the greenhouse to be predicted. When the hydroponic lettuce is in the seedling stage, the second growth index subset, the number of plants in the seedling stage, and the environmental heat index value in the current growth stage are input into the support vector machine regression lettuce yield model to obtain the second yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted. When the hydroponic lettuce is in the growth stage, the third growth index subset and the environmental heat index value in the current growth stage are input into the extreme gradient to improve lettuce yield model to obtain the third yield prediction result. The second growth index subset is extracted from the growth index set of the hydroponic lettuce in the greenhouse to be predicted. The random forest lettuce yield model is trained based on a first sample growth index subset and a first sample predicted value; the support vector machine regression lettuce yield model is trained based on a second sample growth index subset and a second sample predicted value; and the extreme gradient boost lettuce yield model is trained based on a third sample growth index subset and a third sample predicted value.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the machine learning-based method for predicting the yield of hydroponic lettuce in facilities as described in any one of claims 1-5.

8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the machine learning-based method for predicting the yield of hydroponic lettuce in facilities as described in any one of claims 1-5.