Prediction model of milk yield of dairy cows based on dry matter intake and effective nutrient composition

By constructing a dairy cow milk yield prediction model based on dry matter intake and nutrient composition, and utilizing the XGBoost algorithm and feature engineering, the nonlinear relationship problem in dairy cow milk yield prediction was solved, achieving high-precision prediction and accurate decision support for nutrient supply, thereby improving the efficiency and economic benefits of farm management.

CN122245628APending Publication Date: 2026-06-19NORTHWEST A & F UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTHWEST A & F UNIV
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the existing technology, dairy cow milk production prediction models fail to effectively capture the complex nonlinear relationship between dry matter intake and nutrient parameters, and lack the integration of prediction models with the estimation of nutrient requirements and parameter optimization decisions in actual production, resulting in insufficient prediction accuracy and stability.

Method used

A prediction model for dairy cow milk production based on dry matter intake and effective nutrient components was constructed. Machine learning algorithms such as XGBoost were used, combined with feature engineering and data preprocessing, to build a high-precision prediction model. Nutrient requirements were estimated by controlling variables to provide precise feeding decision support.

🎯Benefits of technology

It achieves high-precision prediction of milk yield in dairy cows, with a model fit R2 of 0.856, improving prediction accuracy. It is also integrated into the farm management system through web services to provide real-time, quantitative nutritional supply suggestions, supporting precision feeding and production optimization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245628A_ABST
    Figure CN122245628A_ABST
Patent Text Reader

Abstract

This invention relates to the field of smart animal husbandry technology, specifically to a dairy cow milk yield prediction model based on dry matter intake and effective nutrient composition. It includes using dairy cow dry matter intake as the core input feature, combined with body weight, lactation days, parity, rumination time, and daily intake of major nutrients, and integrates an advanced random forest machine learning algorithm to construct a solution that integrates accurate prediction, intelligent analysis, and visual decision support. This invention overcomes the shortcomings of traditional linear models in terms of accuracy, accurately quantifying the complex nonlinear relationships between multiple factors and milk yield, achieving high-precision prediction of milk yield. This invention combines prediction and decision-making, providing a complete technical solution for precision dairy cow farming, from data collection and intelligent prediction to nutritional optimization, which is of great significance for improving farm production efficiency and promoting the digital transformation of animal husbandry.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of smart animal husbandry technology, and more specifically, to a model for predicting milk yield in dairy cows based on dry matter intake and effective nutrient components. Background Technology

[0002] Dry matter intake is a core factor determining the nutritional intake and production performance of dairy cows. Accurate prediction of milk yield is crucial for optimizing feed formulation, achieving precision feeding, and improving farm economic efficiency. Traditional methods for predicting milk yield often rely on linear regression models in standards such as the NRC (2001) or the experience and judgment of farmers. These methods have significant drawbacks: linear models struggle to capture the complex nonlinear relationships between dry matter intake, nutritional parameters, and milk yield; experience-based judgments are highly subjective, lack quantitative evidence, and suffer from low accuracy and poor stability.

[0003] With the development of machine learning technology, it has shown great advantages in handling complex nonlinear problems. However, existing technologies that directly apply machine learning to predict dairy cow milk yield still have shortcomings: First, the models often do not fully consider the key role of dry matter intake as the core independent variable and its interaction with other nutritional factors; second, the selection of model input features is not optimized enough, resulting in limited prediction accuracy and generalization ability; and third, there is a lack of a complete technical solution that effectively combines the prediction model with the estimation of nutritional requirements and parameter optimization decisions in actual production.

[0004] Therefore, developing a method that uses dry matter intake as a key input, employs advanced machine learning algorithms, and can accurately predict milk yield, further guiding the optimization of nutritional parameters, has become an urgent need in the field of precision dairy farming. Some studies have attempted to introduce machine learning algorithms into production performance prediction, but existing solutions still have many shortcomings when applied to milk yield prediction.

[0005] Prior art related to this invention: A method and system for predicting dry matter intake in dairy cows, patent number: CN202410860401.8.

[0006] The technical solution of the prior art is as follows: This invention discloses a method for predicting dry matter intake (DMI) in dairy cows. It collects parameters such as lactation days, milk yield, fat-corrected milk yield, rumination time, and neutral detergent fiber content in the diet of dairy cows, combines at least two parameters, obtains multiple models through linear fitting, and selects the final dry matter intake prediction model based on the best fit and the minimum standard deviation.

[0007] The shortcomings of the existing technology are as follows: First, its core algorithm relies on linear fitting, which is insufficient in capturing the complex nonlinear relationship that commonly exists between dry matter intake (DMI), nutritional parameters, and milk production, resulting in a ceiling on prediction accuracy. Second, the goal of this solution is to predict dry matter intake (DMI), not directly predict milk production. DMI is an intermediate variable, and ultimately, to serve production decisions (such as adjusting the formula to increase milk production), it is still necessary to establish the relationship between DMI and milk production. This patent does not solve this core problem. Third, its feature engineering is relatively simple and does not fully consider using DMI and the daily intake of various nutrients as key features, resulting in insufficient information dimensions in the model input.

[0008] The prior art related to this invention is: a method and apparatus for predicting the optimal branched-chain amino acid ratio in the diet of lactating sows, patent number: CN202510941414.2.

[0009] The technical solution of prior art 2: This invention relates to the field of information technology and discloses a method for predicting the proportion of branched-chain amino acids in sow diets. It obtains effective feature vectors by acquiring production data and formulation data, performing data processing and correlation calculations, and then uses these feature vectors to train and validate a preset model. Finally, the best prediction model is selected, and the predicted value of amino acid content is output.

[0010] The shortcomings of prior art two: Although this technical solution applies a machine learning model, its technical field and application goals are fundamentally different from those of this invention, focusing on the amino acid nutrition of sows. This patent does not address the specific problem of predicting dairy cow milk yield, nor does it recognize the core role of dry matter intake in predicting lactation performance in ruminants. Its technical solution cannot directly transfer or solve the specific challenges encountered in predicting dairy cow milk yield, such as how to integrate multi-source heterogeneous data such as DMI, nutrient intake, and dairy cow physiological status (days of lactation, parity).

[0011] The third prior art related to this invention is: an assessment model and method for the appropriate standard ileal digestible lysine requirement of lactating sows under low protein diet levels, patent number: CN202411112024.6.

[0012] The technical solution of prior art 3: This invention discloses a model and method for assessing the lysine requirement of lactating sows, employing a piecewise linear model and a piecewise quadratic model. This model determines the appropriate lysine level by finding the inflection point (R) of the sensitive effect index.

[0013] The drawbacks of existing technology three are: This technology employs a traditional, parametric statistical model (broken line model), which heavily relies on prior assumptions about the model's form, lacks flexibility, and struggles to adapt to complex data distributions. This method is particularly challenging when dealing with multi-factor, high-dimensional, and highly interactive dairy cow lactation systems. Furthermore, this model is also designed for sows, and its structure and parameters are unsuitable for predicting dairy cow milk yield.

[0014] In summary, the existing models (linear or piecewise linear models) are too simple and cannot meet the needs of high-precision prediction of milk yield. Currently, there is a lack of a complete technical solution specifically for dairy cows that uses dry matter intake as the key input and utilizes advanced machine learning algorithms to predict milk yield with high precision. A dairy cow milk yield prediction model based on dry matter intake and effective nutrient components is needed to solve these problems. Summary of the Invention

[0015] The purpose of this invention is to provide a model for predicting dairy cow milk yield based on dry matter intake and effective nutrient components, and to solve the following core problems.

[0016] 1. This method addresses the problem that traditional linear models and empirical methods cannot accurately depict the complex nonlinear relationship between dry matter intake, nutritional parameters, and milk yield in dairy cows, thereby enabling high-precision prediction of milk yield. 2. A method is provided to construct a machine learning model specifically for milk yield prediction by using dry matter intake as the core input feature and effectively integrating information on the physiological state and nutrient intake of dairy cows. 3. To address the disconnect between prediction models and actual production, a method is provided to estimate the nutritional requirements needed to achieve the target milk production rate based on the prediction model, thus providing direct data support and decision-making basis for precision feeding; 4. Improve the practicality and convenience of prediction methods, enabling the model to be deployed as a software system to serve the daily management of the ranch.

[0017] To achieve the above objectives, the present invention provides the following technical solution: a dairy cow milk yield prediction model based on dry matter intake and effective nutrient components, comprising the following steps: S1. Data Preparation and Preprocessing: Historical production data of Holstein dairy cows were collected, including at least the following: age in months, weight, parity, days of lactation, body condition score, dry matter intake, milk yield, 4% milk fat corrected milk yield, milk fat, lactose, milk protein ratio and yield indicators, as well as the intake of crude protein, crude fat, starch and neutral detergent fiber in the diet as characteristic variables, and milk yield as the target variable; the collected data were cleaned, outlier handled and standardized preprocessed. S2. Feature Engineering and Dataset Construction: Integrate data, based on animal nutrition knowledge, calculate and introduce the daily intake of nutrients as key features, and construct a dataset for model training and validation. S3. Predictive Model Training: Six algorithms, including linear regression, support vector regression, random forest regression, gradient boosting regression, extreme gradient boosting algorithm, and K-nearest neighbor algorithm, are used to train the model on the dataset constructed in step S2. The hyperparameters of the model are optimized through grid search and cross-validation techniques. Based on machine learning, a trained milk yield prediction model is obtained. Preliminary descriptive statistical analysis is performed on the data to obtain the total number of samples, mean, standard deviation, minimum, lower quartile, median, upper quartile, and maximum value of all independent variables. Combining animal nutrition knowledge and the 3σ principle, the data are analyzed to determine whether they are within the normal range. If they are not, they are identified as outliers and outlier data are removed. The algorithm requires that the target variable data conforms to a normal distribution to improve the model's prediction accuracy. Therefore, the normality of the dairy cow DMI data is verified by describing the distribution of the target variable and determining whether it conforms to a normal distribution.

[0018] S4. Predicting milk yield: Input the dry matter intake, body weight, number of days of lactation, parity, rumination time, crude protein intake, starch intake, neutral detergent fiber intake, and crude fat intake of the cow to be predicted into the prediction model trained in step S3, and output the predicted milk yield of the cow. S5. Nutritional Requirements Estimation and Parameter Optimization: Based on the trained prediction model, the optimal range of dry matter intake and the intake of each nutrient is determined when the target milk production is achieved through the controlled variable method or parameter scanning method, providing a basis for diet formulation optimization.

[0019] As a preferred technical solution of the present invention, the dry matter intake in step S1 is indirectly obtained through a prediction model based on the gradient boosting algorithm. The input features of the DMI prediction model include the cow's body weight, lactation days, milk yield, milk protein percentage, milk fat percentage, lactose percentage, and the content of crude protein, starch, and crude fat in the diet.

[0020] As a preferred technical solution of the present invention, the key feature introduced in step S2 also includes energy and protein balance related indicators calculated based on daily intake, including but not limited to net energy intake for lactation and metabolic protein intake.

[0021] As a preferred embodiment of the present invention, in step S3, the milk production data follows a normal distribution based on the distribution of the histogram and the KS test p-value being greater than 0.05.

[0022] As a preferred technical solution of the present invention, in step S4, the parameters of the input prediction model must meet the preset validity verification rules, including numerical range verification and logical association verification. For inputs that fail the verification, the model will return an error message instead of a prediction result.

[0023] As a preferred technical solution of the present invention, the method for determining the nutritional requirement in S5 is as follows: fixing other characteristic variables except for the target nutrient intake as average values, dynamically adjusting the target nutrient intake, using a prediction model to calculate the amount of lactation under different intakes, and then determining the range of the nutrient requirement when the target amount of lactation is reached.

[0024] As a preferred embodiment of the present invention, the method further includes deploying the trained prediction model on a web server, providing a graphical user interface, and allowing users to input parameters through the interface and obtain prediction results and optimization suggestions in real time.

[0025] Compared with the prior art, the present invention has the following beneficial effects: (1) This invention is a dairy cow milk yield prediction model based on dry matter intake and effective nutrient components. This invention has high prediction accuracy and is the first to systematically apply the XGBoost algorithm to milk yield prediction with dry matter intake as the core, making full use of its ability to handle nonlinear relationships and feature interactions. The model fit is R 2 The model achieved a score of 0.856, significantly outperforming traditional linear models and the NRC standard model. Its feature design is scientific: by introducing the "daily intake" of nutrients as a key feature through feature engineering, it more directly reflects the causal relationship between nutrient supply and milk production performance, thereby improving the interpretability and predictive accuracy of the model.

[0026] (2) This invention is a dairy cow milk production prediction model based on dry matter intake and effective nutrient components. The invention has a high degree of technical integration: it not only provides a prediction method, but also forms a complete technical closed loop from data preparation, model training, accurate prediction to nutrient requirement estimation and parameter optimization suggestions, realizing the leap from "prediction" to "decision". The input parameters involved in the method (such as body weight, lactation days, parity, and nutrient intake) are easy to obtain in modern ranches. The model can be easily integrated into the ranch management system through Web services, which is convenient for promotion and application in actual production, and provides real-time and quantitative data support for precision feeding. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0028] Figure 1 This is a schematic diagram of the overall process of a dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to an embodiment of the present invention. Figure 2 This is a flowchart of the dry matter intake prediction model construction according to an embodiment of the present invention. Figure 3 This is a flowchart of the milk yield prediction model construction based on the dry matter intake and effective nutrient components prediction model of dairy cows according to an embodiment of the present invention. Figure 4 This is a line graph showing the effect of a single nutrient on milk yield in a dairy cow milk yield prediction model based on dry matter intake and effective nutrients according to an embodiment of the present invention. Detailed Implementation

[0029] The invention will now be further described with reference to the accompanying drawings and specific embodiments: Example 1 refer to Figures 1-4 Example 1 illustrates the construction and validation of the prediction model. Data collection: 11,562 valid individual records of dairy cows were collected from large-scale farms. Each record included: body weight, number of days of lactation, parity, rumination time, milk yield, and individual dry matter intake calculated by the DMI prediction model. The daily intake of crude protein, starch, neutral detergent fiber, and crude fat was then calculated in combination with the dietary nutrient composition.

[0030] Data preprocessing: Clean the data, remove obvious outliers (e.g., based on the 3σ principle or business knowledge), and fill missing values ​​with the mean. Standardize continuous numerical features.

[0031] Model Training and Optimization: The preprocessed data was randomly divided into training and test sets in a 7:3 ratio. Python's XGBRegressor was used for model training. A grid search with 5-fold cross-validation was performed using GridSearchCV to determine the optimal hyperparameter combination: n_estimators=100, max_depth=6, learning_rate=0.1, subsample=0.5. The final model was trained on the training set using this parameter combination.

[0032] Model Evaluation: The model performance is evaluated on the test set. The goodness-of-fit R of the model obtained in this embodiment is... 2The model achieved a mean absolute error of 0.856, a root mean square error of 2.672 kg, and a root mean square error of 3.492 kg, indicating extremely high prediction accuracy. A scatter plot comparing predicted and actual values ​​is shown below. Figure 3 The data points are closely distributed near the diagonal.

[0033] Feature importance analysis: Feature importance analysis is performed on the trained model, and the results are as follows: Figure 2 As shown, lactation days, rumination time, body weight, neutral detergent fiber intake, and crude protein intake were identified as the top five most important characteristics for predicting milk yield. This aligns with the basic principles of animal nutrition and validates the rationality of the model's logic.

[0034] Example 2 refer to Figure 2 Example 2 further illustrates Example 1, including the determination and application of nutritional requirements. Nutritional requirements are estimated based on the high-precision prediction model trained in Example 1.

[0035] Taking a target daily milk production of 50kg as an example: Other non-nutritional variables (days of lactation, body weight, rumination time, parity) are set as the average of the dataset.

[0036] Adjust the intake of each nutrient (crude protein, starch, neutral detergent fiber, crude fat) one by one and observe the changes in milk production predicted by the model.

[0037] By analyzing the curves of milk production changes with the intake of various nutrients, the critical values ​​or optimal ranges of the intake of each nutrient are determined when the milk production reaches or exceeds 50 kg.

[0038] The results showed that in order to achieve a milk yield of 50 kg / day, the intake of crude protein needs to reach about 5.8 kg / day, starch needs to reach about 8.2 kg / day, neutral detergent fiber needs to reach about 7.5 kg / day, and crude fat needs to reach about 0.9 kg / day.

[0039] This result can provide direct and quantitative scientific basis for ranches to formulate daily rations and precisely regulate the nutritional supply of dairy cows.

[0040] Example 3 This implementation further illustrates Example 1, including system integration and deployment. The trained XGBoost model is serialized and saved using the joblib library. A backend API service is built based on the Python Flask framework to receive prediction parameters from the frontend. The frontend uses HTML, JavaScript, and the Bootstrap framework to construct a responsive user interface. Users can input various parameters of the cows via forms or sliders. The frontend calls the backend prediction API using AJAX technology and displays the returned predicted milk yield and parameter optimization suggestions generated based on built-in rules on the interface in real time. This system enables online, real-time, and accurate prediction of milk yield and supports feeding decisions.

[0041] In the description of this invention, it should be noted that the terms "top," "bottom," "one side," "the other side," "front," "rear," "middle part," "inner," "top," and "bottom," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are used only for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. The terms "first," "second," and "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, unless otherwise explicitly specified and limited, the terms "installed," "connected," and "linked" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal communication of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0042] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A model for predicting milk yield of a dairy cow based on dry matter intake and effective nutrient composition, characterized in that, Includes the following steps: S1. Data Preparation and Preprocessing: Historical production data of Holstein dairy cows were collected, including at least the following: age in months, weight, parity, days of lactation, body condition score, dry matter intake, milk yield, 4% milk fat corrected milk yield, milk fat, lactose, milk protein ratio and yield indicators, as well as the intake of crude protein, crude fat, starch and neutral detergent fiber in the diet as characteristic variables, and milk yield as the target variable; the collected data were cleaned, outlier handled and standardized preprocessed. S2. Feature Engineering and Dataset Construction: Integrate data, based on animal nutrition knowledge, calculate and introduce the daily intake of nutrients as key features, and construct a dataset for model training and validation. S3. Predictive Model Training: Six algorithms, including linear regression, support vector regression, random forest regression, gradient boosting regression, extreme gradient boosting algorithm, and K-nearest neighbor algorithm, are used to train the model on the dataset constructed in step S2. The hyperparameters of the model are optimized through grid search and cross-validation techniques. Based on machine learning, a trained milk yield prediction model is obtained. Preliminary descriptive statistical analysis is performed on the data to obtain the total number of samples, mean, standard deviation, minimum, lower quartile, median, upper quartile, and maximum value of all independent variables. Combining animal nutrition knowledge and the 3σ principle, the data are analyzed to determine whether they are within the normal range. If they are not, they are judged as outliers and outlier data are filtered out. The algorithm requires that the target variable data conforms to a normal distribution to improve the model's prediction accuracy. Therefore, the normality of the dairy cow DMI data is verified by describing the distribution of the target variable and determining whether it conforms to a normal distribution. S4. Predicting milk yield: Input the dry matter intake, body weight, number of days of lactation, parity, rumination time, crude protein intake, starch intake, neutral detergent fiber intake, and crude fat intake of the cow to be predicted into the prediction model trained in step S3, and output the predicted milk yield of the cow. S5. Nutritional Requirements Estimation and Parameter Optimization: Based on the trained prediction model, the optimal range of dry matter intake and the intake of each nutrient is determined when the target milk production is achieved through the controlled variable method or parameter scanning method, providing a basis for diet formulation optimization.

2. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 1, characterized in that, In step S1, the dry matter intake is indirectly obtained through a prediction model based on the gradient boosting algorithm. The input features of this DMI prediction model include the cow's body weight, lactation days, milk yield, milk protein percentage, milk fat percentage, lactose percentage, and the content of crude protein, starch, and crude fat in the diet.

3. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 2, characterized in that, In step S2, the key features introduced also include energy and protein balance related indicators calculated based on daily intake, including but not limited to net energy intake for lactation and metabolic protein intake.

4. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 1, characterized in that, In step S3, the milk production data follows a normal distribution based on the histogram distribution and the KS test p-value being greater than 0.

05.

5. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 1, characterized in that, In step S4, the parameters input to the prediction model must meet the preset validity verification rules, including numerical range verification and logical association verification. For inputs that fail the verification, the model will return an error message instead of a prediction result.

6. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 1, characterized in that, The method for determining nutritional requirements in S5 is as follows: fix other characteristic variables except for the target nutrient intake as average values, dynamically adjust the target nutrient intake, use a prediction model to calculate the amount of lactation under different intake levels, and then determine the range of nutrient requirements when the target amount of lactation is reached.

7. The dairy cow milk yield prediction model based on dry matter intake and effective nutrient components according to claim 1, characterized in that, The method also includes deploying the trained prediction model on a web server, providing a graphical user interface, through which users can input parameters and obtain prediction results and optimization suggestions in real time.

Citation Information

Patent Citations

  • Dairy cow dry matter feed intake prediction method and system

    CN118864136A

  • Evaluation model and method for proper standard ileum digestible lysine demand amount of lactating sows under low-protein daily ration level

    CN119601087A

  • Prediction method and device for optimal branched chain amino acid proportion of daily ration of lactating sow

    CN120510926A