A method for intelligent recognition of drinking water state of cattle based on rumen temperature characteristics and XGBoost

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By implanting sensors in the rumen of cattle to collect temperature data, constructing features, and using the XGBoost algorithm for modeling, the problem of insufficient accuracy of existing cattle drinking behavior recognition methods in complex environments is solved, achieving efficient and low-cost drinking recognition, which is suitable for various breeding scenarios.

CN121744073BActive Publication Date: 2026-06-26YUNNAN ZHENTU INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: YUNNAN ZHENTU INFORMATION TECHNOLOGY CO LTD
Filing Date: 2026-03-02
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing methods for recognizing cattle drinking behavior suffer from simplistic logic and insufficient accuracy in complex environments, and their equipment deployment and maintenance costs are high, making them difficult to adapt to changing farming scenarios.

Method used

A smart identification method for cattle drinking status based on rumen temperature characteristics and XGBoost was adopted. Temperature data was collected by implanting a capsule sensor in the rumen of cattle, drinking-related features were constructed, and the XGBoost binary classification algorithm was used for modeling to achieve accurate identification of drinking events.

Benefits of technology

It improves the accuracy and adaptability of drinking water identification, reduces equipment dependence and operation and maintenance costs, is suitable for various breeding scenarios such as pen-raising and grazing, and has environmental robustness and computing efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121744073B_ABST

Patent Text Reader

Abstract

The present application relates to a kind of intelligent identification method of cattle drinking water state based on rumen temperature characteristics and XGBoost, belongs to the technical field of animal behavior recognition.The method collects temperature data by implanting rumen capsule type sensor, constructs including temperature first-order difference, three-point rolling average temperature and other characteristic variables, realizes drinking water state accurate identification by combining XGBoost algorithm and random search hyperparameter optimization.The present application overcomes the limitations of strong scene dependence, complex system, low recognition accuracy and other limitations of prior art, has the advantages of stable data acquisition, flexible deployment, high computing efficiency, strong environmental robustness, etc.;It can effectively adapt to a variety of scenarios such as captive, grazing, etc., provides efficient technical path for intelligent pasture animal behavior monitoring and health early warning.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of animal behavior recognition methods, and in particular relates to a method for intelligent recognition of cattle drinking status based on rumen temperature characteristics and XGBoost. Background Technology

[0002] In existing technologies, there are two main methods for identifying cattle drinking behavior. The first is based on external monitoring devices. This involves deploying sensing units around the drinking point to observe and record the interaction between the cattle and the water source, or changes in water parameters, thereby recording the amount and frequency of drinking. For example, patent publication number CN211631322U, "A Cattle Drinking Behavior Measurement Device for Scientific Research," proposes a method that integrates a weighing mechanism and a liquid level sensor in the water tank, combined with a control system, to detect the weight and level of the water in real time. The second method is based on changes in the cattle's physiological state. This involves using sensors implanted in the body or worn on the skin to continuously monitor changes in the cattle's core physiological parameters during drinking, thereby identifying and recording drinking events. For example, patent application publication number CN117337786A, "A Method for Analyzing Cattle and Sheep Drinking Behavior Based on Temperature Monitoring," continuously collects body temperature signals and determines drinking behavior based on temperature change thresholds.

[0003] While existing technologies can largely identify and monitor cattle drinking behavior, they still face varying degrees of technical limitations in practical application and promotion. Methods based on external monitoring devices generally rely on environmentally deployed sensing units to detect cattle behavior, such as using infrared sensing, weight changes, image recognition, or liquid level changes to determine the occurrence of drinking events. However, these devices must be deployed in specific locations like watering points, exhibiting significant scene dependence and making it difficult to cover complex application environments such as grazing or free-flowing watering systems. Secondly, existing solutions often require multiple sensors to work together, resulting in complex system structures, high deployment and maintenance costs, and difficulties in large-scale deployment. Furthermore, the complex structures, when exposed to high humidity, high corrosiveness, and easily polluted environments in cattle sheds or pastures, are more prone to corrosion, malfunction, or performance degradation, further affecting the system's stability and lifespan. In contrast, another approach, based on changes in cattle physiological parameters to identify drinking behavior, has a relatively simple implementation path and demonstrates feasibility in practical applications. However, simply setting a threshold for temperature change makes the judgment logic rather crude. The recognition results are easily affected by factors such as individual body temperature, ambient temperature, and water source temperature, resulting in low recognition accuracy, easy misjudgment or missed judgment, and difficulty in being stably applied to the ever-changing aquaculture scenarios. Summary of the Invention

[0004] This invention aims to propose a cattle drinking signal recognition algorithm based on feature engineering and machine learning, overcoming the problems of simple recognition logic, insufficient accuracy, and poor adaptability in existing drinking behavior recognition methods. This algorithm uses temperature data collected by sensors in the cattle's stomach as a foundation, constructs drinking-related features, and introduces machine learning methods for modeling, achieving accurate recognition of drinking events. Compared with traditional methods relying on static threshold rules, this invention has stronger adaptability and environmental robustness, effectively handling differences in cattle body temperature, drinking methods, water source temperature, and external air temperature. Furthermore, since this invention relies entirely on in-body sensor signals, it possesses significant advantages such as stable data acquisition, low computational overhead, and flexible deployment, making it widely applicable to various farming scenarios such as penning and grazing, effectively improving the practicality and adaptability of cattle behavior monitoring systems.

[0005] The present invention is implemented using the following technical solution.

[0006] A method for intelligent identification of cattle drinking status based on rumen temperature characteristics and XGBoost, comprising the following steps:

[0007] Step 1) Data acquisition; A capsule sensor is implanted in the rumen of cattle to collect rumen temperature time series data and record the time information of each sampling moment.

[0008] Step 2) Feature construction; Based on the collected temperature and time data, generate key features, including the current rumen temperature, the temperature difference between adjacent time points, the average temperature of the last 3 time points, the hour of the sampling time, whether it is morning, and whether it is noon;

[0009] Step 3) Model training: The feature data and labeled drinking status are combined into a training set, and the XGBoost binary classification algorithm is used to build the model; the model hyperparameters are optimized by random search, and the optimal parameters are selected by five-fold hierarchical cross-validation;

[0010] Step 4) State recognition: For newly collected temperature data, construct feature vectors according to the same rules, input them into the trained model, and calculate the probability of drinking water; if the probability is ≥0.5, it is determined to be a drinking water state, and if the probability is <0.5, it is a non-drinking water state.

[0011] Furthermore, step 2) of the present invention includes assigning a drinking water category label to each record. , where y t = 1 indicates that the sample is in a drinking state at this moment, y t = 0 indicates that the sample was not drinking water at that moment; let the body temperature sequence of a certain cow during the observation period be denoted as . The sampling time of the t-th record is denoted as S. tThe corresponding rumen temperature value is T. t N represents the temperature sampling point of the cow during that time period.

[0012] Furthermore, step 2) of the present invention includes constructing the following feature variables:

[0013] 1) The first-order temperature difference, temperature_diff1, is used to characterize the instantaneous temperature rise and fall between adjacent sampling times, reflecting the intensity and direction of local changes in rumen temperature over a short time scale. The formula is as follows: (1);

[0014] 2) The three-point rolling mean temperature (rolling_mean_3) is used to characterize the average temperature level within the local window formed by the current sampling time and the two sampling times preceding it. Starting from the third sampling point, the formula is defined as follows:

[0015] (2);

[0016] 3) Intra-day hourly index, hour_of_day, starting from sampling time S t Extracting hourly information from the data is used to characterize the time location of the sample within a day, and the formula is defined as follows:

[0017] (3);

[0018] 4) Time period indication: This feature is divided into morning indication (is_morning) and midday indication (is_noon), reflecting the distribution of drinking behavior during typical time periods. The formula is defined as follows:

[0019] (4);

[0020] (5).

[0021] Furthermore, step 3) of the present invention includes step 3.1), which constructs a gradient boosting tree model composed of multiple regression trees, and maps the model output to the drinking status prediction probability through the Sigmoid activation function, thereby realizing cattle drinking identification modeling based on temperature and time features.

[0022] Furthermore, step 3.1) of this invention is performed using the formula:

[0023] (8);

[0024] Where K is the number of regression trees, This is the output of the k-th regression tree for the input feature vector x;

[0025] For any feature sample x, the predicted probability that it is in a drinking state at the corresponding time is defined as:

[0026] (9);

[0027] in, This is the Sigmoid activation function.

[0028] Furthermore, step 3) of the present invention includes step 3.2), which uses a logarithmic loss function to measure the deviation between the model output and the actual drinking water state, constructs an overall optimization objective function, and iteratively minimizes the objective function through a gradient boosting framework to update the model parameters, thereby achieving the solution of the optimal model parameters.

[0029] Furthermore, step 3.2 of this invention involves using a logarithmic loss function as the training objective to measure the deviation between the model output and the actual drinking water state. The specific expression is as follows: (10);

[0030] in, , for the model to the first Predicted probability (positive class probability) of being in a drinking state at each sampling time; For the first The logarithmic loss value for each sample is used to measure the deviation between the predicted probability and the true label; simultaneously, to control the complexity of the regression tree structure and suppress overfitting, a regularization term is introduced for each regression tree, defined as:

[0031] (11);

[0032] in, Indicates the first Tree of return The regularization term is used to penalize the structural complexity of the regression tree in the optimization objective; Let be the number of leaf nodes in the k-th regression tree; Let J be the output weight of the j-th leaf node in the k-th tree; The penalty coefficient for the number of leaf nodes; is the L2 regularization coefficient for the leaf node weights;

[0033] After introducing the empirical loss term and the regularization term, the overall optimization objective function of the model is obtained:

[0034] (12);

[0035] in, The overall optimization objective function for model training is defined; the gradient boosting framework is used to optimize the above objective function. Perform iterative minimization; let the model obtained in the (K-1)th iteration be . Add a new regression tree in the Kth iteration. , and press: (13);

[0036] The parameters are updated, and this process is repeated iteratively until the objective function converges, thus obtaining the optimal model parameters for drinking water identification.

[0037] Furthermore, step 3) of the present invention includes step 3.3), which involves randomly sampling to generate multiple sets of candidate hyperparameter combinations by pre-setting reasonable value ranges for key hyperparameters of the XGBoost model. Each set of combinations is trained by iteratively minimizing the objective function based on the gradient boosting framework. Then, through five-fold hierarchical cross-validation, the average performance of each set is calculated with the recall rate of drinking status as the core evaluation index. The hyperparameter configuration with the highest average recall rate is selected, and finally, the final drinking recognition model is obtained by training all the training data.

[0038] Further, step 3.3) of this invention involves setting the following parameters in XGBoost: the number of trees n_estimators is set to an integer range of [200, 1000); the maximum depth max_depth is set to an integer range of [5, 15); the learning rate learning_rate is set to a continuous range of [0.01, 0.31); the subsample ratio subsample and the feature sampling ratio colsample_bytree are both set to a continuous range of [0.60, 1.00); and the minimum child node weight min_child_ The weights are set to integer values within the range [1, 6), the splitting penalty coefficient gamma is set to a continuous range [0, 5), the regularization coefficients reg_alpha and reg_lambda are set to continuous ranges [0, 1) and [0.5, 2.5) respectively, and the class imbalance weight scale_pos_weight is selected from the discrete set {5, 10, 20, 50}. Within this range, several sets of candidate hyperparameter combinations are generated through random sampling. For each set of candidate hyperparameters, the aforementioned training process based on the gradient boosting framework is repeated, that is, the objective function is trained under this set of hyperparameter configurations. Iterative minimization yields a set of model parameters. Subsequently, five-fold stratified cross-validation is used to evaluate the model's recognition performance. This involves stratifying the samples based on the drinking water label, dividing the training data into five mutually exclusive subsets, selecting one subset as the validation set each time, and using the remaining four subsets as the training set, repeating this process five times. During the validation phase, the recall rate of the drinking water category is used as the primary evaluation metric. Let TP be the number of samples that were actually drinking water and were correctly identified as such by the model, and FN be the number of samples that were actually drinking water but were incorrectly identified as not drinking water by the model. The formula is as follows:

[0039] (14);

[0040] For each candidate hyperparameter combination, calculate its average recall value in five-fold hierarchical cross-validation, and select the combination with the highest average recall value as the final hyperparameter configuration; under this configuration, train the model using all training data to obtain the final drinking water recognition model:

[0041] (15).

[0042] in, This is the original output function of the final drinking water recognition model obtained after training (i.e., the model output before Sigmoid mapping). This indicates the result obtained after training. Regression trees for input feature vectors The output value.

[0043] Furthermore, step 4) of the present invention involves obtaining the final drinking water recognition model after completing model training and hyperparameter optimization. At this point, the model has the ability to recognize the drinking status of cattle; for the newly collected body temperature data, feature construction and preprocessing are performed in the same manner as in the training phase to obtain the feature vector at the current moment. ;Will Substituting the values into the trained model, we can calculate the predicted probability of being in a drinking state at that moment:

[0044] (16);

[0045] in, Use the Sigmoid activation function;

[0046] After obtaining the predicted probability of being in a drinking state at that moment, a predicted label is generated according to the following rules. :

[0047] (17);

[0048] when When that time comes, the system will determine that the moment is a drinking water state;

[0049] when When that time occurs, the system will determine that the time is not a drinking water state.

[0050] The core technical highlights of this invention are: 1) Construction based on temperature and time features.

[0051] This invention focuses on the patterns of body temperature changes during bovine drinking behavior. Based on raw rumen temperature data, it systematically constructs several representative characteristic variables, including first-order temperature difference and three-point moving average, to reflect instantaneous abrupt changes and local fluctuations in temperature signals during drinking. Building upon this, and considering the repetitive and concentrated characteristics of drinking behavior over time, it innovatively introduces hourly indexes and typical time period indicator variables, incorporating the time factor into the modeling system. This significantly improves the model's ability to identify and accurately judge drinking events.

[0052] 2) Supervised modeling framework based on gradient boosting tree.

[0053] This invention employs XGBoost as the classification model, using constructed temperature and time features as input to train a gradient boosting framework composed of multiple regression trees. Simultaneously, by introducing model complexity penalties and regularization mechanisms during the modeling process, the invention significantly suppresses the risk of overfitting. Furthermore, by combining hyperparameter random search optimization and five-fold cross-validation, it improves the overall performance and generalization of drinking water event recognition while ensuring model structural stability. Ultimately, this invention forms a structurally complete and robust supervised modeling and drinking water recognition process.

[0054] 3) Multidimensional behavioral analysis integrated architecture for rumen capsules.

[0055] This invention, while designing the algorithm, fully considers the application environment and data link of the rumen capsule sensing terminal, integrating the drinking behavior recognition method into the existing rumen capsule data acquisition and transmission system. This allows functional modules such as drinking recognition, posture recognition, and abnormal gastric movement recognition to operate collaboratively on the same terminal. By using the rumen capsule as a unified data entry point, an in vivo multidimensional analysis architecture capable of simultaneously supporting the mining of multiple behavioral features and state recognition is constructed, providing a unified data foundation and computing platform for subsequent health assessments and livestock management optimization based on in vivo signals.

[0056] This invention achieves automatic identification of cattle's drinking behavior by constructing features and using machine learning modeling based on time-series temperature data collected from rumen capsule sensors in cattle, yielding the following beneficial effects:

[0057] 1) Stable and reliable recognition accuracy: Compared with existing drinking water determination methods based on fixed temperature thresholds, the present invention achieves higher accuracy and stability in drinking water recognition under complex environments and different individual cattle conditions. In actual tests, the recall performance of the algorithm of the present invention in recognizing cattle drinking behavior is quite ideal, and it maintains a relatively consistent level across datasets from different regions, ages, and seasons, indicating that the present invention has good adaptability and recognition stability under various farming conditions.

[0058] 2) Low environmental dependence and adaptable to various drinking scenarios: The data relied upon by this invention all come from sensors inside the cattle, so there is no need to deploy additional sensing hardware near the water source. It is almost unaffected by light, dust, and changes in the structure of drinking facilities, and has higher environmental robustness and scenario versatility.

[0059] 3) Simplified deployment and significantly reduced hardware and maintenance costs: Compared with solutions that require additional external monitoring devices, this invention can complete drinking water identification by relying only on the existing rumen capsule and field gateway, without the need for additional external hardware devices such as cameras and pressure sensors. This not only reduces the overall hardware and installation investment, but also reduces the pressure of later inspection and maintenance, which is conducive to ensuring the long-term stable operation of the system.

[0060] 4) High computational efficiency and low response latency: The present invention balances recognition performance and implementation complexity in algorithm design, and adopts a lightweight classification model, which can maintain a low processing latency in window recognition calculation, and can meet the needs of near real-time recognition of cattle drinking behavior in actual production environment.

[0061] 5) Facilitates iterative optimization and long-term application and maintenance: This invention establishes a clearly structured modeling process for cattle drinking behavior, encompassing data preprocessing, feature construction, and model training, demonstrating excellent iterative optimization and functional expansion capabilities. As historical data accumulates, the system can periodically update feature combinations and model parameters based on new data without altering the physical hardware or data acquisition methods, improving the accuracy and applicability of drinking behavior identification. When changes occur in farming practices, environmental conditions, or management needs, new features can be integrated or the model structure replaced within the existing framework, ensuring the algorithm's continuous evolution and maintainability.

[0062] 6) Intelligent Early Warning: During continuous operation, this invention dynamically tracks and models the daily drinking frequency, intervals between adjacent drinking sessions, and typical drinking times of cattle based on identified drinking event sequences, forming an individualized baseline for drinking behavior. When an individual fails to drink for a certain period of time, its drinking frequency decreases significantly compared to historical levels, or its drinking time distribution becomes abnormally concentrated, the system can trigger an early warning mechanism, marking suspected abnormal individuals and pushing the information to managers to assist in conducting health checks and adjusting feeding management, thereby enabling early detection and intervention of potential risks.

[0063] Through the aforementioned performance advantages, this invention achieves stable identification of cattle drinking behavior without relying on external monitoring devices, providing a more efficient and scalable technical approach for animal behavior monitoring, health early warning, and management decision-making in smart ranches.

[0064] The present invention will be further explained below with reference to the accompanying drawings and specific embodiments. Attached Figure Description

[0065] Figure 1 This is a flowchart of the method steps of the present invention. Detailed Implementation

[0066] The following embodiments are only a part of the technical solutions of the present invention and are not intended to limit all the technical solutions of the present invention. The embodiments of the present invention are provided to further explain and illustrate the details of the technical solutions of the present invention.

[0067] See Figure 1 This is a flowchart of the method steps of the present invention.

[0068] A method for intelligent identification of cattle drinking status based on rumen temperature characteristics and XGBoost includes the following steps:

[0069] Step 1: Collect temperature data inside the rumen by implanting a capsule sensor in the rumen of cattle.

[0070] Step 2: Based on the temperature time series obtained in Step 1, assign a drinking water category label to each record. , where y t = 1 indicates that the sample is in a drinking state at this moment, y t = 0 indicates that the sample was not drinking water at that moment. Let the internal temperature sequence of a certain cow during the observation period be denoted as . The sampling time of the t-th record is denoted as S. t The corresponding rumen temperature value is T. t N represents the number of temperature sampling points for that cow during that time period. Based on this, the following feature variables are constructed:

[0071] 1) The first-order temperature difference (temperature_diff1) is used to characterize the instantaneous temperature rise and fall between adjacent sampling times, reflecting the intensity and direction of local changes in rumen temperature over a short time scale. The formula is defined as follows:

[0072] (1);

[0073] 2) The three-point rolling mean temperature (rolling_mean_3) is used to characterize the average temperature level within the local window formed by the current sampling time and the two sampling times preceding it. Starting from the third sampling point, the formula is defined as follows:

[0074] (2);

[0075] 3) Intra-day hourly index (hour_of_day), starting from the sampling time S t Extracting hourly information from the data is used to characterize the time location of the sample within a day, and the formula is defined as follows:

[0076] (3);

[0077] 4) Time period indication: This feature is divided into morning indication (is_morning) and noon indication (is_noon), reflecting the distribution of drinking behavior during typical time periods (such as morning and noon). The definition formula is as follows:

[0078] (4);

[0079] (5).

[0080] Step 3: Based on the feature construction in Step 2, the feature vector of the t-th record is denoted as:

[0081] (6);

[0082] Input the corresponding labels into the training dataset:

[0083] (7);

[0084] Based on this training dataset, an XGBoost (Extreme Gradient Boosting) binary classification model is used to construct a drinking water identifier to learn drinking water identification patterns based on temperature and time features. The key sub-steps include:

[0085] Step 3.1: Construct a gradient boosting tree model composed of multiple regression trees to model drinking water identification based on temperature and time features. The overall structure is defined as follows:

[0086] (8);

[0087] Where K is the number of regression trees, This is the output of the k-th regression tree for the input feature vector x.

[0088] For any feature sample x, the predicted probability that it is in a drinking state at the corresponding time is defined as:

[0089] (9);

[0090] in, This is the Sigmoid activation function.

[0091] Step 3.2: Use the logarithmic loss function as the training objective to measure the deviation between the model output and the actual drinking water state. The specific expression is as follows:

[0092] (10);

[0093] in, , for the model to the first Predicted probability (positive class probability) of being in a drinking state at each sampling time; For the first The logarithmic loss value for each sample is used to measure the deviation between the predicted probability and the true label. Meanwhile, to control the complexity of the regression tree structure and suppress overfitting, a regularization term is introduced for each regression tree, defined as:

[0094] (11);

[0095] in, Indicates the first Tree of return The regularization term is used to penalize the structural complexity of the regression tree in the optimization objective; Let be the number of leaf nodes in the k-th regression tree; Let J be the output weight of the j-th leaf node in the k-th tree; The penalty coefficient for the number of leaf nodes; is the L2 regularization coefficient for the leaf node weights.

[0096] After introducing the empirical loss term and the regularization term, the overall optimization objective function of the model is obtained:

[0097] (12);

[0098] in, The overall optimization objective function for model training is defined. This objective function is then optimized using a gradient boosting framework. Perform iterative minimization. Let the model obtained in the (K-1)th iteration be . Add a new regression tree in the Kth iteration. , and press: (13);

[0099] The parameters are updated, and this process is repeated iteratively until the objective function converges, thus obtaining the optimal model parameters for drinking water identification.

[0100] Step 3.3: To further improve the drinking water recognition performance, this embodiment performs a random search to optimize the key hyperparameters of XGBoost. Specifically, this involves optimizing the number of trees (n_estimators). The parameters are set to take values within the integer range [200, 1000), the maximum depth (max_depth) is set within the integer range [5, 15), the learning rate (learning_rate) is set within the continuous range [0.01, 0.31), the subsample ratio (subsample) and the feature sampling ratio (colsample_bytree) are both set within the continuous range [0.60, 1.00), the minimum child weight (min_child_weight) is set within the integer range [1, 6), the split penalty coefficient (gamma) is set within the continuous range [0, 5), the regularization coefficients (reg_alpha) and (reg_lambda) are set within the continuous ranges [0, 1) and [0.5, 2.5) respectively, and the class imbalance weight (scale_pos_weight) is selected from the discrete set {5, 10, 20, 50}. Within this range, several sets of candidate hyperparameter combinations are generated through random sampling. For each set of candidate hyperparameters, repeat the aforementioned gradient boosting framework-based training process, that is, train the objective function under that set of hyperparameters. The model parameters are obtained by iterative minimization. Then, the recognition performance of the model is evaluated by five-fold stratified cross-validation. That is, the samples are stratified according to the drinking water label, and the training data is divided into five mutually exclusive subsets. In each iteration, one fold is selected as the validation set and the remaining four folds are selected as the training set. This process is repeated five times to complete the training and validation.

[0101] During the validation phase, the recall rate of the drinking water category (true label 1) was used as the primary evaluation metric. Let TP be the number of samples that were actually drinking water and were identified as such by the model, and FN be the number of samples that were actually drinking water but were identified as not drinking water by the model. The specific expressions are as follows:

[0102] (14);

[0103] The average recall value of each candidate hyperparameter combination in five-fold hierarchical cross-validation is calculated, and the combination with the highest average recall value is selected as the final hyperparameter configuration. The model is then trained using all training data under this configuration to obtain the final drinking water recognition model.

[0104] (15).

[0105] in, This is the original output function of the final drinking water recognition model obtained after training (i.e., the model output before Sigmoid mapping). This indicates the result obtained after training. Regression trees for input feature vectors The output value.

[0106] Step 4: After completing model training and hyperparameter optimization, the final drinking water recognition model is obtained. At this point, the model already possesses the ability to recognize the drinking status of cattle. For the newly collected body temperature data, feature construction and preprocessing are performed in the same manner as during the training phase to obtain the feature vector for the current moment. .Will Substituting the values into the trained model, we can calculate the predicted probability of being in a drinking state at that moment:

[0107] (16);

[0108] in, This is the Sigmoid activation function.

[0109] After obtaining the predicted probability of being in a drinking state at that moment, a predicted label is generated according to the following rules. :

[0110] (17);

[0111] when When that time comes, the system will determine that the moment is a drinking water state;

[0112] when When that time occurs, the system will determine that the time is not a drinking water state.

[0113] Application Example 1

[0114] The algorithm described in this invention, after training and iterative optimization using a total of 545,952 sample data points, ultimately obtained a drinking water recognition model with optimal performance. To further verify the feasibility and applicability of this algorithm in real-world scenarios, the following example uses real body temperature observation data collected from a cow between 14:56 and 14:59 on June 28, 2025. This data is input into the algorithm model, and the feature construction, drinking probability calculation, and final classification results are tracked step by step to verify the recognition effect of the algorithm in a real-world data environment. The body temperature observation data of the cow collected at the above four consecutive sampling times are as follows:

[0115] ;

[0116] ;

[0117] ;

[0118] ;

[0119] Because the temperature rolling average feature in this invention uses a sliding window of length 3, and the first-order difference feature depends on the body temperature value at the previous moment, in the demonstration of this embodiment, only the complete feature construction and drinking probability calculation process are given for the 3rd and 4th records. The 1st and 2nd records are only used as historical inputs for the rolling window, and no corresponding rolling features or discrimination results are output. It should be noted that in actual application scenarios, after the monitoring system has completed startup and entered a stable operation phase, the system has accumulated enough historical body temperature observation data when making discrimination at any moment. Based on this, dynamic features such as the first-order difference and three-point rolling average can be constructed, and the drinking probability and corresponding classification results can be output. This embodiment only selects a few moments to demonstrate the calculation process and does not affect the continuous recognition capability of the algorithm in actual deployment.

[0120] Based on the above sampling, and following the feature construction method given in step 2, the feature values are calculated step by step as follows:

[0121] 1) First-order temperature difference (temperature_diff1):

[0122] (1);

[0123] (2);

[0124] 2) Three-point rolling mean temperature (rolling_mean_3):

[0125] (3);

[0126] (4);

[0127] 3) Hourly Index (hour_of_day): Based on the sampling time Extracting the hour information reveals that the hour index for each of the four sampling times in this embodiment is:

[0128] (5);

[0129] 4) Time Period Indication: Based on the definition of time period indication characteristics, the time period indication for the four sampling times in this embodiment is as follows:

[0130] (6);

[0131] (7);

[0132] Based on the feature concatenation order given in step 3, the feature vectors for records 3 and 4 are denoted as follows:

[0133] (8);

[0134] (9);

[0135] After completing the above feature construction, the feature vectors corresponding to the sampling times of the 3rd and 4th times will be... , Substituting the previously trained drinking water status probability prediction model, the calculation process of its prediction probability is as follows:

[0136] (10);

[0137] Given that the XGBoost model is composed of multiple decision trees stacked together, and the parameters such as the internal node splitting conditions and leaf node weights are quite large, it is not convenient to list them all. This article only shows the final prediction result as an example, and its prediction probability is as follows:

[0138] (11);

[0139] (12);

[0140] After obtaining the predicted probability of being in a drinking state at that moment, a predicted label is generated according to the following rules. :

[0141] (13);

[0142] Therefore,

[0143] The predicted label for the third sample is This refers to the state of drinking water;

[0144] The predicted label for sample number 4 is: This refers to a state where the water is not being consumed.

[0145] The above descriptions are merely some specific embodiments of the present invention. Commonly known details or common knowledge in the solutions are not described in detail here (including but not limited to abbreviations, acronyms, units commonly used in the art, experimental methods, parameter conditions, etc.). It should be noted that the above embodiments do not limit the present invention in any way. For those skilled in the art, any technical solutions obtained by equivalent substitution or equivalent transformation fall within the protection scope of the present invention. The scope of protection claimed in this application should be determined by the content of its claims, and the specific embodiments described in the specification can be used to interpret the content of the claims.

Claims

1. A method for intelligent identification of cattle drinking status based on rumen temperature characteristics and XGBoost, characterized in that, The method includes the following steps: Step 1) Data acquisition; A capsule sensor is implanted in the rumen of cattle to collect rumen temperature time series data and record the time information of each sampling moment. Step 2) Feature construction; Based on the collected temperature and time data, generate key features, including the current rumen temperature, the temperature difference between adjacent time points, the average temperature of the last 3 time points, the hour of the sampling time, whether it is morning, and whether it is noon; Step 2) includes assigning a drinking water category label to each record. , where y t = 1 indicates that the sample is in a drinking state at this moment, y t = 0 indicates that the sample was not drinking water at that moment; let the body temperature sequence of a certain cow during the observation period be denoted as . The sampling time of the t-th record is denoted as S. t The corresponding rumen temperature value is T. t N represents the temperature sampling points of the cow during that time period; Construct the following feature variables: a) The first-order temperature difference, temperature_diff1, is used to characterize the instantaneous temperature rise and fall between adjacent sampling times, reflecting the intensity and direction of local changes in rumen temperature over a short time scale. The formula is as follows: (1); b) The three-point rolling mean temperature (rolling_mean_3) is used to characterize the average temperature level within the local window formed by the current sampling time and the two sampling times preceding it. Starting from the third sampling point, the formula is defined as follows: （2）； c) Intra-day hourly index hour_of_day, from sampling time S t Extracting hourly information from the data is used to characterize the time location of the sample within a day, and the formula is defined as follows: （3）； d) Time period indication: This feature is divided into morning indication (is_morning) and midday indication (is_noon), reflecting the distribution of drinking behavior during typical time periods. The formula is defined as follows: （4）；（5）； Step 3) Model training: The feature data and labeled drinking status are combined into a training set, and the XGBoost binary classification algorithm is used to build the model; the model hyperparameters are optimized by random search, and the optimal parameters are selected by five-fold hierarchical cross-validation. Step 4) State recognition; For newly collected temperature data, construct feature vectors according to the same rules, input them into the trained model, and calculate the probability of drinking water. If the probability is ≥0.5, the state is determined to be drinking water; if the probability is <0.5, the state is determined to be not drinking water.

2. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 1, characterized in that, Step 3) includes step 3.1), which constructs a gradient boosting tree model composed of multiple regression trees, and maps the model output to the drinking status prediction probability through the Sigmoid activation function, thereby realizing the modeling of cattle drinking identification based on temperature and time features.

3. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 2, characterized in that, Step 3.1) involves using the following formula: （8）； Where K is the number of regression trees, This is the output of the k-th regression tree for the input feature vector x; For any feature vector x, the predicted probability of it being in a drinking state at the corresponding time is defined as: （9）； in, This is the Sigmoid activation function.

4. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 1, characterized in that, Step 3) includes step 3.2), which uses the logarithmic loss function to measure the deviation between the model output and the actual drinking water state, constructs an overall optimization objective function, and uses the gradient boosting framework to iteratively minimize the objective function to update the model parameters, thereby achieving the solution of the optimal model parameters.

5. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 4, characterized in that, Step 3.2) involves using a logarithmic loss function as the training objective to measure the deviation between the model output and the actual drinking water state. The specific expression is as follows: (10); in, , for the model to the first The predicted probability of being in a drinking state at each sampling time; For the first The logarithmic loss value for each sample is used to measure the deviation between the predicted probability and the true label; simultaneously, to control the complexity of the regression tree structure and suppress overfitting, a regularization term is introduced for each regression tree, defined as: （11）； in, Indicates the first Tree of return The regularization term is used to penalize the structural complexity of the regression tree in the optimization objective; Let be the number of leaf nodes in the k-th regression tree; Let J be the output weight of the j-th leaf node in the k-th tree; The penalty coefficient for the number of leaf nodes; is the L2 regularization coefficient for the leaf node weights; After introducing the empirical loss term and the regularization term, the overall optimization objective function of the model is obtained: （12）； in, The overall optimization objective function for model training is defined; the gradient boosting framework is used to optimize the above objective function. Perform iterative minimization; let the model obtained in the (K-1)th iteration be . Add a new regression tree in the Kth iteration. , and press: （13）； The parameters are updated, and this process is repeated iteratively until the objective function converges, thus obtaining the optimal model parameters for drinking water identification.

6. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 1, characterized in that, Step 3) includes step 3.3), which involves randomly sampling and generating multiple sets of candidate hyperparameter combinations by pre-setting the value range of key hyperparameters of the XGBoost model. Each set of combinations is trained by iteratively minimizing the objective function based on the gradient boosting framework. Then, through five-fold hierarchical cross-validation, the average performance of each set is calculated with the recall rate of drinking status as the core evaluation index. The hyperparameter configuration with the highest average recall rate is selected, and finally, the final drinking recognition model is obtained by training all the training data.

7. The intelligent identification method for bovine drinking status based on rumen temperature characteristics and XGBoost according to claim 6, characterized in that, Step 3.3) involves setting the following parameters in XGBoost: the number of trees n_estimators is set to an integer range of [200, 1000); the maximum depth max_depth is set to an integer range of [5, 15); the learning rate learning_rate is set to a continuous range of [0.01, 0.31); both the subsample ratio and the feature sampling ratio colsample_bytree are set to a continuous range of [0.60, 1.00); and the minimum child weight min_child_weight is set to an integer range of [1, 6). The splitting penalty coefficient gamma is set to a value within the continuous interval [0, 5), and the regularization coefficients reg_alpha and reg_lambda are set to values within the continuous intervals [0, 1) and [0.5, 2.5), respectively. The class imbalance weight scale_pos_weight is selected from the discrete set {5, 10, 20, 50}, and several sets of candidate hyperparameter combinations are generated by random sampling within this range. For each set of candidate hyperparameters, the aforementioned training process based on the gradient boosting framework is repeated, that is, the objective function is trained under this set of hyperparameter configurations. Iterative minimization yields a set of model parameters. Subsequently, five-fold stratified cross-validation is used to evaluate the model's recognition performance. This involves stratifying the samples based on the drinking water label, dividing the training data into five mutually exclusive subsets, selecting one subset as the validation set each time, and using the remaining four subsets as the training set, repeating this process five times. During the validation phase, the recall rate of the drinking water category is used as the primary evaluation metric. Let TP be the number of samples that were actually drinking water and were correctly identified as such by the model, and FN be the number of samples that were actually drinking water but were incorrectly identified as not drinking water by the model. The formula is as follows: （14）； For each candidate hyperparameter combination, calculate its average recall value in five-fold hierarchical cross-validation, and select the combination with the highest average recall value as the final hyperparameter configuration; under this configuration, train the model using all training data to obtain the final drinking water recognition model: （15）； in, This is the original output function of the final drinking water recognition model obtained after training, i.e., the model output before Sigmoid mapping; This indicates the result obtained after training. Regression trees for input feature vectors The output value.

8. The intelligent identification method for cattle drinking status based on rumen temperature characteristics and XGBoost according to claim 1, characterized in that, Step 4) involves obtaining the final drinking water recognition model after completing model training and hyperparameter optimization. At this point, the model has the ability to recognize the drinking status of cattle; for the newly collected body temperature data, feature construction and preprocessing are performed in the same manner as in the training phase to obtain the feature vector at the current moment. ;Will Substituting the values into the trained model, we can calculate the predicted probability of being in a drinking state at that moment: （16）； in, Use the Sigmoid activation function; After obtaining the predicted probability of being in a drinking state at that moment, a predicted label is generated according to the following rules. : （17）； when When that time comes, the system will determine that the moment is a drinking water state; when When that time occurs, the system will determine that the time is not a drinking water state.