Unsupervised live pig health anomaly real-time detection method and system based on monitoring video
By employing an unsupervised method for detecting abnormal pig health, this paper utilizes ModNet and multi-frame optical flow estimation algorithms to extract pig movement features and constructs a model for detecting abnormal pig health. This method addresses the issues of data requirements and environmental adaptability in pig health detection, achieving high-accuracy detection and early warning in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CHINA AGRICULTURAL UNIVERSITY
- Filing Date
- 2024-03-26
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies require a large amount of tagged data for detecting abnormal health in pigs, and the complex monitoring environment of farms makes it impossible to accurately extract pig characteristics, reducing detection accuracy. Furthermore, existing methods lack generalization ability across different pig houses.
An unsupervised real-time detection method for abnormal pig health is adopted. The ModNet neural network model is used for pig instance segmentation. Combined with a motion information reconstruction module and a health anomaly detection module, the motion features of pigs are extracted through a multi-frame optical flow estimation algorithm and a memory network to construct a pig health anomaly detection model, automatically set health warning thresholds and perform real-time detection.
It improves the accuracy of detecting abnormal health conditions in pigs, can accurately extract the movement characteristics of pigs in complex and ever-changing monitoring environments, reduces errors, achieves sensitive and accurate early warning of health abnormalities, and adapts to environmental changes in different pig houses.
Smart Images

Figure CN118196709B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, specifically to a method and system for real-time detection of unsupervised health abnormalities in pigs based on surveillance video. Background Technology
[0002] Abnormal behavior detection plays a crucial role in swine health management. By identifying pig behaviors and collecting various behavioral data, further research can be conducted on the correlation between this data and the pigs' physiological state to assess their health status. This allows for the timely detection of pigs exhibiting abnormal behavior and enabling them to undergo physical examinations, thereby reducing losses for pig farms and preventing sick pigs from entering the market. Currently, existing methods for detecting abnormal swine health patterns mainly fall into two categories: manual detection and intelligent detection methods. Traditional manual detection methods primarily involve periodic manual inspections of pigs. However, manual detection is susceptible to subjective bias and cannot achieve real-time accuracy. Intelligent monitoring methods encompass two approaches. One involves equipping pigs with sensors that record positional parameters such as acceleration and movement trajectories. Machine learning methods are then used to process these parameters to determine whether the behavior constitutes an abnormal health condition. However, this is an invasive method, and battery replacement is a challenging issue, making efficient anomaly detection difficult. The other approach utilizes non-contact sensors such as cameras, combined with computer vision technology, to detect abnormal behavior in captured video footage.
[0003] Many scholars have used the latter approach in their research. Xue Yueju et al. used an improved Faster R-CNN algorithm to identify feeding and mating behaviors in pig farm videos and analyzed the relationship between behavioral outcomes and pig health. Li Dan et al. improved Mask R-CNN and applied it to experiments on pig mounting and other behaviors. Chen et al. extracted pig acceleration features by analyzing the displacement changes of pig targets between adjacent keyframe images and used hierarchical clustering to classify the degree of pig aggression. However, these are all supervised learning methods for specific abnormal behaviors. Each abnormal behavior requires a separate algorithm for detection, which limits the applicability and generalization ability of supervised learning methods in large-scale farming scenarios. Furthermore, obtaining a large number of labeled abnormal samples is a difficult and expensive task. Wutke et al. used unsupervised neural networks to detect the activity behavior of pig herds and classified the activity scores of pig herds based on thresholds to achieve behavioral monitoring and analysis of sick pigs. However, this method has low real-time performance, and the selection of thresholds may be affected by many factors. In addition, this method ignores the importance of motion features in abnormal pig health patterns. Furthermore, methods based on surveillance videos are susceptible to interference from the background of pigsties, which vary from pigsty to pigsty, including layout, floor color, and equipment placement. This variability increases the difficulty of model generalization, potentially leading to performance degradation or misjudgments when applied to different pigsties. Summary of the Invention
[0004] To address the aforementioned shortcomings in existing technologies, this invention provides a method and system for real-time detection of unsupervised pig health abnormalities based on surveillance video. This solves the problem that when detecting pig behavior, a large amount of labeled data is required, and the complex monitoring environment of the farm makes it difficult to accurately extract pig features. Furthermore, the method does not focus on more obvious pig movement features, thus reducing detection accuracy.
[0005] To achieve the above-mentioned objectives, the technical solution adopted by this invention is as follows:
[0006] A method for real-time detection of unsupervised health abnormalities in pigs based on surveillance video is provided, which includes the following steps:
[0007] S1. Acquire and store real-time video data from the pigsty;
[0008] S2. Construct a pig health anomaly detection model; the pig health anomaly detection model includes a pig instance segmentation module, a motion information reconstruction module, and a pig health anomaly detection module;
[0009] S3. Input the pigsty video data at the current moment into the pig instance segmentation module to obtain video frame data containing only pig instances at the current moment; wherein, the video frame data includes frame data to be predicted and real frame data;
[0010] S4. Input the frame data to be predicted at the current moment into the motion information reconstruction module to obtain the reconstructed pig motion information data at the current moment;
[0011] S5. Input the current pigsty video data, the current reconstructed pig movement information data, and the current real frame data into the pig health abnormality detection module to complete the real-time detection of pig health abnormalities.
[0012] Furthermore, the pig instance segmentation module adopts the ModNet neural network model; the motion information reconstruction module includes a motion optical flow generation submodule, a motion information reconstruction encoder, a motion information memory submodule, and a motion information reconstruction decoder connected in series; the motion information reconstruction encoder includes three stacked convolutional blocks; the motion information memory submodule adopts a memory network model; the motion information reconstruction decoder includes three stacked deconvolutional blocks; both the deconvolutional blocks and the convolutional blocks include convolutional layers, batch normalization layers, and ReLU activation layers connected in series.
[0013] The swine health anomaly detection module includes a future frame prediction submodule, a dynamic health warning threshold selection submodule, a health score generation submodule, and a swine health anomaly warning submodule. The future frame prediction submodule includes an encoding layer, a feature fusion layer, and a decoder connected in series. The encoding layer includes a parallel prior encoder and a posterior encoder. Both the prior encoder and the posterior encoder include a convolutional layer, a batch normalization layer, and a ReLU activation layer connected in series.
[0014] Furthermore, the pigsty video data in step S1 includes six frames of video data, that is, the pigsty video data at each moment includes six frames of video data, and is processed as a unit.
[0015] The specific process of step S3 is as follows:
[0016] The current pigsty video data is input into the pig instance segmentation module, and the ModNet neural network model is used to separate the foreground and background of the video to obtain the video frame data I containing only pig instances at the current moment. t-5:t ; to transfer video frame data I t-5:t The first five frames of data are used as the frame data to be predicted, I. t-5:t-1 , to transfer video frame data I t-5:t The sixth frame data in the data is used as the real frame data I. t Where t is the current time.
[0017] Furthermore, step S4 further includes:
[0018] S4-1. Input the current frame data to be predicted into the motion optical flow generation submodule and process it using a multi-frame optical flow estimation algorithm to obtain the current pig movement information y. t-5:t-1 ;
[0019] S4-2, Transfer the current pig movement information y t-5:t-1 The input is fed into the motion information reconstruction encoder for encoding to obtain the current moment's pig motion feature vector;
[0020] S4-3. Input the current pig movement feature vector into the memory network model to obtain the optimized pig movement feature vector at the current time.
[0021] S4-4. Input the optimized pig movement feature vector at the current moment into the movement information reconstruction decoder for decoding to obtain the reconstructed pig movement information data at the current moment.
[0022] Furthermore, the formula for the memory network model in step S4-3 is:
[0023]
[0024]
[0025]
[0026] in, Let Z be the optimized feature vector of pig movement at the current time step, ||Z|| be the magnitude of the feature vector of pig movement at the current time step, W be the addressing matrix, and M be a four-dimensional real matrix. i Let m be the i-th data element of the addressing matrix W. i , Let |m| be the i-th row vector of a four-dimensional real matrix and its transpose, respectively. i || represents the magnitude of the vector in the i-th row of the four-dimensional real matrix, N is the total number of rows in the four-dimensional real matrix, and m is the length of the vector in the i-th row. n Let be the nth row vector of a four-dimensional real matrix, d() be the cosine similarity function, exp() be the exponential function with the natural constant e as the base, and ∑() be the summation function.
[0027] Furthermore, step S5 further includes:
[0028] S5-1. Input the reconstructed pig movement information data at the current moment into the prior encoder to obtain the prior feature vector at the current moment;
[0029] S5-2. Input the current pigsty video data and the current reconstructed pig movement information data into the posterior encoder to obtain the current hybrid feature vector;
[0030] S5-3. Input the prior feature vector and the mixed feature vector at the current time into the feature fusion layer for fusion to obtain the prediction frame feature map at the current time.
[0031] S5-4. Input the feature map of the predicted frame at the current time into the decoder and process it by upsampling layer by layer to obtain the predicted frame data at the current time.
[0032] S5-5. Input the predicted frame data and the actual frame data at the current time into the health score generation submodule, and apply the formula:
[0033]
[0034] Get the frame-level pig health score at the current moment. in, For the current time frame prediction data, For the predicted frame data of the pig pixel with spatial index (i,j), I t For the current real frame data, I i,j For the real frame data of the pig pixel with spatial index (i,j), log 10 (·) is a logarithmic function with base 10. To predict the maximum pixel value in the frame data, m and n' are the total number of spatial indices, and ∑(·) is the summation function;
[0035] S5-6. Input the frame-level pig health score of the previous moment into the dynamic health warning threshold selection submodule to judge health anomalies, and adjust the warning threshold according to the health anomaly judgment result to obtain the dynamic health warning threshold at the current moment.
[0036] S5-7. Input the current frame-level pig health score and the current dynamic health warning threshold into the pig health abnormality warning submodule for judgment; if the current frame-level pig health score is less than the current dynamic health warning threshold, a warning message is obtained and sent to the management personnel; otherwise, monitoring is performed.
[0037] Furthermore, the parameters of the future frame prediction submodule are adjusted using a total loss function consisting of an appearance loss function and an action loss function; wherein, the appearance loss function includes minimizing the KL divergence loss function and the gradient loss function;
[0038] The formula for the appearance loss function is:
[0039]
[0040]
[0041]
[0042]
[0043] in, for, and These are the loss functions that minimize the KL divergence, gradient, and action, respectively, λ. cave , λ gl and λ ofl Here, I represents the weighting parameter, KL represents the divergence term, and I represents the weighting parameter. t-5:t-1 For the frame data to be predicted, I t I t-1 These are the current real frame data and the frame data to be predicted from the previous time step, respectively. For the current time frame prediction data, The frames to be predicted at the current time are I t-5:t-1 Real frame data I t The corresponding reconstructed pig movement information data, The frame data to be predicted at the current time is I t-5:t-1 and its corresponding reconstructed pig movement information data The posterior distribution, The real frame data I at the current moment t The corresponding prior distribution of the reconstructed hog movement information data, ||·||1 and ||·||2 are norm 1 and norm 2, respectively, and |·| is the absolute value. I represents the predicted frame data with spatial indices (i,j) and (i-1,j) for the pig pixels, respectively. i,j I i-1,j These are the real frame data for the pig pixels with spatial indices of (i,j) and (i-1,j), respectively. These are the predicted frame data for the pig pixels with spatial indices (i,j) and (i,j-1), respectively. i,j I i,j-1 These are the real frame data of the pig pixels with spatial indices (i,j) and (i,j-1), respectively, and f(·) is the optical flow function in FlowNet.
[0044] Furthermore, steps S5-6 further include:
[0045] S5-6-1. Set the frame-level pig health score of the previous moment as the initial value of the dynamic health early warning threshold.
[0046] S5-6-2. Determine whether the frame-level pig health score at the current moment is less than the initial value of the dynamic health warning threshold; if so, use the initial value of the dynamic health warning threshold as the dynamic health warning threshold in the current state; otherwise, proceed to step S5-6-3.
[0047] S5-6-3, According to the formula:
[0048]
[0049]
[0050]
[0051] The initial value of the dynamic health warning threshold is adjusted to obtain the dynamic health warning threshold DS under the current state. t ;in, σ t Here, represents the mean and variance of the peak signal-to-noise ratio within the window, T is the sliding window length, k is the sequence number of the frame data to be predicted in the current state, and t-1 is the (t-1)th frame of the video frame data. The k-th frame data I of the frame data to be predicted in the current state k and the predicted frame data of frame t The peak signal-to-noise ratio between them.
[0052] An unsupervised real-time detection system for abnormal pig health based on surveillance video is provided, comprising a video data acquisition module, a pig instance segmentation module, a motion information reconstruction module, and a pig health abnormality detection module; wherein:
[0053] The video data acquisition module is used to monitor the pigsty in real time through cameras, acquire video data of the pigsty, and transmit it to the video recorder for local storage using a switch;
[0054] The pig instance segmentation module is used to segment the pigsty video data in the current state to obtain video frame data containing only pig instances;
[0055] The motion information reconstruction module is used to reconstruct video frame data containing only live pig instances to obtain reconstructed live pig motion information data.
[0056] The swine health anomaly detection module is used to detect reconstructed swine movement information data, video frame data containing only swine instances, and swine house video data, obtain early warning information, and notify management personnel.
[0057] The beneficial effects of this invention are as follows:
[0058] 1. This method constructs a model for detecting abnormal health in pigs and uses models and algorithms such as ModNet neural network model, memory network and multi-frame optical flow estimation algorithm to detect healthy pig samples. It accurately extracts the movement and appearance features of pigs to obtain health scores, reduces model errors, and further improves the accuracy of model detection. It also makes up for the lack of abnormal health samples in pigs, and sends early warning information to farmers in a timely manner to assist in coordination and treatment. It can be applied to pig houses in complex and ever-changing monitoring environments.
[0059] 2. This method employs a pig instance segmentation module, which can accurately extract pig objects from video frames and separate them from the background to obtain video frames containing only pig instances. This solves the problem of performance degradation or misjudgment caused by the complex and ever-changing monitoring environment of farms when the model is applied to different pig houses.
[0060] 3. This method constructs a model that learns the movement features of healthy and abnormal behavior patterns through a movement information reconstruction module. By utilizing reconstruction errors, the model focuses more on the detection of movement features, making the early warning of abnormal pig health more sensitive and accurate. It also uses a memory network to memorize and reconstruct movement information under healthy pig behavior patterns, and can reduce the possibility of incorrectly reconstructing the movement features of abnormal pigs as those of healthy pigs, further improving the accuracy of detecting abnormal pig behavior.
[0061] 4. This method uses the pig health anomaly detection module and the reconstructed pig movement features from the motion information reconstruction module as guidance to accurately distinguish between healthy and abnormally healthy pigs. By automatically calculating the peak signal-to-noise ratio between the predicted frame and the real frame containing pig instances, the method assesses the degree of difference between the predicted and real pig frames, thereby quantifying the health status and time of the pigs. A data-driven approach is used to automatically set the health warning threshold, and the threshold is fine-tuned through a sliding window to solve the false alarms caused by minor changes in pixel values or features between video frames due to noise in the pigsty.
[0062] 5. The system has a simple structure and detects abnormal pig behaviors through a video data acquisition module, a pig instance segmentation module, a motion information reconstruction module, and a pig health abnormality detection module, which is flexible and adaptable. Attached Figure Description
[0063] Figure 1 This is a flowchart of the method of the present invention;
[0064] Figure 2 This is a flowchart of the model of the present invention;
[0065] Figure 3 Structure and flowchart of the motion information reconstruction module;
[0066] Figure 4 This is a schematic diagram illustrating the structure and addressing process of the motion information memory submodule;
[0067] Figure 5 The structure and flowchart of the swine health abnormality detection module;
[0068] Figure 6 This is a graph showing the results of the swine health anomaly detection module. Detailed Implementation
[0069] The specific embodiments of the present invention are described below to enable those skilled in the art to understand the present invention. However, it should be understood that the present invention is not limited to the scope of the specific embodiments. For those skilled in the art, various changes are obvious as long as they are within the spirit and scope of the present invention as defined and determined by the appended claims. All inventions utilizing the concept of the present invention are protected.
[0070] like Figure 1 and Figure 2 As shown, a method for real-time detection of unsupervised health abnormalities in pigs based on surveillance video includes the following steps:
[0071] S1. Real-time acquisition and storage of pigsty video data; The pigsty video data in step S1 includes six frames of video data, that is, the pigsty video data at each moment includes six frames of video data, and is treated as a processing unit.
[0072] S2. Construct a pig health anomaly detection model; the pig health anomaly detection model includes a pig instance segmentation module, a motion information reconstruction module, and a pig health anomaly detection module;
[0073] Because different pig houses have different background conditions, including layout, floor color, and equipment placement, these differences increase the difficulty of model generalization, and may lead to performance degradation or misclassification when applied to different pig houses. The model should focus on learning the features of the foreground region, accurately classifying the complex and varied morphology of pigs, and identifying and locating each pig in the herd. Therefore, the pig instance segmentation module adopts the ModNet neural network model.
[0074] The motion information reconstruction module comprises a motion optical flow generation submodule, a motion information reconstruction encoder, a motion information memory submodule, and a motion information reconstruction decoder, all connected in series. The motion information reconstruction encoder consists of three stacked convolutional blocks. The motion information memory submodule employs a memory network model. The motion information reconstruction decoder consists of three stacked deconvolutional blocks. Both the deconvolutional and convolutional blocks consist of convolutional layers, batch normalization layers, and ReLU activation layers connected in series. The kernel size of all convolutional layers is fixed at 3*3.
[0075] The swine health anomaly detection module includes a future frame prediction submodule, a dynamic health warning threshold selection submodule, a health score generation submodule, and a swine health anomaly warning submodule. The future frame prediction submodule includes an encoding layer, a feature fusion layer, and a decoder connected in series. The encoding layer includes a parallel prior encoder and a posterior encoder. Both the prior encoder and the posterior encoder include a convolutional layer, a batch normalization layer, and a ReLU activation layer connected in series.
[0076] S3. Input the pigsty video data at the current moment into the pig instance segmentation module to obtain video frame data containing only pig instances at the current moment; wherein, the video frame data includes frame data to be predicted and real frame data;
[0077] The specific process of step S3 is as follows:
[0078] The trained ModNet neural network model is used to perform foreground-background separation on each processing unit to obtain corresponding video frame data I containing only pig instances. t-5:t ; to transfer video frame data I t-5:t The first five frames of data are used as the frame data to be predicted, I. t-5:t-1 The data is then input into the motion information reconstruction module and the future frame prediction submodule for processing; the video frame data I... t-5:t The sixth frame data in the data is used as the real frame data I. t The data is input into the health score generation submodule to generate a health score; where t is a frame.
[0079] S4. Input the current frame data to be predicted into the motion information reconstruction module to obtain the reconstructed pig motion information data for the current moment. Since pig motion features are often easier to detect than appearance features, the motion information reconstruction module extracts and reconstructs the pig motion information. The motion information reconstruction module reconstructs the motion information of pigs under healthy behavioral patterns such as standing, sleeping, and eating, but it will produce a large reconstruction error for the motion information under abnormal health patterns such as convulsions, aggression, and diarrhea. By amplifying the prediction error through reconstruction error, it provides the pig health abnormality detection module with motion information references that clearly distinguish the behavioral differences between healthy and abnormal pigs. During the training phase, only video frames containing only healthy pigs are needed for training, without additional annotation. The motion information memory submodule will only remember the behavioral patterns of healthy pigs. During the testing phase, video frames containing only pig images are input into the motion optical flow generation submodule to output pig motion information. The obtained pig motion information is input into the motion information reconstruction encoder, and the motion information memory module is used to remember the motion information of healthy pigs at different feature levels.
[0080] like Figure 3 As shown, step S4 further includes:
[0081] S4-1. Input the current frame data to be predicted into the motion optical flow generation submodule and process it using a multi-frame optical flow estimation algorithm to obtain the current pig movement information y. t-5:t-1 The data of the frames to be predicted is processed to calculate the displacement and direction of movement of the pig pixels between each frame. For each frame, there is a corresponding optical flow field representing the pixel displacement between that frame and the previous frame. These corresponding optical flow fields constitute the pig motion information y. t-5:t-1 Each piece of information about the movement of a pig is a matrix with a dimension of 3.
[0082] Optical flow is used to describe image motion, capturing motion patterns in an image sequence by inferring the direction and velocity of motion of each pixel. In healthy pig behavior patterns, the optical flow of pigs typically exhibits relatively small amplitude and a consistent direction. This is because when pigs walk or turn normally, their optical flow vectors usually show some consistency in the corresponding motion direction. However, abnormal health behaviors often involve larger optical flow amplitudes and inconsistent motion directions. For example, when pigs convulse or attack, their optical flow vectors may show drastic changes and inconsistent directions. Therefore, the model introduces a multi-frame optical flow estimation algorithm to extract pig motion features. The algorithm takes a series of consecutive image frames as input and outputs the optical flow field corresponding to each frame. By utilizing temporal continuity and spatial consistency of optical flow, the multi-frame optical flow estimation algorithm can jointly estimate optical flow at multiple time steps, thus providing more accurate optical flow results.
[0083] S4-2, Transfer the current pig movement information y t-5:t-1 The input is fed into the motion information reconstruction encoder for encoding, yielding the current moment's pig motion feature vector; Figure 3 In this model, each cube represents the output feature map of the corresponding layer. Downsampling is achieved through convolutions with a stride of 2. The motion information reconstruction encoder is denoted as F. en (·):y i →Z, y i Information on pig movement y t-5:t-1 The i-th data point, Z is the homing feature vector of the pig, F en (·) represents the encoder.
[0084] S4-3. The current pig movement feature vector is input into the memory network model to obtain the optimized pig movement feature vector at the current moment. To optimize the reconstruction of pig movement information and capture the time-series patterns and long-term dependencies of pig movement, a movement information memory module is introduced at the connection between the movement information reconstruction encoder and the movement information reconstruction decoder. Instead of directly passing the pig movement feature vector Z to the movement information decoder, relevant pig behavior pattern entries are retrieved through the memory. The pig movement feature vector Z serves as a query request, used to find the most relevant pig behavior pattern entry in the movement information memory submodule. This is then passed to the movement information reconstruction decoder for reconstruction. The movement information memory submodule, based on the memory network architecture, learns the characteristics and patterns of healthy pig behavior by memorizing past movement information and applies them to the reconstruction process. This allows for more accurate recovery of the movement information of healthy pig behavior, while introducing larger reconstruction errors for behaviors under abnormal health patterns, thereby improving the accuracy of health anomaly detection.
[0085] Through external memory, the motion information memory submodule can acquire the encoded features of healthy pig behavior patterns and perform addressing using an attention mechanism. The addressing process is as follows: Figure 4 As shown, the attention mechanism can handle long-term dependencies between motion information. Pig movement behavior exhibits specific patterns and regularities over a longer period. By using a memory mechanism for addressing, the model can remember past pig movement information and match it with current pig movement information, thereby better capturing the long-term dependencies in pig movement and obtaining the model that best matches the pig movement feature vector Z. To enhance the ability to reconstruct the behavioral patterns of healthy pigs. The storage component is designed with a shape of M=R. N×M×H×C The system is a four-dimensional real matrix, where N represents the memory capacity, each storage entry has dimensions (M, H, C), and R represents a natural number. For the motion information reconstruction module, the dimensions of the storage entries are the same as the dimensions of the pig motion feature vector Z. The query request Z = R is obtained. N ×M×H×C Then, the motion information memory module uses the addressing matrix W=R 1×N Obtain the optimized hog movement feature vector Its dimension is the same as that of the hog movement feature vector Z.
[0086] The formula for the memory network model in step S4-3 is:
[0087]
[0088]
[0089]
[0090] in, Let Z be the optimized feature vector of pig movement at the current time step, ||Z|| be the magnitude of the feature vector of pig movement at the current time step, W be the addressing matrix, and M be a four-dimensional real matrix. i Let m be the i-th data element of the addressing matrix W. i , Let |m| be the i-th row vector of a four-dimensional real matrix and its transpose, respectively. i || represents the magnitude of the vector in the i-th row of the four-dimensional real matrix, N is the total number of rows in the four-dimensional real matrix, and m is the length of the vector in the i-th row. n Let be the nth row vector of a four-dimensional real matrix, d(·) be the cosine similarity function, exp(·) be the exponential function with the natural constant e as the base, and ∑(·) be the summation function.
[0091] Because only healthy pig samples are input into the memory network model for training, the memory only learns the behavioral patterns and movement characteristics of healthy pigs. During the testing phase, it is stipulated that only the most relevant healthy pig movement characteristics can be retrieved from the trained memory, significantly reducing the similarity between input and output data of movement characteristics under abnormal pig health patterns.
[0092] S4-4. Input the optimized pig movement feature vector at the current moment into the movement information reconstruction decoder for decoding to obtain the reconstructed pig movement information data at the current moment. Upsampling is achieved through deconvolution with a stride of 2. The motion information reconstruction decoder is represented as F. de (·): F de (·) represents the decoder. To reconstruct pig movement information data The i-th data.
[0093] S5. Input the current pigsty video data, the current reconstructed pig movement information data, and the current real frame data into the pig health abnormality detection module to complete the real-time detection of pig health abnormalities.
[0094] like Figure 5 and Figure 6 As shown, step S5 further includes:
[0095] S5-1. Input the reconstructed pig movement information data at the current moment into the prior encoder to obtain the prior feature vector at the current moment; wherein, the dimension of the prior feature vector is 3.
[0096] S5-2. Input the current pigsty video data and the current reconstructed pig movement information data into the posterior encoder to obtain the current hybrid feature vector; concatenate and convolve the video frame data containing only pig instances and the reconstructed pig movement information data to obtain the convolved posterior feature vector; randomly sample the posterior feature vector to obtain the hybrid feature vector.
[0097] S5-3. Input the prior feature vector and the mixed feature vector at the current time into the feature fusion layer for fusion to obtain the prediction frame feature map at the current time; wherein, the dimension of the prediction frame feature map is 3;
[0098] S5-4. Input the current time frame feature map into the decoder and restore the current time frame feature map to the size of the original image by upsampling layer by layer to obtain the current time frame data.
[0099] S5-5. Input the predicted frame data and the actual frame data at the current time into the health score generation submodule, and apply the formula:
[0100]
[0101] Get the frame-level pig health score at the current moment. in, For the current time frame prediction data, For the predicted frame data of the pig pixel with spatial index (i,j), I t For the current real frame data, I i,j For the real frame data of the pig pixel with spatial index (i,j), log 10 (·) is a logarithmic function with base 10. To predict the maximum pixel value in the frame data, m and n' are the total number of spatial indices, and ∑(·) is the summation function; the higher the peak signal-to-noise ratio (PSNR) in the health score generation submodule, the smaller the difference between the predicted frame data and the real frame data, that is, the greater the probability that the pigs in the real frame are in a healthy state.
[0102] S5-6. Input the frame-level pig health score of the previous moment into the dynamic health warning threshold selection submodule to judge health anomalies, and adjust the warning threshold according to the health anomaly judgment result to obtain the dynamic health warning threshold at the current moment.
[0103] To definitively determine whether pigs are in an abnormal health state, a health warning threshold needs to be pre-set for the health score. If the health score is lower than the set health warning threshold, the pig is in an abnormal health state. If warning thresholds for different scenarios are pre-set, adjustments can only be made within the preset scenarios, and the impact of noise in the pigsty video cannot be fine-tuned. In this case, to avoid false alarms or missed alarms, a dynamic health warning threshold selection submodule is used. The dynamic health warning threshold selection submodule can automatically set the health warning threshold and adjust it as needed. To eliminate the impact of individual health score calculation errors due to model errors, the dynamic threshold selection module uses a sliding window to calculate the mean and variance of the corresponding health scores of the video sequence within the current window in real time, and finally adjusts the warning threshold through a weighted sum. That is, steps S5-6 further include:
[0104] S5-6-1. Set the frame-level pig health score of the previous moment as the initial value of the dynamic health early warning threshold.
[0105] S5-6-2. Determine whether the frame-level pig health score at the current moment is less than the initial value of the dynamic health warning threshold; if so, use the initial value of the dynamic health warning threshold as the dynamic health warning threshold in the current state; otherwise, proceed to step S5-6-3.
[0106] S5-6-3. Noise in large-scale pig farming monitoring videos can cause slight changes in pixel values or features between video frames. These changes may be related to actual abnormal events or influenced by piggery noise. These changes may cause the health score of healthy pig video frames to exceed the initial value of the dynamic health warning threshold, leading to false alarms. Furthermore, the model will introduce some error. The future frame prediction module uses the previous 5 frames to predict the current frame. As abnormal events continue to occur, the previous 5 input frames may also be abnormal, potentially leading to a better prediction of the t-th frame. This error may cause the health score of abnormal pig video frames to fall below the initial value of the dynamic health warning threshold, resulting in missed alarms. To eliminate the impact of noise in piggery monitoring videos and model errors, the dynamic warning threshold selection submodule uses a sliding window of length T to calculate the PSNR statistics of the current video segment in real time, fine-tuning the initial value of the dynamic health warning threshold according to the formula:
[0107]
[0108]
[0109]
[0110] The initial value of the dynamic health warning threshold is adjusted to obtain the dynamic health warning threshold DS under the current state.t ;in, σ t Here, represents the mean and variance of the peak signal-to-noise ratio within the window, T is the sliding window length, k is the sequence number of the frame data to be predicted in the current state, and t-1 is the (t-1)th frame of the video frame data. The k-th frame data I of the frame data to be predicted in the current state k and the predicted frame data of frame t The peak signal-to-noise ratio between the two values. For each health score generated, the corresponding dynamic health warning threshold needs to be adjusted.
[0111] Specifically, when setting the initial warning threshold at the initial moment, it is necessary to statistically analyze the health scores of all pig training data, calculate the boundary values using a box plot, and select the minimum boundary value as the initial warning threshold, i.e., according to the formula:
[0112] B l =Q1-1.5IQR
[0113] B u =Q3 + 1.5IQR
[0114] Obtain the minimum normal value B among the frame-level pig health scores corresponding to all pig training data. l and maximum normal value B u And the minimum normal value B l The initial value for the dynamic health warning threshold is set; where Q1 is the lower quartile of all frame-level pig health scores, Q3 is the upper quartile of all frame-level pig health scores, and IQR is the quartile range between the lower quartile Q1 and the upper quartile Q3; during the training of the dynamic health warning threshold selection submodule, points between the maximum and minimum normal values are considered normal points, and points outside the maximum and minimum normal values are considered outliers. Therefore, health scores (normal points) containing healthy pig video frames are selected for training.
[0115] S5-7. Input the current frame-level pig health score and the current dynamic health warning threshold into the pig health abnormality warning submodule for judgment; if the current frame-level pig health score is less than the current dynamic health warning threshold, a warning message is obtained and sent to the management personnel; otherwise, monitoring is performed.
[0116] To make the generated predicted frames as similar as possible to the real frames, the future frame prediction module constrains the model from both the appearance and movement of the pigs. Specifically, it uses a total loss function consisting of an appearance loss function and a movement loss function to adjust the parameters of the future frame prediction submodule. The appearance loss function includes minimizing the KL divergence loss function and the gradient loss function.
[0117] Minimizing the KL divergence consists of two parts: the KL divergence term and the prediction loss term. The KL divergence measures the difference between two probability distributions, while the prediction loss measures the difference between the predicted frame and the target frame, i.e., the prediction error of the model when generating future frames. The KL divergence term acts as a regularization during training, forcing the posterior distribution... Prior distribution The goal is to make the predicted pig images as close as possible to the real ones, allowing the future frame prediction module to better capture the latent distribution in the pig video data and use these learned distribution features for future frame prediction. By minimizing the prediction loss, the model aims to make the predicted pig images as similar as possible to the real pig images. Gradient loss plays a role in constraining the pixels of the pig area in the pigsty monitoring video. Since the edges of the pig's body contour in the predicted frame are blurred, minimizing the gradient difference of pig pixels on the x and y axes between the predicted and real frames enhances the clarity of the pig in the predicted frame, making the pig's posture in the predicted frame closer to the characteristics of a real pig. To estimate the movement trend of the pigs and ensure the consistency of the pig's movement in the predicted future frames, the action loss function uses FlowNet to calculate optical flow. Minimizing the optical flow difference ensures the consistency of pig movement in the predicted future frames.
[0118] The formula for the appearance loss function is:
[0119]
[0120]
[0121]
[0122]
[0123] in, for, and These are the loss functions that minimize the KL divergence, gradient, and action, respectively, λ. cave , λ gl and λ ofl Here, I represents the weighting parameter, KL represents the divergence term, and I represents the weighting parameter. t-5:t-1 For the frame data to be predicted, I t I t-1 These are the current real frame data and the frame data to be predicted from the previous time step, respectively. For the current time frame prediction data, The frames to be predicted at the current time are I t-5:t-1 Real frame data I t The corresponding reconstructed pig movement information data, The frame data to be predicted at the current time is I t-5:t-1 and its corresponding reconstructed pig movement information data The posterior distribution, The real frame data I at the current moment t The corresponding prior distribution of the reconstructed hog movement information data, ||·||1 and ||·||2 are norm 1 and norm 2, respectively, and |·| is the absolute value. I represents the predicted frame data with spatial indices (i,j) and (i-1,j) for the pig pixels, respectively. i,j I i-1,j These are the real frame data for the pig pixels with spatial indices of (i,j) and (i-1,j), respectively. These are the predicted frame data for the pig pixels with spatial indices (i,j) and (i,j-1), respectively. i,j I i,j-1 The data are real frame data with spatial indices (i,j) and (i,j-1) for the pig pixels, respectively, and f(·) is the optical flow function in FlowNet. The weights of each loss function can be set according to the dataset of different pig houses in practical applications.
[0124] An unsupervised real-time detection system for abnormal pig health based on surveillance video is provided, comprising a video data acquisition module, a pig instance segmentation module, a motion information reconstruction module, and a pig health abnormality detection module; wherein:
[0125] The video data acquisition module is used to monitor the pigsty in real time via cameras, acquire video data, and transmit it to a video recorder for local storage using a switch. The video data acquisition module includes a monitoring camera, a network transmission submodule, and a data storage submodule. The network transmission submodule includes two switches, a video recorder, and a primary router. One monitoring camera and one switch are located inside the pigsty, while the other switch, video recorder, and primary router are located outside the pigsty. The monitoring camera continuously records the pigs inside the pigsty 24 hours a day. The video data is transmitted to the video recorder for local storage via the switch inside the pigsty, and then uploaded to the server via the switch and primary router for use by the pig instance segmentation module, achieving non-contact pig behavior monitoring and data collection.
[0126] The pig instance segmentation module is used to segment the pigsty video data in the current state to obtain video frame data containing only pig instances;
[0127] The motion information reconstruction module is used to reconstruct video frame data containing only live pig instances to obtain reconstructed live pig motion information data.
[0128] The swine health anomaly detection module is used to detect reconstructed swine movement information data, video frame data containing only swine instances, and swinehouse video data, generate early warning information, and notify management personnel. This early warning information includes, but is not limited to, the specific location of the swinehouse and health warning details.
[0129] In summary, this invention designs a method and system for detecting abnormal pig health based on surveillance video: It treats the pig health detection problem as an unsupervised video anomaly detection problem, requiring only healthy pig samples for training to complete the early warning of abnormal pig health; the pig instance segmentation module accurately extracts pig objects and separates them from the background, solving the problem of performance degradation or misjudgment caused by the complex and ever-changing farm monitoring environment when applied to different pig houses; the motion information reconstruction module, trained only on healthy pig samples, can correctly learn and model the motion features of healthy pig behavior patterns, thus generating a large reconstruction error for motion input from abnormal health behavior patterns, making the method more focused on motion feature detection; the motion information memory module memorizes and reconstructs motion information from healthy pig behavior patterns, reducing the possibility of incorrectly reconstructing the motion features of abnormal healthy pigs as those of healthy pigs, ensuring accurate reconstruction of the motion features of abnormal healthy pigs. The feature generates a large reconstruction error, which improves the accuracy of motion feature reconstruction. The future frame prediction submodule can adaptively learn the distribution of pig motion information and guide the prediction of future frames of pigs. Specifically for pig video data, it uses two encoders and one decoder to obtain pig motion information reference, improving the accuracy of distinguishing between healthy pigs and pigs with abnormal health. The health score generation submodule automatically calculates the peak signal-to-noise ratio between the predicted frame and the real frame containing pig instances, and evaluates the degree of difference between the predicted pig frame and the real pig frame to quantify the health status and time of pigs. The dynamic health warning threshold selection submodule uses a data-driven method to automatically set the health warning threshold and fine-tunes the health warning threshold through a sliding window. It can solve the false alarm caused by small changes in pixel values or features between video frames due to noise in the pig house, thereby improving the accuracy and adaptability of the system in judging abnormal pig health.
Claims
1. A method for real-time detection of unsupervised health abnormalities in pigs based on surveillance video, characterized in that: Includes the following steps: S1. Acquire and store real-time video data from the pigsty; S2. Construct a pig health anomaly detection model; the pig health anomaly detection model includes a pig instance segmentation module, a motion information reconstruction module, and a pig health anomaly detection module; S3. Input the pigsty video data at the current moment into the pig instance segmentation module to obtain video frame data containing only pig instances at the current moment; wherein, the video frame data includes frame data to be predicted and real frame data; S4. Input the frame data to be predicted at the current moment into the motion information reconstruction module to obtain the reconstructed pig motion information data at the current moment; S5. Input the current pigsty video data, the current reconstructed pig movement information data, and the current real frame data into the pig health abnormality detection module to complete the real-time detection of pig health abnormalities. The pig instance segmentation module uses the ModNet neural network model; The motion information reconstruction module includes a motion optical flow generation submodule, a motion information reconstruction encoder, a motion information memory submodule, and a motion information reconstruction decoder connected in series. The motion information reconstruction encoder includes three stacked convolutional blocks. The motion information memory submodule adopts a memory network model. The motion information reconstruction decoder includes three stacked deconvolutional blocks. Both the deconvolutional blocks and the convolutional blocks include convolutional layers, batch normalization layers, and ReLU activation layers connected in series. The pig health anomaly detection module includes a future frame prediction submodule, a dynamic health warning threshold selection submodule, a health score generation submodule, and a pig health anomaly warning submodule. The future frame prediction submodule includes an encoding layer, a feature fusion layer, and a decoder connected in series. The encoding layer includes a prior encoder and a posterior encoder in parallel. Both the prior encoder and the posterior encoder include a convolutional layer, a batch normalization layer, and a ReLU activation layer connected in series. Step S5 further includes: S5-1. Input the reconstructed pig movement information data at the current moment into the prior encoder to obtain the prior feature vector at the current moment; S5-2. Input the current pigsty video data and the current reconstructed pig movement information data into the posterior encoder to obtain the current hybrid feature vector; S5-3. Input the prior feature vector and the mixed feature vector at the current time into the feature fusion layer for fusion to obtain the prediction frame feature map at the current time. S5-4. Input the feature map of the predicted frame at the current time into the decoder and process it by upsampling layer by layer to obtain the predicted frame data at the current time. S5-5. Input the predicted frame data and the actual frame data at the current time into the health score generation submodule, and apply the formula: Get the frame-level pig health score at the current moment. ;in, For the current time frame prediction data, The spatial index of the pig pixel is The predicted frame data, The current frame data. The spatial index of the pig pixel is Real frame data, It is a logarithmic function with base 10. To predict the maximum pixel value in frame data, , This represents the total number of spatial indexes. For summation functions; S5-6. Input the frame-level pig health score of the previous moment into the dynamic health warning threshold selection submodule to judge health anomalies, and adjust the warning threshold according to the health anomaly judgment result to obtain the dynamic health warning threshold at the current moment. S5-7. Input the current frame-level pig health score and the current dynamic health warning threshold into the pig health abnormality warning submodule for judgment; if the current frame-level pig health score is less than the current dynamic health warning threshold, a warning message is obtained and sent to the management personnel; otherwise, monitoring is performed.
2. The method for real-time detection of unsupervised pig health abnormalities based on monitoring video according to claim 1, characterized in that: The pigsty video data in step S1 includes six frames of video data, that is, the pigsty video data at each moment includes six frames of video data, and is processed as a unit. The specific process of step S3 is as follows: The current pigsty video data is input into the pig instance segmentation module, and the ModNet neural network model is used to separate the foreground and background of the video to obtain video frame data containing only pig instances at the current moment. ; to transfer video frame data The first five frames of data are used as the frames to be predicted. , video frame data The sixth frame data is used as the real frame data. ;in, This refers to the current moment.
3. The method for real-time detection of unsupervised pig health abnormalities based on monitoring video according to claim 1, characterized in that: Step S4 further includes: S4-1. Input the current frame data to be predicted into the motion optical flow generation submodule and process it using a multi-frame optical flow estimation algorithm to obtain the pig movement information at the current moment. ; S4-2, Transfer the current pig movement information. The input is fed into the motion information reconstruction encoder for encoding to obtain the current moment's pig motion feature vector; S4-3. Input the current pig movement feature vector into the memory network model to obtain the optimized pig movement feature vector at the current time. S4-4. Input the optimized pig movement feature vector at the current moment into the movement information reconstruction decoder for decoding to obtain the reconstructed pig movement information data at the current moment. .
4. The method for real-time detection of unsupervised pig health abnormalities based on monitoring video according to claim 1, characterized in that: The formula for the memory network model in step S4-3 is: in, This is the optimized feature vector of pig movement at the current moment. This represents the feature vector of pig movement at the current moment. Let be the magnitude of the feature vector of pig movement at the current moment. For the addressing matrix, It is a four-dimensional real matrix. For addressing matrix The Item data, , These are the first and second digits of a four-dimensional real matrix. Row vectors and their transpose matrices The fourth real matrix is the first real matrix. The magnitude of a row vector, The total number of rows in a four-dimensional real matrix. The fourth element of a four-dimensional real matrix row vectors Let cosine similarity function be used. It is an exponential function with the natural constant e as its base. This is a summation function.
5. The method for real-time detection of unsupervised pig health abnormalities based on monitoring video according to claim 1, characterized in that: The parameters of the future frame prediction submodule are adjusted using a total loss function consisting of an appearance loss function and an action loss function; the appearance loss function includes minimizing the KL divergence loss function and the gradient loss function. The formula for the appearance loss function is: in, for, , and These are the loss functions that minimize the KL divergence, gradient, and action, respectively. , and For weight parameters, For divergence term, For the frame data to be predicted, , These are the current real frame data and the frame data to be predicted from the previous time step, respectively. For the current time frame prediction data, , These are the frames to be predicted at the current time. Real frame data The corresponding reconstructed pig movement information data, The frame data to be predicted at the current moment and its corresponding reconstructed pig movement information data The posterior distribution, The real frame data at the current moment The corresponding prior distribution of reconstructed swine movement information data, , Norms 1 and 2 are respectively. For absolute values, , The spatial indices of the pig pixels are respectively , The predicted frame data, , The spatial indices of the pig pixels are respectively , Real frame data, , The spatial indices of the pig pixels are respectively , The predicted frame data, , The spatial indices of the pig pixels are respectively , Real frame data, This refers to the optical flow function in FlowNet.
6. The method for real-time detection of unsupervised pig health abnormalities based on monitoring video according to claim 1, characterized in that: Steps S5-6 further include: S5-6-1. Set the frame-level pig health score of the previous moment as the initial value of the dynamic health early warning threshold. S5-6-2. Determine whether the frame-level pig health score at the current moment is less than the initial value of the dynamic health warning threshold; if so, use the initial value of the dynamic health warning threshold as the dynamic health warning threshold in the current state; otherwise, proceed to step S5-6-3. S5-6-3, According to the formula: The initial value of the dynamic health warning threshold is adjusted to obtain the dynamic health warning threshold under the current state. ;in, , These are the mean and variance of the peak signal-to-noise ratio within the window, respectively. The length of the sliding window. This refers to the sequence number of the frame data to be predicted in the current state. For the video frame data frame, The first frame of data to be predicted in the current state Frame data and the Predicted frame data The peak signal-to-noise ratio between them.
7. A detection system based on the unsupervised real-time detection method for abnormal pig health based on monitoring video as described in any one of claims 1 to 6, characterized in that: It includes a video data acquisition module, a pig instance segmentation module, a motion information reconstruction module, and a pig health anomaly detection module; among which: The video data acquisition module is used to monitor the pigsty in real time through cameras, acquire video data of the pigsty, and transmit it to the video recorder for local storage using a switch; The pig instance segmentation module is used to segment the pigsty video data in the current state to obtain video frame data containing only pig instances; The motion information reconstruction module is used to reconstruct video frame data containing only live pig instances to obtain reconstructed live pig motion information data. The swine health anomaly detection module is used to detect reconstructed swine movement information data, video frame data containing only swine instances, and swine house video data, obtain early warning information, and notify management personnel.