System and method for monitoring and analyzing based on multi-modal big model of poultry wisdom breeding
The Poultry Eye Smart Farming Multimodal Large Model System solves the problems in the farming industry such as delayed disease detection, inaccurate production efficiency, reliance on personal experience for management decisions, limited functionality of AI systems, and limited computing power of edge devices. It enables early disease warning, precise production management, and comprehensive decision-making, thereby reducing costs and improving efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HAINAN YILIAN TECH CO LTD
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-12
AI Technical Summary
The aquaculture industry faces challenges such as delayed disease detection, inaccurate production efficiency, reliance on personal experience for management decisions, limited functionality of existing AI systems, limited computing power of edge devices, and difficulties in multimodal data fusion.
The system adopts a multimodal big data model based on poultry eye intelligent farming, including a data and infrastructure layer, a core model layer, a task and application layer, and a deployment and evolution layer. Through multimodal data fusion and deep learning technology, it realizes early disease warning, precise production management, data-driven decision-making, edge real-time reasoning, and continuous optimization.
It enables early disease warning, precise production management, reduced mortality rate, extended peak egg production period, optimized feed conversion ratio, reduced labor costs, and provides comprehensive breeding analysis and integrated decision support.
Smart Images

Figure CN122196437A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and computer vision, and particularly to the field of livestock breeding technology. Specifically, it refers to a system and method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry breeding. Background Technology
[0002] With the development of large-scale and intensive farming, traditional manual management methods are no longer sufficient to meet the needs of modern farming. Currently, the farming industry faces the following technical challenges: 1. Severe delay in disease detection: Relying on manual visual inspections makes it difficult to achieve 24 / 7 monitoring without blind spots. Early and mild symptoms are easily overlooked, and by the time they are discovered, the epidemic has already spread and caused significant economic losses.
[0003] 2. Production efficiency "black box": Key production indicators (such as accurate stocking, individual egg production performance, weight uniformity, and actual feed intake) rely on manual sampling and estimation. The data is not real-time, comprehensive, or accurate, and cannot support refined management.
[0004] 3. Management decisions are highly dependent on personal experience: decisions such as breeding, culling, population adjustment, and timing of immunization lack precise data support, and excellent experience is difficult to standardize and replicate on a large scale.
[0005] 4. Existing AI systems have limited functionality: Traditional aquaculture AI systems typically focus on a single task (such as counting or detecting dead animals), lack comprehensive analytical capabilities, and cannot achieve the leap from perception to cognition.
[0006] 5. Limited computing power of edge devices: Complex deep learning models are difficult to run in real time on resource-constrained edge devices, while cloud inference suffers from high latency and network dependence.
[0007] 6. Difficulty in multimodal data fusion: There are multiple data types in aquaculture scenarios, such as video, images, sensors, and text. How to effectively integrate these data for comprehensive analysis is a technical challenge. Summary of the Invention
[0008] The purpose of this invention is to overcome the shortcomings of the prior art and provide a system and method for monitoring and analysis based on a multimodal large model of intelligent poultry farming that meets the requirements of high accuracy, comprehensiveness, and wide applicability.
[0009] To achieve the above objectives, the present invention provides a system and method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming, as follows: The system for monitoring and analysis based on the multimodal large model of intelligent poultry farming is characterized by the following features: the system includes a data and infrastructure layer, a core model layer, a task and application layer, and a deployment and evolution layer, which are connected sequentially. The data and infrastructure layer includes a multimodal aquaculture database, an automated training platform, a model repository, and a version management module, which are connected sequentially. The multimodal aquaculture database is used to store and manage multimodal data such as videos, images, sensor data, and text data from aquaculture scenarios. The automated training platform is used to automatically train, evaluate, and deploy models. The model repository and version management module are used to manage multiple versions of models. The core model layer includes a dual-tower encoder, a multimodal fusion unit, and a unified task decoder, which are connected sequentially. The dual-tower encoder is used to process visual data from aquaculture scenarios and text data from the aquaculture field. The multimodal fusion unit is used to semantically align visual features with text features and to deeply fuse visual and text multimodal features. The unified task decoder contains multiple task heads for inference output of aquaculture-related tasks such as detection, segmentation, classification, and regression. The task and application layer includes a perception task module, a cognitive task module, and an interaction task module, which are connected sequentially. The perception task module is used to perform visual perception tasks such as chicken detection, segmentation, and pose estimation. The cognitive task module is used to perform cognitive analysis tasks such as poultry health diagnosis, behavior understanding, and production performance prediction. The interaction task module is used to perform human-computer interaction tasks such as visual question answering, semantic retrieval, and natural language interaction. The deployment and evolution layer includes a cloud server, edge computing nodes, and a data flywheel closed-loop module, which are connected sequentially. The cloud server is used to perform model development, continuous learning, and complex task processing. The edge computing nodes are used to achieve domain adaptation and real-time inference in aquaculture scenarios. The data flywheel closed-loop module is used to achieve difficult sample backflow and continuous model optimization.
[0010] Preferably, the dual-tower encoder includes a visual encoding tower and a text encoding tower. The visual encoding tower is used to extract global context and local detail features from visual data of aquaculture scenarios, and the text encoding tower is used to process text data in the aquaculture field and extract text features.
[0011] Preferably, the multimodal fusion unit includes an alignment module and an inference module. The alignment module semantically aligns visual features with textual features through contrastive learning, and the inference module deeply fuses visual and textual multimodal features through a cross-attention mechanism.
[0012] Preferably, the multimodal aquaculture database adopts a distributed storage architecture, with a built-in automatic data classification and indexing module and a data quality assessment system; the automated training platform realizes distributed training, supports automatic hyperparameter search and automatic model selection, and integrates model evaluation and automated deployment processes; the model repository and version management module uses Git-LFS to manage model files and has a built-in model version control and rollback mechanism.
[0013] Preferably, the unified task decoder adopts a multi-task shared decoder architecture, and the task head includes a detection head, a segmentation head, a classification head, and a regression head, and each task head is configured with a task-specific adapter to reduce interference between multiple tasks.
[0014] Preferably, the cloud server and edge computing nodes of the deployment and evolution layer construct a hybrid inference architecture, dynamically allocating computing resources according to task type and device computing power. The data flywheel closed-loop module has a built-in unit for automatic identification and feedback of difficult samples, which is used to realize incremental learning and continuous optimization of the model, and completes the model update of the edge computing nodes through OTA.
[0015] The main feature of this method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming is that the method includes the following steps: (1) Collect multimodal data of aquaculture scenarios, and clean, label and enhance the collected data to construct a multimodal aquaculture dataset; (2) The visual coding tower and the text coding tower are pre-trained on the dual-tower encoder respectively. The semantic alignment of visual and text multimodal features is achieved through contrastive learning. Multi-task joint training is completed on the unified task decoder to obtain a large multimodal model of aquaculture. (3) Real-time detection tasks in aquaculture scenarios are processed through local edge computing nodes, while complex cognitive and interactive tasks are processed through cloud servers; (4) Based on the inference output of the multimodal large model of aquaculture, complete the poultry health monitoring and production performance analysis in the aquaculture scenario, and generate corresponding disease early warning information and aquaculture management decision suggestions; (5) Automatically identify and transmit difficult samples in the aquaculture scenario, update the aquaculture multimodal large model in the cloud periodically based on the returned difficult samples, and update the model of the edge computing node through OTA.
[0016] Preferably, in step (1), multimodal data such as videos, images, environmental sensor data, and aquaculture management texts of the aquaculture scene are collected through cameras and sensor devices; and a multi-level labeling system is established during the preprocessing process.
[0017] Preferably, step (2) specifically includes the following steps: (2.1) Perform basic pre-training of visual coding towers and text coding towers on a general public dataset; (2.2) Perform cross-modal semantic alignment of visual features and text features through contrastive learning; (2.3) Based on the labeled dataset of aquaculture scenarios, perform joint training of detection, segmentation, classification and regression tasks on a unified task decoder to build a large multimodal aquaculture model.
[0018] Preferably, step (5) specifically includes: The data flywheel closed loop enables the automatic collection, labeling, and backhaul of difficult samples. Incremental model training is carried out based on the backhauled samples, and regular iterative updates of the cloud model are completed. The model is then updated seamlessly on the edge computing nodes via OTA remote distribution.
[0019] This invention employs a system and method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming. It enables early disease warning through multimodal data fusion and deep learning technology, allowing for early detection and warning of diseases, identifying potential health problems 24-48 hours in advance. It also facilitates precise production management by real-time monitoring and analysis of production performance data, enabling accurate calculation of indicators such as precise stocking, individual egg production performance, and weight uniformity. Furthermore, it enables data-driven decision-making by providing a scientific basis for farming management decisions based on multi-dimensional data analysis, reducing reliance on personal experience. Finally, it possesses comprehensive analytical capabilities, integrating multiple tasks such as detection, segmentation, classification, and regression to achieve a comprehensive understanding of poultry farming. It achieves a leap in cognition, providing comprehensive aquaculture analysis; enables real-time edge inference by compressing, quantizing, and optimizing models to run complex models in real time on edge devices with end-to-end latency below 200ms; achieves multimodal data fusion, effectively integrating various data types such as video, images, sensors, and text to provide more comprehensive and accurate analysis results; possesses continuous evolution capabilities by using a data flywheel closed loop to continuously optimize and enhance the model's capabilities, adapting to different aquaculture scenarios and breeds; reduces aquaculture costs by more than 10% in reducing mortality, extends the peak egg-laying period by 1-2 weeks, optimizes the feed conversion ratio by more than 0.05, and reduces labor costs by more than 30%. Attached Figure Description
[0020] Figure 1 This is a schematic diagram of the system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to the present invention. Detailed Implementation
[0021] To more clearly describe the technical content of the present invention, the following description is provided in conjunction with specific embodiments.
[0022] The system for monitoring and analysis based on a multimodal large model of intelligent poultry farming according to the present invention includes a data and infrastructure layer, a core model layer, a task and application layer, and a deployment and evolution layer, wherein the data and infrastructure layer, the core model layer, the task and application layer, and the deployment and evolution layer are connected in sequence. The data and infrastructure layer includes a multimodal aquaculture database, an automated training platform, a model repository, and a version management module, which are connected sequentially. The multimodal aquaculture database is used to store and manage multimodal data such as videos, images, sensor data, and text data from aquaculture scenarios. The automated training platform is used to automatically train, evaluate, and deploy models. The model repository and version management module are used to manage multiple versions of models. The core model layer includes a dual-tower encoder, a multimodal fusion unit, and a unified task decoder, which are connected sequentially. The dual-tower encoder is used to process visual data from aquaculture scenarios and text data from the aquaculture field. The multimodal fusion unit is used to semantically align visual features with text features and to deeply fuse visual and text multimodal features. The unified task decoder contains multiple task heads for inference output of aquaculture-related tasks such as detection, segmentation, classification, and regression. The task and application layer includes a perception task module, a cognitive task module, and an interaction task module, which are connected sequentially. The perception task module is used to perform visual perception tasks such as chicken detection, segmentation, and pose estimation. The cognitive task module is used to perform cognitive analysis tasks such as poultry health diagnosis, behavior understanding, and production performance prediction. The interaction task module is used to perform human-computer interaction tasks such as visual question answering, semantic retrieval, and natural language interaction. The deployment and evolution layer includes a cloud server, edge computing nodes, and a data flywheel closed-loop module, which are connected sequentially. The cloud server is used to perform model development, continuous learning, and complex task processing. The edge computing nodes are used to achieve domain adaptation and real-time inference in aquaculture scenarios. The data flywheel closed-loop module is used to achieve difficult sample backflow and continuous model optimization.
[0023] Preferably, the dual-tower encoder includes a visual encoding tower and a text encoding tower. The visual encoding tower is used to extract global context and local detail features from visual data of aquaculture scenarios, and the text encoding tower is used to process text data in the aquaculture field and extract text features.
[0024] In a preferred embodiment of the present invention, the multimodal fusion unit includes an alignment module and an inference module. The alignment module semantically aligns visual features with text features through contrastive learning, and the inference module deeply fuses visual and text multimodal features through a cross-attention mechanism.
[0025] In a preferred embodiment of the present invention, the multimodal aquaculture database adopts a distributed storage architecture, with a built-in automatic data classification and indexing module and a data quality assessment system; the automated training platform realizes distributed training, supports automatic hyperparameter search and automatic model selection, and integrates model evaluation and automated deployment processes; the model repository and version management module adopts Git-LFS to manage model files, and has a built-in model version control and rollback mechanism.
[0026] In a preferred embodiment of the present invention, the unified task decoder adopts a multi-task shared decoder architecture. The task head includes a detection head, a segmentation head, a classification head, and a regression head, and each task head is equipped with a task-specific adapter to reduce interference between multiple tasks.
[0027] In a preferred embodiment of the present invention, the cloud server and edge computing nodes of the deployment and evolution layer construct a hybrid inference architecture, dynamically allocate computing resources according to task type and device computing power, and the data flywheel closed-loop module has a built-in hard sample automatic identification and backhaul unit to realize incremental learning and continuous optimization of the model, and complete the model update of the edge computing nodes through OTA.
[0028] The present invention discloses a method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming, wherein the method includes the following steps: (1) Collect multimodal data of aquaculture scenarios, and clean, label and enhance the collected data to construct a multimodal aquaculture dataset; (2) The visual coding tower and the text coding tower are pre-trained on the dual-tower encoder respectively. The semantic alignment of visual and text multimodal features is achieved through contrastive learning. Multi-task joint training is completed on the unified task decoder to obtain a large multimodal model of aquaculture. (3) Real-time detection tasks in aquaculture scenarios are processed through local edge computing nodes, while complex cognitive and interactive tasks are processed through cloud servers; (4) Based on the inference output of the multimodal large model of aquaculture, complete the poultry health monitoring and production performance analysis in the aquaculture scenario, and generate corresponding disease early warning information and aquaculture management decision suggestions; (5) Automatically identify and transmit difficult samples in the aquaculture scenario, update the aquaculture multimodal large model in the cloud periodically based on the returned difficult samples, and update the model of the edge computing node through OTA.
[0029] As a preferred embodiment of the present invention, in step (1), multimodal data such as videos, images, environmental sensor data, and aquaculture management texts of the aquaculture scene are collected through cameras and sensor devices; and a multi-level labeling system is established during the preprocessing process.
[0030] In a preferred embodiment of the present invention, step (2) specifically includes the following steps: (2.1) Perform basic pre-training of visual coding towers and text coding towers on a general public dataset; (2.2) Perform cross-modal semantic alignment of visual features and text features through contrastive learning; (2.3) Based on the labeled dataset of aquaculture scenarios, perform joint training of detection, segmentation, classification and regression tasks on a unified task decoder to build a large multimodal aquaculture model.
[0031] In a preferred embodiment of the present invention, step (5) specifically comprises: The data flywheel closed loop enables the automatic collection, labeling, and backhaul of difficult samples. Incremental model training is carried out based on the backhauled samples, and regular iterative updates of the cloud model are completed. The model is then updated seamlessly on the edge computing nodes via OTA remote distribution.
[0032] This invention relates to the fields of artificial intelligence, computer vision, and animal husbandry technology, and particularly to a smart animal husbandry monitoring and analysis system and method based on a multimodal large model. This invention aims to solve the following problems existing in the prior art: delayed disease detection in animal husbandry scenarios, making early warning impossible; inaccurate production efficiency data, making it difficult to support refined management; management decisions relying on personal experience and lacking data support; existing AI systems having limited functionality and lacking comprehensive analytical capabilities; complex models being difficult to run in real time on edge devices; and difficulties in multimodal data fusion, hindering comprehensive analysis.
[0033] This invention provides a multimodal large-scale model system for intelligent poultry farming, comprising: 1. Data and infrastructure layer, including multimodal aquaculture database, automated training platform, model repository and version management.
[0034] A multimodal aquaculture database stores and manages multimodal data such as videos, images, sensors, and text from aquaculture scenarios.
[0035] An automated training platform for the automatic training, evaluation, and deployment of models.
[0036] Model repository and version management: Manage different versions of models, and support model rollback and A / B testing.
[0037] 2. Core model layer, namely the large-scale multimodal aquaculture model, which includes a dual-tower encoder, a multimodal fusion unit, and a unified task decoder.
[0038] A dual-tower encoder consists of a visual encoding tower and a text encoding tower.
[0039] The Visual Encoding Tower uses a Swing Transformer V3 and a lightweight CNN parallel architecture to capture global context and local details.
[0040] Text Encoding Tower uses a lightweight BERT model to process text data in the aquaculture field.
[0041] The multimodal fusion unit includes an alignment module and an inference module.
[0042] The alignment module aligns visual and textual features through contrastive learning.
[0043] The inference module achieves deep fusion of multimodal features through a cross-attention mechanism.
[0044] The unified task decoder contains multiple task heads and supports various tasks such as detection, segmentation, classification, and regression.
[0045] 3. Task and application layer, including perceptual tasks, cognitive tasks, and interactive tasks.
[0046] Perception tasks include chicken detection, segmentation, and pose estimation.
[0047] Cognitive tasks include health diagnosis, behavioral understanding, and productivity prediction.
[0048] Interactive tasks include visual question answering, semantic retrieval, and natural language interaction.
[0049] 4. Deployment and Evolution Layer, including cloud servers, edge computing nodes, and data flywheel closed loop.
[0050] Cloud-based: Responsible for model development, continuous learning, and handling complex tasks.
[0051] Edge: Responsible for domain adaptation and real-time inference.
[0052] Data flywheel closed loop: enables the return of difficult samples and continuous model optimization.
[0053] 5. A smart farming method, comprising: Data Acquisition and Preprocessing: Collect and process multimodal data from aquaculture scenarios; Model training: Visual and textual pre-training is performed on the dual-tower encoder separately, and multi-modal feature alignment is achieved through contrastive learning. Multi-task joint training is performed on the unified task decoder. Hybrid inference: Local edge devices handle real-time detection tasks, while the cloud handles complex cognitive and interaction tasks; Intelligent analysis and decision-making: Based on model output, health monitoring and production performance analysis are performed to generate early warning information and decision-making suggestions; The model continues to evolve: it automatically collects and transmits difficult samples, updates the cloud model regularly, and updates the edge device model via OTA.
[0054] The specific embodiments of the present invention include the following examples: 1. An example of the data and infrastructure layer is as follows: (1) Multimodal aquaculture database: It adopts a distributed storage architecture to support petabyte-level data storage; it enables automatic data classification and indexing to improve data retrieval efficiency; and it establishes a data quality assessment system to ensure data reliability.
[0055] (2) Automated training platform: It implements distributed training based on Kubernetes, supports automatic hyperparameter search and model selection, and integrates model evaluation and deployment processes.
[0056] (3) Model repository and version management: Git-LFS is used to manage model files, enabling model version control and rollback mechanisms, and supporting model A / B testing and canary releases.
[0057] 2. An example of the core model layer is as follows: (1) Dual-tower encoder: Visual encoding tower: It uses Swing Transformer V3 Base as the backbone network and connects a lightweight CNN branch of MobileNetV4 in parallel to extract local details.
[0058] Text Coding Tower: The TinyBERT lightweight model is used for domain-adaptive pre-training on texts in the aquaculture field.
[0059] (2) Multimodal fusion device: Alignment module: It adopts CLIP-style contrastive learning to achieve semantic alignment of visual and textual features.
[0060] Inference module: Employs a multi-layer cross-attention mechanism to achieve deep fusion of visual and textual features.
[0061] (3) Unified Task Decoder: Design a multi-task shared decoder architecture, including multiple task heads such as detection head, segmentation head, classification head, and regression head, and implement task-specific adapters to reduce interference between tasks.
[0062] 3. Examples of intelligent farming methods are as follows: (1) Data acquisition and preprocessing: Collect multimodal data on aquaculture scenarios using devices such as cameras and sensors; The data is cleaned, labeled, and augmented. Establish a multi-level annotation system to ensure annotation quality.
[0063] (2) Model training: Pre-train visual and text encoding towers on a general dataset; Alignment of multimodal features is achieved through contrastive learning; Multi-task joint training is performed on a unified task decoder.
[0064] (3) Mixed reasoning: Edge devices handle real-time detection tasks; Processing complex cognitive and interactive tasks in the cloud; Computing resources are dynamically allocated based on task type and device capabilities.
[0065] (4) Intelligent analysis and decision-making: Health monitoring and production performance analysis are performed based on model output; Generate early warning information and decision-making recommendations; Provide analysis results to users through natural language interaction.
[0066] (5) The model continues to evolve: Automatically collect and transmit difficult samples; Regularly update the cloud model; Update edge device models via OTA (Over-The-Air).
[0067] This invention features a multimodal fusion architecture: an innovative design of a dual-tower encoder and a multimodal fusion unit to achieve deep fusion of visual and textual elements.
[0068] This invention features a unified task decoder: an innovative design that shares underlying features to create a unified task decoder that supports multiple tasks.
[0069] This invention enables edge-cloud collaborative reasoning, innovatively designs a hybrid reasoning architecture, and dynamically allocates computing resources based on task type and device capabilities.
[0070] This invention features a data flywheel closed loop and an innovative design for automatic identification and feedback of difficult samples, enabling incremental learning and continuous optimization of the model.
[0071] This invention enables specific optimization for aquaculture scenarios, optimizing the model architecture and algorithm to meet the special needs of aquaculture scenarios.
[0072] like Figure 1 As shown, the connection relationships of this technical solution are as follows: Data and infrastructure layer output to core model layer: The multimodal aquaculture database provides labeled video, image, sensor, and text data for model training.
[0073] The automated training platform uses this data to complete the pre-training, contrastive learning alignment, and multi-task joint training of the dual-tower encoder, and then stores the generated model in the model repository.
[0074] The core model layer loads the latest version of the model from the model repository through an interface for inference.
[0075] Core Model Layer → Task and Application Layer The unified task decoder outputs results from basic tasks such as detection, segmentation, classification, and regression. These results are encapsulated into a unified feature representation for use by upper-level task modules.
[0076] The perception task module directly uses visually relevant outputs; the cognition task module performs health diagnosis and behavior understanding based on perception results and text features; and the interaction task module combines multimodal fusion features to achieve visual question answering and semantic retrieval.
[0077] Task and application layer output to deployment and evolution layer: Real-time data generated by perception, cognition, and interaction tasks (such as edge detection results and complex cloud analysis requests) are distributed to the corresponding execution units.
[0078] Edge computing nodes are responsible for handling real-time perception tasks (such as chicken detection), while cloud servers handle complex cognitive and interactive tasks.
[0079] Low-confidence samples or error cases encountered during task execution are marked as difficult samples.
[0080] The deployment and evolution layer outputs to the data and infrastructure layer (closed loop): The data flywheel closed-loop module automatically collects difficult samples identified by edge nodes and the cloud, and transmits them back to the multimodal aquaculture database.
[0081] The automated training platform regularly uses newly added difficult samples for incremental training. After the optimized model is managed by the model repository version, it is pushed to the edge node via OTA to complete the continuous evolution of the model.
[0082] The module connections within the layer are as follows: In the data and infrastructure layer, the multimodal aquaculture database, automated training platform, model repository, and version management are connected sequentially, and data flows are transmitted sequentially.
[0083] In the core model layer, the dual-tower encoder, multimodal fusion unit, and unified task decoder are connected in sequence, and features are processed step by step.
[0084] In the task and application layer, the perception task module, cognition task module, and interaction task module are connected in sequence, and the output of the lower-level task serves as the input of the higher-level task.
[0085] In the deployment and evolution layer, cloud servers and edge computing nodes process data in parallel, and both are connected to the data flywheel closed-loop module for sample backhaul.
[0086] For the specific implementation scheme of this embodiment, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0087] It is understood that the same or similar parts in the above embodiments can be referred to each other, and the contents not described in detail in some embodiments can be referred to the same or similar contents in other embodiments.
[0088] It should be noted that in the description of this invention, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, in the description of this invention, unless otherwise stated, "a plurality of" means at least two.
[0089] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.
[0090] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0091] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The corresponding program can be stored in a computer-readable storage medium. When the program is executed, it includes one or a combination of the steps of the method embodiments.
[0092] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0093] The storage media mentioned above can be read-only memory, disk, or optical disk, etc.
[0094] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0095] This invention employs a system and method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming. It enables early disease warning through multimodal data fusion and deep learning technology, allowing for early detection and warning of diseases, identifying potential health problems 24-48 hours in advance. It also facilitates precise production management by real-time monitoring and analysis of production performance data, enabling accurate calculation of indicators such as precise stocking, individual egg production performance, and weight uniformity. Furthermore, it enables data-driven decision-making by providing a scientific basis for farming management decisions based on multi-dimensional data analysis, reducing reliance on personal experience. Finally, it possesses comprehensive analytical capabilities, integrating multiple tasks such as detection, segmentation, classification, and regression to achieve a comprehensive understanding of poultry farming. It achieves a leap in cognition, providing comprehensive aquaculture analysis; enables real-time edge inference by compressing, quantizing, and optimizing models to run complex models in real time on edge devices with end-to-end latency below 200ms; achieves multimodal data fusion, effectively integrating various data types such as video, images, sensors, and text to provide more comprehensive and accurate analysis results; possesses continuous evolution capabilities by using a data flywheel closed loop to continuously optimize and enhance the model's capabilities, adapting to different aquaculture scenarios and breeds; reduces aquaculture costs by more than 10% in reducing mortality, extends the peak egg-laying period by 1-2 weeks, optimizes the feed conversion ratio by more than 0.05, and reduces labor costs by more than 30%.
[0096] In this specification, the invention has been described with reference to specific embodiments thereof. However, it will be apparent that various modifications and variations can be made without departing from the spirit and scope of the invention. Therefore, the specification and drawings should be considered illustrative rather than restrictive.
Claims
1. A system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming, characterized in that, The system comprises a data and infrastructure layer, a core model layer, a task and application layer, and a deployment and evolution layer, which are connected sequentially. The data and infrastructure layer includes a multimodal aquaculture database, an automated training platform, a model repository, and a version management module, which are connected sequentially. The multimodal aquaculture database is used to store and manage multimodal data such as videos, images, sensor data, and text data from aquaculture scenarios. The automated training platform is used to automatically train, evaluate, and deploy models. The model repository and version management module are used to manage multiple versions of models. The core model layer includes a dual-tower encoder, a multimodal fusion unit, and a unified task decoder, which are connected sequentially. The dual-tower encoder is used to process visual data from aquaculture scenarios and text data from the aquaculture field. The multimodal fusion unit is used to semantically align visual features with text features and to deeply fuse visual and text multimodal features. The unified task decoder contains multiple task heads for inference output of aquaculture-related tasks such as detection, segmentation, classification, and regression. The task and application layer includes a perception task module, a cognitive task module, and an interaction task module, which are connected sequentially. The perception task module is used to perform visual perception tasks such as chicken detection, segmentation, and pose estimation. The cognitive task module is used to perform cognitive analysis tasks such as poultry health diagnosis, behavior understanding, and production performance prediction. The interaction task module is used to perform human-computer interaction tasks such as visual question answering, semantic retrieval, and natural language interaction. The deployment and evolution layer includes a cloud server, edge computing nodes, and a data flywheel closed-loop module, which are connected sequentially. The cloud server is used to perform model development, continuous learning, and complex task processing. The edge computing nodes are used to achieve domain adaptation and real-time inference in aquaculture scenarios. The data flywheel closed-loop module is used to achieve difficult sample backflow and continuous model optimization.
2. The system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming as described in claim 1, characterized in that, The dual-tower encoder includes a visual encoding tower and a text encoding tower. The visual encoding tower is used to extract global context and local detail features from visual data of aquaculture scenarios, while the text encoding tower is used to process text data in the aquaculture field and extract text features.
3. The system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming as described in claim 1, characterized in that, The multimodal fusion unit includes an alignment module and an inference module. The alignment module semantically aligns visual features with text features through contrastive learning, and the inference module deeply fuses visual and text multimodal features through a cross-attention mechanism.
4. The system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming as described in claim 1, characterized in that, The multimodal aquaculture database adopts a distributed storage architecture, with a built-in automatic data classification and indexing module and a data quality assessment system; the automated training platform realizes distributed training, supports automatic hyperparameter search and automatic model selection, and integrates model evaluation and automated deployment processes; the model repository and version management module uses Git-LFS to manage model files and has a built-in model version control and rollback mechanism.
5. The system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to claim 1, characterized in that, The unified task decoder adopts a multi-task shared decoder architecture. The task head includes a detection head, a segmentation head, a classification head, and a regression head, and each task head is configured with a task-specific adapter to reduce interference between multiple tasks.
6. The system for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to claim 1, characterized in that, The cloud server and edge computing nodes of the deployment and evolution layer construct a hybrid inference architecture, dynamically allocating computing resources according to task type and device computing power. The data flywheel closed-loop module has a built-in unit for automatic identification and feedback of difficult samples, which is used to realize incremental learning and continuous optimization of the model, and completes the model update of the edge computing nodes through OTA.
7. A method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming using the system described in claim 1, characterized in that, The method includes the following steps: (1) Collect multimodal data of aquaculture scenarios, and clean, label and enhance the collected data to construct a multimodal aquaculture dataset; (2) The visual coding tower and the text coding tower are pre-trained on the dual-tower encoder respectively. The semantic alignment of visual and text multimodal features is achieved through contrastive learning. Multi-task joint training is completed on the unified task decoder to obtain a large multimodal model of aquaculture. (3) Real-time detection tasks in aquaculture scenarios are processed through local edge computing nodes, while complex cognitive and interactive tasks are processed through cloud servers; (4) Based on the inference output of the multimodal large model of aquaculture, complete the poultry health monitoring and production performance analysis in the aquaculture scenario, and generate corresponding disease early warning information and aquaculture management decision suggestions; (5) Automatically identify and transmit difficult samples in the aquaculture scenario, update the aquaculture multimodal large model in the cloud periodically based on the returned difficult samples, and update the model of the edge computing node through OTA.
8. The method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to claim 7, characterized in that, In step (1), multimodal data such as videos, images, environmental sensor data, and aquaculture management text are collected from the aquaculture scene using cameras and sensor devices. A multi-level annotation system is established during the preprocessing process.
9. The method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to claim 7, characterized in that, Step (2) specifically includes the following steps: (2.1) Perform basic pre-training of visual coding towers and text coding towers on a general public dataset; (2.2) Perform cross-modal semantic alignment of visual features and text features through contrastive learning; (2.3) Based on the labeled dataset of aquaculture scenarios, perform joint training of detection, segmentation, classification and regression tasks on a unified task decoder to build a large multimodal aquaculture model.
10. The method for monitoring and analysis based on a multimodal large-scale model of intelligent poultry farming according to claim 7, characterized in that, The specific steps (5) are as follows: The data flywheel closed loop enables the automatic collection, labeling, and backhaul of difficult samples. Incremental model training is carried out based on the backhauled samples, and regular iterative updates of the cloud model are completed. The model is then updated seamlessly on the edge computing nodes via OTA remote distribution.