A water quality detection method for a factory circulating water culture system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining multi-parameter water quality sensors and underwater visual scanning technology in a factory-scale recirculating aquaculture system, and employing a multimodal data fusion network with self-attention and cross-domain attention mechanisms, high-precision prediction and early warning of water quality status are achieved. This solves the problems of water quality prediction lag and limited accuracy in existing technologies and improves the adaptive control capability of the aquaculture system.

CN122045963BActive Publication Date: 2026-06-16YELLOW SEA FISHERIES RES INST CHINESE ACAD OF FISHERIES SCI

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: YELLOW SEA FISHERIES RES INST CHINESE ACAD OF FISHERIES SCI
Filing Date: 2026-04-17
Publication Date: 2026-06-16

Application Information

Patent Timeline

17 Apr 2026

Application

16 Jun 2026

Publication

CN122045963B

IPC: G06F18/241; G01N33/18; G06F18/27; G06F18/25; G06F18/213; G06V20/05; G06V20/40; G06V40/20; G06V10/44; G06V10/82; G06V10/52; G06N3/045; G06N3/049; G06N3/0464; G06N3/047; G06N3/0499; G06F123/02

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies struggle to integrate multi-source heterogeneous data and predict water quality in factory-scale recirculating aquaculture systems. This results in water quality warnings relying on alarms triggered by parameters that have already deteriorated, hindering proactive early risk identification and intelligent decision-making.

⚗Method used

By using a multi-parameter water quality sensor array and dynamic visual scanning of underwater community biological behavior, combined with a multimodal data fusion network that integrates self-attention and cross-domain attention mechanisms, the system extracts and outputs a holographic quantitative representation of the current comprehensive ecological water quality status of the aquaculture water body, thereby achieving multi-task joint decoding and water quality index inversion prediction.

🎯Benefits of technology

It achieves high-precision and forward-looking prediction of water quality, can provide early warning of water quality deterioration, reduce physiological damage to aquaculture organisms, and improve the accuracy and timeliness of detection and early warning through closed-loop adaptive management and control.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122045963B_ABST

Patent Text Reader

Abstract

The application provides a kind of factory circulating water aquaculture system water quality detection method, belongs to deep learning and wisdom breeding technical field;Through high-frequency collection of multi-dimensional physicochemical parameters of aquaculture environment, underwater biological population behavior dynamic visual information, abnormal filtering and standardized space-time tensor processing are carried out on multi-source heterogeneous data, and a joint data benchmark aligned in space and time is constructed.Relying on deep residual network and visual self-attention mechanism to extract visual features, accurately identify the stress state of the group;Adopting time series convolution and cross-domain cross attention to realize multi-modal data fusion, making up for the difference between physicochemical data lag and visual information mutation, forming a holographic water quality representation tensor.Finally, through multi-task joint decoding, the water quality biochemical indicators are inverted, the safety level is determined, the comprehensive early warning message is generated, and the bottom equipment closed-loop control is driven;The application effectively improves the water quality detection and early warning accuracy, timeliness and self-adaptive control level of the aquaculture environment.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of deep learning and intelligent aquaculture technology, and in particular relates to a water quality detection method for a factory-scale recirculating aquaculture system. Background Technology

[0002] In the fields of modern agriculture and intelligent equipment technology, recirculating aquaculture systems (RAS) are widely used in the intensive production of high-value aquatic products due to their small footprint, high water resource utilization, high stocking density, and strong environmental controllability. However, in actual aquaculture production and management, managers typically rely on a single array of physicochemical sensors, pre-setting absolute safety thresholds for various water quality parameters. They then collect water quality data periodically or frequently, and use basic data statistics and charting methods to conduct post-event assessments of the water condition. Under this monitoring model, early warning of water quality often depends on alarms triggered by parameters exceeding limits after deterioration, making it difficult to establish proactive early risk identification and intelligent decision-making pathways.

[0003] The aforementioned traditional one-way monitoring methods have revealed a series of significant shortcomings in the field of factory-scale recirculating aquaculture. First, water treatment systems exhibit a significant physical lag effect. By the time water quality sensors detect excessive concentrations of harmful substances such as ammonia nitrogen or nitrite, water environment deterioration is often already inevitable, causing irreversible physiological damage to farmed organisms. Existing methods struggle to characterize the early mapping relationship between water quality evolution trends and fish stress states, let alone provide early warnings of sudden water quality changes. Second, abnormal behavior in farmed organisms is the most direct early indication of water quality deterioration, but existing methods cannot directly quantify and integrate these high-frequency visual abrupt changes with low-frequency physicochemical lag characteristics. Therefore, it is impossible to deduce the specific concentrations and risk levels of core biochemical indicators of water quality from these dynamic visual manifestations.

[0004] Therefore, how to achieve the fusion of multi-source heterogeneous data and water quality prediction while eliminating the modal gap between physicochemical hysteresis and visual abrupt change has become a key technical problem in this field.

[0005] From the perspective of existing technologies, water quality testing mainly employs single physicochemical model prediction and basic machine vision recognition methods. While physicochemical models can predict future trends based on historical time-series data, they neglect the immediate stress feedback of organisms as living sensors, resulting in prediction lag and limited accuracy. Basic machine vision methods typically focus on fish population statistics or simple dead fish identification, lacking the ability to explicitly model the complex morphological characteristics of groups and their correlation with physicochemical evolution trends. This makes it difficult to output intuitive and quantifiable prediction results of core biochemical indicators, thus hindering the direct conversion of recognition results into specific automated control commands and early warning messages for underlying equipment. In summary, the existing technological system cannot yet meet the actual needs of factory-scale recirculating aquaculture systems for high-precision water quality inversion prediction. Summary of the Invention

[0006] To address the above problems, this invention provides a method for water quality testing in a factory-scale recirculating aquaculture system, comprising the following steps:

[0007] S1 continuously collects multi-dimensional physicochemical sensor time-series sequences through a multi-parameter water quality sensor array in the aquaculture pond; simultaneously, it performs dynamic visual scanning and image sequence extraction of underwater group biological behavior to obtain dynamic video streams of underwater group behavior.

[0008] The collected data is preprocessed to obtain a set of standardized physicochemical feature time series tensors and spatiotemporal visual image frames;

[0009] S2 inputs the spatiotemporal visual image frame set into a morphological visual feature extraction network that integrates a self-attention mechanism, extracts and outputs a dimensionality-reduced visual representation feature vector that represents the current biological stress state in the aquatic environment.

[0010] S3 inputs the standardized physicochemical feature temporal tensor and the dimensionality-reduced visual representation feature vector into a multimodal data fusion decoupling network based on temporal convolution and cross-domain attention mechanism to explore the long-term evolution trend of physicochemical features and output a holographic water quality status fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of aquaculture water bodies.

[0011] S4, based on the holographic water quality state fusion characterization tensor, performs multi-task joint decoding for water quality index inversion and prediction. It then uses continuous numerical regression branching to invert the core biochemical indicators of water quality into a quantitative prediction matrix.

[0012] The safety rating label is output by combining the classification decision branches.

[0013] Preferably, the multi-parameter water quality sensor array continuously collects electrical signals of water temperature, pH, dissolved oxygen concentration, and redox potential at various depths, and extracts the absolute physical timestamp at the corresponding sampling moment; then, the absolute physical timestamp and the corresponding electrical signal are bound to the underlying time axis to generate a single-point multidimensional water quality state primitive; finally, all the single-point multidimensional water quality state primitives collected in the continuous time steps are serialized and arranged in chronological order to output a multidimensional physicochemical sensor time sequence without algorithm feature processing.

[0014] Preferably, the specific process of acquiring the dynamic video stream of underwater group behavior is as follows:

[0015] First, a binocular camera is controlled to continuously capture panoramic images of the fish school's movement trajectory and feeding activity at a fixed video acquisition frame rate to obtain raw underwater color dynamic video. Second, based on the absolute physical timestamp, a frame-level alignment operation based on minimum time difference matching is performed on the acquired raw underwater color dynamic video. The absolute time difference between the exposure time of each video frame and the absolute physical timestamp is calculated, and the key video frame with the smallest time difference is strictly retained to ensure the absolute synchronization of multimodal data on the time axis. Third, the global color space histogram features of each extracted key video frame are extracted and encapsulated into raw primitives of a single-frame underwater visual image. Finally, the raw primitives of the single-frame underwater visual image corresponding to the same time step are channel-cascaded and time-series stacked according to the time sequence dimension, and finally, the underwater group behavior dynamic video stream parallel to the time sequence of the multi-dimensional physicochemical sensor is extracted and output.

[0016] Preferably, the morphological visual feature extraction network incorporating a self-attention mechanism is specifically:

[0017] First, a set of high-frequency spatiotemporal visual image frames is input into the visual backbone feature extractor to perform layer-by-layer spatial downsampling and feature mapping operations, extracting and outputting the basic feature map of shallow water morphology. Then, a multi-scale self-attention mechanism is used to redistribute visual features and generate dimensionality reduction representations. The basic feature map of shallow water morphology is input into the visual self-attention encoding module to generate a global spatial attention feature matrix and a multi-scale underwater visual perception feature tensor. After passing through global spatial feature aggregation operations and high-dimensional information compression, the dimensionality reduction visual representation feature vector representing the current biological stress state of the aquatic environment is finally extracted and output.

[0018] Preferably, the multimodal data fusion decoupling network based on temporal convolution and cross-domain attention mechanisms specifically comprises:

[0019] First, a physicochemical evolution trend feature extraction based on one-dimensional causal convolution is implemented. The standardized physicochemical feature time-series tensor is input into a one-dimensional dilated causal convolutional network to perform causal sliding feature extraction and time dimension feature compression, outputting a latent feature vector of physicochemical evolution trend. Then, multimodal cross-domain cross-attention fusion and holographic state tensor generation are performed. The latent feature vector of physicochemical evolution trend and the dimensionality-reduced visual representation feature vector are simultaneously input into the cross-domain cross-attention fusion module for correlation score calculation and cross-modal feature redistribution. The residual jump connection and nonlinear feature smoothing fusion are then performed with the original physicochemical features to output a holographic water quality state fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of the aquaculture water body.

[0020] Preferably, the implementation of the physicochemical evolution trend feature extraction based on one-dimensional causal convolution is as follows: the standardized physicochemical feature time series tensor is input into a one-dimensional dilated causal convolutional network containing an input mapping layer, four cascaded residual modules, a batch normalization layer, a ReLU activation function layer, a time dimension aggregation module, and a fully connected layer;

[0021] First, the input mapping layer is invoked to perform initial channel dimension expansion calculations on the standardized physicochemical feature time-series tensor to generate an initial time-series feature matrix. Second, the initial time-series feature matrix is input into each of the four cascaded residual modules. Each cascaded residual module performs causal sliding feature extraction operations unidirectionally along the time axis. Before each one-dimensional convolutional slide, zero-padding is performed on the input features on one side of the time axis to prevent premature leakage of information from future time steps. An exponentially increasing dilation coefficient (base 2) is configured for the one-dimensional convolution kernel within each cascaded residual module, multiplying the receptive field of the time dimension through a step-skip sampling method to generate a dilated causal feature tensor. Third, the cascaded residual module's interconnected one-dimensional causal convolution operators are invoked to perform feature mapping operations on the dilated causal feature tensor, and the results are input into the batch normalization layer. The layer and ReLU activation function layer perform gradient stabilization and nonlinear activation operations, and perform residual jump connection calculations on the element-wise addition of the nonlinear activation results and the original features of the input residual module to mine the long-term dependency evolution law of various physicochemical indicators over time and output a deep temporal feature tensor. Finally, the deep temporal feature tensor of the last time step of the one-dimensional dilated causal convolutional network is extracted and input into the time dimension aggregation module configured with a one-dimensional global average pooling layer. The one-dimensional global average pooling layer is called to perform an average operation on the values of each feature channel in the deep temporal feature tensor at all time steps to complete the time dimension feature compression. Then, the fully connected layer at the end of the one-dimensional dilated causal convolutional network is called to perform flattening and dimensionality reduction operations on the compressed features and output the implicit feature vector of physicochemical evolution trend.

[0022] Preferably, the specific process of performing multimodal cross-domain cross-attention fusion and holographic state tensor generation is as follows: The latent feature vector of physicochemical evolution trends and the feature vector of dimensionality reduction visual representation are simultaneously input into the cross-domain cross-attention fusion module, which contains a linear mapping network and a matrix multiplication operator. First, three independent fully connected mapping layers are used to perform linear projection transformations on the input cross-modal features: the latent feature vector of physicochemical evolution trends is transformed into a query matrix for the cross-domain attention mechanism through the first fully connected mapping layer, and the feature vector of dimensionality reduction visual representation is transformed into the key matrix and value matrix for the cross-domain attention mechanism through the second and third fully connected mapping layers, respectively. Second, the matrix multiplication operator is called to perform the dot product operation of the transpose matrix multiplication of the query matrix and the key matrix to calculate the correlation score between different physical modes, and then... The correlation score is scaled by division using a scaling factor to prevent gradient vanishing. Then, the scaled correlation score is normalized using the Softmax activation function to generate a multimodal cross-domain interaction weight matrix that spans different physical modalities. The multimodal cross-domain interaction weight matrix is multiplied by the value matrix to achieve cross-modal feature redistribution from visual features to physicochemical features and generate a redistributed feature tensor. Finally, the redistributed feature tensor and the original physicochemical evolution trend latent feature vector are subjected to a residual jump connection operation with element-wise addition to preserve the original physicochemical time series information. The tensor after residual addition is then input into a feature smoothing processing unit that is internally connected to a layer normalization operator and a feedforward fully connected network to perform high-dimensional space mapping and nonlinear feature fusion, and output a holographic water quality state fusion representation tensor.

[0023] Preferably, in step S4, the holographic water quality status fusion representation tensor is first input into the multi-task joint decoder to purify and generate a decoder-shared semantic feature tensor. Then, the tensor is subjected to high-dimensional spatial mapping and physical dimension inversion through the first continuous numerical regression prediction branch to obtain the peak values of ammonia nitrogen and nitrite concentration changes in the water body within a future preset time window, and a quantitative prediction matrix of core biochemical indicators of water quality is constructed. Subsequently, the decoder-shared semantic feature tensor is input into the second classification decision branch to perform normalized probability mapping calculation to generate a water quality safety status rating label.

[0024] Preferably, the specific generation process of the quantitative prediction matrix of the core biochemical indicators of water quality is as follows:

[0025] The holographic water quality state fusion representation tensor is input into a multi-task joint decoder containing a shared feature dimensionality reduction network and two independent task mapping branches. First, the shared feature dimensionality reduction network, containing two downsampling layers, performs feature purification and dimensionality compression calculations on the holographic water quality state fusion representation tensor to generate a decoder shared semantic feature tensor. Second, the decoder shared semantic feature tensor is input into the first continuous numerical regression prediction branch of the multi-task joint decoder, where the alternating hidden fully connected layers, nonlinear activation functions, and random deactivation operators within the first continuous numerical regression prediction branch perform various operations on the input features. Multi-level high-dimensional space mapping and overfit suppression operations are used to generate the latent feature vector of the regression task. Next, the two independent linear output nodes configured in parallel at the end of the first continuous numerical regression prediction branch are called to perform linear activation and scalar mapping calculations on the latent feature vector of the regression task to output dimensionless normalized prediction values. Then, the inverse maximum-minimum-extreme value restoration operator is called to perform physical dimension inversion mapping operations on the normalized prediction values to predict the peak values of ammonia nitrogen concentration and nitrite concentration in the water body within a preset time window. Finally, the peak values of the inverted changes are spliced and encapsulated into a two-dimensional digital matrix according to the preset channel order.

[0026] Preferably, the step of combining the classification decision branch to output the security rating identifier specifically includes:

[0027] First, the generated decoder shared semantic feature tensor is input into the second classification decision branch of the multi-task joint decoder. Multiple fully connected mapping layers configured within the second classification decision branch are invoked to perform high-dimensional space dimensionality reduction and feature recombination calculations on the input features to generate classification decision latent feature vectors. Second, the nonlinear classifier nodes connected in series at the end of the second classification decision branch and the activation function maximized are invoked to perform normalized probability mapping calculations on the classification decision latent feature vectors. The output is a discrete probability distribution value representing the current circulating water body in four independent categories: high quality, sub-healthy, slightly polluted, and severely deteriorated. The category label with the highest probability value is extracted to generate a water quality safety status rating label.

[0028] Compared with the prior art, the present invention has the following innovative features and beneficial effects:

[0029] 1. Design of a standardized spatiotemporal reconstruction and morphological visual feature extraction network for multi-source heterogeneous data: A multimodal feature extraction architecture based on absolute physical timestamp alignment and self-attention mechanism is proposed. This architecture innovatively introduces a visual self-attention encoding module. By calculating global spatial dependency weights, a global spatial attention feature matrix is generated and bilinear interpolation upsampling and multi-scale feature concatenation are performed. Finally, a dimensionality-reduced visual representation feature vector representing the current biological stress state of the aquatic environment is extracted.

[0030] 2. Design of a cross-domain attention multimodal fusion mechanism to eliminate the gap between physical and chemical lag and visual mutation: A parallel decoupling and deep fusion strategy based on one-dimensional dilated causal convolution and cross-domain attention fusion is proposed, which abandons single-modal serial analysis. In the multimodal data processing, a one-dimensional dilated causal convolutional network containing a cascaded residual module is constructed to target biochemical hysteresis features. By configuring an exponentially increasing dilation coefficient and zero-padding on one side of the time axis, the receptive field of the time dimension is multiplied to prevent future information leakage and accurately extract the implicit feature vectors of physicochemical evolution trends. On this basis, a cross-domain cross-attention fusion module is constructed as a modal interaction hub. Three independent fully connected mapping layers are used to map physicochemical temporal features and visual representation features into query matrices and key-value matrices, respectively. The correlation scores between different physical modalities are calculated by transpose matrix multiplication dot product operation, and the Softmax activation function is used to generate a multimodal cross-domain interaction weight matrix to perform cross-modal feature redistribution. Furthermore, by combining residual skip connections and feature smoothing processing units with internally connected layer normalization operators, the cross-validation and feature compensation of visual stress signals to biochemical hysteresis signals are realized, and the output is a holographic water quality state fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of aquaculture water bodies.

[0031] 3. Quantitative Inversion of Water Quality Biochemical Indicators Based on Multi-Task Joint Decoding: Addressing the lack of forward-looking quantitative early warning in industrialized aquaculture, a full-link deep feature decoding and structured message generation system is established. A structured data encapsulation method based on generation timestamp primary key alignment is proposed. This method performs absolute time alignment operations on the quantitative prediction matrix of core water quality biochemical indicators and the water quality safety status rating identifier, constructing a standard message data dictionary based on key-value pair mapping. Following preset protocol specifications, a start frame header, physical address, and cyclic redundancy check (CRC) code frame tail are added sequentially to generate a comprehensive water quality detection early warning message. This transforms the abstract model deduction results into a structured communication payload that can directly trigger automated control commands for underlying equipment, achieving closed-loop adaptive management and control across the entire business chain, from status perception to early warning intervention. Attached Figure Description

[0032] To more clearly illustrate the technical solutions of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the following description is only one embodiment of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0033] Figure 1 This is a flowchart illustrating the overall technical process of the present invention.

[0034] Figure 2 This is a schematic diagram of the data acquisition and preprocessing process of the present invention.

[0035] Figure 3 This is a diagram of the morphological visual feature extraction network architecture that incorporates a self-attention mechanism, as described in this invention.

[0036] Figure 4 This is a diagram of the multimodal data fusion decoupling network architecture based on temporal convolution and cross-domain attention mechanisms of the present invention.

[0037] Figure 5 This is a diagram of the water quality index inversion and prediction architecture based on multi-task joint decoding of the present invention.

[0038] Figure 6 This is a comparison chart of the results of multiple algorithms in predicting core biochemical indicators of water quality in the embodiments.

[0039] Figure 7 The diagram shows the distribution of the early warning response time deviation before and after the introduction of visual feature compensation in the embodiment.

[0040] Figure 8 This is a comparison curve of detection stability under sensor noise interference in the embodiment. Detailed Implementation

[0041] To achieve proactive quantitative assessment and adaptive control of water quality in factory-scale aquaculture, this invention proposes a water quality detection method for factory-scale recirculating aquaculture systems. The overall process is as follows: Figure 1 As shown, this method involves high-frequency acquisition of multi-dimensional physicochemical parameters of the underlying aquaculture environment and dynamic visual scanning of underwater group biological behavior. It implements joint anomaly filtering and standardized spatiotemporal tensor generation of multi-source heterogeneous data to establish a joint data benchmark with absolutely aligned physical timelines and standardized spatiotemporal modalities. A morphological visual feature extraction network is built based on a deep residual network and a visual self-attention encoding mechanism. A visual backbone feature extractor is used to purify shallow basic features, and a multi-scale self-attention mechanism is used to achieve high-dimensional visual feature redistribution and dimensionality reduction representation generation, accurately capturing the group's stress state. Furthermore, a multi-modal data processing mechanism based on temporal convolution and cross-domain cross-attention is implemented. By integrating decoupling strategies, one-dimensional dilated causal convolution is reused to mine the long-order evolution trend of physicochemical properties. A cross-domain cross-attention fusion module is used to eliminate the modal gap between physicochemical lag and visual abrupt changes, outputting a holographic water quality state fusion representation tensor. Finally, water quality index inversion prediction and closed-loop adaptive control based on multi-task joint decoding are executed. The core biochemical indexes of water quality are inverted and quantitatively predicted through continuous numerical regression branch. Combined with the safety rating labels output by classification decision branch, a comprehensive early warning message containing continuous concentration and discrete risk is generated, triggering the closed-loop control of the underlying equipment. This significantly improves the accuracy, timeliness and closed-loop adaptive control capability of detection and early warning in industrial sites.

[0042] The implementation process of the present invention will be further described below with reference to specific embodiments.

[0043] S1. Synchronous acquisition and reconstruction of multimodal water quality parameters and population visual data for factory-scale aquaculture

[0044] First, high-frequency acquisition and serialization of multi-dimensional physicochemical parameters of the underlying aquaculture environment are implemented. Physicochemical feature data are continuously acquired through a multi-parameter water quality sensor array, and underlying timeline binding is performed, ultimately outputting a multi-dimensional physicochemical sensor time-series sequence without algorithmic feature processing. Second, dynamic visual scanning and image sequence extraction of underwater group biological behavior are performed, acquiring raw underwater color dynamic video. Frame-level alignment based on minimum time difference matching is performed according to absolute physical timestamps, extracting and outputting the underwater group behavior dynamic video stream. Finally, multi-source heterogeneous data joint anomaly filtering and standardized spatiotemporal tensor generation are implemented. Noise reduction compensation and spatial resampling operations are performed on the multi-dimensional physicochemical sensor time-series sequence and the underwater group behavior dynamic video stream, respectively, ultimately outputting a standardized physicochemical feature time-series tensor and a high-frequency spatiotemporal visual image frame set. The specific implementation process is as follows: Figure 2 As shown.

[0045] S1-1 Implementation of High-Frequency and Serialization of Multi-Dimensional Physicochemical Parameters of the Aquaculture Environment: This step aims to acquire continuous digital electrical signals reflecting the core physicochemical state of the current water body. First, the vertical distance from the air-water interface at the still water surface to the physical bottom of the recirculating aquaculture pond is acquired and established as the effective working water depth. Second, a multi-parameter water quality sensor array deployed in the surface, middle, and bottom water areas of the recirculating aquaculture pond is continuously collected at a preset fixed high-frequency sampling rate to acquire physicochemical characteristic data. Specifically, the surface water area is defined as the spatial region extending downwards from the air-water interface to 15% of the effective working water depth; the middle water area is defined as the spatial region extending upwards and downwards by 10% from the center at 50% of the effective working water depth; and the bottom water area is defined as the spatial region extending upwards from a pollution-prevention safety level offset by 15 cm from the physical bottom of the pond to 80% of the effective working water depth. This upward offset serves as a preliminary obstacle avoidance measure to prevent residual feed and silt from physically clogging the sensor probes at the bottom of the pond. Furthermore, the multi-parameter water quality sensor array is composed of composite sensing nodes integrating temperature platinum resistance electrodes, glass pH electrodes, fluorescence dissolved oxygen probes, and platinum redox potential probes, positioned according to the aforementioned defined spatial vertical linear topology. This allows for the simultaneous acquisition of raw analog voltage signals of water temperature, pH, dissolved oxygen concentration, and redox potential at various depths. Subsequently, the built-in analog-to-digital converter module discretizes the raw analog voltage signals into standard digital signals conforming to the RS485 industrial communication protocol. The system then receives the physical quantity electrical signals transmitted from each sensor node in real time via an industrial-grade fieldbus and accurately extracts the absolute physical timestamp at the corresponding sampling moment. Next, the absolute physical timestamp and the corresponding physical quantity electrical signal are bound together at the underlying time axis to generate a single-point multi-dimensional water quality state primitive. Finally, all the single-point multi-dimensional water quality state primitives acquired within consecutive time steps are serialized and arranged in chronological order to output a multi-dimensional physicochemical sensor time sequence without algorithmic feature processing.

[0046] S1-2 Performing Dynamic Visual Scanning and Image Sequence Extraction of Underwater Group Biological Behavior: This step aims to acquire visual image data including fish stress responses and the state of suspended matter in the water. Specifically, firstly, ultra-high-definition binocular cameras, fixedly installed at the surface of the aquaculture pond and via underwater side-view observation windows, continuously capture panoramic images of the fish's swimming trajectory and feeding activity at a fixed video acquisition frame rate to obtain raw underwater color dynamic video. Secondly, strictly adhering to the absolute physical timestamps described in step S1-1, frame-level alignment based on minimum time difference matching is performed on the acquired raw underwater color dynamic video. The absolute time difference between the exposure time of each video frame and the absolute physical timestamp is calculated, and the key video frames with the smallest time difference are strictly retained to ensure absolute synchronization of multimodal data on the timeline. Thirdly, the global color space histogram features of each extracted key video frame are extracted and encapsulated into raw primitives for a single-frame underwater visual image. Finally, the original primitives of the single-frame underwater visual image corresponding to the same time step are channel-cascaded and time-stacked according to the time sequence dimension, and finally the underwater group behavior dynamic video stream is extracted and output that is parallel to the time sequence of the multi-dimensional physicochemical sensor on the time axis.

[0047] S1-3 Implementing Joint Anomaly Filtering and Standardized Spatiotemporal Tensor Generation from Multi-Source Heterogeneous Data: This step aims to eliminate electromagnetic interference from the industrial environment and underwater optical distortion, and to complete the standardization and noise reduction of multimodal data. First, the multi-dimensional physicochemical sensor time series output from step S1-1 is obtained, and the Kalman filter algorithm is used to perform anomaly noise removal and linear interpolation compensation for missing values in the high-frequency hardware noise contained in the water temperature, pH, dissolved oxygen concentration, and redox potential of the multi-dimensional physicochemical sensor time series. Second, the dimensions of water temperature, pH, dissolved oxygen concentration, and redox potential in the compensated multi-dimensional physicochemical sensor time series are uniformly compressed to a numerical range of zero to one using the maximum-minimum-maximum normalization operation, eliminating the risk of gradient explosion caused by differences in the range of different sensors, and generating a standardized physicochemical feature time series tensor. Next, the underwater group behavior dynamic video stream output from steps S1-2 is acquired, and contrast-limited adaptive histogram equalization and nonlinear gamma correction are performed on the underwater dark environment in the video stream. This forcibly stretches the grayscale levels in the shadow areas and significantly enhances the texture of fish edges and suspended particles. Finally, spatial bilinear interpolation resampling is performed on the corrected underwater group behavior dynamic video stream to convert it into a fixed-size, channel-aligned four-dimensional image data packet, outputting a set of high-frequency spatiotemporal visual image frames.

[0048] S2. Construct a morphological visual feature extraction network that integrates a self-attention mechanism.

[0049] Network architecture such as Figure 3As shown, firstly, a set of high-frequency spatiotemporal visual image frames is input into the visual backbone feature extractor to perform layer-by-layer spatial downsampling and feature mapping operations, extracting and outputting the basic feature map of shallow water morphology. Subsequently, a multi-scale self-attention mechanism is used for visual feature redistribution and dimensionality reduction representation generation. The basic feature map of shallow water morphology is input into the visual self-attention encoding module to generate a global spatial attention feature matrix and a multi-scale underwater visual perception feature tensor. After passing through global spatial feature aggregation operations and high-dimensional information compression, the dimensionality-reduced visual representation feature vector representing the current biological stress state of the aquatic environment is finally extracted and output.

[0050] S2-1 Implementation of Morphological Fundamental Feature Space Dimensionality Reduction and Extraction Based on Deep Residual Networks: This step aims to extract key biological morphological and physical environment noise features from the complex underwater background. The high-frequency spatiotemporal visual image frame set output from step S1 is obtained and input into a visual backbone feature extractor constructed using a deep residual network topology. First, five stacked 3x3 two-dimensional convolutional layers and max-pooling layers within the visual backbone feature extractor are used to perform layer-by-layer spatial downsampling and feature mapping operations on the high-frequency spatiotemporal visual image frame set. Second, the bottleneck residual module cascaded within the visual backbone feature extractor is invoked to perform deep convolution calculations on the downsampled features to extract a shallow visual tensor containing the fish body edge contour and the distribution texture of suspended matter in the water. Finally, the ReLU activation function is used to perform nonlinear feature purification on the shallow visual tensor, ultimately extracting and outputting the basic feature map of shallow water morphology.

[0051] S2-2 Visual Feature Reassignment and Dimensionality Reduction Generation Using Multi-Scale Self-Attention Mechanism: This step aims to focus on abnormal biological behavior regions in underwater images with extremely high water quality indicative value and to compress and flatten high-dimensional features. The basic feature map of shallow water morphology output from step S2-1 is input into the visual self-attention encoding module. First, the basic feature map of shallow water morphology is linearly transformed by three independent linear mapping fully connected layers to generate a query matrix, a key matrix, and a value matrix. Second, the transpose dot product scaling operation of the query matrix and the key matrix is performed, and the Softmax activation function is input to calculate the global spatial dependency weights. The global spatial dependency weights are then multiplied by the value matrix to output the global spatial attention feature matrix. Third, the global spatial attention feature matrix is upsampled and amplified in spatial dimension using a bilinear interpolation algorithm. The amplified features are then combined with the basic feature map of shallow water morphology output from step S2-1 in the network channel dimension to generate a multi-scale underwater visual perception feature tensor. Finally, a global spatial feature aggregation operation is performed on the multi-scale underwater visual perception feature tensor using a global average pooling layer. The aggregated tensor is then input into a dimensionality-reduced fully connected layer configured with a ReLU activation function for high-dimensional information compression, and a dimensionality-reduced visual representation feature vector representing the current biological stress state of the aquatic environment is output.

[0052] S3. Design of a multimodal data fusion decoupling network based on temporal convolution and cross-domain attention mechanisms.

[0053] like Figure 4 As shown, firstly, a physicochemical evolution trend feature extraction based on one-dimensional causal convolution is implemented. The standardized physicochemical feature time-series tensor is input into a one-dimensional dilated causal convolutional network to perform causal sliding feature extraction and time-dimensional feature compression, outputting a latent feature vector of the physicochemical evolution trend. Subsequently, multimodal cross-domain cross-attention fusion and holographic state tensor generation are performed. The latent feature vector of the physicochemical evolution trend and the dimensionality-reduced visual representation feature vector are simultaneously input into the cross-domain cross-attention fusion module for correlation score calculation and cross-modal feature redistribution. Then, residual jump connections and nonlinear feature smoothing are performed with the original physicochemical features to output a holographic water quality state fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of the aquaculture water body.

[0054] S3-1 Implementation of Physicochemical Evolution Trend Feature Extraction Based on One-Dimensional Causal Convolution: This step aims to extract long-sequence temporal evolution patterns from densely sampled low-level sensor electrical signals. The standardized physicochemical feature temporal tensor output from step S1 is obtained and input into a one-dimensional dilated causal convolutional network containing an input mapping layer, four cascaded residual modules, a batch normalization layer, a ReLU activation function layer, a temporal dimension aggregation module, and a fully connected layer. First, the input mapping layer is invoked to perform initial channel dimension expansion calculations on the standardized physicochemical feature temporal tensor to generate an initial temporal feature matrix. Secondly, the initial temporal feature matrix is input into each of the four cascaded residual modules. Each cascaded residual module performs causal sliding feature extraction unidirectionally along the time axis. Before each one-dimensional convolutional slide, zero-padding is performed on the input features to the left on one side of the time axis to prevent premature leakage of information from future time steps. An exponentially increasing dilation coefficient (base 2) is configured for the one-dimensional convolutional kernel within each cascaded residual module. This multiplies the receptive field of the time dimension through a step-skip sampling method, thereby generating a dilated causal feature tensor. Thirdly, the cascaded residual modules call the interconnected one-dimensional causal convolution operators to perform feature mapping operations on the dilated causal feature tensor. The results are input into a batch normalization layer and a ReLU activation function layer to perform gradient stabilization and nonlinear activation operations. The nonlinear activation results are then combined with the preceding original features input to the residual module using element-wise addition of residual skip connections. This process deeply mines the long-term dependency evolution of various physicochemical indicators over time and outputs a deep temporal feature tensor. Finally, the deep temporal feature tensor of the last time step of the one-dimensional dilated causal convolutional network is extracted and input into the temporal dimension aggregation module configured with a one-dimensional global average pooling layer. The one-dimensional global average pooling layer is called to perform an average operation on the values of each feature channel in the deep temporal feature tensor at all time steps to complete the temporal dimension feature compression. Then, the fully connected layer at the end of the one-dimensional dilated causal convolutional network is called to perform a flattening and dimensionality reduction operation on the compressed features, and output the implicit feature vector of physicochemical evolution trend.

[0055] S3-2 Performing Multimodal Cross-Domain Cross-Attention Fusion and Holographic State Tensor Generation: This step aims to achieve cross-validation and feature compensation of visual stress signals for biochemical hysteresis signals. The physicochemical evolution trend latent feature vector output from step S3-1 and the dimensionality-reduced visual representation feature vector output from step S2 are simultaneously input into the cross-domain cross-attention fusion module, which contains a linear mapping network and matrix multiplication operators. First, three independent fully connected mapping layers are used to perform linear projection transformations on the input cross-modal features. Specifically, the physicochemical evolution trend latent feature vector is transformed into a query matrix for the cross-domain attention mechanism through the first fully connected mapping layer, and the dimensionality-reduced visual representation feature vector is transformed into the key matrix and value matrix for the cross-domain attention mechanism through the second and third fully connected mapping layers, respectively. Secondly, the matrix multiplication operator is invoked to perform a dot product operation of the transpose of the query matrix and the key matrix to calculate the correlation score between different physical modalities. A scaling factor is then used to perform a division scaling operation on the correlation score to prevent gradient vanishing. Subsequently, the Softmax activation function is used to perform probability normalization on the scaled correlation score to generate a multimodal cross-domain interaction weight matrix spanning different physical modalities. The multimodal cross-domain interaction weight matrix is then multiplied with the value matrix to achieve cross-modal feature redistribution from visual features to physicochemical features, generating a redistributed feature tensor. Finally, the redistributed feature tensor and the original physicochemical evolution trend latent feature vector are subjected to an element-wise addition residual jump connection operation to preserve the original physicochemical temporal information. The tensor after residual addition is then input into a feature smoothing processing unit that internally connects a layer normalization operator and a feedforward fully connected network to perform high-dimensional space mapping and nonlinear feature fusion, outputting a holographic water quality state fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of the aquaculture water body.

[0056] S4. Water quality index inversion prediction and closed-loop adaptive control based on multi-task joint decoding

[0057] like Figure 5 As shown, firstly, the holographic water quality status fusion representation tensor is input into the multi-task joint decoder to purify and generate a decoder-shared semantic feature tensor. Then, through the first continuous numerical regression prediction branch, high-dimensional spatial mapping and physical dimension inversion are performed on this tensor to obtain the peak values of ammonia nitrogen and nitrite concentrations within a preset future time window, and a quantitative prediction matrix of core water quality biochemical indicators is constructed. Subsequently, the decoder-shared semantic feature tensor is input into the second classification decision branch to perform normalized probability mapping calculation to generate a water quality safety status rating identifier. Finally, the quantitative prediction matrix of core water quality biochemical indicators and the water quality safety status rating identifier are combined according to a preset protocol to generate and output a comprehensive water quality detection early warning message.

[0058] S4-1 Implementing Inversion Prediction of Core Biochemical Indicators of Water Quality Based on Continuous Numerical Regression: This step aims to provide a basis for predicting and judging future water quality concentrations with accurate physical dimensions. The holographic water quality state fusion representation tensor output from step S3 is obtained and input into a multi-task joint decoder containing a shared feature dimensionality reduction network and two independent task mapping branches. First, the shared feature dimensionality reduction network containing two downsampling layers is used to perform feature purification and dimensionality compression calculations on the holographic water quality state fusion representation tensor to generate a decoder shared semantic feature tensor. Second, the decoder shared semantic feature tensor is input into the first continuous numerical regression prediction branch of the multi-task joint decoder. The alternating hidden fully connected layers, nonlinear activation functions, and random deactivation operators within the first continuous numerical regression prediction branch are controlled to perform multi-level high-dimensional space mapping and overfitting suppression operations on the input features to generate the latent feature vector for the regression task. Next, the two independent linear output nodes configured in parallel at the end of the first continuous numerical regression prediction branch are invoked to perform linear activation and scalar mapping calculations on the implicit feature vector of the regression task to output dimensionless normalized prediction values. Then, the inverse maximum-minimum-maximum restoration operator is invoked to perform physical dimension inversion mapping operations on the normalized prediction values to predict the peak values of ammonia nitrogen and nitrite concentrations in the water body within a preset future time window. Finally, the peak values of ammonia nitrogen and nitrite concentrations in the water body within the preset future time window are concatenated and encapsulated into a two-dimensional digital matrix according to a preset channel order. Finally, a quantitative prediction matrix of core biochemical indicators of water quality with precise physical dimensions is extracted and output.

[0059] S4-2 Implementation of Water Quality Safety Status Rating and Comprehensive Detection and Early Warning Based on Classification Decision Branch: This step aims to transform the multimodal implicit features extracted by deep networks into intuitive classification safety levels, completing a comprehensive quantitative assessment and visual detection output of the water quality status of factory aquaculture. First, the decoder shared semantic feature tensor generated in step S4-1 is input into the second classification decision branch of this multi-task joint decoder. Multiple fully connected mapping layers configured within this second classification decision branch are invoked to perform high-dimensional space dimensionality reduction and feature recombination calculations on the input features to generate classification decision implicit feature vectors. Second, the nonlinear classifier nodes cascaded at the end of the second classification decision branch and the maximized activation function are invoked to perform normalized probability mapping calculations on the classification decision implicit feature vectors, accurately outputting discrete probability distribution values representing the current circulating water body in four independent categories: high quality, sub-healthy, slightly polluted, and severely deteriorated. The category label with the highest probability value is extracted to generate a water quality safety status rating label.

[0060] Finally, the generation timestamps corresponding to the quantitative prediction matrix of the core biochemical indicators of water quality and the water quality safety status rating identifier are extracted as the primary key for time axis matching, eliminating the asynchronous time difference caused by the parallel operation of multi-task decoding branches and completing the absolute time alignment operation of multi-dimensional data. Subsequently, a standard message data dictionary based on key-value pair mapping is constructed, and the aligned quantitative prediction matrix of the core biochemical indicators of water quality and the water quality safety status rating identifier are accurately assigned to the preset continuous concentration value field and discrete risk level field, respectively, and structured sequence encapsulation is performed. Next, according to the preset factory-scale water quality testing standard communication protocol specification, the start frame header, sensor network node physical address, data payload length identifier, and cyclic redundancy check code frame tail are added to the standard message data dictionary in sequence to complete the assembly of the underlying communication message frame structure. Finally, a comprehensive water quality testing early warning message containing continuous concentration prediction values and discrete risk categories is generated and output.

[0061] Experimental Analysis:

[0062] To verify the technical advantages of the proposed water quality detection method for factory-scale recirculating aquaculture systems in terms of multimodal fusion, hysteresis compensation, and real-time early warning, this embodiment constructs a simulation verification platform that includes multi-source heterogeneous data acquisition, biological behavior feature encoding, cross-domain attention fusion, and multi-task decoding inversion. The experimental dataset is constructed based on historical operational data from a high-density recirculating aquaculture base, extracting 3000 sets of standardized physicochemical feature time-series tensors and high-frequency spatiotemporal visual image frames covering different feeding intensities, varying dissolved oxygen levels, and biochemical filtration failure states. The experimental verification process focuses on evaluating the performance of this invention in three key areas: the accuracy of biochemical indicator inversion prediction, the early warning response lead time, and the compensation performance for sensor hysteresis.

[0063] 1. Experiment on the accuracy of inversion prediction of core biochemical indicators of water quality

[0064] Figure 6 The figure shows a comparison of the mean absolute errors of multiple algorithms in predicting ammonia nitrogen and nitrite concentrations. The horizontal axis represents four different model configurations, and the vertical axis represents the mean absolute error of the predicted values. After processing by the architecture of this invention, the prediction errors for ammonia nitrogen and nitrite concentrations were reduced to 0.024 mg / L and 0.015 mg / L, respectively. The combined prediction errors of a single-modal physicochemical time-series network and a conventional multimodal feature concatenation network were both higher than 0.08 mg / L. The above objective comparative data demonstrates that the cross-domain cross-attention fusion module, by calculating the correlation weights between visual stress features and physicochemical evolution trends, achieves a deep redistribution of the feature space, significantly improving the model's accuracy in retrieving core biochemical indicators.

[0065] 2. Ablation experiment on early warning lead time for sensor physicochemical hysteresis Figure 7The distribution of early warning response time deviation before and after introducing visual feature compensation is shown. The horizontal axis represents the ammonia nitrogen accumulation rate during the simulated water quality deterioration process, and the vertical axis represents the lead time of the early warning system's alarm issuance. Experimental results show that the baseline model without incorporating feedback from group biological behavior is limited by the physical hysteresis effect of physicochemical sensors, and its early warning trigger point often lags behind the critical point of water environment deterioration. This invention uses a visual self-attention encoding module to capture the abrupt changes in fish school movement trajectories and stress responses in real time, and utilizes a cross-domain cross-attention mechanism to perform feature complementarity, issuing risk messages an average of 18 minutes before the ammonia nitrogen concentration reaches the safe threshold. The above test data demonstrates that this invention effectively eliminates the modal differences between physicochemical and visual signals, achieving early risk identification.

[0066] 3. Stability analysis of multimodal fusion model in complex interference environment

[0067] Figure 8 The detection success rate distribution curves are shown under the background of sensor noise interference and underwater optical distortion. The horizontal axis represents the input hardware noise intensity coefficient, and the vertical axis represents the confidence score of the comprehensive early warning result. Even under the extreme condition of a noise intensity coefficient reaching 0.75, due to the Kalman filtering and contrast-limited adaptive histogram equalization preprocessing performed in step S1.3 of this invention, and the enhanced robustness achieved through a multi-scale feature splicing mechanism, the system's prediction confidence score remains above 0.88. These distribution characteristics indicate that the standardized spatiotemporal tensor generation logic of this invention can effectively intercept random interference at the hardware level, ensuring the determinism of closed-loop management in industrial aquaculture sites.

[0068] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0069] While the above description illustrates specific embodiments of the present invention, it is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.

Claims

1. A method for water quality testing in a factory-scale recirculating aquaculture system, characterized in that, Includes the following processes: S1 continuously collects multi-dimensional physicochemical sensor time-series sequences through a multi-parameter water quality sensor array in the aquaculture pond; simultaneously, it performs dynamic visual scanning and image sequence extraction of underwater group biological behavior to obtain dynamic video streams of underwater group behavior. The collected data is preprocessed to obtain a set of standardized physicochemical feature time series tensors and spatiotemporal visual image frames; S2 inputs the spatiotemporal visual image frame set into a morphological visual feature extraction network that integrates a self-attention mechanism, extracts and outputs a dimensionality-reduced visual representation feature vector that represents the current biological stress state in the aquatic environment. S3 inputs the standardized physicochemical feature temporal tensor and the dimensionality-reduced visual representation feature vector into a multimodal data fusion decoupling network based on temporal convolution and cross-domain attention mechanism to explore the long-term evolution trend of physicochemical features and output a holographic water quality status fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of aquaculture water bodies. The multimodal data fusion decoupling network based on temporal convolution and cross-domain attention mechanisms is specifically as follows: First, a physicochemical evolution trend feature extraction based on one-dimensional causal convolution is implemented. The standardized physicochemical feature time-series tensor is input into a one-dimensional dilated causal convolutional network to perform causal sliding feature extraction and time dimension feature compression, outputting a latent feature vector of physicochemical evolution trend. Then, multimodal cross-domain cross-attention fusion and holographic state tensor generation are performed. The latent feature vector of physicochemical evolution trend and the dimensionality-reduced visual representation feature vector are simultaneously input into the cross-domain cross-attention fusion module for correlation score calculation and cross-modal feature redistribution. Residual jump connection and nonlinear feature smoothing fusion are then performed with the original physicochemical features to output a holographic water quality state fusion representation tensor that can holographically quantify the current comprehensive ecological water quality level of the aquaculture water body. The specific process of performing multimodal cross-domain cross-attention fusion and holographic state tensor generation is as follows: The latent feature vectors of physicochemical evolution trends and the feature vectors of dimensionality-reduced visual representations are simultaneously input into the cross-domain cross-attention fusion module, which contains a linear mapping network and a matrix multiplication operator. First, three independent fully connected mapping layers are used to perform linear projection transformations on the input cross-modal features: the latent feature vectors of physicochemical evolution trends are transformed into a query matrix for the cross-domain attention mechanism through the first fully connected mapping layer, and the feature vectors of dimensionality-reduced visual representations are transformed into the key matrix and value matrix for the cross-domain attention mechanism through the second and third fully connected mapping layers, respectively. Second, the matrix multiplication operator is called to perform the dot product operation of the transpose matrix multiplication of the query matrix and the key matrix to calculate the correlation score between different physical modes, and the reduction is used... The correlation score is scaled by a factor to prevent gradient vanishing. Then, the scaled correlation score is normalized using the Softmax activation function to generate a multimodal cross-domain interaction weight matrix that spans different physical modalities. The multimodal cross-domain interaction weight matrix and the value matrix are multiplied together to achieve cross-modal feature redistribution from visual features to physicochemical features and generate a redistributed feature tensor. Finally, the redistributed feature tensor and the original physicochemical evolution trend latent feature vector are added element-wise with residual jump connection to preserve the original physicochemical time series information. The tensor after residual addition is then input into the feature smoothing processing unit, which is internally connected to a layer normalization operator and a feedforward fully connected network, to perform high-dimensional space mapping and nonlinear feature fusion, and output a holographic water quality state fusion representation tensor. S4, based on the holographic water quality state fusion characterization tensor, performs multi-task joint decoding for water quality index inversion and prediction. It then uses continuous numerical regression branching to invert the core biochemical indicators of water quality into a quantitative prediction matrix. The safety rating label is output by combining the classification decision branches.

2. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 1, characterized in that: The multi-parameter water quality sensor array continuously collects electrical signals of water temperature, pH, dissolved oxygen concentration, and redox potential at various depths, and extracts the absolute physical timestamp at the corresponding sampling moment. Then, the absolute physical timestamp and the corresponding electrical signal are bound to the underlying time axis to generate a single-point multi-dimensional water quality state primitive. Finally, all the single-point multi-dimensional water quality state primitives collected in the continuous time steps are serialized and arranged in chronological order to output a multi-dimensional physicochemical sensor time sequence without algorithm feature processing.

3. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 1, characterized in that: The specific process for acquiring the dynamic video stream of underwater group behavior is as follows: First, a binocular camera is controlled to continuously capture panoramic images of the fish school's movement trajectory and feeding activity at a fixed video acquisition frame rate to obtain raw underwater color dynamic video. Second, based on the absolute physical timestamp, a frame-level alignment operation based on minimum time difference matching is performed on the acquired raw underwater color dynamic video. The absolute time difference between the exposure time of each video frame and the absolute physical timestamp is calculated, and the key video frame with the smallest time difference is strictly retained to ensure the absolute synchronization of multimodal data on the time axis. Third, the global color space histogram features of each extracted key video frame are extracted and encapsulated into raw primitives of a single-frame underwater visual image. Finally, the raw primitives of the single-frame underwater visual image corresponding to the same time step are channel-cascaded and time-series stacked according to the time sequence dimension, and finally, the underwater group behavior dynamic video stream parallel to the time sequence of the multi-dimensional physicochemical sensor is extracted and output.

4. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 1, characterized in that: The morphological visual feature extraction network that integrates a self-attention mechanism is specifically as follows: First, a set of high-frequency spatiotemporal visual image frames is input into the visual backbone feature extractor to perform layer-by-layer spatial downsampling and feature mapping operations, extracting and outputting the basic feature map of shallow water morphology. Then, a multi-scale self-attention mechanism is used to redistribute visual features and generate dimensionality reduction representations. The basic feature map of shallow water morphology is input into the visual self-attention encoding module to generate a global spatial attention feature matrix and a multi-scale underwater visual perception feature tensor. After passing through global spatial feature aggregation operations and high-dimensional information compression, the dimensionality reduction visual representation feature vector representing the current biological stress state of the aquatic environment is finally extracted and output.

5. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 1, characterized in that: The implementation of the physicochemical evolution trend feature extraction based on one-dimensional causal convolution is as follows: the standardized physicochemical feature time series tensor is input into a one-dimensional dilated causal convolutional network containing an input mapping layer, four cascaded residual modules, a batch normalization layer, a ReLU activation function layer, a time dimension aggregation module and a fully connected layer; First, the input mapping layer is invoked to perform initial channel dimension expansion calculation on the standardized physicochemical feature time series tensor to generate an initial time series feature matrix. Second, the initial time series feature matrix is input into each of the four cascaded residual modules. The cascaded residual modules perform causal sliding feature extraction operations unidirectionally along the time axis. Before each one-dimensional convolution sliding, the input features are padded with zeros to the left on one side of the time axis to prevent premature leakage of information in future time steps. An expansion coefficient with a base of 2 and exponentially increasing is configured for the one-dimensional convolution kernel inside each cascaded residual module. The receptive field of the time dimension is multiplied by a step-skip sampling method to generate an expanded causal feature tensor. Next, the one-dimensional causal convolution operator cascaded within the cascaded residual module is invoked to perform feature mapping operations on the dilated causal feature tensor. The operation result is input to the batch normalization layer and ReLU activation function layer to perform gradient stabilization and nonlinear activation operations. The nonlinear activation result is then combined with the preceding original features input to the residual module to perform element-wise residual jump connection calculations, thereby mining the long-term dependency evolution laws of various physicochemical indicators over time and outputting a deep temporal feature tensor. Finally, the deep temporal feature tensor of the last time step of the one-dimensional dilated causal convolutional network is extracted and input into the time dimension aggregation module configured with a one-dimensional global average pooling layer. The one-dimensional global average pooling layer is invoked to perform an average operation on the values of each feature channel within the deep temporal feature tensor at all time steps to complete the time dimension feature compression. Subsequently, the fully connected layer at the end of the one-dimensional dilated causal convolutional network is invoked to perform flattening and dimensionality reduction operations on the compressed features, outputting the implicit feature vector of physicochemical evolution trend.

6. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 1, characterized in that: In S4, the holographic water quality state fusion representation tensor is first input into the multi-task joint decoder to purify and generate the decoder shared semantic feature tensor. Then, the high-dimensional space mapping and physical dimension inversion are performed on it through the first continuous numerical regression prediction branch to obtain the peak values of the changes in ammonia nitrogen concentration and nitrite concentration in the water body within the future preset time window, and a quantitative prediction matrix of the core biochemical indicators of water quality is constructed. The decoder's shared semantic feature tensor is then input into the second classification decision branch to perform normalized probability mapping calculations to generate a water quality safety status rating label.

7. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 6, characterized in that: The specific generation process of the quantitative prediction matrix for the core biochemical indicators of water quality is as follows: The holographic water quality state fusion representation tensor is input into a multi-task joint decoder containing a shared feature dimensionality reduction network and two independent task mapping branches. First, the shared feature dimensionality reduction network containing two downsampling layers is used to perform feature purification and dimensionality compression calculations on the holographic water quality state fusion representation tensor to generate a decoder shared semantic feature tensor. Second, the decoder shared semantic feature tensor is input into the first continuous numerical regression prediction branch of the multi-task joint decoder. The hidden fully connected layers, nonlinear activation functions, and random deactivation operators alternately connected within the first continuous numerical regression prediction branch are controlled to perform multi-level high-dimensional space mapping and overfit suppression operations on the input features to generate the latent feature vector of the regression task. Next, the two independent linear output nodes configured in parallel at the end of the first continuous numerical regression prediction branch are invoked to perform linear activation and scalar mapping calculations on the implicit feature vector of the regression task to output dimensionless normalized prediction values. Then, the inverse maximum-minimum-extreme value restoration operator is invoked to perform physical dimension inversion mapping operations on the normalized prediction values to predict the peak values of ammonia nitrogen and nitrite concentrations in the water body within a preset time window. Finally, the peak values of the inverted changes are spliced and encapsulated into a two-dimensional digital matrix according to the preset channel order.

8. The water quality testing method for a factory-scale recirculating aquaculture system as described in claim 7, characterized in that: The security rating identifier output by combining the classification decision branch is specifically as follows: First, the generated decoder shared semantic feature tensor is input into the second classification decision branch of the multi-task joint decoder. Multiple fully connected mapping layers configured within the second classification decision branch are invoked to perform high-dimensional space dimensionality reduction and feature recombination calculations on the input features to generate classification decision latent feature vectors. Second, the nonlinear classifier nodes connected in series at the end of the second classification decision branch and the activation function maximized are invoked to perform normalized probability mapping calculations on the classification decision latent feature vectors. The output is a discrete probability distribution value representing the current circulating water body in four independent categories: high quality, sub-healthy, slightly polluted, and severely deteriorated. The category label with the highest probability value is extracted to generate a water quality safety status rating label.