A method for predicting growth conditions of microcystis based on conventional water quality physicochemical indexes
By establishing a regression model using the random forest algorithm and using conventional water quality physicochemical indicators, the relative abundance and cell density of Microcystis OTUs are predicted, solving the problem of the difficulty in predicting the growth status of Microcystis and realizing real-time monitoring and early warning of the growth trend of Microcystis. This method is suitable for online water quality monitoring systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- RES CENT FOR ECO ENVIRONMENTAL SCI THE CHINESE ACAD OF SCI
- Filing Date
- 2022-09-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies cannot effectively predict the growth status and trend of Microcystis using conventional water quality physicochemical indicators, making it impossible to achieve real-time monitoring and prevention of Microcystis outbreaks.
A regression model was established using the random forest algorithm, and conventional water quality physicochemical indicators such as temperature, pH, total nitrogen, and total phosphorus were used as explanatory variables to predict the relative abundance and cell density of Microcystis OTUs. A predictive model was established through machine learning.
It enables accurate prediction of Microcystis growth, providing early warning and prevention of Microcystis outbreaks. It is suitable for online monitoring systems, saving time and manpower, and is applicable to water quality control.
Smart Images

Figure CN115472230B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of wastewater treatment, and specifically to a method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators. Background Technology
[0002] Cyanobacteria, also known as blue-green algae or blue-green bacteria, are a major cause of eutrophication and cyanobacterial blooms in surface waters, making them a key focus of water pollution control both domestically and internationally. Microcystis is a common genus of cyanobacteria found in freshwater, including *Microcystis aeruginosa*, which causes harmful algal blooms. Its toxins (microcystin) can lead to liver and gallbladder damage. Microcystis is the most common blooming cyanobacterial species, widely distributed in eutrophic lakes such as Dianchi, Taihu, and Chaohu. Numerous studies have revealed that nitrogen and phosphorus in water are the dominant factors for Microcystis growth. However, the internal growth mechanisms and patterns of Microcystis, as well as their complex interactions with external environmental factors, influence their growth and reproduction. Accurately predicting and analyzing the growth status and trends of Microcystis under these complex interactions is crucial for effective control and remains a major challenge for researchers.
[0003] Currently, the growth status of Microcystis can be reflected by detecting the relative abundance of Microcystis OTUs and the cell density of Microcystis. However, these two indicators are not routine water quality monitoring indicators and there are no standard limits. Their detection methods and data processing are relatively complex. In particular, they all require gene sequencing to obtain sequencing data and then calculate the indicator results through corresponding methods. This cannot meet the requirements for real-time monitoring of Microcystis growth status and control of growth trends, nor can it prevent Microcystis outbreaks.
[0004] Common water quality physicochemical indicators generally include: temperature (Temp), pH value, total nitrogen (TN), and ammonia nitrogen (NH4). + -N), nitrate nitrogen (NO3) - The detection of physicochemical indicators in water bodies, such as total phosphorus (TP), soluble reactive phosphorus (SRP), dissolved organic carbon (DOC), chlorophyll a (Chl-a), chemical oxygen demand (COD), biochemical oxygen demand (BOD), and dissolved oxygen (DO), is a necessary step in water quality monitoring. Various water quality standards define the limits for these indicators, and the detection methods are well-established. Currently, there is no method to predict the growth status of Microcystis aeruginosa using water physicochemical indicators. Summary of the Invention
[0005] Therefore, the present invention aims to provide a method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators.
[0006] To achieve the above objectives, the present invention provides the following technical solution:
[0007] This invention provides a method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators, comprising the following steps:
[0008] (1) Take several river water samples, obtain the relative abundance of Microcystis OTUs and / or Microcystis cell density, and obtain their conventional water quality physicochemical index data, including temperature, pH value, Chl-a, TN, and NH4. + -N, NO3 - -N, TP, SRP, DOC;
[0009] (2) Using temperature, DOC / TN, NO3 - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, and the relative abundance of Microcystis OTUs as predictor variables, a random forest algorithm was used to establish a regression model as the predictor of the relative abundance of Microcystis OTUs, and / or
[0010] Temperature, DOC / TN, NO3 - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + -N / TP, Chl-a, pH, and DOC were used as explanatory variables, and Microcystis cell density was used as the predictor variable. A regression model was established using the random forest algorithm as the predictor model for Microcystis cell density.
[0011] (3) The relative abundance of Microcystis OTUs in the river water sample to be tested is predicted using the aforementioned prediction model.
[0012] The microcystis cell density prediction model was used to predict the microcystis cell density of the river water sample to be tested.
[0013] Furthermore, in step (2), when using the random forest algorithm to establish the regression model, the predictor variables are logarithmically transformed.
[0014] Furthermore, in step (2), the regression model is established using the randomForest package in R software.
[0015] Furthermore, in step (3), the method for predicting the relative abundance of Microcystis OTUs in the river water sample to be tested includes:
[0016] Obtain the conventional water quality physicochemical index data of the river water sample to be tested, including temperature, DOC / TN, and NO3. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, the explanatory variable data are substituted into the prediction model for the relative abundance of Microcystis OTUs, and the resulting predicted variable values are the predicted results of the abundance of Microcystis OTUs in the river water sample to be tested.
[0017] Furthermore, in step (3), the method for predicting the Microcystis cell density of the river water sample to be tested includes:
[0018] Obtain the conventional water quality physicochemical index data of the river water sample to be tested, including temperature, DOC / TN, and NO3. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, the explanatory variable data are substituted into the prediction model for Microcystis cell density, and the resulting predicted variable values are the predicted results for the Microcystis cell density of the river water sample to be tested.
[0019] Furthermore, in step (1), the relative abundance of Microcystis OTUs in the river water sample is obtained by performing 16S rRNA sequencing.
[0020] Furthermore, methods for obtaining the relative abundance of Microcystis OTUs in river water samples include:
[0021] DNA was extracted from the river water sample, and the V4 region of the 16S rRNA gene was amplified.
[0022] The PCR products were purified, and the DNA concentration of the purified PCR products was measured.
[0023] The purified PCR products were sequenced. The original sequences were purified to remove impurities. The high-quality paired-end sequences were linked into tags based on repeat regions. Chimeras were filtered out, and the sequences were clustered into OTUs. The relative abundance of Microcystis OTUs was calculated.
[0024] Furthermore, DNA was extracted using a DNA separation kit.
[0025] Furthermore, the V4 region of the 16S rRNA gene was amplified using barcode primers 515F and 806R.
[0026] Furthermore, the PCR products were purified using a DNA gel extraction kit.
[0027] Furthermore, the DNA concentration of the purified PCR product was measured using a fluorometer.
[0028] Furthermore, the purified PCR products were sequenced using the Illumina Hiseq 2500 sequencing platform.
[0029] Furthermore, the original sequences were cleaned using Flash and Trimmomatic software to remove sequences with primer mismatches of 2 or more and mismatch rates of overlapping regions of the spliced sequences of 0.2 or more.
[0030] Furthermore, chimeras were filtered using Uparse software, and the sequences were clustered. Based on a sequence similarity of 97%, all sequences were homologously aligned and clustered into OTUs.
[0031] Furthermore, in step (1), the method for obtaining the Microcystis cell density of the river water sample includes:
[0032] The total cell count of the river water sample was determined by flow cytometry, and the cell density was obtained by dividing the total cell count by the water sample volume.
[0033] The microcystis cell density is obtained by multiplying the cell density of the water sample by the relative abundance of Microcystis OTUs.
[0034] Furthermore, the method for predicting the growth status of Microcystis based on conventional water quality physicochemical indicators further includes: randomly dividing the dataset of the river water sample into a training set and a test set, using the training set to establish the prediction model, and using the test set to verify the prediction ability of the prediction model. Preferably, 70% of the dataset is used as the training set and 30% of the dataset is used as the test set.
[0035] Furthermore, using the fitting coefficient R... 2 Measuring the predictive power of the prediction model:
[0036] When R 2 When the value is ≤0.3, the predicted value fits the observed value poorly, and the prediction model has poor predictive ability.
[0037] When 0.3 < R 2 When the value is ≤0.4, the predicted value fits the observed value poorly, and the prediction ability of the prediction model is weak.
[0038] When 0.4 < R 2 When the value is ≤0.6, the predicted value fits the observed value moderately, and the prediction ability of the prediction model is moderate.
[0039] When 0.6 < R 2 When the value is ≤1.0, the predicted value fits the observed value well, and the prediction model has strong predictive ability.
[0040] The technical solution of this invention has the following advantages:
[0041] This invention provides a method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators. The method establishes a prediction model by pre-collecting river water samples and performing machine learning, thereby predicting the relative abundance and cell density of Microcystis aeruginosa OTUs in the river water samples. Specifically, it uses conventional water quality physicochemical indicators from water quality monitoring, as well as nutrient ratio data (temperature, DOC / TN, NO3) calculated from these indicators. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using *Microcystis aeruginosa* (N / TP, Chl-a, pH, DOC) as explanatory variables and *Microcystis aeruginosa* OTU abundance or *Microcystis aeruginosa* cell density as predictive variables, a prediction model was established using the Random Forest (RF) algorithm. Verification showed that the prediction model established using the method provided in this invention has a strong fitting effect, requiring only the water quality physicochemical index test results of the river sample to accurately predict *Microcystis aeruginosa* OTU abundance and cell density. On the one hand, it solves the problems of complex index detection and inability to monitor in real time, saving significant time and manpower, and avoiding complex detection procedures. On the other hand, it can predict the growth status and trend of river *Microcystis aeruginosa* based on routine water quality physicochemical indexes obtained from daily water quality monitoring, which is beneficial for early prediction and warning of *Microcystis aeruginosa* outbreaks, especially suitable for the early warning function of online water quality monitoring systems, and more conducive to the prevention and control of *Microcystis aeruginosa* outbreaks from the perspective of water quality management. This invention is not only of great significance for the control of eutrophication of water bodies, but also of great significance for the protection of aquatic organisms and the control of human health risks caused by *Microcystis aeruginosa* toxins in drinking water sources. Attached Figure Description
[0042] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0043] Figure 1 This invention presents the model training and testing results of predicting the relative abundance of Microcystis OTUs and cell density using conventional water quality physicochemical index data in this embodiment of the invention. (a) shows the effect of predicting the relative abundance of Microcystis OTUs, and (b) shows the effect of predicting the cell density of Microcystis.
[0044] Figure 2This is a partial dependency diagram between predictor variables and explanatory variables in the prediction model for the relative abundance of Microcystis OTUs.
[0045] Figure 3 This is a partial dependency diagram between predictor variables and explanatory variables in a predictive model for Microcystis cell density. Detailed Implementation
[0046] The following embodiments are provided to better understand the present invention and are not limited to the preferred embodiments described. They do not constitute a limitation on the content and scope of protection of the present invention. Any product that is the same as or similar to the present invention, derived by any person under the guidance of the present invention or by combining the features of the present invention with other prior art, falls within the protection scope of the present invention.
[0047] Where specific experimental steps or conditions are not specified in the embodiments, they can be performed according to the conventional experimental steps or conditions described in the literature in this field. All raw materials or instruments used are commercially available conventional products, including but not limited to those used in the embodiments of this application.
[0048] Example 1: Establishment of the Prediction Model
[0049] This embodiment provides a method for establishing a predictive model for the relative abundance of OTUs and cell density in Microcystis aeruginosa. The specific steps are as follows:
[0050] (1) Water sampling
[0051] The Chaobai River basin (39°–40°N, 116°–117°E) within the Haihe River system of China was selected as the research object. Sampling was conducted in the Chaobai River basin in December 2016 (winter), March 2017 (spring), June 2017 (summer), and September 2017 (autumn). A total of 34 sampling points were collected, all located in the main channel of the Chaobai River and at the confluence of important tributaries, comprising 13 mountainous areas, 7 urban areas, and 14 agricultural areas. A total of 126 water samples were collected, of which 99 had no missing values.
[0052] (2) Testing of water quality physicochemical indicators
[0053] The following routine water quality physicochemical indicators were tested on the water sample: temperature, pH value, chlorophyll a (Chl-a), total nitrogen (TN), and ammonia nitrogen (NH4). + -N), nitrate nitrogen (NO3) - The detection methods, reference standards, and main instruments and equipment for the following indicators are shown in Table 1: total phosphorus (TP), soluble active phosphorus (SRP), and dissolved organic carbon (DOC).
[0054] The 11 water quality physicochemical indicators (PCIs) used as explanatory variables are: temperature, DOC / TN, NO3, etc. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + -N / TP, Chl-a, pH, and DOC are among the five nutrient ratio indicators calculated as follows: DOC / TN is the ratio of DOC to TN, NO3- - -N / TP is NO3 - -N to TP ratio, TN / TP is the ratio of TN to TP, SRP / TP is the ratio of SRP to TP, NH4 + -N / TP is NH4 + -N to TP ratio.
[0055] Table 1. Methods, reference standards, and main instruments and equipment for testing common water quality physicochemical indicators
[0056]
[0057] Note: The results of each indicator are the average of three test results.
[0058] (3) Detection of relative abundance of Microcystis OTUs
[0059] use DNA extraction was performed on filtered water samples using a DNA isolation kit (MoBio Laboratories Inc., Carlsbad, CA, USA). The V4 region of the 16S rRNA gene was amplified using barcoding primers 515F (5′-gtgccagccggtaa-3′) and 806R (5′-ggactachvggwtctaat-3′). For PCR, the manufacturer's instructions were followed. High-fidelity PCR master mix (New England Biological Laboratory, Ipswich, Massachusetts, USA). Cycling conditions were 95°C for 3 minutes, followed by 30 cycles of 95°C for 45 seconds, 56°C for 45 seconds, and 72°C for 45 seconds, followed by an extension step at 72°C for 10 minutes. PCR products were then purified using the AxyPrep DNA Gel Extraction Kit (Axygen, USA) according to the manufacturer's protocol. The DNA concentration of the purified PCR products was measured using a TBS-380 fluorometer (Turner Biosystems, CA, USA). Sequencing was performed on an Illumina HiSeq 2500 sequencing platform at the Beijing Genomics Institute (BGI, Shenzhen, China). Repeated DNA aliquots were mixed at the same concentration in a single DNA pool for sequencing.
[0060] The original sequences were purified using Flash and Trimmomatic software to remove primer mismatches of 2 or more and sequences with overlap mismatch rates of 0.2 or higher. The remaining high-quality paired-end sequences were linked into tags based on repeat regions. Chimeras were filtered using Uparse software (Uparse v7, http: / / drive5.com / uparse / ), and the sequences were clustered. Based on a 97% sequence similarity, all sequences were homology-aligned and clustered into OTUs to determine the relative abundance of Microcystis OTUs.
[0061] (4) Microcystis cell density detection
[0062] Use anhydrous dimethyl sulfoxide Green I was diluted 100-fold to prepare the dye. 10 μL of dye was added to every 100 μL of water sample, and staining was performed for 15 minutes at room temperature in the dark. All samples were diluted 10-fold with sterile TE buffer (10 mM Tris-HCl, pH 8.0; 1 mM EDTA, pH 8.0). An equal volume of fluorescent microspheres (BD Biosciences) was added to each sample for quantitative fluorescence signal collection. Biomass was determined using flow cytometry to ascertain the total number of cells in the water sample. This total number of cells was then divided by the water sample volume to obtain the cell density (the average of four parallel experiments was used for each sample). The Microcystis cell density was then calculated by multiplying the water sample cell density by the relative abundance of Microcystis OTUs.
[0063] (5) A prediction model is established using the random forest algorithm.
[0064] After removing some missing data from the above test data, the final dataset contains 99 sets of observations. These are based on the aforementioned 11 water quality physicochemical indicators (temperature, DOC / TN, NO3). - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + The variables were N / TP, Chl-a, pH, and DOC, with the relative abundance of Microcystis OTUs and Microcystis cell density as predictor variables. A random forest algorithm was used to build a regression model. Predictor variables underwent a base-10 logarithmic transformation; when the predictor variable was the relative abundance of Microcystis OTUs, its value was replaced by 1 / 10 of the minimum value. Before model building, the optimal value of the key parameter `mtry` was determined through 10 cross-validations, and the parameter `ntree` was determined after the calculation results stabilized. For each model, 70% of the dataset (69 samples) was randomly divided into the training set, and the remaining 30% (30 samples) was used as the test set. Furthermore, the mean-squared error (MSE) and the coefficient of fit R-squared based on both the training and test sets were used.2 Two performance parameters are used to evaluate the model. Since different indicators have different numerical ranges, MSE, as an absolute value, is only suitable for comparing prediction results between the same indicator. Furthermore, generally, the smaller the MSE, the higher the R². 2 The larger R is, the more 2 As the primary evaluation parameter. When R 2 When R0.3 is ≤0.3, the predicted values fit the observed values poorly; when R0.3 < R0.3, the predicted values fit the observed values poorly. 2 When R0.4, the predicted values fit the observed values poorly; when R0.4 < R0.4, the predicted values fit the observed values poorly. 2 When R0.6 is ≤0.6, the predicted values fit the observed values moderately; when R0.6 < R0.6, the predicted values fit the observed values moderately. 2 When the value is ≤1.0, the predicted values fit the observed values well. The Random Forest algorithm is implemented using the "randomForest" package in R software (v.3.5.2), and the dataset is randomly partitioned using the sample function or createDataPartition function in R software (v.3.5.2).
[0065] Example 2: Validation of the Prediction Model
[0066] (1) Model training effect based on training set
[0067] The results of predicting the relative abundance of Microcystis OTUs based on the training set using a combination of 11 water quality physicochemical indicators (explanatory variables) are shown in Table 2. The fitting results between the predicted and observed values are shown in [the table]. Figure 1 (a) Gray area. Table 3 shows the results of predicting Microcystis cell density using a combination of 11 water quality physicochemical indicators (explanatory variables) based on the training set. The fitting results between the predicted and observed values are shown in [the table]. Figure 1 (b) Gray area.
[0068] Table 2. Training results of the Microcystis OTU relative abundance prediction model based on the training set.
[0069]
[0070]
[0071]
[0072] Table 3. Training performance of the Microcystis cell density prediction model based on the training set (unit: cells / mL)
[0073]
[0074]
[0075]
[0076] As shown in Table 2 and Figure 1 As shown in (a), the prediction model shows good agreement between the predicted and detected values of the relative abundance of Microcystis OTUs in the training set, with an MSE of 0.66 and R0.05. 2 The value is 0.68, indicating a strong fit.
[0077] As shown in Table 3 and Figure 1 As shown in (b), the prediction model shows good agreement between the predicted and detected values of Microcystis cell density in the training set, with an MSE of 0.67 and an R0. 2 The value is 0.72, indicating a strong fit.
[0078] (2) Model prediction performance based on the test set
[0079] The prediction model obtained from the training set was used on the test set to verify the model's generalization ability, i.e., its predictive ability for new datasets, using mean squared error (MSE) and coefficient of determination (R²). 2 An assessment will be conducted.
[0080] The results of predictions for the test set using the Microcystis OTU relative abundance prediction model and the Microcystis cell density prediction model are shown in Tables 4 and 5, respectively. The fitting results of the corresponding predicted values and observed values are shown in Tables 4 and 5, respectively. Figure 1 (a) and Figure 1 (b) is the black part.
[0081] In the prediction models for the relative abundance of Microcystis OTUs and Microcystis cell density, the partial dependency plots between the predictor variables and each explanatory variable are shown below. Figure 2 and Figure 3 .
[0082] Table 4. Prediction performance of the Microcystis OTU relative abundance prediction model based on the test set.
[0083]
[0084]
[0085] Table 5. Prediction performance of the Microcystis cell density prediction model based on the test set (unit: cells / mL)
[0086]
[0087]
[0088] As shown in Table 4 and Figure 1 As shown in (a), the prediction model still achieves a strong fit in predicting the relative abundance of Microcystis OTUs in the new dataset (test set) (R²⁺). 2 =0.66), MSE is 0.98.
[0089] As shown in Table 5 and Figure 1 As shown in (b), the prediction model still achieves a strong fit in predicting the cell density of Microcystis aeruginosa in the new dataset (test set) (R²). 2 =0.70), MSE is 0.95.
[0090] like Figure 2 and Figure 3 As shown, temperature is the most important factor for the growth of *Microcystis*. The relative abundance of *Microcystis* OTUs and cell density increase with increasing temperature, with around 25℃ being the optimal growth temperature. DOC / TN, Chl-a, DOC, and pH (8–9.5) are all positively correlated with the relative abundance of *Microcystis* OTUs and cell density. TN / TP and NO3- are also positively correlated. - -N / TP, NH4 + -N / TP, TN, NO3 - -N, SRP / TP were negatively correlated with the abundance of OTUs and cell density in Microcystis, indicating that Microcystis growth and absorption of inorganic nitrogen (NO3) were related. - -N, NH4 + DOC-N and phosphorus (mainly SRP) are converted into organic nitrogen, which synthesizes organic substances including Chl-a, causing an increase in DOC / TN and Chl-a. DOC provides organic carbon source for bacteria in the microcystic symbiotic system, thereby promoting the growth of Microcystis aeruginosa. Figure 2 and Figure 3 The variables in the partial dependency graph are ordered by importance, therefore NO3 is determined. - -N / TP is the main influencing factor on Microcystis biomass and is the primary water quality indicator for Microcystis control.
[0091] Therefore, the prediction model established by the method provided by the present invention can accurately predict the relative abundance of OTUs and cell density of Microcystis in rivers, and can be used to monitor the growth status of Microcystis.
[0092] Example 3: Prediction of Microcystis growth status
[0093] The prediction model established in Example 1 (based on the training set) was used to predict the relative abundance and cell density of Microcystis OTUs in the river water samples to be tested. The method is as follows:
[0094] Obtain routine water quality physicochemical index data from the river sample to be tested, using routine water quality physicochemical indexes (Temp, DOC / TN, NO3). - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 +Using N / TP, Chl-a, pH, and DOC as explanatory variables, and substituting conventional water quality physicochemical index data into the prediction model for the relative abundance of Microcystis OTUs, the resulting predicted variable values are the predicted results of the relative abundance of Microcystis OTUs in the river water sample to be tested.
[0095] Obtain routine water quality physicochemical index data from the river sample to be tested, using routine water quality physicochemical indexes (Temp, DOC / TN, NO3). - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using N / TP, Chl-a, pH, and DOC as explanatory variables, and substituting conventional water quality physicochemical index data into the prediction model for Microcystis cell density, the resulting predicted variable values are the predicted results for the Microcystis cell density of the river water sample to be tested.
[0096] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators, characterized in that, Includes the following steps: (1) Take several river water samples, obtain the relative abundance of Microcystis OTUs and / or Microcystis cell density, and obtain their conventional water quality physicochemical index data, including temperature, pH value, Chl-a, TN, and NH4. + -N, NO3 - -N, TP, SRP, DOC, The method for obtaining the cell density of Microcystis aeruginosa in river water samples includes: using flow cytometry to detect the total number of cells in the river water sample, dividing the total number of cells by the volume of the water sample to obtain the cell density of the water sample; multiplying the cell density of the water sample by the relative abundance of Microcystis aeruginosa OTUs to obtain the cell density of Microcystis aeruginosa. (2) Based on temperature, DOC / TN, NO3 - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, and the relative abundance of Microcystis OTUs as predictor variables, a random forest algorithm was used to establish a regression model as the predictor of the relative abundance of Microcystis OTUs, and / or Temperature, DOC / TN, NO3 - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables and Microcystis cell density as a predictor variable, a regression model was established using the random forest algorithm to predict Microcystis cell density. In particular, when using the random forest algorithm to build a regression model, the predictor variables are log-transformed. (3) Using the aforementioned prediction model for the relative abundance of Microcystis OTUs, the relative abundance of Microcystis OTUs in the river water samples to be tested is predicted, and / or The microcystis cell density prediction model was used to predict the microcystis cell density of the river water sample to be tested.
2. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 1, characterized in that, In step (2), the regression model is established using the randomForest package in R software.
3. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 1, characterized in that, In step (3), the method for predicting the relative abundance of Microcystis OTUs in the river water sample to be tested includes: Obtain the conventional water quality physicochemical index data of the river water sample to be tested, including temperature, DOC / TN, and NO3. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, the explanatory variable data are substituted into the prediction model for the relative abundance of Microcystis OTUs, and the resulting predicted variable values are the predicted results of the abundance of Microcystis OTUs in the river water sample to be tested.
4. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 1, characterized in that, In step (3), the method for predicting the Microcystis cell density in the river water sample to be tested includes: Obtain the conventional water quality physicochemical index data of the river water sample to be tested, including temperature, DOC / TN, and NO3. - -N / TP, TN, NO3 - -N, TN / TP, SRP / TP, NH4 + Using -N / TP, Chl-a, pH, and DOC as explanatory variables, the explanatory variable data are substituted into the prediction model for Microcystis cell density, and the resulting predicted variable values are the predicted results for the Microcystis cell density of the river water sample to be tested.
5. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 1, characterized in that, In step (1), the relative abundance of Microcystis OTUs in river water samples is obtained by performing 16S rRNA sequencing.
6. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 5, characterized in that, Methods for obtaining the relative abundance of Microcystis OTUs in river water samples include: DNA was extracted from the river water sample, and the V4 region of the 16S rRNA gene was amplified. The PCR products were purified, and the DNA concentration of the purified PCR products was measured. The purified PCR products were sequenced. The original sequences were purified to remove impurities. The high-quality paired-end sequences were linked into tags based on repeat regions. Chimeras were filtered out, and the sequences were clustered into OTUs. The relative abundance of Microcystis OTUs was calculated.
7. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 6, characterized in that, DNA was extracted using a DNA separation kit; The V4 region of the 16S rRNA gene was amplified using barcode primers 515F and 806R. Purify PCR products using a DNA gel extraction kit; The concentration of DNA in the purified PCR product was measured using a fluorometer; The purified PCR products were sequenced using the Illumina Hiseq 2500 sequencing platform. The original sequences were cleaned using Flash and Trimmomatic software to remove sequences with primer mismatches of 2 or more and mismatch rates of overlapping regions of the spliced sequences of 0.2 or more. Chimeras were filtered using Uparse software, and the sequences were clustered. Based on a 97% sequence similarity, all sequences were aligned to homology and clustered into OTUs.
8. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 1, characterized in that, Also includes: The dataset of river water samples is randomly divided into a training set and a test set. The prediction model is built using the training set, and the prediction ability of the prediction model is verified using the test set.
9. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 8, characterized in that, 70% of the dataset was used as the training set, and 30% of the dataset was used as the test set.
10. The method for predicting the growth status of Microcystis aeruginosa based on conventional water quality physicochemical indicators according to claim 8, characterized in that, Using the fitting coefficient R 2 Measuring the predictive power of the prediction model: When R 2 When the value is ≤0.3, the predicted value fits the observed value poorly, and the prediction model has poor predictive ability. When 0.3 < R 2 When the value is ≤0.4, the predicted value fits the observed value poorly, and the prediction ability of the prediction model is weak. When 0.4 < R 2 When the value is ≤0.6, the predicted value fits the observed value moderately, and the prediction ability of the prediction model is moderate. When 0.6 < R 2 When the value is ≤1.0, the predicted value fits the observed value well, and the prediction model has strong predictive ability.