A method and system for predicting urban water supply and demand based on machine learning
By constructing an urban water demand prediction system using machine learning methods, the problems of relying on human experience and outdated models in existing technologies have been solved. This has enabled accurate and automated prediction of water demand, thereby improving the operational efficiency and economic benefits of the water supply system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU WATER SUPPLY CO
- Filing Date
- 2026-01-23
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for predicting urban water demand rely on human experience, with outdated and rigid models, low prediction accuracy, and difficulty in achieving refined scheduling. This leads to a high degree of blindness in water supply planning, increasing operating costs and wasting resources.
Using a machine learning-based approach, a candidate model set of various machine learning algorithms is constructed through multidimensional correlation data fusion, time series analysis, and feature engineering. Hyperparameter optimization is performed, the optimal prediction model is selected, and the predicted water demand value is output.
It enables accurate and automated prediction of water demand, improves prediction accuracy, reduces resource waste, lowers operating costs, and enhances the operating efficiency and reliability of the water supply system.
Smart Images

Figure CN122198205A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of smart water management and artificial intelligence technology, specifically to a method and system for predicting urban water supply demand based on machine learning. Background Technology
[0002] Urban water supply systems are core infrastructure ensuring the normal operation of cities and the quality of life for residents. Optimizing water supply scheduling is a crucial link in achieving public welfare and energy conservation. Currently, when formulating water supply plans for the next day and for a period of time, water supply dispatch centers rely primarily on forecasts of future urban water demand. However, existing methods for forecasting urban water demand have many significant shortcomings and are insufficient to meet the actual needs of refined scheduling:
[0003] Reliant on human experience and highly subjective: Existing forecasting methods largely depend on the personal "intuition" and historical work experience of dispatchers, lacking scientific and quantitative analytical basis. Differences in the experience levels and judgment standards of different dispatchers lead to highly subjective and unstable forecasting results. Under the same forecasting conditions, significantly different predicted values may be obtained, seriously affecting the reliability of water supply plans.
[0004] Outdated and rigid models with poor adaptability: Some water supply companies still use traditional forecasting models established more than a decade ago. The structural design, parameter configuration, and selection of influencing factors of such models are all based on the city size, climate conditions, and user water usage habits at that time. With rapid urban development, intensified climate change, and changes in residents' water usage patterns, traditional models can no longer adapt to the dynamic changes in the external environment and internal demand, resulting in a large deviation between the forecast results and the actual water demand, making it difficult to guide actual water supply scheduling.
[0005] Coarse-grained forecasting: Traditional forecasting methods often employ a holistic and average-based approach, making it difficult to effectively differentiate water usage patterns across seasons, time periods, and regions, and thus failing to provide refined water demand forecasts. This forces water supply scheduling to adopt a broad, one-size-fits-all strategy, hindering precise adjustments based on actual demand variations and impacting the operational efficiency of the water supply system.
[0006] The aforementioned deficiencies lead to low accuracy in urban water demand forecasting, resulting in unpredictable water supply planning: when forecasts exceed actual demand, overproduction at water plants causes excessive pressure in the pipeline network, increasing power consumption at pumping stations and exacerbating network aging and physical leakage; conversely, when forecasts are lower than actual demand, localized water shortages may occur, affecting residents' normal water use and industrial production. These problems directly increase the operating costs of water supply companies, reduce the control of production and demand differences, and cause serious resource waste.
[0007] Therefore, developing a high-precision, automated, and highly adaptable method and system for predicting urban water supply demand has become a pressing technical challenge in the field of smart water management. Summary of the Invention
[0008] This invention provides a method and system for predicting urban water demand based on machine learning, which enables accurate and automated prediction of water demand, provides a scientific basis for water supply scheduling, and achieves the goals of optimizing resource allocation, reducing operating costs, and improving service quality. It solves the problems of existing technologies, such as reliance on human experience, poor model adaptability, and low prediction accuracy in predicting urban water demand.
[0009] To achieve the above objectives, the present invention is implemented through the following technical solution:
[0010] In a first aspect, the present invention provides a method for predicting urban water supply demand based on machine learning, comprising the following steps:
[0011] Multidimensional correlated data is acquired and fused preprocessed to obtain a structured training dataset;
[0012] Automated feature engineering is performed based on time series analysis to select key input features and construct feature vectors.
[0013] Construct a candidate model set containing multiple machine learning algorithms, screen the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization;
[0014] Based on the optimal prediction model and the corresponding characteristics of the date to be predicted, the predicted value of urban water supply demand is output.
[0015] As a further improvement to the technical solution of the present invention, the step of obtaining multidimensional correlation data and performing fusion preprocessing to obtain a structured training dataset specifically includes:
[0016] Obtain historical daily water supply time series data for each water supply zone and water plant in the city;
[0017] Obtain external influence characteristic data for the date to be predicted and historical dates, wherein the external influence characteristic data includes at least meteorological data and holiday characteristic data;
[0018] The historical daily water supply time series data and external influence feature data are cleaned to remove outliers and fill in missing values, forming the structured training dataset.
[0019] As a further improvement to the technical solution of this invention, the step of performing automated feature engineering based on time series analysis to screen key input features and construct feature vectors specifically includes:
[0020] The historical daily water supply time series data were analyzed using autocorrelation function (ACF) and partial autocorrelation function (PACF) to determine the actual water supply of the previous N days before the day to be predicted as the core autoregressive feature, where N is a positive integer;
[0021] The core autoregressive features are fused with the external influence feature data to construct an input feature vector for model training and prediction.
[0022] As a further improvement to the technical solution of this invention, the construction of a candidate model set containing multiple machine learning algorithms, the selection of the optimal prediction model through multi-dimensional performance evaluation, and the completion of hyperparameter optimization specifically include:
[0023] Define a candidate model set, which includes at least the gradient boosting decision tree model, the random forest model, the support vector machine model, and the linear regression model;
[0024] Each model in the candidate model set is trained and validated using the structured training dataset. Each model is quantitatively evaluated using a preset performance evaluation metric, which includes at least the mean absolute percentage error (MAPE), training time, and inference time.
[0025] The model with the best overall performance is selected as the initial prediction model based on the evaluation results;
[0026] The key hyperparameters of the initial prediction model are optimized using cross-validation to obtain the optimal prediction model.
[0027] As a further improvement to the technical solution of the present invention, the value of N is 2, and the core autoregressive feature is the actual water supply of the day before the predicted day and the actual water supply of the two days before the predicted day.
[0028] As a further improvement to the technical solution of the present invention, the step of outputting the predicted value of urban water supply demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted specifically includes:
[0029] Obtain meteorological data, holiday characteristic data, and actual water supply volume for the N days prior to the predicted date for the date to be predicted;
[0030] The above data is integrated into a prediction input vector and then input into the optimal prediction model.
[0031] The predicted urban water demand for the predicted date is calculated and output using the optimal prediction model.
[0032] As a further improvement to the technical solution of the present invention, the meteorological data includes at least the highest temperature, lowest temperature, weather conditions, rainfall, and evapotranspiration; the holiday characteristic data is used to distinguish between weekdays, ordinary weekends, and major holidays.
[0033] A second aspect of the present invention provides a machine learning-based urban water supply demand prediction system, comprising:
[0034] The data acquisition and preprocessing module is used to acquire multidimensional related data and perform fusion preprocessing to obtain a structured training dataset;
[0035] The feature engineering module is used to perform automated feature engineering based on time series analysis, select key input features and construct feature vectors;
[0036] The model training and optimization module is used to build a candidate model set containing various machine learning algorithms, select the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization.
[0037] The prediction output module is used to output the predicted value of urban water supply demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted.
[0038] The user interaction module provides a human-computer interaction interface, receives parameters input by the user, and displays the prediction results.
[0039] A third aspect of the present invention provides a computer device comprising a memory and a processor, the memory storing a computer program, and the processor being configured to acquire the computer program and execute the machine learning-based urban water demand prediction method as described above.
[0040] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the machine learning-based urban water demand prediction method described above.
[0041] The technical solution of the present invention has the following advantages over the prior art:
[0042] This invention constructs a comprehensive influencing factor system through the fusion and preprocessing of multidimensional correlated data, combines automated feature engineering of time series analysis to screen key input features, and then obtains the optimal prediction model through multi-model performance evaluation and hyperparameter optimization. Ultimately, it achieves accurate prediction of urban water demand. This not only eliminates the shortcomings of traditional methods, such as reliance on manual experience, outdated and rigid models, and coarse prediction granularity, significantly improving prediction accuracy (controlling MAPE to around 1.01%), but also automates the entire prediction process, greatly shortening water supply scheduling decision-making time and improving work efficiency. Simultaneously, it can automatically learn the nonlinear relationship between water demand and multidimensional complex factors, dynamically adapting to urban development, climate change, and changes in water use patterns. This effectively solves the problem of mismatch between water production and actual demand, reduces power consumption at water pumping stations and physical leakage in the pipeline network, lowers the production-sales gap for water supply enterprises, and provides a scientific and reliable quantitative basis for water supply scheduling, resulting in significant economic and social benefits. Attached Figure Description
[0043] Other features, objects, and advantages of the present invention will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0044] Figure 1 This is a schematic diagram of the framework of a machine learning-based method for predicting urban water supply demand according to an embodiment of the present invention.
[0045] Figure 2 This is a schematic diagram of the module framework of a machine learning-based urban water demand prediction system according to an embodiment of the present invention.
[0046] Figure 3 This is a schematic diagram of the composition of a computing device according to an embodiment of the present invention. Detailed Implementation
[0047] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0048] The present invention will be further described in detail below with reference to the accompanying drawings.
[0049] Reference Figure 1 In a first aspect, the present invention provides a method for predicting urban water supply demand based on machine learning, comprising the following steps:
[0050] Multidimensional correlated data is acquired and fused preprocessed to obtain a structured training dataset;
[0051] Automated feature engineering is performed based on time series analysis to select key input features and construct feature vectors.
[0052] Construct a candidate model set containing multiple machine learning algorithms, screen the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization;
[0053] Based on the optimal prediction model and the corresponding characteristics of the date to be predicted, the predicted value of urban water supply demand is output.
[0054] In practice, the process begins by acquiring multi-dimensional correlation data related to urban water demand through multiple data collection channels. This data undergoes preprocessing operations such as cleaning, completion, and standardization to form a structured training dataset with a unified structure and complete data set. Next, time series analysis is used to mine features from historical water volume data, automatically selecting input features that have a key impact on demand prediction. These key features are then integrated to construct feature vectors for model training and prediction. A candidate model set containing various types of machine learning algorithms is then constructed. Each model in the candidate set is trained and validated using multi-dimensional performance evaluation metrics such as mean absolute percentage error, training time, and inference time. The model with the best overall performance is selected, and its key hyperparameters are optimized through cross-validation to obtain the optimal prediction model with superior performance. Finally, feature data corresponding to the predicted date are collected and input into the optimal prediction model, which then calculates and outputs the predicted urban water demand value.
[0055] This invention, through a full-process design involving multi-dimensional data fusion, automated feature engineering, model optimization, and hyperparameter optimization, overcomes the shortcomings of traditional water demand forecasting, which relies on human experience, has outdated and rigid models, and has coarse prediction granularity. It achieves scientific, automated, and accurate water demand forecasting. Multi-dimensional performance evaluation and hyperparameter optimization ensure that the prediction model has the best overall performance, can dynamically adapt to changes in urban development, climate change, and water use patterns, provides high-precision quantitative basis for water supply scheduling, effectively reduces resource waste caused by the mismatch between water supply production and actual demand, and improves the economic and social benefits of water supply enterprises.
[0056] In some embodiments, obtaining multidimensional correlation data and performing fusion preprocessing to obtain a structured training dataset specifically includes:
[0057] Obtain historical daily water supply time series data for each water supply zone and water plant in the city;
[0058] Obtain external influence characteristic data for the date to be predicted and historical dates, wherein the external influence characteristic data includes at least meteorological data and holiday characteristic data;
[0059] The historical daily water supply time series data and external influence feature data are cleaned to remove outliers and fill in missing values, forming the structured training dataset.
[0060] It should be noted that historical water volume data was acquired through the city's water supply system's flow monitoring equipment and data acquisition system, covering the historical daily water supply time series data of each water supply zone and water plant in the city, ensuring the comprehensiveness and representativeness of the data; external impact characteristic data was acquired through meteorological data obtained from the public meteorological service platform, and holiday characteristic data was constructed by combining the national statutory holiday arrangements and work adjustment situation. The meteorological data contains key indicators affecting water demand, and the holiday characteristic data clearly distinguishes different date types; outlier identification and removal were performed on the collected historical water volume data and external impact characteristic data, and missing values were supplemented using targeted methods such as linear interpolation and mode imputation, ultimately forming a structured training dataset that meets the requirements of model training.
[0061] This invention overcomes the limitations of traditional prediction methods that rely solely on historical water volume data by systematically acquiring multidimensional data, and constructs a more comprehensive system of influencing factors. The data fusion preprocessing operation effectively removes noise and anomalies from the data, fills in data gaps, and significantly improves data quality, providing a reliable and high-quality data foundation for subsequent feature engineering and model training, and avoiding the adverse effects of poor-quality data on the accuracy of prediction results.
[0062] In some embodiments, the automated feature engineering based on time series analysis, which filters key input features and constructs feature vectors, specifically includes:
[0063] The historical daily water supply time series data were analyzed using autocorrelation function (ACF) and partial autocorrelation function (PACF) to determine the actual water supply of the previous N days before the day to be predicted as the core autoregressive feature, where N is a positive integer;
[0064] The core autoregressive features are fused with the external influence feature data to construct an input feature vector for model training and prediction.
[0065] It should be noted that autocorrelation function analysis and partial autocorrelation function analysis were performed on the historical daily water supply time series data. The autocorrelation and partial autocorrelation characteristics of the data were observed by drawing analysis charts. Based on the tailing and truncation characteristics presented in the charts, the actual water supply of the previous N days before the date to be predicted was determined as the core autoregressive feature, where N is a positive integer. The selected core autoregressive feature was integrated with the preprocessed external influence feature data. Redundant and highly correlated features were removed through correlation analysis to construct an input feature vector with appropriate dimensions and effective information.
[0066] This invention utilizes time series analysis technology to scientifically screen key autoregressive features, avoiding subjectivity in feature selection and ensuring that core features can reflect the impact of historical water volume on future water demand to the greatest extent. Through feature fusion and redundancy removal, it reduces the interference of irrelevant features on model training, improves the quality and effectiveness of model input data, and thus enhances the learning efficiency and prediction accuracy of the prediction model.
[0067] In some embodiments, the construction of a candidate model set containing multiple machine learning algorithms, the selection of the optimal prediction model through multi-dimensional performance evaluation, and the completion of hyperparameter optimization specifically include:
[0068] Define a candidate model set, which includes at least the gradient boosting decision tree model, the random forest model, the support vector machine model, and the linear regression model;
[0069] Each model in the candidate model set is trained and validated using the structured training dataset. Each model is quantitatively evaluated using a preset performance evaluation metric, which includes at least the mean absolute percentage error (MAPE), training time, and inference time.
[0070] The model with the best overall performance is selected as the initial prediction model based on the evaluation results;
[0071] The key hyperparameters of the initial prediction model are optimized using cross-validation to obtain the optimal prediction model.
[0072] It should be noted that a candidate model set is predefined, including various types of machine learning algorithms such as gradient boosting decision tree model, random forest model, support vector machine model, and linear regression model, covering both linear and nonlinear algorithms. The structured training dataset is divided into a training set and a validation set. Each model in the candidate model set is trained using the training set, and the prediction accuracy, training efficiency, and inference efficiency of each model are quantitatively compared and evaluated using multi-dimensional performance evaluation metrics on the validation set. Based on the evaluation results, the model with the best overall performance is selected as the initial prediction model. Then, a cross-validation method is used to traverse the hyperparameter space and optimize and adjust the key hyperparameters of the initial prediction model to obtain the optimal prediction model with better performance.
[0073] This invention uses a multi-model comparison and screening mechanism to ensure that the selected model can adapt to the complex scenario of water demand prediction, balancing prediction accuracy and operational efficiency. Hyperparameter optimization further improves the performance ceiling of the model, enabling the model to better learn the complex relationship between water demand and multidimensional influencing factors, effectively avoiding the performance bottleneck that may exist in a single model, and providing core technical support for high-precision prediction.
[0074] In some embodiments, the value of N is 2, and the core autoregressive feature is the actual water supply of the day before the predicted day and the actual water supply of the two days before the predicted day.
[0075] It should be noted that, through autocorrelation function and partial autocorrelation function analysis of historical water supply time series data, when the analysis chart exhibits ACF tailing and PACF significant truncation at lag 1 and 2, the value of N is determined to be 2. That is, the core autoregressive feature specifically refers to the actual water supply of the day before the predicted date and the actual water supply of the two days before. This value can capture the influence of historical water volume on current water demand to the greatest extent. This invention clarifies the specific range of the core autoregressive feature, making feature selection more targeted and scientific, and avoiding the blind selection of N value. The actual water supply of the day before and the two days before, as the core feature, can accurately reflect the impact of recent water use patterns on future water demand, further improving the effectiveness of the model input features and providing clear feature support for improving prediction accuracy.
[0076] In some embodiments, the step of outputting the predicted urban water demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted specifically includes:
[0077] Obtain meteorological data, holiday characteristic data, and actual water supply volume for the N days prior to the predicted date for the date to be predicted;
[0078] The above data is integrated into a prediction input vector and then input into the optimal prediction model.
[0079] The predicted urban water demand for the predicted date is calculated and output using the optimal prediction model.
[0080] It should be noted that the process involves obtaining meteorological forecast data for the predicted date from a public meteorological service platform, determining holiday characteristic data for the predicted date based on the national statutory holiday schedule, and obtaining the actual water supply volume for the N days prior to the predicted date from a water supply system data acquisition system. Following a preset feature vector format, the data obtained above are integrated into a unified prediction input vector. This prediction input vector is then input into a trained and optimized optimal prediction model, and the predicted urban water demand for the predicted date is obtained through the model's calculation and reasoning.
[0081] This invention realizes a standardized process for acquiring prediction data, constructing input vectors, and outputting model predictions, ensuring the standardization and efficiency of the prediction process. Based on the optimal prediction model and complete feature inputs, it can quickly output high-precision water demand predictions, meeting the actual needs of the water supply dispatch center in formulating the next day's water supply plan and providing timely and reliable basis for dispatch decisions.
[0082] In some embodiments, the meteorological data includes at least the highest temperature, lowest temperature, weather conditions, rainfall, and evapotranspiration; the holiday characteristic data is used to distinguish between weekdays, ordinary weekends, and major holidays.
[0083] It should be noted that the meteorological data specifically includes indicators closely related to water demand, such as maximum temperature, minimum temperature, weather conditions, rainfall, and evapotranspiration. Among these, temperature and evapotranspiration directly affect the intensity of residential and industrial water use, rainfall affects groundwater recharge and outdoor water demand, and weather conditions affect users' water use behavior patterns. Holiday characteristic data accurately reflects the differences in water use patterns under different date types by distinguishing between weekdays, ordinary weekends, and major holidays. These data together constitute a comprehensive system of external influencing factors.
[0084] This invention refines the specific content of external influence feature data, making the key factors affecting water demand clearer and avoiding the ambiguity of external features. The comprehensive and specific external influence factors can more realistically simulate various influencing factors in actual water use scenarios, making the learning of the prediction model closer to reality, and further improving the accuracy and reliability of water demand prediction.
[0085] Reference Figure 2 The second aspect of the present invention provides a machine learning-based urban water supply demand prediction system, comprising:
[0086] The data acquisition and preprocessing module is used to acquire multidimensional related data and perform fusion preprocessing to obtain a structured training dataset;
[0087] The feature engineering module is used to perform automated feature engineering based on time series analysis, select key input features and construct feature vectors;
[0088] The model training and optimization module is used to build a candidate model set containing various machine learning algorithms, select the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization.
[0089] The prediction output module is used to output the predicted value of urban water supply demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted.
[0090] The user interaction module provides a human-computer interaction interface, receives parameters input by the user, and displays the prediction results.
[0091] It should be noted that the data acquisition and preprocessing module is responsible for connecting to multiple data sources, collecting and fusing multi-dimensional correlated data, and outputting a structured training dataset. The feature engineering module receives the structured training dataset, filters key autoregressive features through time series analysis, and constructs an input feature vector by fusing external influence features. The model training and optimization module, based on the input feature vector, completes the training of candidate models, multi-dimensional evaluation, selection of the optimal model, and hyperparameter optimization. The prediction output module receives the feature data for the date to be predicted, calculates and outputs the predicted value through the optimal model, and provides a visual human-computer interaction interface, supporting user input of parameters, querying historical data, and exporting prediction results, thus achieving an intuitive display of the prediction results. All modules work together to form a complete water demand prediction process.
[0092] This invention clarifies the functional division of each part of the system through modular design, realizing full automation of data processing, feature engineering, model training, prediction output, and user interaction, which significantly reduces the cost of manual intervention. The user interaction module improves the usability of the system, enabling dispatchers to operate quickly and obtain prediction results, significantly improving the work efficiency of water supply dispatch. The coordinated cooperation of each module ensures the stable operation and efficient output of the prediction system, providing an integrated technical solution for urban water supply dispatch.
[0093] Reference Figure 3 A third aspect of the present invention provides a computer device, the computer device including a memory and a processor, the memory storing a computer program, the processor being configured to acquire the computer program and execute the machine learning-based urban water demand prediction method as described above.
[0094] It should be noted that the computer device's memory stores the computer program and related data that implement the machine learning-based urban water demand prediction method, including multidimensional correlated data, structured training datasets, feature vectors, model parameters, etc. The processor reads the computer program stored in the memory and, according to the program's preset logical steps, sequentially executes operations such as data acquisition and preprocessing, feature engineering, model building and optimization, and prediction output to complete the urban water demand prediction process. The computer device provides a reliable hardware platform for the machine learning-based urban water demand prediction method. The processor's computing power and the memory's storage capacity ensure that the prediction method can run efficiently and stably. The versatility and scalability of the computer device enable the prediction method to adapt to water supply systems of different sizes, facilitating its application in various water supply scheduling scenarios and providing hardware support for the practical promotion of the method.
[0095] In some embodiments, the machine learning-based urban water demand prediction method described above can be implemented by a computer device, which includes at least one processor, a communication bus, a memory, and at least one communication interface.
[0096] A processor can be a general-purpose central processing unit (CPU) or an application-specific integrated circuit (ASIC).
[0097] A communication bus can be used to transmit information between the aforementioned components.
[0098] The memory can be read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, random access memory (RAM) or other types of dynamic storage devices capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, universal optical discs, Blu-ray discs, etc.), magnetic disks or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited to these. The memory can exist independently and be connected to the processor via a communication bus. The memory can also be integrated with the processor.
[0099] The memory stores program code for executing the present invention, and its execution is controlled by a processor. The processor executes the program code stored in the memory. The program code may include one or more software modules. In the above embodiments, the machine learning-based urban water demand prediction method can be implemented by a processor and one or more software modules in the program code in the memory.
[0100] A communication interface is a device that uses any transceiver or similar device to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc.
[0101] In a specific implementation, as one example, a computer device may include multiple processors, each of which may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. Here, a processor may refer to one or more devices, circuits, and / or processing cores used to process data (e.g., computer program instructions).
[0102] The aforementioned computer device can be a general-purpose computer device or a special-purpose computer device. In specific implementations, the computer device can be a desktop computer, a portable computer, a network server, a handheld digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device. This embodiment of the invention does not limit the type of computer device.
[0103] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the machine learning-based urban water demand prediction method described above.
[0104] It should be noted that the computer-readable storage medium uses a suitable storage format to store the computer program implementing the machine learning-based urban water demand prediction method. This storage medium is compatible with various computer devices. When the processor of a computer device reads the computer program stored in this storage medium, it can execute the corresponding prediction steps according to the program instructions, including data preprocessing, feature engineering, model training and optimization, and prediction output, thereby achieving accurate prediction of urban water demand. The computer-readable storage medium facilitates the storage, transmission, and deployment of the computer program for the machine learning-based urban water demand prediction method, lowering the barrier to promotion and application. The stability and compatibility of the computer-readable storage medium ensure that the program can run normally on different computer devices, expanding the application scope of this prediction method and enabling more water supply companies to easily adopt this method to improve the accuracy of water demand prediction and the scientific nature of scheduling.
[0105] To provide a clearer understanding of the invention, the invention is further described below:
[0106] Reference Figure 1 In a first aspect, the present invention provides a method for predicting urban water supply demand based on machine learning, comprising the following steps:
[0107] Step 1: Obtain multidimensional correlation data and perform fusion preprocessing to obtain a structured training dataset;
[0108] Specifically, step 1 includes:
[0109] Sub-step 1.1: Obtain historical daily water supply time series data for each water supply zone and each water plant in the city. The data comes from the flow monitoring equipment and data acquisition system of the water supply system, and the time span covers at least 3 full years to ensure the representativeness and completeness of the data.
[0110] Sub-step 1.2: Obtain external influence characteristic data for the date to be predicted and historical dates. The external influence characteristic data includes at least meteorological data and holiday characteristic data. The meteorological data is obtained from a public meteorological service platform or a professional meteorological database and includes at least the highest temperature, lowest temperature, weather conditions (sunny, cloudy, rainy, snowy, etc.), rainfall, and evapotranspiration. The holiday characteristic data is constructed based on the national statutory holiday arrangements and weekend adjustments to clearly distinguish between weekdays, ordinary weekends, and major holidays (such as Spring Festival, National Day, etc.).
[0111] Sub-step 1.3: Cleaning the historical daily water supply time series data and external influence characteristic data: using... The criteria identify and remove abnormal data such as zero values and extreme values caused by equipment failure and abnormal data transmission. For missing values, different supplementation strategies are adopted according to the data type: continuous data (such as temperature and rainfall) is supplemented by linear interpolation, and discrete data (such as weather conditions and holiday types) is supplemented by mode imputation. Finally, a structured training dataset with uniform format and complete data is formed.
[0112] Step 2: Perform automated feature engineering based on time series analysis to select key input features and construct feature vectors;
[0113] Specifically, step 2 includes:
[0114] Sub-step 2.1: Perform autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis on the historical daily water supply time series data: By plotting ACF and PACF graphs, observe the autocorrelation and partial autocorrelation characteristics of the data; if the ACF graph shows a tailing characteristic, and the PACF graph has significant partial autocorrelation peaks at lag 1 and lag 2, and then rapidly truncates, then determine the actual water supply of the day before the predicted day and the actual water supply of the two days before the predicted day as the core autoregressive features;
[0115] Sub-step 2.2: The core autoregressive features are fused with the external influence feature data (meteorological data, holiday feature data), redundant features and highly correlated features with a linear correlation coefficient greater than 0.8 are removed, and an input feature vector for model training and prediction is constructed.
[0116] Step 3: Construct a candidate model set containing multiple machine learning algorithms, select the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization;
[0117] Specifically, step 3 includes:
[0118] Sub-step 3.1: Define a candidate model set, which includes at least gradient boosting decision tree models (such as LightGBM model, XGBoost model), random forest models, support vector machine (SVM) models and linear regression models, covering both linear and nonlinear algorithms to ensure comprehensive model selection;
[0119] Sub-step 3.2: Divide the structured training dataset into a training set and a validation set in a 7:3 ratio. Train each model in the candidate model set using the training set and evaluate the model performance using the validation set. Use multi-dimensional preset performance evaluation metrics, including mean absolute percentage error (MAPE), training time, and inference time, to quantitatively compare the overall performance of each model.
[0120] Sub-step 3.3: Select the model with the best overall performance as the initial prediction model based on the evaluation results. The best overall performance means that the training time and inference time are minimized while minimizing the MAPE.
[0121] Sub-step 3.4: Optimize the key hyperparameters of the initial prediction model using the 5-fold cross-validation method: with MAPE as the optimization objective, traverse the hyperparameter space through grid search or random search to determine the hyperparameter combination that optimizes the model performance, thus obtaining the optimal prediction model.
[0122] Step 4: Based on the optimal prediction model and the corresponding characteristics of the date to be predicted, output the predicted value of urban water supply demand;
[0123] Specifically, step 4 includes:
[0124] Sub-step 4.1: Obtain meteorological data for the date to be predicted (forecast data obtained from the public meteorological service platform), holiday characteristic data (determined according to the statutory holiday schedule), and actual water supply for the day before and two days before the date to be predicted (obtained from the water supply system data acquisition system).
[0125] Sub-step 4.2: Integrate the above data according to the input feature vector format constructed in step 2 to form the prediction input vector;
[0126] Sub-step 4.3: Input the prediction input vector into the optimal prediction model, and the model automatically calculates and outputs the predicted value of urban water supply demand for the date to be predicted;
[0127] Sub-step 4.4: Display the prediction results to the user through a visualization interface, and support the export and storage of prediction data for easy subsequent query and analysis.
[0128] Preferably, the value of N is 2, that is, the core autoregressive feature is the actual water supply of the day before the predicted day and the actual water supply of the two days before the predicted day. This value is verified by ACF and PACF analysis and can reflect the impact of historical water volume on future water demand to the greatest extent.
[0129] Preferably, the meteorological data includes at least the highest temperature, lowest temperature, weather conditions, rainfall, and evapotranspiration. This type of data is significantly correlated with water demand: temperature and evapotranspiration affect residential water use (such as bathing and irrigation) and industrial production water use, rainfall affects groundwater recharge and outdoor water demand, and weather conditions directly affect users' water use behavior patterns.
[0130] Reference Figure 2 Secondly, the present invention provides a machine learning-based urban water supply demand prediction system, comprising:
[0131] The data acquisition and preprocessing module is used to perform the operation in step 1, acquire multidimensional related data and perform fusion preprocessing to obtain a structured training dataset;
[0132] The feature engineering module is used to perform the operations in step 2. It performs automated feature engineering based on time series analysis, selects key input features, and constructs feature vectors.
[0133] The model training and optimization module is used to perform the operations in step 3, build a candidate model set containing multiple machine learning algorithms, screen the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization.
[0134] The prediction output module is used to perform the operation in step 4, and output the predicted value of urban water supply demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted.
[0135] The user interaction module provides a human-computer interaction interface, supporting users to input the date to be predicted, query historical prediction data, export prediction results, and display the prediction results in the form of charts and numerical values, thereby improving the user experience.
[0136] To better understand the technical solution of the present invention, the technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
[0137] refer to Figure 1 The figure is an exemplary flowchart of a machine learning-based urban water demand prediction method according to some embodiments of the present invention. The method mainly includes the following steps:
[0138] Step 1: Obtain multidimensional correlation data and perform fusion preprocessing to obtain a structured training dataset;
[0139] In this embodiment, the historical daily water supply time series data of each water supply zone and water plant from January 1, 2020 to December 31, 2022 are obtained from the flow monitoring equipment and data acquisition system of a certain city's water supply system. The data sampling frequency is once a day. Meteorological data for the same period are obtained from the National Meteorological Science Data Center, including daily maximum temperature, minimum temperature, weather conditions, rainfall, and evapotranspiration. According to the national statutory holiday arrangement and weekend adjustment, holiday characteristic data are constructed, and the date type is divided into weekdays (marked as 0), ordinary weekends (marked as 1), and major holidays (marked as 2).
[0140] use The criteria identify and remove outliers in the water supply data: if the water supply on a certain date is 0 or exceeds the mean ± 3 times the standard deviation, it is identified as an outlier and removed; for missing temperature data, linear interpolation is used to fill in the missing data; for missing weather data, the mode of the weather conditions for the three days before and after is used to fill in the missing data; finally, a structured training dataset containing "date-water supply-highest temperature-lowest temperature-weather conditions-rainfall-evapotranspiration-holiday type" is formed.
[0141] Step 2: Perform automated feature engineering based on time series analysis to select key input features and construct feature vectors;
[0142] ACF and PACF analyses were performed on historical daily water supply time series data, and ACF and PACF plots were generated. The ACF plot showed a tailing characteristic, while the PACF plot had significant partial autocorrelation peaks at lag 1 (lag=1) and lag 2 (lag=2), followed by rapid truncation. Therefore, the actual water supply of the day before the predicted date (Lag1) and the actual water supply of the two days before the predicted date (Lag2) were determined as the core autoregressive features.
[0143] The core autoregressive features (Lag1, Lag2) are fused with meteorological features (maximum temperature, minimum temperature, rainfall, evapotranspiration) and holiday features (date type). Redundant features are removed by Pearson correlation coefficient analysis: the correlation coefficient between maximum temperature and evapotranspiration is 0.85, so the evapotranspiration feature is removed. The final input feature vector is [Lag1, Lag2, maximum temperature, minimum temperature, rainfall, weather conditions, date type].
[0144] Step 3: Construct a candidate model set containing multiple machine learning algorithms, select the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization;
[0145] The candidate model set is defined, including eight mainstream machine learning models: LightGBM, XGBoost, Random Forest, Support Vector Machine (SVM), Linear Regression, Decision Tree, Gradient Boosting Tree (GBR), and K Nearest Neighbors (KNN).
[0146] The structured training dataset was divided into a training set (January 1, 2020 to August 31, 2022) and a validation set (September 1, 2022 to December 31, 2022) in a 7:3 ratio. The candidate models were trained using the training set, and model performance was evaluated using the validation set. MAPE, training time, and inference time were used as performance evaluation metrics. The LightGBM model was selected as the initial prediction model. Five-fold cross-validation and grid search were used to optimize the key hyperparameters of the LightGBM model. The hyperparameter search range included learning rate (0.01-0.1), maximum tree depth (3-10), number of leaf nodes (20-100), and number of iterations (100-1000). The optimal hyperparameter combination was determined to be: learning rate = 0.05, maximum tree depth = 6, number of leaf nodes = 50, and number of iterations = 500, resulting in the optimal prediction model.
[0147] Step 4: Based on the optimal prediction model and the corresponding characteristics of the date to be predicted, output the predicted value of urban water supply demand;
[0148] Using January 1, 2023 as the predicted date, obtain the weather forecast data for that date: maximum temperature. minimum temperature The weather was sunny with 0 mm of rainfall; according to the holiday schedule, January 1, 2023, is a major holiday (marked as 2); the water supply volume on December 30, 2022 (Lag1 = 523,000 cubic meters) was obtained. (and the water supply as of December 31, 2022 (Lag2=518,000)) ).
[0149] The above data were integrated into a prediction input vector [51.8, 52.3, 12, 3, 0, sunny, 2], and input into the optimal LightGBM prediction model. The model outputs a predicted urban water demand of 531,000 tons for January 1, 2023. The actual daily water supply was 528,000 cubic meters. The MAPE is 0.57%, and the prediction accuracy meets the needs of practical applications.
[0150] refer to Figure 2The figure is a schematic diagram of the structure of an urban water supply demand prediction system according to some embodiments of the present invention. The system includes a data acquisition and preprocessing module, a feature engineering module, a model training and optimization module, a prediction output module, and a user interaction module.
[0151] The data acquisition and preprocessing module interfaces with the water supply system data acquisition system and the public meteorological service platform to automatically acquire multidimensional correlated data and complete cleaning and preprocessing. The feature engineering module performs ACF / PACF analysis on historical water volume data, selects core autoregressive features, and constructs feature vectors. The model training and optimization module trains, evaluates, selects the best candidate model, and optimizes hyperparameters. The prediction output module receives the prediction input vector, calculates and outputs the prediction results through the optimal model. The user interaction module provides a visual interface to support user operation and result display.
[0152] The technical solutions provided by the embodiments of the present invention have the following beneficial effects:
[0153] Significantly improved prediction accuracy: This invention abandons the traditional prediction model that relies solely on historical water volume, and innovatively integrates multi-dimensional related information such as historical water volume data, meteorological data, and holiday characteristic data to construct a comprehensive influencing factor system; through ACF / PACF analysis, core autoregressive features are scientifically screened, and combined with model optimization and hyperparameter optimization, the mean absolute percentage error (MAPE) of water demand prediction is reduced to about 1.01%, which is an order of magnitude improvement in accuracy compared with traditional methods, providing high-precision data support for water supply scheduling.
[0154] Highly scientific and adaptable: This invention constructs a prediction model based on machine learning algorithms, which can automatically learn the nonlinear relationship between water demand and multidimensional influencing factors. It can adapt to the impact of urban development, climate change and changes in water use patterns without human intervention. The model selection mechanism ensures that the model with the best overall performance can be selected in different application scenarios, further improving the adaptability and robustness of the prediction method.
[0155] High degree of automation, improving scheduling efficiency: This invention realizes full automation of the process from data acquisition, preprocessing, feature engineering, model training to prediction output, completely eliminating the dependence on human experience; dispatchers only need to input a few necessary parameters through the user interface to quickly obtain accurate prediction results, which greatly shortens the decision-making time and improves the efficiency and scientific nature of water supply scheduling.
[0156] Significant economic and social benefits: High-precision water demand forecasting enables accurate matching between water supply production and actual demand, which helps optimize pump start-up and shutdown strategies, reduce power consumption of water pumping stations, reduce excess pressure in the pipeline network, reduce physical leakage rate of the pipeline network, and effectively control the production-demand gap; at the same time, it avoids situations of insufficient or excessive water supply, ensuring stable water use for residential life and industrial production, and achieving a win-win situation for both economic and social benefits.
[0157] The technical solutions provided by the embodiments of the present invention have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of the embodiments of the present invention. The descriptions of the embodiments above are only for helping to understand the principles of the embodiments of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the embodiments of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for predicting urban water supply and demand based on machine learning, characterized in that, Includes the following steps: Multidimensional correlated data is acquired and fused preprocessed to obtain a structured training dataset; Automated feature engineering is performed based on time series analysis to select key input features and construct feature vectors. Construct a candidate model set containing multiple machine learning algorithms, screen the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization; Based on the optimal prediction model and the corresponding characteristics of the date to be predicted, the predicted value of urban water supply demand is output.
2. The urban water supply demand prediction method based on machine learning according to claim 1, characterized in that: The process of acquiring multidimensional correlated data and performing fusion preprocessing to obtain a structured training dataset specifically includes: Obtain historical daily water supply time series data for each water supply zone and water plant in the city; Obtain external influence characteristic data for the date to be predicted and historical dates, wherein the external influence characteristic data includes at least meteorological data and holiday characteristic data; The historical daily water supply time series data and external influence feature data are cleaned to remove outliers and fill in missing values, forming the structured training dataset.
3. The urban water supply demand prediction method based on machine learning according to claim 1, characterized in that: The automated feature engineering based on time series analysis, which involves screening key input features and constructing feature vectors, specifically includes: Autocorrelation and partial autocorrelation functions were performed on the historical daily water supply time series data to determine the actual water supply of the N days prior to the day to be predicted as the core autoregressive feature, where N is a positive integer. The core autoregressive features are fused with the external influence feature data to construct an input feature vector for model training and prediction.
4. The urban water supply demand prediction method based on machine learning according to claim 1, characterized in that: The construction of a candidate model set containing multiple machine learning algorithms, the selection of the optimal prediction model through multi-dimensional performance evaluation, and the completion of hyperparameter optimization specifically include: Define a candidate model set, which includes at least the gradient boosting decision tree model, the random forest model, the support vector machine model, and the linear regression model; Each model in the candidate model set is trained and validated using the structured training dataset. Each model is quantitatively evaluated using a preset performance evaluation metric, which includes at least the mean absolute percentage error, training time, and inference time. The model with the best overall performance is selected as the initial prediction model based on the evaluation results; The key hyperparameters of the initial prediction model are optimized using cross-validation to obtain the optimal prediction model.
5. The urban water supply demand prediction method based on machine learning according to claim 3, characterized in that: The value of N is 2, and the core autoregressive feature is the actual water supply of the day before the predicted day and the actual water supply of the two days before the predicted day.
6. The urban water supply demand prediction method based on machine learning according to claim 1, characterized in that: The specific steps for outputting the predicted urban water demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted include: Obtain meteorological data, holiday characteristic data, and actual water supply volume for the N days prior to the predicted date for the date to be predicted; The above data is integrated into a prediction input vector and then input into the optimal prediction model. The predicted urban water demand for the predicted date is calculated and output using the optimal prediction model.
7. The urban water supply demand prediction method based on machine learning according to claim 2, characterized in that: The meteorological data includes at least the highest temperature, lowest temperature, weather conditions, rainfall, and evapotranspiration; the holiday characteristic data is used to distinguish between weekdays, ordinary weekends, and major holidays.
8. A machine learning-based urban water supply demand prediction system, characterized in that, include: The data acquisition and preprocessing module is used to acquire multidimensional related data and perform fusion preprocessing to obtain a structured training dataset; The feature engineering module is used to perform automated feature engineering based on time series analysis, select key input features and construct feature vectors; The model training and optimization module is used to build a candidate model set containing various machine learning algorithms, select the optimal prediction model through multi-dimensional performance evaluation, and complete hyperparameter optimization. The prediction output module is used to output the predicted value of urban water supply demand based on the optimal prediction model and the corresponding characteristics of the date to be predicted. The user interaction module provides a human-computer interaction interface, receives parameters input by the user, and displays the prediction results.
9. A computer device, characterized in that, The computer device includes a memory and a processor, the memory storing a computer program, and the processor being configured to retrieve the computer program and execute the machine learning-based urban water demand prediction method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the machine learning-based urban water demand prediction method as described in any one of claims 1 to 7.