Method and system for predicting carbon emissions of a park based on multi-source data

By employing a dual-constraint clustering modeling method based on energy consumption patterns and event rules at the workshop level in industrial parks, the problem of insufficient prediction accuracy caused by the heterogeneity of production units is solved, enabling accurate prediction and real-time response to carbon emissions.

CN122242841APending Publication Date: 2026-06-19SHENZHEN POLYTECHNIC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN POLYTECHNIC
Filing Date
2026-03-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

At the workshop level in industrial parks, existing carbon emission monitoring and forecasting methods cannot effectively overcome the insufficient prediction accuracy of single models due to the strong heterogeneity of production units, and they also fail to effectively integrate multi-source event data to improve the timeliness and interpretability of forecasts.

Method used

By using a dual-constraint clustering modeling method based on energy consumption patterns and event rules, detailed energy consumption data and event indication data of multiple production units are obtained. Clustering is performed in conjunction with business rules, a suitable prediction algorithm is selected to establish a carbon emission prediction sub-model, and the total carbon emission prediction result is generated.

Benefits of technology

It enables precise carbon emission prediction for heterogeneous production units, improving prediction accuracy and real-time performance, and can respond to real-time changes in production plans.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242841A_ABST
    Figure CN122242841A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for predicting carbon emissions from industrial park workshops based on multi-source data. The method includes: acquiring detailed energy consumption data and event indication data related to production activities from multiple production units within a target workshop; calculating the energy consumption curve morphology index of each unit based on the energy consumption data, and clustering the production units according to preset business rules established based on the event indication data, resulting in multiple carbon emission feature clusters; for each feature cluster, selecting a corresponding prediction algorithm to establish a carbon emission prediction sub-model based on its morphological characteristics and the correlation between event data; finally, generating the total carbon emission prediction result for the workshop based on the output of each sub-model, and outputting the prediction contribution of each feature cluster. The system includes a multi-dimensional data interface module, a dual-constraint clustering engine, a clustering prediction modeling module, and a prediction integration module to implement the method. This invention achieves accurate carbon emission prediction for heterogeneous production units within a workshop through a dynamic clustering mechanism of "data-driven clustering + business rule verification," effectively improving prediction accuracy and management operability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial process data analysis and low-carbon management technology, specifically to a method and system for predicting carbon emissions from industrial park workshops based on multi-source data, which is particularly suitable for complex workshop scenarios in industrial parks where production units are highly heterogeneous, carbon emission sources are dispersed, and are strongly coupled with production activities. Background Technology

[0002] To achieve the "dual carbon" target, industrial parks, especially their production workshops, are key control units. Currently, workshop-level carbon emission monitoring and forecasting mainly rely on two types of methods: one is to perform macro-level accounting and time-series extrapolation forecasting based on energy bill data such as workshop electricity meters and gas meters; the other is to install sensors on key energy-consuming equipment and establish energy consumption simulations based on the physical models of the equipment.

[0003] However, in real-world discrete manufacturing or process industry workshops, the aforementioned methods all face significant limitations. First, workshops typically contain multiple production lines, various process equipment, and auxiliary systems (such as air conditioners and air compressors). These production units exhibit vastly different energy consumption patterns, carbon emission characteristics, and degrees of production planning influence, demonstrating strong "unit heterogeneity." Using only overall workshop data fails to identify the contributions and dynamics of each internal unit, resulting in coarse predictions that are difficult to guide precise energy conservation. Modeling only a few devices cannot cover the entire workshop, and the modeling costs are high and lack flexibility. Second, the fundamental driver of workshop carbon emissions is production activity, whose fluctuations are closely related to events such as production plan adjustments, equipment start-ups and shutdowns, and order changes. Existing methods largely rely on statistical patterns from historical energy consumption data for prediction, failing to effectively integrate real-time event information from Manufacturing Execution Systems (MES), equipment sensors, and video surveillance. This leads to a lag in the predictive model's response to sudden carbon emission fluctuations caused by temporary changes in production plans or abnormal equipment operation, resulting in insufficient prediction accuracy.

[0004] Therefore, how to overcome the problem of insufficient prediction accuracy of a single model due to the strong heterogeneity of production units at the workshop level, and effectively integrate multi-source event data to improve the timeliness and interpretability of predictions, has become a technical bottleneck that urgently needs to be solved in the refined carbon management of industrial parks. Summary of the Invention

[0005] In a first aspect, embodiments of this disclosure provide a method for predicting carbon emissions from industrial park workshops based on multi-source data, comprising: Obtain detailed energy consumption data for multiple production units within the target workshop, as well as event indication data related to the production activities of each production unit; Based on the energy consumption details, calculate the energy consumption curve shape index of each production unit in the historical period. By combining preset business rules that reflect the characteristics of unit production, the multiple production units are clustered to obtain multiple carbon emission characteristic clusters; wherein, the judgment basis of the business rules is at least partially derived from the event indication data; For each carbon emission characteristic cluster, a corresponding prediction algorithm is selected based on the correlation characteristics between its energy consumption curve shape and event indication data, and a carbon emission prediction sub-model is established. Based on the output of each carbon emission prediction sub-model, the total carbon emission prediction result of the target workshop is generated, and the prediction contribution of each carbon emission feature cluster is correlated and output.

[0006] Preferably, the event indication data includes at least one of the following: Production order start / stop event data obtained from the Manufacturing Execution System; Key equipment operating status change data obtained from equipment status sensors; Data on changes in personnel density based on workshop video stream recognition.

[0007] Preferably, the step of calculating the energy consumption curve shape index based on detailed energy consumption data includes: Extract the first type of indicators characterizing energy consumption fluctuations, including the load factor variance and peak-to-valley ratio within a preset time period; Extract a second type of indicator characterizing the periodicity of energy consumption, including the intensity of the main period identified through spectral analysis; A third type of indicator characterizing the responsiveness of energy consumption events is extracted, including the correlation response strength between the energy consumption curve and the occurrence time of the event marked by the event indication data.

[0008] Preferably, the clustering process based on business rules includes: Based on the similarity of the energy consumption curve shape indicators, the production units are initially clustered. The initial clustering results are validated and adjusted using application business rules, which include: If two units belonging to the same initial cluster have event indication data that their production planning patterns are of different types, then they will be assigned to different clusters. If two units belonging to different initial clusters have event indication data that they are driven synchronously by the same production plan event, then they will be merged into the same cluster.

[0009] Preferably, the selection of the corresponding prediction algorithm includes: For clusters that remain stable after initial clustering and business rule verification, and whose third category index is below the threshold, a time series prediction algorithm is selected as the prediction sub-model for the cluster. For clusters whose third category index is higher than the threshold, a machine learning regression algorithm that uses the event indicator data as the key input feature is selected as the predictive sub-model for that cluster.

[0010] Preferably, when establishing a predictive sub-model for a machine learning regression algorithm, the method for converting the event indication data into model input features includes: Transform production work order events into planned production intensity characteristics for the next N hours; Transform device state transition events into time-series features that characterize the probability of device start-up and shutdown.

[0011] Secondly, this disclosure also provides a prediction system for carbon emissions from industrial park workshops based on multi-source data, the system comprising: The multi-dimensional data interface module is used to synchronize the energy consumption details data from the workshop energy management system and the event indication data from the manufacturing execution system or the Internet of Things sensing network. The dual-constraint clustering engine is used to perform the following operations based on the time-series data provided by the multi-dimensional data interface module: calculate the energy consumption curve shape index of each production unit, and divide the multiple production units into multiple carbon emission feature clusters according to the preset business rule base; wherein, the business rule base defines the logic for judging production synergy and mode differences based on event indication data. The clustering prediction modeling module is used to train and predict the carbon emission sub-prediction results of each cluster for each carbon emission feature cluster output by the dual-constraint clustering engine by calling a model algorithm that matches the characteristics of the cluster. The prediction integration and traceability module is used to aggregate the sub-prediction results of each cluster to generate the overall prediction result at the workshop level; and is configured to start the traceability analysis process when a significant deviation occurs in the prediction to locate the main feature clusters and related events that caused the deviation.

[0012] The present invention has the following advantages: To address the issues of insufficient accuracy of single prediction models due to the high heterogeneity of production units and the complex and variable carbon emission patterns within workshops, as well as the disconnect between carbon emission prediction and production activity event data, this invention achieves accurate carbon emission prediction for heterogeneous production units through a "dynamic clustering modeling method based on dual constraints of energy consumption patterns and event rules." Specifically, by combining energy consumption pattern indicators with business rules defined by event data for intelligent clustering, and constructing adaptive prediction sub-models for different clusters, the prediction accuracy is effectively improved. Simultaneously, by using multi-source event data as key inputs to the model, the prediction results can respond to real-time changes in production plans, thereby improving the real-time performance and interpretability of the predictions. Attached Figure Description

[0013] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings used in the embodiments will be briefly described below. These drawings are incorporated in and constitute a part of this specification. They illustrate embodiments conforming to this disclosure and, together with the specification, serve to explain the technical solutions of this disclosure. It should be understood that the following drawings only show some embodiments of this disclosure and should not be considered as limiting the scope. Those skilled in the art can obtain other related drawings based on these drawings without creative effort.

[0014] Figure 1 A flowchart illustrating the steps of a method for predicting carbon emissions from industrial park workshops based on multi-source data, provided in this embodiment of the invention; Figure 2 This is a schematic diagram of a prediction system for carbon emissions from industrial park workshops based on multi-source data, provided as an embodiment of the present invention. Detailed Implementation

[0015] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings to make the technical solution of the present invention clearer and more complete. It should be noted that the described embodiments are for illustrative purposes only and are not intended to limit the present invention. Other implementation methods that can be made by those skilled in the art based on the content of the present invention without creative effort should all fall within the protection scope of the present invention.

[0016] In this application and its claims, unless otherwise expressly stated, the terms "comprising," "including," and similar expressions should be understood to indicate the presence of the listed items without excluding the presence or addition of other items. The words "an," "a," and similar terms should not be construed as limited to the singular in this application, and may also include multiple items.

[0017] Furthermore, the accompanying drawings in this application are merely illustrative and not necessarily drawn to scale. The same reference numerals denote components with the same or similar functions. To clearly illustrate the present invention, specific details are provided in the following embodiments. Those skilled in the art should understand that these details are not essential for implementing the present invention, and other methods may be used to implement it without affecting the basic idea of ​​the invention.

[0018] I. Carbon Emission Prediction Methods and Procedures The overall process flowchart is as follows: Figure 1 As shown.

[0019] This embodiment uses an injection molding workshop in an industrial park as the implementation scenario. The workshop contains three large injection molding machines (Unit_M1, Unit_M2, Unit_M3) and a centralized cooling system (Unit_Cooling). Each production unit is equipped with smart meters, flow meters, and PLC controllers, and is connected to the manufacturing execution system and energy management system within the workshop.

[0020] S1: Multi-source data acquisition and synchronization (1) Collection of detailed energy consumption data Detailed energy consumption data is obtained through smart sensing devices deployed on the power supply circuits of production units, specifically including the consumption of electricity and compressed air.

[0021] ①Data source and metering points: Three-phase smart meters (model example: SFE PD194Z-2S4, accuracy class 0.5S) are installed at the three-phase power input terminals of the three injection molding machines Unit_M1, Unit_M2, and Unit_M3. The meters communicate via RS-485 interface using the Modbus-RTU protocol.

[0022] Gas turbine flow meters (model example: Jinding Instrument LWQ-50, outputting 4-20mA analog signal) are installed on the compressed air branch pipes that supply air to the above-mentioned injection molding machine.

[0023] Configure a power monitoring instrument for the Unit_Cooling (chilled water unit) to directly read its total power consumption.

[0024] ② Data collection and uploading: Deploy an industrial data acquisition unit (such as the ICP-841 series from ICP) on the workshop side. This acquisition unit is equipped with multiple serial ports and analog input modules.

[0025] For electricity data: The data collector acts as a Modbus master station, polling each smart meter sequentially every 15 minutes to read the cumulative value of its "Total Positive Active Energy" register (address example: 0x0000). The data collector calculates the difference between the current cycle and the previous cycle to obtain the actual electricity consumption (unit: kWh) within that 15 minutes.

[0026] For compressed air data: The analog input module of the data acquisition unit receives the 4-20mA current signal from the flow meter in real time and converts it into an instantaneous flow rate value (m³ / h). The data acquisition unit integrates the instantaneous flow rate and accumulates it every 15 minutes to obtain the total compressed air consumption (unit: m³) during that period.

[0027] The data collector encapsulates the processed data (including timestamp, device ID, energy consumption type, value, and unit) into a JSON format message and sends it to the designated data receiving interface of the workshop energy management system periodically via the workshop industrial Ethernet using an HTTP POST request.

[0028] (2) Event Indication Data Collection Event indication data is acquired in real time from the production management system and the Internet of Things sensing network, and is used to characterize the status and intensity of production activities.

[0029] ① Production work order start / stop event data: Data source: The manufacturing execution system (MES) in the workshop. The MES system maintains the scheduling, execution, and status information of all production work orders.

[0030] Data collection method: A data synchronization proxy service is run on the server where the system of this invention is deployed. This service retrieves work order events by calling the RESTful API provided by the MES system. For example, it periodically (e.g., every minute) calls the GET / mes-api / production-orders / events?since={last_query_time} interface to query new work order status change events since the last synchronization time.

[0031] Data Structure: Each event record in the returned JSON data contains at least the following fields: event_timestamp (ISO 8601 format timestamp), workstation_id (corresponding production unit ID, such as "Unit_M1"), event_type (such as "JOB_START" or "JOB_FINISH"), order_number (work order number, such as "WO20231201001"), product_sku (product model), and planned_quantity (planned output).

[0032] ② Equipment operating status change data: Data source: Programmable logic controller built into the injection molding machine.

[0033] Data acquisition method: Deploy an OPC UA server / data acquisition gateway (such as KEPServerEX) in the workshop network. This gateway establishes a connection with each injection molding machine PLC through the driver provided by the manufacturer (such as the Mitsubishi MELSEC protocol) and subscribes to the discrete registers inside the PLC that represent the status of key processes (such as D100.0 representing "injection" and D100.1 representing "holding pressure").

[0034] Event Generation: Gateway configuration data change trigger rules. When any subscribed status bit changes from 0 to 1 or from 1 to 0, the gateway immediately generates a status event message containing timestamp, device_id, signal_name, and new_value, and publishes it to the topic / shopfloor / equipment / {device_id} / status via the MQTT protocol.

[0035] ③ Data on changes in population density: Data source: Network high-definition cameras deployed in key areas such as the workshop material preparation area and finished product stacking area.

[0036] Acquisition and Analysis: The video stream is fed into an edge computing device (such as NVIDIA Jetson AGX Orin). This device runs a trained object detection model (such as YOLOv5s) to analyze the video stream in real time and count the number of people in the frame.

[0037] Data reporting: Edge computing devices calculate the time-weighted average of the number of people per second within a 30-minute statistical period, which serves as the "personnel density index". At the end of each period, the device reports this index, along with the period end timestamp and region ID, to the central data service via HTTPS.

[0038] (3) Unified data entry and standardization All raw data collected through the above channels is sent to a unified data access layer (such as a message queue built on Apache Kafka).

[0039] The energy management system's HTTP receiving interface, MES data synchronization service, MQTT Broker, and personnel counting reporting interface all act as producers, publishing data to corresponding topics in Kafka, such as: topic_energy_raw / topic_mes_events / topic_equipment_status / topic_personnel_density.

[0040] A data standardization microservice acts as a consumer, consuming messages from these topics. This service cleanses data from different sources (e.g., deduplication, outlier filtering), aligns timestamps (to UTC time), converts formats, and ultimately organizes all data into a time series database (such as InfluxDB), providing a unified and standardized data foundation for subsequent processing step S2.

[0041] Data in InfluxDB is organized by measurement name and tag, for example, Energy consumption data is stored in energy_consumption measurement, labeled unit=M1, with fields electricity=125.6 and compressed_air=8.3; Work order events are stored in production_events measurement with the label unit=M1 and event_type=JOB_START.

[0042] S2: Calculation of Energy Consumption Curve Shape Indicators This step, based on the historical data collected and stored by S1, performs quantitative analysis on the energy consumption sequence of each production unit, extracting three types of morphological indicators that characterize its inherent patterns. This process is executed by a periodic batch job (e.g., weekly), with the analysis window being the most recent consecutive 30 calendar days.

[0043] (1) Data Preprocessing First, query the raw energy consumption data of each production unit within a specified historical period from the time-series database. Taking Unit_M1 as an example, query all its energy consumption records from T_start to T_end (a total of 30 days).

[0044] Data alignment and integration: For ease of processing, the power consumption (kWh) and compressed air consumption (kWh) are integrated. Converted to standard carbon emissions (unit: ) based on energy conversion factor. The conversion formula is: Total carbon emissions = Electricity consumption * Grid emission factor + Gas consumption * Compressed air carbon emission factor. This generates a single time series of standard carbon emission intensity for each unit, spaced 15 minutes apart, denoted as... ,in For unit identification, For a point in time.

[0045] Data cleaning: cleaning the sequence The data was cleaned, and linear interpolation was used to fill in individual data gaps caused by communication interruptions. A method based on... The principle of statistical methods is to remove obvious outliers.

[0046] (2) Calculation of morphological indicators For each pre-processed unit carbon emission time series Calculate the following three types of indicators respectively: ① First type of indicators: Volatility indicators These indicators reflect the dispersion and fluctuation of energy consumption.

[0047] The calculation process for load factor variance is as follows: For each day, calculate the instantaneous carbon emission intensity sequence for 96 time points (24 hours * 4 points / hour) of this unit. .

[0048] Calculate the daily average carbon intensity for that day. .

[0049] Calculate the instantaneous load rate at each moment of the day. ,in This is the rated carbon emission power of the unit (which can be calculated based on the rated power of the equipment and the operating time).

[0050] Calculate 96 on that day Sample variance of values .

[0051] Variance of each day over 30 days Calculate the arithmetic mean to obtain the load factor variance index of the unit.

[0052] The calculation process for the peak-to-valley ratio is as follows: For each day, find the daily carbon emission intensity sequence. maximum value and minimum value .

[0053] Calculate the daily average carbon intensity for that day. .

[0054] Calculate the peak-to-valley ratio for that day. .

[0055] For each day within 30 days Calculate the arithmetic mean to obtain the peak-to-valley ratio index for this unit.

[0056] ② Second category of indicators: Periodic indicators These indicators identify recurring patterns hidden in energy consumption sequences.

[0057] Detrending of sequences: For a complete 30-day sequence By fitting a linear trend line using the least squares method and subtracting the trend, the detrended sequence is obtained. This is to eliminate the influence of long-term trends on cycle analysis.

[0058] Fast Fourier Transform: for The spectrum is obtained by applying the Fast Fourier Transform. .

[0059] Power spectrum calculation: Calculate the power spectral density of the spectrum. .

[0060] Main cycle identification: After excluding the DC component (zero frequency), search within a reasonable frequency range (e.g., corresponding to a cycle from 15 minutes to 24 hours). The maximum value point. The frequency corresponding to this point. This is the dominant frequency.

[0061] Indicator Calculation: Main cycle: (Unit: hours or days). For example, if the dominant frequency corresponds to 2 cycles per day, then the main cycle is 12 hours.

[0062] Principal period intensity: the intensity at the point of maximum value. The value, after normalization, serves as an intensity indicator; the larger the value, the more significant the periodicity.

[0063] ③ Third category of indicators: Event responsiveness indicators These indicators quantify the sensitivity of energy consumption to specific production events.

[0064] Work order response intensity: Extract all "Work Order Start" events that occurred in this unit within the analysis period from the event database, and denot them as the event set. Each event It has its own time of occurrence .

[0065] For each event : Calculate the time 1 hour before the event ( Average carbon emission intensity within ) .

[0066] Calculate 1 hour after the event ( Average carbon emission intensity within ) .

[0067] Calculate the instantaneous response strength of the event. .

[0068] Calculate all of this unit arithmetic mean To prevent the influence of extreme values, [measures can be taken]. Perform winsorizing (e.g., truncate at the 5th and 95th percentiles) before averaging. This is the work order response intensity index for that unit.

[0069] Equipment status response intensity: From the event database, extract all events in which the state of all critical equipment in this unit changes from "low-energy consumption state" (such as standby, shutdown) to "high-energy consumption state" (such as injection, pressure holding), and denote them as a set. .

[0070] For each event : Calculate the average carbon intensity 15 minutes before the state transition. .

[0071] Calculate the average carbon intensity 15 minutes after the state transition. .

[0072] Calculate the instantaneous response strength of the event. .

[0073] Calculate all of this unit arithmetic mean , as an indicator of the intensity of equipment condition response.

[0074] (3) Result Storage After the calculation is completed, each production unit will receive a morphology indicator vector containing the above six specific values ​​(load rate variance, peak-to-valley ratio, master cycle, master cycle intensity, work order response intensity, and equipment status response intensity). This vector is stored in the unit_morphology_features table of a relational database (such as PostgreSQL) for use in the S3 clustering process.

[0075] S3: Production Unit Clustering Based on Dual Constraints This step aims to group units with similar production patterns and carbon emission characteristics into one category, laying the foundation for subsequent differentiated modeling. Its core is to integrate two constraints: "data-driven morphological similarity" and "knowledge-driven event logic," to perform dynamic clustering. This process is executed by the clustering engine service and consists of three stages: data clustering, rule validation, and dynamic adjustment.

[0076] (1) Initial clustering based on energy consumption pattern indicators In this stage, the quantitative morphological indicators calculated based on S2 are used for preliminary grouping using an unsupervised machine learning algorithm.

[0077] ① Feature vector construction This step constructs a multi-dimensional feature vector for each production unit. To balance the dimensions of each indicator, Z-score normalization is performed on each morphological indicator for all units. For example, for unit_M1, its feature vector... It can be represented as: V_M1=[Z(Load Rate Variance), Z(Peak-Valve Ratio), Z(Main Cycle), Z(Main Cycle Intensity), Z(Work Order Response Intensity), Z(Equipment Status Response Intensity)] Note: Due to the different units, the intensity of the principal period value is usually taken as the clustering feature.

[0078] ② Clustering algorithm execution We use the K-Means++ algorithm from the scikit-learn machine learning library. This algorithm effectively avoids local optima problems caused by improper selection of initial centroids.

[0079] Input: The standardized feature vector matrix of all production units.

[0080] Key parameter: Preset number of clusters K. In this embodiment, the elbow rule or profile coefficient method is used to automatically determine the number of clusters. The program calculates the sum of squared intra-cluster errors for different K values ​​(e.g., 2 to 5), and selects the K value corresponding to the inflection point (e.g., K=3) as the initial number of clusters.

[0081] Output: Each unit is assigned an initial cluster label. Assume three initial clusters are obtained: Initial_Cluster_1:{Unit_M1,Unit_M2} Initial_Cluster_2:{Unit_M3,Unit_Cooling} Initial_Cluster_3: {} (may be empty or contain other units) (2) Validation and adjustment based on business rule base This stage introduces domain knowledge to refine the purely data-driven clustering results, making them more aligned with actual production management practices. Business rules are stored in a structured manner in a rule base (such as the Drools rule engine or a JSON configuration file).

[0082] ① Example of rule base definition:

[0083] ② Rule execution and calculation logic: 1) Execute rule "R001" (schema difference separation): For Unit_M3 and Unit_Cooling in Initial_Cluster_2, extract the work order event sequences of each from the MES event history of S1 over the past 30 days.

[0084] Define the calculation method for production planning coordination degree SCORE: Divide the time axis into 15-minute intervals. If there are active work orders in both intervals, it is counted as a "coordination period". SCORE = Number of coordination periods / (Total number of active work order periods in Unit_M3).

[0085] Calculations show that Unit_M3 work orders are mostly independent small batches, while Unit_Cooling has no work order concept. Its start / stop coordination with Unit_M3 work orders (SCORE) is extremely low (e.g., <0.1), far below the preset threshold. (e.g., 0.6).

[0086] According to rule R001, the two are determined to have "different production planning modes," triggering a separation action. The system removes Unit_Cooling from Initial_Cluster_2.

[0087] 2) Execute rule "R002" (event synchronization and merging): Similarly, extract the work order event sequences for Unit_M1 and Unit_M2 in Initial_Cluster_1.

[0088] Define the calculation method for the event synchronization drive strength SYNC: Take the work order start event of Unit_M1 as the benchmark, and check the preceding and following events of Unit_M2. Check if there are any work order start events within the time window (e.g., 30 minutes). Count the number of synchronization events. SYNC = Number of synchronization events / Total number of work order start events in Unit_M1.

[0089] Calculations show that the SYNC values ​​of Unit_M1 and Unit_M2 are very high (e.g., >0.85), exceeding the preset threshold. (e.g., 0.7).

[0090] According to rule R002, it is determined that the two are "driven by the same production plan event", triggering a merge action (in this case, to maintain the merged state).

[0091] (3) Generate the final carbon emission characteristic cluster After adjustments by the rules engine, a final stable clustering result is formed. The system assigns a descriptive name and a unique ID to each final cluster and records all unit members within the cluster.

[0092] Table 1 Final Clustering Results

[0093] The clustering results are persistently stored in the system configuration database and serve as key inputs for independently selecting and training prediction models for each cluster in step S4. This dual-constraint clustering mechanism of "data + rules" is one of the core innovations of this method in improving the relevance and accuracy of subsequent prediction models.

[0094] S4: Construction and Training of Clustered Carbon Emission Prediction Sub-model This step involves selecting a prediction algorithm, constructing a feature engineering pipeline, and training the model for each carbon emission feature cluster identified by S3, based on its unique energy consumption patterns and event response characteristics. This results in the creation of a dedicated carbon emission prediction sub-model for each cluster.

[0095] (1) Model selection strategy and decision logic The core basis for model selection is the third type of indicator (event responsiveness indicator) calculated in S2, with reference to other morphological characteristics. The system presets a response intensity threshold (e.g., ), used for quantitative decision-making.

[0096] ① For the mainline synchronous production type feature cluster (F_001, including Unit_M1 and Unit_M2): Judgment: The "work order response strength" index of this cluster is calculated to be 0.72, which is far higher than the threshold. This indicates a strong causal relationship between its carbon emission fluctuations and production planning events (MES work orders).

[0097] Decision: Select a machine learning regression model that uses event-indicating data as the key explanatory variable. This embodiment uses the Extreme Gradient Boosting Regressor (XGBoost Regressor) because it can effectively handle nonlinear relationships and feature interactions, and is robust to missing values.

[0098] ② For the flexible independent production type feature cluster (F_002, including Unit_M3): Judgment: The "work order response strength" index of this cluster is 0.35, slightly lower than the threshold. However, its "main cycle intensity" index is relatively high. This indicates that its carbon emissions have certain inherent time-series patterns and are relatively less affected by event-driven factors.

[0099] Decision: Select a classic time series forecasting model. This embodiment uses the Facebook Prophet model because it can automatically handle trends, seasonality, and holiday effects, and its assumption of the periodicity of the series is consistent with the characteristics of this cluster.

[0100] ③ For conditionally triggered supportive feature clusters (F_003, including Unit_Cooling): Judgment: The "work order response intensity" index of this cluster is extremely low (0.08), but its energy consumption is strongly correlated with the total heat load of the workshop (which can be approximated as the sum of the predicted loads of other clusters) and the ambient temperature.

[0101] Decision: Select a machine learning regression model driven by exogenous variables. This embodiment also uses XGBoostRegressor, but its input features are fundamentally different from those of the F_001 cluster.

[0102] (2) Feature Engineering and Model Input Construction For different types of models, construct differentiated feature inputs.

[0103] ① Feature engineering for XGBoost models in the F_001 cluster: Core event feature transformation (corresponding to claim 6): Planned production intensity characteristics: Retrieve refined production schedules from the MES for the next 24 hours at 15-minute intervals. For each predicted time period t (e.g., the third 15-minute interval in the future), extract all planned work orders belonging to cluster F_001 within the [t-2h, t+2h] time window of the schedule. Multiply the "planned quantity" of each work order by the "unit standard carbon consumption" of the product, and sum them to obtain the planned carbon emission load for that time period, as a continuous feature.

[0104] Equipment status prospective features: Based on historical data of equipment status changes, the expected probability of each unit being in a "high energy consumption state" (such as injection, pressure holding) in the future is statistically analyzed, which serves as another dimension of equipment activity intensity features.

[0105] Spatiotemporal context features include one-hot encoding of "hour cycle" (0-23), one-hot encoding of "day of the week" (1-7), and a flag indicating whether it is a holiday.

[0106] Historical lag characteristics: Carbon emissions of the cluster in the past 1 hour, 3 hours, and 24 hours (at the same period point) are added as characteristics.

[0107] Each row of the final training feature matrix X_F001 corresponds to a historical time point and contains all the feature columns mentioned above.

[0108] ② Data preparation for the Prophet model for cluster F_002: The input to the Prophet model is a DataFrame with two columns: ds (time stamp) and y (target value).

[0109] The historical carbon emission data of all units within the cluster are summed by timestamps to obtain the cluster-level historical total carbon emission sequence y.

[0110] The timestamp ds needs to be converted to Pandas datetime format.

[0111] Additional regression factors (add_regressor) can be added, such as "total planned hours for the day", to incorporate some planning information.

[0112] ③ Feature engineering for the XGBoost model of the F_003 cluster: Key driving features: Heat load source characteristics: The heat load generated by the main production activities in the workshop is characterized using the predicted carbon emissions of the F_001 and F_002 clusters (their historical real values ​​are used during the training phase) as input.

[0113] Environmental characteristics: Access meteorological API to obtain forecast temperature and humidity for the next 24 hours.

[0114] Operating strategy features: including the chiller unit's "set temperature" and "operating mode" (energy saving / standard).

[0115] Historical and periodic features: Same as F_001 cluster, including historical lag features and spatiotemporal context features.

[0116] (3) Model training and hyperparameter optimization ① Data partitioning Historical data from the past 90 consecutive days was used as the training set. For the XGBoost model, the data was divided chronologically, with the first 80% used for training and the last 20% for validation.

[0117] ② Training process For the XGBoost model: Using the xgboost library, initial parameters are set (in this implementation, learning_rate=0.05, max_depth=6, n_estimators=200), and training is performed using the root mean square error (RMSE) as the loss function. Early stopping (early_stopping_rounds=20) is used with the validation set to prevent overfitting. Grid search is then employed to optimize within the hyperparameter space (e.g., learning_rate, max_depth, subsample).

[0118] For the Prophet model: use the prophet library and call the fit() method to perform the fit. Parameters such as seasonality prior scale can be adjusted to adapt to the characteristics of the data.

[0119] ③ Model Saving: The best model that has been trained and validated, along with its corresponding feature engineering pipeline (including normalizers, encoders, etc.), is serialized and saved using joblib or MLflow, and registered in the model repository. Each feature cluster has its own independent model storage path and version number.

[0120] At this point, each carbon emission feature cluster has a tailor-made, trained predictive sub-model, ready for S5's integrated prediction.

[0121] S5: Integrated Prediction and Contribution Analysis of Carbon Emissions This step, as the output stage of the prediction process, is responsible for coordinating and scheduling the prediction sub-models of each feature cluster trained in S4, generating unified workshop-level prediction results, and performing attribution analysis to quantify the expected contribution of each production unit cluster to future carbon emissions.

[0122] (1) Predictive task scheduling and execution This step is automatically triggered daily at a preset time (e.g., 18:00:00 UTC+8) by a predictive scheduling service in the system (implemented based on Apache Airflow in this embodiment). Its core is to execute an atomic predictive pipeline job.

[0123] ① Data preparation before forecasting: The service first accesses an external data interface to obtain the latest future information necessary to perform predictions, including: Retrieve details of the future production plan from T_start (00:00 the next day) to T_end (23:45 the next day) from the MES system.

[0124] Obtain hourly forecast data of future outdoor temperature and humidity for the same period from the meteorological service interface.

[0125] Meanwhile, historical actual carbon emission sequences of each feature cluster are extracted from the internal time series database for a certain window period (such as the most recent 72 hours) before the prediction start time, and used for the construction of lag features required by some models.

[0126] ② Parallel inference in clustered models: Based on the clustering configuration determined by S3, the scheduling service creates parallel prediction tasks for the three feature clusters F_001, F_002, and F_003 respectively.

[0127] Each prediction task loads the prediction sub-model and feature engineering pipeline that are trained and saved in S4 for the corresponding feature cluster.

[0128] Real-time feature construction: Each task dynamically constructs the model input feature matrix for the next 24 hours (96 points at 15-minute intervals) based on the cluster type, using future and historical data obtained in section 5.1.1. This process reuses the feature engineering logic for each cluster defined in S4.

[0129] For F_001 (XGBoost): Construct a feature matrix that includes "future planned production intensity", "equipment status probability", "spatiotemporal characteristics", etc.

[0130] For F_002 (Prophet): Construct a DataFrame containing ds (future time series) and external regression factors (such as total planned hours).

[0131] For F_003 (XGBoost): the "heat load source" part of its input features depends on the intermediate prediction results of F_001 and F_002. Therefore, the system adopts a sequential-parallel hybrid execution strategy: first, the predictions of F_001 and F_002 are executed in parallel, and after both are completed, their outputs are used as known inputs before the prediction of F_003 is started.

[0132] Model inference: Input the constructed feature matrix into the corresponding model and execute the predict method. Each model outputs a vector of length 96. ,in The predicted carbon emissions of feature cluster i in the next t-th 15-minute period are expressed in kilograms of carbon dioxide equivalent. ).

[0133] (2) Prediction Result Integration and Assembly ① Time series alignment and summation Predicted sequences of all feature clusters Naturally aligned in the time dimension (all starting from 00:00 the next day, with 15-minute intervals).

[0134] For each future time point t (t=1 to 96), calculate the total workshop-level forecast value. In this embodiment, the values ​​of F_001, F_002, and F_003 are directly algebraically added together, i.e. .

[0135] This generates a complete workshop-level carbon emission forecast curve for the next 24 hours. .

[0136] ②Persistence and Publication of Prediction Results Integrated overall prediction curve and the independent predicted sequences of each cluster All data are written to the prediction results database (such as a specific measurement in InfluxDB), recording metadata such as prediction generation time, prediction target date, and data source (model version).

[0137] The prediction results are also pushed to a message bus (such as Kafka) to notify downstream data visualization services, alarm services, etc.

[0138] (3) Calculation and analysis of carbon emission contribution Contribution analysis aims to break down the integrated total prediction into individual feature clusters, providing management insights.

[0139] ① Calculation of total predicted cluster level: For each feature cluster i, its predicted total carbon emissions for the next 24 hours By its predicted sequence Integrating (summing) yields: (Unit: kgCO2e) This calculation is performed immediately after the prediction task is completed.

[0140] ② Calculation of contribution percentage: Calculate the predicted total emissions of the workshop .

[0141] Calculate the percentage contribution of each feature cluster i. :

[0142] Examples of calculation results: F_001: 68.5%, F_002: 18.2%, F_003: 13.3%.

[0143] ③Result formatting and visualization: The contribution results are stored as structured data (JSON format) and correlated with the prediction curve.

[0144] The system visualization module (such as a cockpit based on Grafana or ECharts) automatically reads the latest forecast data and generates and displays the following views: Main view: Projected total carbon emissions at the workshop level for the next 24 hours The predicted curves of each feature cluster are overlaid and displayed. As a layer, it visually displays the composition.

[0145] Contribution dashboard: Visually displays contributions in the form of pie charts or stacked bar charts. In the pie chart, each sector represents a feature cluster, and the size of the sector corresponds to its percentage contribution.

[0146] Data table: Provides a detailed list of the name and predicted total emissions for each feature cluster. Contribution percentage ( ).

[0147] This contribution analysis enables workshop managers not only to grasp the overall emission trends but also to clearly identify the main sources of emissions (such as the F_001 main production line), providing direct quantitative basis for implementing precise energy-saving and carbon-reduction measures (such as optimizing the F_001 production schedule).

[0148] II. Carbon Emission Prediction System Reference Figure 2 This disclosure also provides a prediction system for carbon emissions from industrial parks based on multi-source data, the system comprising: The multi-dimensional data interface module is used to synchronize the energy consumption details data from the workshop energy management system and the event indication data from the manufacturing execution system or the Internet of Things sensing network. The dual-constraint clustering engine is used to perform the following operations based on the time-series data provided by the multi-dimensional data interface module: calculate the energy consumption curve shape index of each production unit, and divide the multiple production units into multiple carbon emission feature clusters according to the preset business rule base; wherein, the business rule base defines the logic for judging production synergy and mode differences based on event indication data. The clustering prediction modeling module is used to train and predict the carbon emission sub-prediction results of each cluster for each carbon emission feature cluster output by the dual-constraint clustering engine by calling a model algorithm that matches the characteristics of the cluster. The prediction integration module is used to aggregate the sub-prediction results of each cluster, generate the overall prediction result at the workshop level, and output the prediction contribution of each carbon emission characteristic cluster.

[0149] For a detailed description of this embodiment, please refer to the corresponding descriptions in the foregoing embodiments, which will not be repeated here.

[0150] The basic principles of this disclosure have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in this disclosure are merely examples and not limitations, and should not be considered as essential features of each embodiment of this disclosure. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the scope of this disclosure to the necessity of employing the aforementioned specific details for implementation.

[0151] In this disclosure, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. The block diagrams of devices, apparatuses, devices, and systems involved in this disclosure are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as "comprising," "including," "having," etc., are open-ended terms meaning "including but not limited to," and are used interchangeably with them. The terms "or" and "and" as used herein refer to the terms "and / or," and are used interchangeably with them unless the context clearly indicates otherwise. The term "such as" as used herein refers to the phrase "such as but not limited to," and is used interchangeably with it.

[0152] Additionally, as used herein, the "or" used in a list of items beginning with "at least one" indicates a separate list, such that a list of, for example, "at least one of A, B, or C" means A or B or C, or AB or AC or BC, or ABC (i.e., A and B and C). Furthermore, the word "exemplary" does not imply that the described example is preferred or better than other examples.

[0153] It should also be noted that in the systems and methods of this disclosure, the components or steps can be decomposed and / or recombined. These decompositions and / or recombinations should be considered as equivalent solutions to this disclosure.

[0154] Various changes, substitutions, and modifications can be made to the technology described herein without departing from the teachings defined by the appended claims. Furthermore, the scope of the claims of this disclosure is not limited to the specific aspects of the processes, machines, manufactures, events, means, methods, and actions described above. Currently existing or later-developed processes, machines, manufactures, events, means, methods, or actions that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein can be utilized. Therefore, the appended claims include such processes, machines, manufactures, events, means, methods, or actions within their scope.

[0155] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other aspects without departing from the scope of this disclosure. Therefore, this disclosure is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.

[0156] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of this disclosure to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations thereof.

Claims

1. A method for predicting carbon emissions from industrial park workshops based on multi-source data, characterized in that, include: Obtain detailed energy consumption data for multiple production units within the target workshop, as well as event indication data related to the production activities of each production unit; Based on the energy consumption details, calculate the energy consumption curve shape index of each production unit in the historical period. By combining preset business rules that reflect the characteristics of unit production, the multiple production units are clustered to obtain multiple carbon emission characteristic clusters; wherein, the judgment basis of the business rules is at least partially derived from the event indication data; For each carbon emission characteristic cluster, a corresponding prediction algorithm is selected based on the correlation characteristics between its energy consumption curve shape and event indication data, and a carbon emission prediction sub-model is established. Based on the output of each carbon emission prediction sub-model, the total carbon emission prediction result of the target workshop is generated, and the prediction contribution of each carbon emission feature cluster is correlated and output.

2. The method according to claim 1, characterized in that, The event indication data includes at least one of the following: Production order start / stop event data obtained from the Manufacturing Execution System; Key equipment operating status change data obtained from equipment status sensors; Data on changes in personnel density based on workshop video stream recognition.

3. The method according to claim 1, characterized in that, The calculation of the energy consumption curve shape index based on detailed energy consumption data includes: Extract the first type of indicators characterizing energy consumption fluctuations, including the load factor variance and peak-to-valley ratio within a preset time period; Extract a second type of indicator characterizing the periodicity of energy consumption, including the intensity of the main period identified through spectral analysis; A third type of indicator characterizing the responsiveness of energy consumption events is extracted, including the correlation response strength between the energy consumption curve and the occurrence time of the event marked by the event indication data.

4. The method according to claim 1, characterized in that, The clustering process based on business rules includes: Based on the similarity of the energy consumption curve shape indicators, the production units are initially clustered. The initial clustering results are validated and adjusted using application business rules, which include: If two units belonging to the same initial cluster have event indication data that their production planning patterns are of different types, then they will be assigned to different clusters. If two units belonging to different initial clusters have event indication data that they are driven synchronously by the same production plan event, then they will be merged into the same cluster.

5. The method according to claim 4, characterized in that, The selected corresponding prediction algorithm includes: For clusters that remain stable after initial clustering and business rule verification, and whose third category index is below the threshold, a time series prediction algorithm is selected as the prediction sub-model for the cluster. For clusters whose third category index is higher than the threshold, a machine learning regression algorithm that uses the event indication data as the key input feature is selected as the predictive sub-model for that cluster.

6. The method according to claim 5, characterized in that, When building a predictive sub-model for a machine learning regression algorithm, methods for transforming the event indication data into model input features include: Transform production work order events into planned production intensity characteristics for the next N hours; Transform device state transition events into time-series features that characterize the probability of device start-up and shutdown.

7. A prediction system for carbon emissions from industrial park workshops based on multi-source data, characterized in that, The system for implementing the method of any one of claims 1-6 comprises: The multi-dimensional data interface module is used to synchronize the energy consumption details data from the workshop energy management system and the event indication data from the manufacturing execution system or the Internet of Things sensing network. The dual-constraint clustering engine is used to perform the following operations based on the time-series data provided by the multi-dimensional data interface module: calculate the energy consumption curve shape index of each production unit, and divide the multiple production units into multiple carbon emission feature clusters according to the preset business rule base; wherein, the business rule base defines the logic for judging production synergy and mode differences based on event indication data. The clustering prediction modeling module is used to train and predict the carbon emission sub-prediction results of each cluster for each carbon emission feature cluster output by the dual-constraint clustering engine by calling a model algorithm that matches the characteristics of the cluster. The prediction integration module is used to aggregate the sub-prediction results of each cluster, generate the overall prediction result at the workshop level, and output the prediction contribution of each carbon emission characteristic cluster.