Campus building equipment flexible control method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a hybrid deep reinforcement learning and hierarchical security delivery mechanism (DQN-PPO), the rigid control, data silos, and decision-making fragmentation issues in building equipment management systems are resolved. This enables real-time improvement of equipment operating efficiency and emergency response, while ensuring the security and reliability of control commands.

CN122242858APending Publication Date: 2026-06-19XINGYUAN BRANCH OF CHONGQING SMART NET TECH CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: XINGYUAN BRANCH OF CHONGQING SMART NET TECH CO LTD
Filing Date: 2026-03-24
Publication Date: 2026-06-19

Application Information

Patent Timeline

24 Mar 2026

Application

19 Jun 2026

Publication

CN122242858A

IPC: G06Q10/04; G06Q10/0637; G06Q10/067; G06Q50/06; G06N3/045; G06N3/092

AI Tagging

Application Domain

Forecasting Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

system
JP2026100599AForecasting Commerce
A multi-modal adaptive student physical fitness training method and training system
CN122243071AInput/output for user-computer interaction Medical data mining
An interval optimization scheduling method considering new energy uncertainty and grid flexibility demand
CN117477664BGeneration forecast in ac networkForecasting
Systems and methods for data collection in an industrial environment
US20260161153A1Machine part testing Receivers monitoring
Kitchen waste treatment method, platform and medium for power generation
CN120940362BTransportation and packaging Solid waste disposal

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing building equipment management systems suffer from problems such as insufficient adaptability of rigid control strategies, data silos from multiple sources and heterogeneous structures, the challenge of discrete-continuous hybrid decision-making, and the contradiction between the security and real-time nature of control command issuance, leading to energy waste and an imbalance between comfort.

Method used

A flexible decision engine based on DQN-PPO hybrid deep reinforcement learning is adopted, combined with a hierarchical security delivery mechanism. The DQN network processes discrete device start-up and shutdown decisions, and the PPO network processes continuous parameter adjustment. A spatiotemporal correlation feature fusion model is constructed to achieve unified situational awareness of device status, environmental parameters and user behavior. A hierarchical security verification mechanism is adopted to ensure the reliability and real-time performance of control commands.

Benefits of technology

It improves equipment operating efficiency, reduces grid load during peak hours, meets the second-level response requirements for emergency events, ensures the integrity and security of control commands, and achieves adaptive adjustment of decision-making strategies through a closed-loop feedback optimization mechanism.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242858A_ABST

Patent Text Reader

Abstract

This invention proposes a flexible management and control method and system for building equipment in a park, comprising: S1, constructing a flexible decision engine based on deep reinforcement learning, wherein the engine adopts a dual-network collaborative architecture, including a deep Q-network (DQN) and a proximal policy optimization network (PPO), and the dual-network collaborative architecture achieves collaborative output through a dynamic weight fusion mechanism, with the weight coefficients adaptively adjusted according to the current decision context; S2, adaptively selecting the instruction generation mode based on the trigger type determination result; for active trigger types, generating pre-adjusted instructions based on predictive analysis; for passive trigger types, generating emergency instructions based on event response; S3, establishing a hierarchical distribution strategy based on trigger type and instruction importance, with key instructions adopting a triple confirmation mechanism of cloud generation-edge verification-local execution, regular instructions adopting an efficient mode of edge generation-batch distribution-asynchronous confirmation, and emergency instructions adopting a rapid response mode of direct triggering by local contingency plans and subsequent cloud synchronization.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to intelligent park energy management methods, and more particularly to a flexible control method and system for park building equipment based on DQN-PPO hybrid deep reinforcement learning and hierarchical security deployment. Background Technology

[0002] With the deepening of digital transformation in buildings, as important carriers of energy consumption, the equipment management of park buildings faces the following technical challenges:

[0003] Rigid control strategies lack adaptability: Traditional building automation systems (BAS) often employ rule-based fixed logic (such as "turn on the air conditioner when the temperature is above 28°C"), failing to adapt to complex factors such as outdoor weather conditions, dynamic changes in indoor occupancy density, and electricity price signals, leading to energy waste and an imbalance in comfort. Severe data silos from multiple sources: The campus contains various types of equipment, including HVAC, lighting, water supply and drainage, elevators, and renewable energy generation, with complex data protocols (Modbus, BACnet, OPC UA, etc.), significant differences in data sampling frequencies and formats, and a lack of effective spatiotemporal alignment and fusion mechanisms, making it difficult to form a global situational awareness. Discrete-continuous hybrid decision-making challenges: Building control involves coupled optimization of discrete decisions for equipment start-up and shutdown and mode switching with continuous decisions for temperature setting and frequency adjustment. Existing methods often employ staged independent optimization or simplified assumptions, making it difficult to obtain the global optimal solution within acceptable computational delays and failing to meet real-time, second-level response requirements. The conflict between security and real-time performance in control command issuance: Critical equipment controls such as fire alarm linkages and main power switching require high reliability and security verification, while routine adjustment commands for fine-tuning lighting brightness demand high issuance efficiency. Existing unified transmission mechanisms struggle to accommodate these diverse needs and lack tiered security protection through edge-cloud collaboration. This necessitates solutions from those skilled in the art to address these technical challenges. Summary of the Invention

[0004] This invention aims to solve the technical problems of rigid control of building equipment in parks, delayed response, fragmented decision-making, and single distribution mechanism in the existing technology, and provides a flexible control method based on DQN-PPO hybrid deep reinforcement learning and hierarchical security distribution.

[0005] To achieve the above-mentioned objectives of the present invention, the present invention provides a flexible management and control method for building equipment in a park, comprising:

[0006] S1. A flexible decision engine is constructed based on deep reinforcement learning. The engine adopts a dual-network collaborative architecture, including a deep Q-network (DQN) and a proximal policy optimization network (PPO). The dual-network collaborative architecture achieves collaborative output through a dynamic weight fusion mechanism, and the weight coefficients are adaptively adjusted according to the current decision-making situation.

[0007] S2, based on the trigger type determination result, adaptively select the instruction generation mode; for active trigger types, generate pre-adjustment instructions based on predictive analysis; for passive trigger types, generate emergency instructions based on event response.

[0008] S3 establishes a tiered distribution strategy based on trigger type and instruction importance. Critical instructions adopt a triple confirmation mechanism of cloud generation, edge verification, and local execution. Regular instructions adopt an efficient mode of edge generation, batch distribution, and asynchronous confirmation. Emergency instructions adopt a rapid response mode of direct triggering by local contingency plans and subsequent cloud synchronization.

[0009] In a preferred embodiment of the above technical solution, S1 includes:

[0010] Flexible decision-making mechanism based on DQN-PPO hybrid decision network: The state space of the DQN network includes equipment operation state vector, environmental parameter vector, energy consumption index vector and user behavior vector, and the action space is a discretized combination of equipment start and stop and mode switching options.

[0011] In a preferred embodiment of the above technical solution, S1 includes:

[0012] Establish a coupled constraint model of discrete and continuous actions to identify the dependency relationship between equipment start-up and shutdown states and operating parameters; generate a set of candidate discrete actions in the DQN decision stage, and input each candidate action into the PPO network for continuous parameter optimization.

[0013] In a preferred embodiment of the above technical solution, step S2 includes:

[0014] Trigger types are classified based on time characteristics, event characteristics, and prediction deviation characteristics; active triggering conditions include predicted load deviation exceeding a threshold, arrival of the planned execution time, and triggering at periodic optimization points; passive triggering conditions include abnormal event detection alarms, emergency demand response signals, and external system command access.

[0015] In a preferred embodiment of the above technical solution, step S2 includes:

[0016] For sudden events such as equipment failure, safety alarms, and emergency response, a pre-built contingency plan library is used at local edge nodes to directly trigger the corresponding control sequence after matching the event type. For common scenarios such as load fluctuations, environmental changes, and prediction deviations, a lightweight online optimization algorithm is used to quickly solve for the near-optimal strategy based on the current state snapshot. This layer balances response speed and optimization accuracy and supports rapid inference and deployment of the DQN-PPO network. For strategic scenarios such as day-ahead planning, peak-valley arbitrage, and maintenance arrangements, a solution method combining mixed integer programming and reinforcement learning is used.

[0017] In a preferred embodiment of the above technical solution, step S2 includes:

[0018] Generate trigger type discrimination vector ,

[0019] When predicting load deviation ;in The preset deviation threshold, or the current time t meets the planned execution time. ;,in For tolerance windows, or periodic optimization points where k is a positive integer. To optimize the cycle, upon arrival, it is determined to be an actively triggered type; actively triggered, the complete DQN-PPO collaborative decision-making process is initiated, and high computing resources are allocated for in-depth optimization.

[0020] When the sensor collects values in real time Exceeding the safety constraint range ,Right now Or receive an emergency demand response signal When an external system accesses a mandatory command via API, it is determined to be a passive trigger type. Passive triggers prioritize calling the pre-set emergency rule base, bypassing the deep decision network, and directly generating security-oriented emergency commands.

[0021] Construct a trigger type confidence evaluation function ,in It is the Sigmoid activation function. This is the weight matrix. For bias terms; trigger type adaptive switching, when Confirm trigger type when 0.5 ≤ When the value is ≤0.8, a hybrid decision-making mode is activated, simultaneously generating proactive optimization instructions and passive emergency instructions, which are then executed after being selected by the safety arbitrator.

[0022] In a preferred embodiment of the above technical solution, step S3 includes:

[0023] Constructing an instruction importance scoring model ,in The risk level of equipment operation. For the economic value of the command, For latency sensitivity,

[0024] After the critical instructions are generated, the cloud-based security verification module first performs a logical consistency check to verify whether the instruction sequence meets the device interlock constraints. After successful verification, the data is sent to the edge nodes. The edge nodes then perform executability verification based on a local device status snapshot to confirm the current device status. With the target state of the instruction The reachability is verified; finally, the local controller performs digital signature verification using the SM3 algorithm, and execution is only allowed after verification is successful; if any step fails verification, a rollback mechanism is triggered, and the cloud regenerates the correction instructions.

[0025] In a preferred embodiment of the above technical solution, S1 includes:

[0026] Regular commands are configured for batch delivery and asynchronous confirmation at the edge. Regular commands are generated locally at the edge nodes and delivered in batch mode, with a single communication frame carrying a set of commands from multiple devices. The message is published to the device topic via the MQTT protocol; after execution, the device asynchronously reports the execution result, and the edge node reports the result within the time window. Internally collects confirmation information; if confirmation is missing, a retransmission mechanism is automatically triggered, with a maximum number of retransmissions. ;

[0027] Set up emergency commands to directly trigger local contingency plans for emergency scenarios such as fire-fighting coordination and safety over-limit protection, and pre-configure a local emergency command library. When the triggering conditions are met, the edge node directly calls the pre-stored instructions without waiting for cloud computing. The execution path is a local closed loop from perception to decision-making to execution. The execution records of the instructions are synchronized to the cloud, supporting post-event auditing and strategy optimization.

[0028] The present invention also discloses a computer system, comprising:

[0029] Processor; memory used to store processor-executable instructions;

[0030] The processor is configured to implement the flexible management and control method for building equipment in a park when executing the executable instructions.

[0031] In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are:

[0032] By constructing a spatiotemporal correlation feature fusion model, heterogeneous sensor data is processed uniformly to extract equipment load characteristics, energy consumption patterns, and environmental response laws, forming a high-dimensional situational awareness representation. A DQN-PPO dual-network collaborative architecture is proposed, with DQN handling discrete equipment start-up and shutdown decisions and PPO handling continuous parameter adjustments. Joint optimization of the hybrid action space is achieved through a gating fusion unit. A dual-judgment mechanism, employing both proactive predictive optimization triggering and passive event response triggering, dynamically selects decision paths and computational resource allocation strategies based on trigger type, balancing optimization depth and response speed. A hierarchical instruction distribution strategy based on instruction importance is constructed, implementing a differentiated security verification mechanism, and continuous evolution of the decision strategy is achieved through experience replay and incremental learning.

[0033] Deep reinforcement learning enables dynamic optimization of equipment operating parameters, improving overall energy efficiency and significantly reducing peak-hour grid load. A multi-objective reward function design balances energy efficiency and comfort, reducing the percentage point dissatisfaction (PPD). In passively triggered scenarios, a local contingency plan triggering mechanism ensures emergency response latency of less than 500ms, meeting the safety requirements of fire-fighting linkage and equipment protection. In active-triggered scenarios, edge-cloud collaboration compresses the optimization cycle. A tiered distribution mechanism and multi-digital signature technology ensure the integrity and non-repudiation of critical control commands, preventing man-in-the-middle and replay attacks. A closed-loop feedback optimization mechanism supports online incremental training of the policy network, accumulating operational data through an experience replay pool, allowing decision-making strategies to adaptively adjust with equipment aging, seasonal changes, and evolving usage habits, avoiding policy rigidity.

[0034] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0035] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:

[0036] Figure 1 This is a schematic diagram of the overall architecture of the flexible management and control method for building equipment in the industrial park according to the present invention;

[0037] Figure 2 Diagram of the DQN-PPO dual-network collaborative architecture for a flexible decision engine;

[0038] Figure 3 A diagram of a multi-time-scale hierarchical scheduling architecture;

[0039] Figure 4 Flowchart for trigger type determination and tiered security distribution;

[0040] Figure 5 A schematic diagram of a closed-loop feedback optimization mechanism;

[0041] Figure 6 This is a comparison chart of indoor temperature control curve and energy consumption curve for a typical summer day (August 15, 2024) in an office park, as shown in a specific embodiment. Detailed Implementation

[0042] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0043] like Figure 1 As shown, this invention discloses a flexible management and control system and working method for building equipment in a park, including the following:

[0044] By deploying a multi-type sensor network in the park buildings, real-time equipment operation status data is collected. The sensor network includes temperature and humidity sensors, power monitoring devices, access control systems, WiFi probes and smart meters. The data acquisition frequency is dynamically configured according to the data type. The sampling frequency of key equipment status data is not less than 1Hz, and the sampling frequency of environmental parameter data is 0.1-1Hz, forming a multi-source heterogeneous raw data stream.

[0045] The collected raw data undergoes standardized preprocessing, including data cleaning, outlier removal, time alignment, and format unification. A data fusion model based on spatiotemporal correlation features is constructed to perform multi-dimensional correlation analysis on equipment operation data, environmental data, energy consumption data, and user behavior data, extracting equipment load characteristics, energy consumption patterns, and environmental response laws to form a unified situational awareness representation vector.

[0046] like Figure 2 As shown in Figure S1, a flexible decision engine is constructed based on deep reinforcement learning. The engine adopts a dual-network collaborative architecture, including a deep Q-network (DQN) and a proximal policy optimization network (PPO). The DQN network is responsible for decision optimization in the discrete action space, and the PPO network is responsible for policy optimization in the continuous action space. The two networks achieve collaborative output through a dynamic weight fusion mechanism, and the weight coefficients are adaptively adjusted according to the current decision context. The flexible decision engine includes a flexible decision-making mechanism based on a DQN-PPO hybrid decision network: the state space of the DQN network includes equipment operating state vectors, environmental parameter vectors, energy consumption index vectors, and user behavior vectors; the action space is a discrete combination of equipment start / stop and mode switching options; and the reward function is a weighted sum of energy efficiency improvement rate, comfort compliance rate, carbon emission reduction rate, and equipment lifespan loss rate. The state space of the PPO network shares the underlying representation with the DQN; the action space is a continuous setpoint for temperature, frequency adjustment, valve opening, and power allocation ratio; and the reward function design incorporates a multi-objective Pareto front tracking mechanism. The outputs of the two networks are dynamically weighted through a gating fusion unit, and the gating signal is jointly determined by the urgency of the current situation, optimization complexity, and historical decision-making effects.

[0047] Establish a coupling constraint model for discrete and continuous actions to identify the dependency relationship between equipment start / stop status and operating parameters; generate a set of candidate discrete actions in the DQN decision stage, and input each candidate action into the PPO network for continuous parameter optimization, forming a hierarchical optimization process of "discrete decision-continuous refinement-joint evaluation"; finally output the optimal composite action instruction that satisfies the coupling constraints.

[0048] like Figure 3 As shown in Figure S2, based on the trigger type determination result, an adaptive instruction generation mode is selected. For active trigger types, pre-adjustment instructions are generated based on predictive analysis; for passive trigger types, emergency instructions are generated based on event response. The instruction content includes equipment identification, action type, target parameters, execution sequence, and safety constraints. The action type covers a composite combination of discrete switching actions and continuous adjustment actions. A trigger type identification module is constructed to classify trigger types based on time characteristics, event characteristics, and prediction deviation characteristics. Active trigger determination conditions include predicted load deviation exceeding a threshold, arrival of the planned execution time, and triggering at periodic optimization points. Passive trigger determination conditions include abnormal event detection alarms, emergency demand response signals, and external system instruction access. Different trigger types correspond to differentiated decision path selection, computing resource allocation, and instruction format generation strategies.

[0049] like Figure 4 As shown in Figure S3, a hierarchical distribution strategy based on trigger type and instruction importance is established. Critical instructions adopt a triple confirmation mechanism of cloud generation, edge verification, and local execution. Regular instructions adopt an efficient mode of "edge generation, batch distribution, and asynchronous confirmation". Emergency instructions adopt a rapid response mode of direct triggering by local contingency plans and subsequent cloud synchronization. All instructions are digitally signed using the SM3 algorithm, and encryption and integrity verification are implemented during transmission.

[0050] Monitor the execution status and actual effect of instructions, and quantitatively evaluate the control deviation; construct an experience replay pool to store decision-execution-effect samples to support incremental training and online updates of the policy network; dynamically adjust the exploration-utilization balance parameters based on feedback results to achieve continuous evolution of decision-making strategies.

[0051] The system constructs equipment operation status through deep reinforcement learning, including the start / stop status, operation mode, current power, cumulative runtime, and health index of each device; environmental parameters, including indoor and outdoor temperature and humidity, light intensity, personnel density, CO2 concentration, and outdoor weather forecast; energy consumption indicators, including real-time power, cumulative electricity consumption, percentage of energy consumption by item, peak and off-peak period distribution, and energy efficiency benchmarking results; and user behavior, including historical energy consumption habits, reservation information, comfort preference settings, and active adjustment records. After being encoded by the embedding layer, the above four types of state vectors are fused into a unified state representation through a spatiotemporal attention mechanism.

[0052] The action space is defined as follows: discrete action space includes single device start / stop, multi-device combination start / stop, operation mode switching, linkage scene activation, and system reset operations. The dimensions increase exponentially with the number of devices, and an action masking mechanism is used to shield inactive actions. Continuous action space includes temperature setpoint adjustment (±2℃ range), fan frequency adjustment (30-60Hz range), valve opening adjustment (0-100% range), lighting brightness adjustment (10-100% range), and power distribution ratio adjustment. The action amplitude of each dimension is limited by the physical constraints and safety boundaries of the equipment.

[0053] The reward function adopts a multi-objective weighted summation form. ,in The energy efficiency improvement rate is defined relative to a historical baseline. The comfort level compliance rate is a comprehensive score based on temperature, humidity, and air quality. This is the carbon emission reduction rate, a value relative to the quota benchmark. The reduction rate of equipment lifespan loss is achieved by suppressing the frequency of start-ups and shutdowns and load fluctuations; , , , These are the weighting coefficients for energy efficiency, comfort, carbon emissions, and equipment lifespan, respectively, and satisfy ∑w i =1, the weighting coefficients are dynamically adjusted according to the park's operational goals. Under the energy-saving priority mode, w1=0.4, w2=0.2, w3=0.3, w4=0.1, and under the comfort priority mode, w1=0.2, w2=0.4, w3=0.2, w4=0.2.

[0054] In a preferred embodiment of the above technical solution, the dual-network collaborative architecture includes:

[0055] The DQN network adopts a Dueling DQN structure, which separately estimates the state value function and the action advantage function. The network input is the fused state representation, and the output is the Q-value estimate of each discrete action. The experience replay pool capacity is set to 10. 6 The transitions employ a priority experience replay mechanism, sampling probability. ,in ϵ=0.01 is a small constant to avoid zero priority. =0.6 is the priority index, and the sampling priority is positively correlated with the absolute value of the TD error; the target network updates the main network parameters once every 1000 steps, and the Double DQN mechanism is used to alleviate the problem of Q-value overestimation.

[0056] The PPO network adopts an Actor-Critic architecture. The policy network outputs the mean and standard deviation parameters of a Gaussian distribution, and the value network outputs the state value estimate. A generalized advantage estimation (GAE) is introduced to calculate the advantage function with parameters λ=0.95 and γ=0.99, where λ is the exponential decay coefficient of the GAE generalized advantage estimation and γ is the discount factor. A proximal pruning objective function is adopted with a pruning coefficient ε=0.2, where ε is the pruning coefficient of the PPO proximal pruning objective function. Multi-step batch updates are supported, with a batch size of 2048 samples and 10 optimization rounds. The action output is mapped to the effective action range after tanh activation and a log probability correction is applied to ensure reversibility.

[0057] Dynamic weight fusion mechanism: A context-aware gating network is constructed. Inputs include a current state urgency score based on prediction bias and constraint violation, a complexity score of the optimization problem based on the decision dimension and the number of coupled constraints, and a historical decision performance score based on recent reward accumulation. The gating network outputs a fusion weight α∈[0,1] of the DQN and PPO network outputs, and the final action instruction is... Discrete actions are embedded and encoded to align with the dimensions of continuous actions; the gating network and the main policy network are jointly trained with the goal of maximizing long-term cumulative reward. This represents the discrete action command vector output by the DQN network. The continuous action parameter vectors output by the PPO network are aligned in dimension and then weighted and fused to generate the final composite action instruction a. The composite action is then embedded and encoded to align with the dimension of the continuous action. The gating network and the main policy network are jointly trained with the goal of maximizing the long-term cumulative reward.

[0058] like Figure 4 As shown, in the preferred embodiment of the above technical solution, an intelligent scheduling strategy is adopted for the trigger type determination process, and a multi-time-scale layered architecture is executed:

[0059] The second-level real-time response layer, with a response cycle of 1-10 seconds, uses a pre-built rule base combined with a lightweight neural network. It is suitable for scenarios such as rapid isolation of device faults, protection against safety over-limits, and rapid execution of demand responses, and is deployed on edge nodes.

[0060] With a minute-level optimization and adjustment layer and a response cycle of 1-15 minutes, it adopts complete DQN-PPO collaborative inference and is suitable for scenarios such as load tracking optimization, real-time energy efficiency improvement, and dynamic comfort adjustment. It is deployed on edge-cloud collaborative nodes.

[0061] The hourly-level planning and scheduling layer, with a response cycle of 1-24 hours, adopts a hybrid optimization of MPC+MILP+reinforcement learning, and is suitable for day-ahead unit combination, energy procurement strategy, and maintenance plan orchestration scenarios. It is deployed in the cloud.

[0062] For equipment failures, safety alarms, and emergency response incidents, the response latency must be less than 500ms. A pre-built contingency plan library is used on local edge nodes, and the corresponding control sequence is triggered directly after matching the event type. During the execution of the contingency plan, the data is simultaneously reported to the cloud, supporting remote manual takeover. This layer prioritizes system safety and personnel comfort, with economic objectives being secondary.

[0063] For common scenarios such as load fluctuations, environmental changes, and prediction deviations, the optimization cycle is 5-15 minutes; a lightweight online optimization algorithm is adopted to quickly solve for the near-optimal strategy based on the current state snapshot; this layer balances response speed and optimization accuracy, and supports rapid inference deployment of DQN-PPO network; the optimization results are distributed to edge execution nodes in batches after security verification.

[0064] For strategic scenarios such as day-ahead planning, peak-valley arbitrage, and maintenance arrangements, the optimization cycle is 1-24 hours; a solution method combining mixed integer programming and reinforcement learning is adopted, taking into account multi-time period coupling constraints and uncertainties; this layer pursues global optimization, has high computational complexity, and is deployed on high-performance computing resources in the cloud; the optimization results are decomposed into executable time period instruction sequences, which are distributed to the edge and local layers level by level.

[0065] In the preferred embodiment of the above technical solution, the conflict resolution mechanism includes:

[0066] Construct an equipment association map to identify energy flow coupling in combined cooling, heating and power systems, air-water balance constraints in HVAC systems, and light-heat coupling relationships in lighting and shading systems; transform physical coupling constraints into mathematical equations or inequalities and embed them into the optimization solution process; for strongly coupled equipment groups, adopt a joint decision-making strategy rather than an independent optimization strategy.

[0067] A multi-dimensional priority assessment model is established, with assessment factors including user type in key or general areas, time sensitivity during production or non-production periods, economic value during high or low electricity price periods, and safety level of fire protection systems or general equipment. After normalization, each factor is weighted and summed to generate a dynamic priority score. When resource conflicts occur, the service order is determined by prioritizing the scores, and alternative strategies such as delayed execution or partial satisfaction are adopted for low-priority needs.

[0068] For multi-device collaborative scenarios, control commands are organized into a directed acyclic graph (DAG) with priority, where nodes represent control actions and edges represent execution dependencies. Topological sorting and critical path analysis are used to identify subsets of commands that can be executed in parallel and those that must be executed sequentially. The optimization objective is to minimize the total execution time and maximize resource utilization, with constraints including device response timing, state transition safety intervals, and manual operation windows.

[0069] In a preferred embodiment of the above technical solution, the trigger type determination includes:

[0070] The process of extracting and encoding multi-dimensional features includes extracting time features (current time, weekday / holiday identifier, peak / valley electricity price period label), event features (equipment alarm level, user reservation record, external system command type, prediction deviation amplitude), and system status features (load deviation rate, energy storage SOC status, renewable energy consumption gap). These features are then standardized and input into the multilayer perceptron (MLP) encoder to generate a trigger type discrimination vector. The process involves standardizing and concatenating 4-dimensional time features, 64-dimensional event features, and 60-dimensional system state features to generate... Trigger type discrimination vector of dimension For feature dimensions.

[0071] The conditions for generating active triggers are as follows:

[0072] When predicting load deviation ;in The preset deviation threshold is 0.15 by default, or the current time t meets the planned execution time. ;,in This is the tolerance window, defaulting to 30 seconds, or periodically optimized at specific times. where k is a positive integer. To optimize the cycle, a default of 5 minutes is used. Upon arrival, it is determined to be an actively triggered type. Active triggering initiates the complete DQN-PPO collaborative decision-making process, allocating high computing resources for in-depth optimization, including... The predicted power is based on the output of the load forecasting model. This refers to the actual power collected in real time by the smart meter. This refers to the rated power of the equipment.

[0073] The passive trigger condition is generated as follows:

[0074] When the sensor collects values in real time Exceeding the safety constraint range ,Right now Or receive an emergency demand response signal When an external system accesses a mandatory command via API, it is determined to be a passive trigger type. Passive triggers prioritize calling the pre-set emergency rule base, bypassing the deep decision network, and directly generating security-oriented emergency commands with a response latency controlled within 500 milliseconds.

[0075] Construct a trigger type confidence evaluation function ,in It is the Sigmoid activation function. The weight matrix for triggering the type discriminator. For bias terms; trigger type adaptive switching, when Confirm trigger type when 0.5 ≤ When the value is ≤0.8, a hybrid decision-making mode is activated, simultaneously generating proactive optimization instructions and passive emergency instructions, which are then executed after being selected by the safety arbitrator.

[0076] In a preferred embodiment of the above technical solution, the hierarchical security distribution mechanism includes:

[0077] Constructing an instruction importance scoring model ,in This represents the risk level of equipment operation, with a dimensionless value ranging from 0 to 10. The economic value of the instruction is expressed in yuan per hour. This is the time delay sensitivity, with a value ranging from 0 to 1 and being dimensionless. , , These are the weighting coefficients for equipment risk level, economic value, and latency sensitivity, respectively, satisfying... The default values are 0.4, 0.3, and 0.3; all evaluation factors are normalized to a dimensionless score, where... After linear normalization, it is mapped to the interval [0,10]. , After dimensional transformation, the scores are standardized from 0 to 10. A dimensionless score for the importance of instructions is then generated by weighted summation. Based on the scoring results, the instructions are divided into key instructions. ≥8, the standard instruction is 3≤ <8 and emergency orders ≥9 and ≥0.9;

[0078] After the critical instructions are generated, the cloud-based security verification module first performs a logical consistency check to verify whether the instruction sequence meets the device interlock constraints. After successful verification, the data is sent to the edge nodes. The edge nodes then perform executability verification based on a local device status snapshot to confirm the current device status. With the target state of the instruction The reachability is verified; finally, the local controller performs digital signature verification using the SM3 algorithm, and execution is only allowed after verification is successful; if any step fails verification, a rollback mechanism is triggered, and the cloud regenerates the correction instructions.

[0079] Regular commands are configured for batch delivery and asynchronous confirmation at the edge. Regular commands are generated locally at the edge nodes and delivered in batch mode, with a single communication frame carrying a set of commands from multiple devices. Where N≤10, the data is published to the device topic via the MQTT protocol; after execution, the device asynchronously reports the execution result, and the edge nodes report the result within the time window. Confirmation information is collected within seconds; if a confirmation is missing, a retransmission mechanism is automatically triggered, with a maximum number of retransmissions. ;

[0080] Set up emergency commands to directly trigger local contingency plans for emergency scenarios such as fire-fighting coordination and safety over-limit protection, and pre-configure a local emergency command library. When the triggering conditions are met, the edge node directly invokes the pre-stored instructions without waiting for cloud computing, creating a local closed loop execution path from perception to decision-making to execution; after the instructions are executed... The execution record will be synchronized to the cloud within seconds, supporting post-event auditing and strategy optimization.

[0081] Transmission security and integrity verification: All command payloads are encrypted using the AES-256-GCM encryption algorithm, and an SM3 hash value is appended as an integrity verification code; critical commands are additionally timestamped. A nonce is used to prevent replay attacks; communication between the edge and local areas uses TLS 1.3 two-way authentication based on digital certificates, with certificate validity periods. The system will automatically rotate 7 days before the expiration date.

[0082] like Figure 5 As shown, in the preferred embodiment of the above technical solution, closed-loop feedback optimization includes:

[0083] Quantitative evaluation of execution effectiveness: Building a device response model ,in To issue control commands, τ represents the communication and execution delay, and ddist represents an unmeasurable disturbance; calculate the control deviation. ,in The evaluation window is set to 10 minutes by default. For target control trajectory; when , This is the tolerance threshold for deviations; a default value of 0.05 indicates an execution anomaly.

[0084] Experience replay pool construction and sampling: Build a priority experience replay pool D in the cloud and store tuples. ,in Let t be the state at time t. In order to perform the action, For instant rewards, This is the state after the transition. For TD error; a priority empirical replay mechanism is adopted, sampling probability. ,in ϵ=0.01 is a small constant to avoid zero priority. =0.6 is the priority index.

[0085] Incremental training of policy networks: Based on newly acquired execution data, incremental training is performed on the DQN and PPO networks, with a batch size B=256 and a learning rate of [missing information]. , The Adam optimizer is employed; an Elastic Weight Consolidation (EWC) mechanism is introduced to prevent new knowledge from overwriting historical strategies. The EWC regularization coefficient is... =10 4 Each cumulative Each new sample triggers an online update, which is performed in shadow mode. Only after successful verification is the sample switched to the production environment.

[0086] Exploration - Utilizing Balanced Dynamic Adjustment: Designing an Adaptive Exploration Rate Adjustment Function

[0087] ,in =0.3、 =0.01 represents the upper and lower limits of the exploration rate, respectively. To explore the rate decay time constant, The recent strategy performance score is based on cumulative reward normalization, where e is a natural constant with a value of approximately 2.718. When the system is in a stable operating state, the exploration rate is increased to discover better strategies, while the exploration rate is reduced and mature strategies are relied upon when anomalies occur frequently.

[0088] By constructing a spatiotemporal correlation feature fusion model, heterogeneous sensor data is processed uniformly to extract equipment load characteristics, energy consumption patterns, and environmental response laws, forming a high-dimensional situational awareness representation. A DQN-PPO dual-network collaborative architecture is proposed, with DQN handling discrete equipment start-up and shutdown decisions and PPO handling continuous parameter adjustments. Joint optimization of the hybrid action space is achieved through a gating fusion unit. A dual-judgment mechanism, employing both proactive predictive optimization triggering and passive event response triggering, dynamically selects decision paths and computational resource allocation strategies based on trigger type, balancing optimization depth and response speed. A hierarchical instruction distribution strategy based on instruction importance is constructed, implementing a differentiated security verification mechanism, and continuous evolution of the decision strategy is achieved through experience replay and incremental learning.

[0089] The present invention discloses a flexible management and control method for building equipment in a park, which includes six core steps: data acquisition at the whole-domain perception layer, fusion and analysis of multi-source heterogeneous data, construction of a flexible decision engine, generation of dynamic control instructions, hierarchical security issuance mechanism, and closed-loop feedback optimization.

[0090] The flexible decision engine adopts a DQN-PPO dual-network collaborative architecture. The DQN network is responsible for discrete action space decision-making. The state space includes equipment operating state vectors, environmental parameter vectors, energy consumption index vectors, and user behavior vectors. The action space is a discrete combination of equipment start-up and shutdown. The reward function is a weighted sum of energy efficiency improvement rate, comfort compliance rate, carbon emission reduction rate, and equipment lifespan loss rate. The PPO network is responsible for continuous action space strategy optimization. The action space includes continuous temperature setpoints, frequency adjustments, valve openings, and power allocation ratios. The reward function incorporates a multi-objective Pareto front tracking mechanism. The outputs of the two networks are dynamically weighted through a gating fusion unit. The gating signal is jointly determined by the urgency of the current situation, optimization complexity, and historical decision-making effects.

[0091] Example 1: Flexible Management of Central Air Conditioning Systems in Commercial Office Parks

[0092] This embodiment is applied to a commercial office park with a building area of 50,000 square meters, comprising 3 office buildings. Each building is equipped with 4 water-cooled screw chillers (rated cooling capacity 800kW / unit), 12 chilled water pumps, 12 cooling water pumps, and 96 combined air conditioning units (AHUs). Before implementing the method of this invention, the system used a fixed timetable control (starting at 7:00, stopping at 19:00, set temperature 26℃), which had problems such as excessively long pre-cooling time in the morning, excessively cold at noon, and localized overheating in the afternoon.

[0093] S1: Data collection at the overall perception layer includes power monitoring devices with a sampling frequency of 1Hz and a measurement accuracy of 0.5 class installed on each chiller unit, water pump, and AHU, along with vibration sensors with a sampling frequency of 10Hz; temperature and humidity sensors are deployed in key office areas, one per 100 square meters with a sampling frequency of 0.1Hz, CO2 concentration sensors are deployed in areas, one per 200 square meters with a sampling frequency of 0.05Hz, and personnel density detection cameras are deployed with a sampling frequency of 1 frame / minute, using AI algorithms to identify the number of people; smart meters are installed in the park's main power distribution room with a sampling frequency of 1Hz for separate metering; an integrated weather station is used to obtain outdoor temperature and humidity, solar radiation intensity with a sampling frequency of 0.1Hz, and weather forecasts for the next 4 hours are updated every 15 minutes; and building access control systems and WiFi probes are used to obtain personnel flow data.

[0094] S2: The process of multi-source heterogeneous data fusion and analysis involves constructing a data standardization preprocessing pipeline. Outlier removal is performed on power monitoring data using the 3σ criterion, eliminating outliers with deviations exceeding three times the standard deviation of the mean. Personnel density data undergoes time alignment with linear interpolation at 1-minute intervals. Weather forecast data is converted to the JSON standard format. A data fusion model based on spatiotemporal correlation features is constructed: a graph neural network (GNN) is used to build an equipment correlation graph, where nodes represent physical equipment and edges represent energy flow or control flow connections. A spatiotemporal convolutional network (ST-GCN) is used to extract multi-dimensional correlation features from equipment operation data, environmental data, energy consumption data, and user behavior data, generating a unified situational awareness representation vector. .

[0095] S3: Build a flexible decision engine and deploy a DQN-PPO dual-network collaborative architecture:

[0096] The DQN network configuration process is as follows: state space dimension This includes the current time, one-hot encoded 24-hour timeframe, outdoor temperature normalized to [-1, 1], and indoor temperature deviation. The current number of operating chiller units is set to an integer from 0 to 4; the current system load rate is set to a continuous value from 0 to 1; and the personnel density level is set to an integer from 0 to 3. The action space is a discrete combination of chiller unit start-up and shutdown and switching between AHU standard mode, energy-saving mode, and high-load mode. The reward function is designed as follows: Among them, energy efficiency improvement rate Calculated as Comfort compliance rate The percentage of time the PMV index is in the range of [-0.5, 0.5] is calculated based on the ASHRAE 55 standard.

[0097] The PPO network configuration process involves the state space sharing the underlying representation with DQN, and the action space dimension... This includes chilled water supply temperature setpoints (range 5-9℃), cooling water return temperature setpoints (range 30-35℃), chilled water pump frequency (range 35-50Hz), and cooling tower fan frequency (range 30-45Hz); the reward function introduces a multi-objective Pareto front tracking mechanism to define the comprehensive cost. ,in The electricity price is based on real-time pricing: 1.2 yuan / kWh during peak hours and 0.4 yuan / kWh during off-peak hours. The carbon emission cost is 0.08 yuan / kgCO2. The comfort penalty coefficient is 0.5 yuan / PMV (percentage per unit of deviation).

[0098] The gating fusion mechanism involves constructing a context-aware gating network, with inputs including the absolute value of the current room temperature deviation. The factors considered include urgency level, number of currently operating units, optimization complexity (more units mean higher complexity), and average reward over the past hour. The gating network output fusion weight α∈[0,1] is used; when α>0.7, DQN discrete decision-making is preferred, suitable for scenarios with large load changes; when α<0.3, PPO continuous adjustment is preferred, suitable for steady-state optimization scenarios; otherwise, weighted fusion is performed. ,in Discrete action commands output by the DQN network. For continuous action instructions output by the PPO network.

[0099] S4: Dynamically control command generation, determine trigger type, and set the active trigger condition to every 5 minutes. =300 seconds or when the load forecast deviation >15%; Passive triggering conditions are room temperature exceeding 28℃ or falling below 20℃, and chiller unit malfunction alarm. During the morning peak period of 7:00-9:00 on typical workdays, the system mainly uses active triggering, performing optimization every 5 minutes; during the stable operation period of 10:00-11:00, the triggering frequency is reduced to once every 15 minutes; during the period of strong solar radiation from 14:00-15:00, the passive response is frequently triggered due to the rapid rise in room temperature, switching to the preset emergency rule library and increasing the chilled water supply temperature setpoint to quickly cool down.

[0100] Command content generation: For active triggering, generate pre-adjustment commands such as "Start the discrete action of chiller unit No. 2, set the chilled water supply temperature to 7.5℃ for continuous action, execute immediately, and the safety constraint is that the unit start interval is not less than 5 minutes"; For passive triggering, generate emergency commands such as "Force all available cooling tower fans to the maximum frequency for 15 minutes".

[0101] S5: A tiered safety distribution mechanism is used. Critical commands for starting and stopping the chiller units employ triple confirmation. After the command is generated in the cloud, the interlock constraints of the units are first verified to ensure that two units with an interval of less than 5 minutes will not start simultaneously. Then, the command is sent to the edge controller to verify the current equipment status, confirming that the unit has no fault alarms and is in remote controllable mode. Finally, the local PLC performs SM3 signature verification. Regular commands, such as water pump frequency adjustment, are batch-distributed from the edge, packaged and distributed every 2 seconds, with asynchronous confirmation from the equipment. Emergency commands, such as fire linkage, are directly triggered by the local PLC, bypassing cloud and edge decision-making. The actual measured response latency is 320ms.

[0102] S6: Closed-loop feedback optimization is implemented by building an experience replay pool with a capacity of 500,000 transitions, storing the state, action, reward, and next state for each step. During the low-load period at 2:00 AM daily, the system automatically starts incremental training, fine-tuning the DQN and PPO networks based on the day's data. The learning rate is set to 1e-5, with 5 training epochs, employing an EWC mechanism to prevent catastrophic forgetting. After training, the performance of the new policy is validated in a shadow environment. If the average reward increases by more than 2% for three consecutive days, the system is switched to the production environment.

[0103] Comparison of implementation results:

[0104] August 15, 2024 (Thursday, sunny, outdoor temperature 28-35℃) was selected as a typical test day for comparison with the traditional fixed-set temperature control method:

[0105] sheet

[0106] Performance indicators Traditional methods Method of the present invention Improvement range Daily cumulative energy consumption (kWh) 28,500 21,200 -25.6% Average indoor temperature (°C) 25.8±1.2 25.5±0.4 Volatility reduced by 67% Duration of temperature exceeding the standard (>27℃) 45 minutes 8 minutes -82% Chiller start-up and shutdown times 12 times 7 times -42% Peak power (kW) 1,850 1,420 -23%

[0107] Traditional methods, due to fixed temperature settings, consume excessive energy during the morning pre-cooling phase (7:00-8:00) (peak energy consumption of 1,850kW). The present invention, through predictive optimization, pre-cools gradually at a lower power (800kW) starting at 6:30, smoothing the load curve. During midday (12:00-13:00), some areas experience reduced personnel, and traditional methods still maintain high load operation. The present invention automatically adjusts to energy-saving mode, reducing energy consumption by 30%. During the afternoon high-temperature period (15:00-16:00), the present invention continuously adjusts the chilled water temperature and fan frequency, avoiding temperature fluctuations caused by unit start-up and shutdown in traditional methods.

[0108] Example 2: Flexible Management and Control of Multi-Energy Complementary Systems in Industrial Parks

[0109] This embodiment is applied to an industrial park comprising photovoltaic power generation (2MW), an energy storage system (1MWh / 2MW), diesel generators (backup), and multiple factory buildings. In implementing this invention, the DQN action space is expanded to include discrete decisions such as energy storage charging and discharging modes, diesel generator start / stop, and interruptible load switching; the PPO action space is expanded to include continuous adjustments such as energy storage charging and discharging power, photovoltaic curtailment rate, and diesel generator output; and the reward function includes a renewable energy absorption rate target. Implementation results show a significant improvement in photovoltaic self-absorption rate. By optimizing the charging and discharging depth and frequency, the energy storage cycle life is extended, and the park's annual electricity costs are significantly reduced.

[0110] Figure 1The overall architecture of the flexible management and control method for building equipment in the park is shown in the diagram. From bottom to top, it is divided into the global perception layer (multi-type sensors), the edge layer (data preprocessing and lightweight inference), the cloud layer (DQN-PPO dual network engine and experience playback pool), the execution layer (HVAC / lighting / energy storage equipment, etc.) and the closed-loop feedback loop.

[0111] Figure 2 The diagram shows the DQN-PPO dual-network collaborative architecture of the flexible decision engine, detailing the dual-network architecture: the left-hand DQN network (Dueling structure, Double DQN, priority experience playback) handles the discrete action space of device start-up and shutdown, while the right-hand PPO network (Actor-Critic, GAE, near-end pruning) handles the continuous action space of temperature / frequency setting. The output is dynamically fused through context-aware gating units, and the bottom shows the experience playback pool and multi-objective reward function design.

[0112] Figure 3 The multi-timescale hierarchical scheduling architecture diagram shows the second-level real-time response layer (1-10 seconds, edge nodes, pre-built rule base), the minute-level optimization and adjustment layer (1-15 minutes, edge-cloud collaboration, DQN-PPO inference), and the hour-level planning and scheduling layer (1-24 hours, cloud high-performance computing, MPC+MILP hybrid optimization, with a 24-hour timeline at the bottom indicating the time period of each layer).

[0113] Figure 4 Trigger type determination and hierarchical security distribution flowchart, vertical process display: top data collection → trigger type determination (three branches: active / passive / hybrid, including confidence calculation formula) → instruction generation (pre-adjustment / emergency / hybrid) → bottom hierarchical security distribution (triple confirmation of critical instructions, batch distribution of regular instructions, and local direct contact of emergency instructions), with the latency and reliability indicators of each path marked.

[0114] Figure 5 Diagram of closed-loop feedback optimization mechanism, reinforcement learning standard interaction loop: left side park building environment (four types of state space) ↔ middle DQN-PPO agent (policy network + value network) → right side experience replay pool (priority sampling mechanism) → policy network incremental training module (including EWC regularization) → exploration-utilization balance adjustment module, bottom shows the regulation deviation calculation formula and time step axis.

[0115] Figure 6 Comparison of indoor temperature and energy consumption curves on a typical summer day, with a dual Y-axis design: the left side shows the temperature (°C) to display the control accuracy, and the right side shows the power (kW) to display the energy consumption characteristics.

[0116] Morning pre-cooling phase (6:30-8:00): Traditional methods suddenly start at high power (1850kW peak) at 7:00, resulting in large temperature fluctuations; the present invention performs smooth pre-cooling (800kW) at 6:30, with the temperature slowly dropping to the set value.

[0117] Midday energy-saving mode (12:00-14:00): Traditional methods maintain a high load (1100kW), but this invention automatically reduces the load to 750kW, while the indoor temperature is still maintained at 25.7℃±0.2℃.

[0118] During the afternoon high-temperature period (15:00-18:00): Traditional methods cause power fluctuations (fluctuation of 1400-1600kW) due to the start-up and shutdown of the unit. This invention continuously adjusts (1100±100kW) and stabilizes the temperature at 25.5℃±0.3℃.

[0119] To address the pain points in the management of building equipment in industrial parks and adapt to the needs of digital transformation and green carbon development, a flexible management and control system for building equipment in industrial parks based on AI technology is being developed. This system will replace the traditional manual-driven, fixed-strategy management and control model, and solve problems such as prominent data silos, insufficient intelligent applications, and rigid resource scheduling, thereby promoting improved quality and efficiency in equipment management and energy conservation and carbon reduction.

[0120] 1. Comprehensive Perception Innovation: Integrating AI video analysis, access control, and perimeter monitoring technologies, it constructs an integrated prevention and control system of "comprehensive perception, access control, anomaly alarm, trajectory tracking, and coordinated response," breaking the traditional manual local monitoring model. Through dynamic adaptation and flexible linkage, it achieves precise control of visitors, key personnel, and perimeter anomalies, and can adjust strategies as needed to improve the level of security intelligence and initiative.

[0121] 2. Innovative Equipment Fault Prediction: Integrating AI detection, high-resolution equipment imaging, and dynamic inspection technologies, it enables health monitoring of core equipment in the park, early fault prediction, and generation of personalized inspection plans. This breaks away from the reactive remediation model, adjusts strategies based on equipment type and operating conditions, facilitates intelligent management of the entire equipment lifecycle, achieves one-stop decision support, and flexibly presents differentiated data according to the needs of managers.

[0122] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for flexible management and control of building equipment in a park, characterized in that, include: S1. A flexible decision engine is constructed based on deep reinforcement learning. The engine adopts a dual-network collaborative architecture, including a deep Q-network (DQN) and a proximal policy optimization network (PPO). The dual-network collaborative architecture achieves collaborative output through a dynamic weight fusion mechanism, and the weight coefficients are adaptively adjusted according to the current decision-making situation. S2, based on the trigger type determination result, adaptively select the instruction generation mode; for active trigger types, generate pre-adjustment instructions based on predictive analysis; for passive trigger types, generate emergency instructions based on event response. S3 establishes a tiered distribution strategy based on trigger type and instruction importance. Critical instructions adopt a triple confirmation mechanism of cloud generation, edge verification, and local execution. Regular instructions adopt an efficient mode of edge generation, batch distribution, and asynchronous confirmation. Emergency instructions adopt a rapid response mode of direct triggering by local contingency plans and subsequent cloud synchronization.

2. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S1 includes: Flexible decision-making mechanism based on DQN-PPO hybrid decision network: The state space of the DQN network includes equipment operation state vector, environmental parameter vector, energy consumption index vector and user behavior vector, and the action space is a discretized combination of equipment start and stop and mode switching options.

3. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S1 includes: Establish a coupled constraint model of discrete and continuous actions to identify the dependency relationship between equipment start-up and shutdown states and operating parameters; generate a set of candidate discrete actions in the DQN decision stage, and input each candidate action into the PPO network for continuous parameter optimization.

4. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S2 includes: Trigger types are classified based on time characteristics, event characteristics, and prediction deviation characteristics; active triggering conditions include predicted load deviation exceeding a threshold, arrival of the planned execution time, and triggering at periodic optimization points; passive triggering conditions include abnormal event detection alarms, emergency demand response signals, and external system command access.

5. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S2 includes: For sudden events such as equipment failure, safety alarms, and emergency response, a pre-built contingency plan library is used at local edge nodes to directly trigger the corresponding control sequence after matching the event type. For common scenarios such as load fluctuations, environmental changes, and prediction deviations, a lightweight online optimization algorithm is used to quickly solve for the near-optimal strategy based on the current state snapshot. This layer balances response speed and optimization accuracy and supports rapid inference and deployment of the DQN-PPO network. For strategic scenarios such as day-ahead planning, peak-valley arbitrage, and maintenance arrangements, a solution method combining mixed integer programming and reinforcement learning is used.

6. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S2 includes: Generate trigger type discrimination vector , When predicting load deviation ;in The preset deviation threshold, or the current time t meets the planned execution time. ;,in For tolerance windows, or periodic optimization points where k is a positive integer. To optimize the cycle, upon arrival, it is determined to be an active trigger type; active triggering initiates the complete DQN-PPO collaborative decision-making process, allocating high computing resources for in-depth optimization; When the sensor collects values in real time Exceeding the safety constraint range ,Right now Or receive an emergency demand response signal When an external system accesses a mandatory command via API, it is determined to be a passive trigger type. Passive triggers prioritize calling the pre-set emergency rule base, bypassing the deep decision network, and directly generating security-oriented emergency commands. Construct a trigger type confidence evaluation function ,in It is the Sigmoid activation function. This is the weight matrix. For bias terms; trigger type adaptive switching, when Confirm trigger type when 0.5 ≤ When the value is ≤0.8, a hybrid decision-making mode is activated, simultaneously generating proactive optimization instructions and passive emergency instructions, which are then executed after being selected by the safety arbitrator.

7. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S3 includes: Constructing an instruction importance scoring model ,in The risk level of equipment operation. For the economic value of the command, For latency sensitivity, After the critical instructions are generated, the cloud-based security verification module first performs a logical consistency check to verify whether the instruction sequence meets the device interlock constraints. After successful verification, the data is sent to the edge nodes. The edge nodes then perform executability verification based on a local device status snapshot to confirm the current device status. With the target state of the instruction The reachability is verified; finally, the local controller performs digital signature verification using the SM3 algorithm, and execution is only allowed after verification is successful; if any step fails verification, a rollback mechanism is triggered, and the cloud regenerates the correction instructions.

8. The flexible management and control method for building equipment in a park according to claim 1, characterized in that, S1 includes: Regular commands are configured for batch delivery and asynchronous confirmation at the edge. Regular commands are generated locally at the edge nodes and delivered in batch mode, with a single communication frame carrying a set of commands from multiple devices. The message is published to the device topic via the MQTT protocol; after execution, the device asynchronously reports the execution result, and the edge node reports the result within the time window. Internally collects confirmation information; if confirmation is missing, a retransmission mechanism is automatically triggered, with a maximum number of retransmissions. ; Set up emergency commands to directly trigger local contingency plans for emergency scenarios such as fire-fighting coordination and safety over-limit protection, and pre-configure a local emergency command library. When the triggering conditions are met, the edge node directly calls the pre-stored instructions without waiting for cloud computing. The execution path is a local closed loop from perception to decision-making to execution. The execution records of the instructions are synchronized to the cloud, supporting post-event auditing and strategy optimization.

9. A computer system, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the flexible management and control method for building equipment in a park as described in any one of claims 1 to 8 when executing the executable instructions.