A method and system for optimizing urban risk governance decisions based on reinforcement learning

By constructing a gridded risk map and dynamically adjusting the environmental risk index using reinforcement learning, and combining radar data to generate a dynamic risk heat map, the problems of data silos and response delays in urban risk governance have been solved, achieving efficient risk identification and resource optimization.

CN121543779BActive Publication Date: 2026-06-23INNER MONGOLIA SPACE-TIME BIG DATA DEV CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INNER MONGOLIA SPACE-TIME BIG DATA DEV CO LTD
Filing Date
2025-10-16
Publication Date
2026-06-23

Smart Images

  • Figure CN121543779B_ABST
    Figure CN121543779B_ABST
Patent Text Reader

Abstract

The application provides a city risk management decision optimization method and system based on reinforcement learning, relates to the technical field of city risk management and reinforcement learning, and constructs a grid risk map by collecting position coordinates and historical fault records of city components; an environmental risk index is set in the map, and the weight thereof is dynamically adjusted through reinforcement learning; three-dimensional point cloud data of crowd behavior is obtained through radar emission, and a superimposed dynamic risk heat map is generated by coordinating the map coordinates; a high-risk aggregation area is identified in combination with the map index and the heat map characteristics, a composite event identifier and a disposal instruction are generated according to the space-time evolution thereof; the instruction is implemented in a virtual environment, and the effect is recorded; the repair priority and the crowd route are adjusted; a global decision feedback is generated to update the grid model, a dynamic decision closed loop is formed, and the accuracy and response efficiency of city risk management can be improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of urban risk governance and reinforcement learning technology, and in particular to a method and system for optimizing urban risk governance decisions based on reinforcement learning. Background Technology

[0002] In urban grid-based governance, public safety emergencies such as urban component malfunctions and overcrowding occur frequently, requiring real-time risk perception, accurate identification of high-risk areas, and rapid generation of response strategies. This necessitates that the governance system possess dynamic data fusion capabilities, intelligent risk assessment capabilities, and cross-grid collaborative decision-making capabilities to address the governance needs arising from the complexity of urban spaces and the strong interconnectedness of risks.

[0003] Currently, the existing solution for this type of need is an early warning system based on fixed sensor monitoring and preset rules. This system collects data on the status of urban components and pedestrian flow by deploying sensors within a grid, sets fixed thresholds to trigger early warnings, and each grid responds independently according to preset handling procedures, forming an initial governance chain of monitoring, early warning, and basic handling.

[0004] However, existing solutions have obvious drawbacks: sensor data is stored in various departmental systems, making it difficult to achieve cross-grid risk correlation analysis; the preset rules are fixed, making it impossible to dynamically adjust risk assessment standards according to the aging trend of urban components and the pattern of population flow; each grid's response strategy is decided independently, lacking a global optimization mechanism, resulting in delayed identification of high-risk areas and unbalanced allocation of disposal resources, making it difficult to solve the problems of data silos and response delays among multiple departments. Summary of the Invention

[0005] The purpose of this application is to provide a method and system for optimizing urban risk governance decisions based on reinforcement learning, so as to solve the problems of data silos and response delays in the prior art.

[0006] To address the aforementioned technical problems, firstly, this application provides a method for optimizing urban risk governance decisions based on reinforcement learning, comprising:

[0007] The location coordinates and historical fault records of urban components are collected. Based on the relationship between the location coordinates of each urban component and the historical fault records, a gridded risk map is constructed. The urban components are manhole covers and fire hydrants.

[0008] An environmental risk index is set in the gridded risk map. The weight of the environmental risk index is dynamically adjusted by reinforcement learning based on the correlation between changes in the status of urban components and historical failures. The environmental risk index is a comprehensive indicator used to quantify the risk level of urban components within the grid. It is determined by the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component.

[0009] By transmitting radar signals to acquire three-dimensional point cloud data of crowd behavior, the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data are spatially registered with the location coordinates of the gridded risk map to generate an overlaid dynamic risk heat map.

[0010] Based on the environmental risk index of the gridded risk map and the population behavior characteristics of the dynamic risk heat map, high-risk cluster areas with geographical location associations are identified. According to the spatiotemporal evolution trend of the high-risk cluster areas, composite event identifiers and handling instructions containing urban component failure and population congestion characteristics are generated. The composite event identifier refers to a comprehensive identifier that includes risk type, location of occurrence, and severity.

[0011] Based on the risk data corresponding to the composite event identifier, the disposal instructions are implemented in the local virtual environment and the execution effect is recorded. The maintenance priority order and crowd route allocation are adjusted according to the execution effect to generate a global optimization decision. The global optimization decision is fed back to each grid to update the local strategy model, forming a dynamic decision-making closed-loop system. The local strategy model refers to the model used by each grid for autonomous decision-making.

[0012] Optionally, based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map, high-risk cluster areas with geographically associated locations are identified. According to the spatiotemporal evolution trend of the high-risk cluster areas, a composite event identifier and handling instructions containing characteristics of urban component failure and crowd congestion are generated, including:

[0013] The environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map are invoked, wherein the crowd behavior characteristics include crowd density distribution, movement speed vector and fall behavior characteristics;

[0014] Based on the numerical value of the environmental risk index and the parameter range of the crowd behavior characteristics, grids with high environmental risk and grid areas with abnormal crowd behavior are marked and identified as high-risk cluster areas.

[0015] The location, extent, and risk characteristics of the high-risk clusters are continuously recorded at different points in time to form the spatiotemporal evolution trend of the high-risk clusters.

[0016] Based on the risks associated with urban component failures and crowd congestion in the spatiotemporal evolution trend, a composite event identifier containing risk type, location, and severity is generated, and a handling instruction is generated based on the composite event identifier and the spatiotemporal evolution trend.

[0017] Optionally, based on the numerical value of the environmental risk index and the parameter range of the crowd behavior characteristics, grids with higher environmental risks and grid areas with abnormal crowd behavior are marked and identified as high-risk cluster areas, including:

[0018] Based on the magnitude of the environmental risk index, the highest value range of environmental risk is set, and based on the parameter range of the crowd behavior characteristics, the abnormal parameter ranges of crowd density distribution, movement speed vector, and fall behavior characteristics are set respectively.

[0019] Traverse the gridded risk map and mark the grids whose environmental risk index values ​​are in the highest range as grids with higher environmental risk. Traverse the dynamic risk heat map and mark the grids whose population behavior characteristics match the abnormal parameter range as grids with abnormal population behavior.

[0020] Calculate the location distance between the grids with high environmental risk and the grids with abnormal crowd behavior, and integrate the associated grids whose location distance is less than a preset length into an initial clustering area;

[0021] The initial clustering area with a total number of grids reaching a preset number is identified as a high-risk clustering area. In the initial clustering area where the number of grids has not reached the preset number, if the highest value of the environmental risk index of the grid or the most abnormal value of the population behavior characteristics exceeds the corresponding limit value, it is also identified as a high-risk clustering area.

[0022] Optionally, three-dimensional point cloud data of crowd behavior is acquired by transmitting radar signals. The crowd density distribution, movement speed vector, and fall behavior characteristics in the three-dimensional point cloud data are spatially registered with the location coordinates of the gridded risk map to generate an overlaid dynamic risk heat map, including:

[0023] Deploy radar equipment to transmit and receive reflected signals, and generate three-dimensional point cloud data covering the monitoring area based on the time difference and intensity changes of the reflected signals. The three-dimensional point cloud data contains spatial location information corresponding to the behavior of people in the monitoring area.

[0024] The crowd density distribution, movement speed vector, and fall behavior features are extracted from the three-dimensional point cloud data, and the position coordinates of each grid in the gridded risk map are obtained.

[0025] Spatial registration is performed between the spatial locations corresponding to the crowd density distribution, the movement speed vector, and the fall behavior features and the position coordinates of each grid, to determine the grid to which each feature belongs and store them together.

[0026] On the gridded risk map, different shades of color are assigned according to the magnitude of the population density distribution within the grid, and arrows are used to mark the direction of the movement speed vector. Special marks are added to the locations where the fall behavior characteristics exist, generating an overlaid dynamic risk heat map.

[0027] Optionally, the spatial registration of the crowd density distribution, the movement speed vector, and the spatial location corresponding to the fall behavior feature with the position coordinates of each grid is performed to determine the grid to which each feature belongs and store them in association, including:

[0028] Extract the spatial coordinates of each density measurement point corresponding to the crowd density distribution, the spatial coordinates of the starting point of each vector corresponding to the movement speed vector, and the spatial coordinates of each behavior occurrence point corresponding to the fall behavior feature;

[0029] The boundary range of each grid is determined based on the position coordinates of each grid, and the boundary range includes the minimum and maximum coordinate values ​​of the grid in the horizontal and vertical directions;

[0030] For each density measurement point of the crowd density distribution, each vector starting point of the moving speed vector, and each behavior occurrence point of the fall behavior feature, spatial coordinates in which both horizontal and vertical values ​​are between the minimum and maximum coordinate values ​​of the grid are selected, and the corresponding density measurement points, vector starting points, and behavior occurrence points are assigned to the same grid.

[0031] A unique identifier is assigned to each of the same grids, and the population density distribution, movement speed vector, and fall behavior characteristics belonging to the same grid are associated with the unique identifier and stored in the corresponding data record table.

[0032] Optionally, based on the risk data corresponding to the composite event identifier, the handling instructions are implemented in a local virtual environment and the execution effect is recorded. The maintenance priority order and crowd route allocation are adjusted according to the execution effect to generate a global optimization decision. The global optimization decision is then fed back to each grid to update its local strategy model, forming a dynamic decision-making closed-loop system, including:

[0033] Collect risk data corresponding to the composite event identifier, input the risk data into the local virtual environment, implement the disposal command in the local virtual environment, and record the execution process and execution effect of the disposal command. The execution effect includes the efficiency of urban component failure resolution, the degree of crowd congestion relief, and resource consumption.

[0034] Based on the execution results, the maintenance priority order for component failure risk and the crowd route allocation for crowd congestion risk in each city are adjusted accordingly to obtain the revised maintenance priority order and crowd route allocation.

[0035] Based on the revised maintenance priority order and crowd route allocation, a global optimization decision is generated and fed back to each grid. Each grid updates its local strategy model according to the global optimization decision, forming a dynamic decision-making closed-loop system.

[0036] Optionally, an environmental risk index is set in the gridded risk map, and the weight of the environmental risk index is dynamically adjusted based on the correlation between changes in the state of urban components and historical failures through reinforcement learning, including:

[0037] An environmental risk index is set in each grid of the gridded risk map. The environmental risk index includes the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component.

[0038] When the state of a city component changes, the grid position and state information before and after the change are extracted. Combined with the failure occurrence of similar state changes in historical failure records, a reinforcement learning framework is used to generate weight adjustment suggestions for each influencing factor.

[0039] Based on the proposed weight adjustment, the weights of the corresponding influencing factors in the environmental risk index are corrected, and the environmental risk index is recalculated. The status changes and failure occurrences of subsequent urban components are continuously monitored, and the new feedback results are input into the reinforcement learning framework to dynamically adjust the weights of the environmental risk index.

[0040] Secondly, this application provides a reinforcement learning-based urban risk governance decision optimization system, including:

[0041] The data acquisition module is used to collect the location coordinates and historical fault records of urban components. Based on the relationship between the location coordinates of each urban component and the historical fault records, a gridded risk map is constructed. The urban components are manhole covers and fire hydrants.

[0042] The adjustment module is used to set an environmental risk index in the gridded risk map. Through reinforcement learning, the weight of the environmental risk index is dynamically adjusted according to the correlation between changes in the status of urban components and historical failures. The environmental risk index is a comprehensive indicator used to quantify the risk level of urban components within the grid. It is determined by the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component.

[0043] The registration module is used to acquire three-dimensional point cloud data of crowd behavior by transmitting radar signals, and to spatially register the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data with the position coordinates of the gridded risk map to generate an overlaid dynamic risk heat map.

[0044] The generation module is used to identify high-risk cluster areas with geographical location association based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map. According to the spatiotemporal evolution trend of the high-risk cluster areas, it generates composite event identifiers and handling instructions that include urban component failure and crowd congestion characteristics. The composite event identifier refers to a comprehensive identifier that includes risk type, location of occurrence, and severity.

[0045] The feedback module is used to implement the disposal instructions in the local virtual environment based on the risk data corresponding to the composite event identifier and record the execution effect. Based on the execution effect, the module adjusts the maintenance priority order and the allocation of crowd routes, generates a global optimization decision, and feeds the global optimization decision back to each grid to update the local strategy model, forming a dynamic decision-making closed-loop system. The local strategy model refers to the model used by each grid for autonomous decision-making.

[0046] Thirdly, this application provides an electronic device, comprising:

[0047] Memory, used to store computer programs;

[0048] A processor is configured to execute the computer program to implement the steps of a reinforcement learning-based urban risk governance decision optimization method as described in the first aspect above.

[0049] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, can implement the steps of the reinforcement learning-based urban risk governance decision optimization method described in the first aspect above.

[0050] This application presents a reinforcement learning-based urban risk governance decision optimization method. By collecting the location coordinates and historical fault records of urban components, and combining location relationships with fault records to construct a gridded risk map, it spatially integrates scattered urban component data, providing a structured spatial carrier for risk analysis. By setting an environmental risk index in the gridded risk map, and using reinforcement learning to dynamically adjust the index weights based on changes in the state of urban components and their correlation with historical faults, the risk assessment indicators can be adaptively optimized, improving the timeliness and accuracy of environmental risk assessment. Furthermore, by acquiring three-dimensional point cloud data of crowd behavior using radar, crowd characteristics are integrated with the gridded risk map. Coordinate registration generates overlaid dynamic risk heat maps, enabling the fusion of population and spatial risk data and intuitively presenting the spatial distribution of population-related risks. By combining gridded risk map indices with dynamic risk heat map features, high-risk clusters are identified, and composite event identifiers and handling instructions are generated based on their spatiotemporal evolution. This allows for precise location of complex high-risk areas, improving the targeting of risk response. By implementing handling instructions in a virtual environment and recording the effects, adjusting maintenance priorities and population route allocation generates global optimization decisions and feeds back to update the grid model, forming a dynamic decision-making closed loop. This enables continuous optimization of decisions and grid-based collaborative linkage, enhancing the overall effectiveness of urban risk governance. Attached Figure Description

[0051] To more clearly illustrate the technical solutions of the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0052] Figure 1 A flowchart illustrating a reinforcement learning-based urban risk governance decision optimization method provided in this application embodiment;

[0053] Figure 2 A flowchart illustrating a specific implementation of a reinforcement learning-based urban risk governance decision optimization method provided in this application embodiment;

[0054] Figure 3 is a scenario diagram of an urban risk governance decision optimization method based on reinforcement learning provided in an embodiment of this application;

[0055] Figure 4 This is a schematic diagram of the structure of an urban risk governance decision optimization system based on reinforcement learning, provided in an embodiment of this application. Detailed Implementation

[0056] In urban grid-based governance, existing early warning systems based on fixed sensors and preset rules have significant limitations: sensor data from various departments are stored in a scattered manner, making it difficult to achieve risk correlation analysis across grids, resulting in a lack of overall risk identification; risk judgment relies on fixed thresholds and preset rules, which cannot be adjusted according to dynamic changes such as aging urban components and population movement, making it prone to misjudgment or omission; independent responses from each grid lack global coordination, resulting in delayed identification of high-risk areas and an imbalance in the allocation of disposal resources, making it difficult to effectively address the pain points of data silos and response delays.

[0057] To address the aforementioned issues, this application proposes a reinforcement learning-based method for optimizing urban risk governance decisions. This method integrates urban component data by constructing a gridded risk map, dynamically adjusts environmental risk index weights using reinforcement learning to adapt to changes, generates a dynamic risk heat map by fusing radar-acquired crowd behavior data to accurately identify high-risk clusters, and verifies the effectiveness of responses in a virtual environment to generate global optimization decisions, which in turn update the grid model, forming a closed loop. This approach breaks down data silos through integration, dynamically adjusts to adapt to changes, and enhances collaboration through global optimization, fundamentally solving the shortcomings of existing solutions and improving the accuracy and response efficiency of urban risk governance.

[0058] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are merely some embodiments of the present application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0059] The core of this application is to provide a method for optimizing urban risk governance decisions based on reinforcement learning, and a flowchart of one specific implementation is shown below. Figure 1 As shown, the method includes:

[0060] S101. Collect the location coordinates and historical fault records of urban components. Based on the relationship between the location coordinates of each urban component and the historical fault records, construct a gridded risk map.

[0061] In the above steps, urban components are manhole covers and fire hydrants; the location coordinates of urban components refer to the specific location information of urban components in the geographic space of the city, usually including latitude and longitude or planar coordinates, which are used to identify the spatial location of urban components; historical fault records refer to relevant information about past faults of urban components, including the time of fault occurrence, fault type such as missing urban components, water leakage of urban components, etc., and fault handling results; gridded risk map refers to a visual spatial model formed by dividing the urban space into several regular grids and integrating the location coordinates of urban components with historical fault records, which is used to intuitively present the spatial distribution of risks of urban components.

[0062] In this embodiment, the basic data collection is first completed by collecting the location coordinates and historical fault records of urban components. For example, the latitude and longitude coordinates and fault records of 100 urban components in District A over the past three years are retrieved through the urban component management system. Secondly, spatial processing is performed based on the relationships between the location coordinates of each urban component. For example, according to the geographical area of ​​District A, the area is divided into several grids of 50m x 50m. The grid to which each urban component belongs is calculated, determining the specific distribution of the urban component within the grid. For example, urban component 1 is located in grid G1, urban component A is located in grid G3, etc., clarifying the spatial association between urban components and grids. Next, risk information is labeled on the grids based on historical fault records. For example, the number of faults and fault types of urban components within each grid are statistically analyzed. For instance, if two urban component faults occurred in grid G1 over three years, and one urban component leakage fault occurred in grid G3, this fault information is associated with and stored with the corresponding grids. Finally, by integrating the spatial division of the grid, the distribution of urban components, and the associated fault records, structured data containing grid numbers, a list of urban components within the grid, and fault statistics is generated. This data is then presented as a gridded risk map using visualization technology, with each grid in the map displaying the urban components it contains and their historical fault information.

[0063] In practical application, in the urban risk management project of District B, staff first exported detailed data on 200 and 50 urban components within the jurisdiction from the urban component management database. This included the latitude and longitude coordinates of each component, such as the coordinates of component A1 (E116.3°, N39.9°) and component B5 (E116.4°, N39.8°). They also extracted fault records from the past five years, covering the time of occurrence, specific problems such as component displacement and interface damage, and maintenance records. Subsequently, based on the actual area of ​​District B, the area was divided into 50 grids (G1 to G50) at a standard size of 100 meters × 100 meters. Each urban component was located to its corresponding grid through coordinate matching; for example, 20 and 3 urban components were assigned to grid G12. Afterwards, the fault situation in each grid was analyzed, revealing that grid G8 experienced 6 and 2 component failures within 5 years, respectively. Three of these failures were due to missing components, and two were due to leaks. Finally, this information is integrated to generate a gridded risk map. Each grid in the map is marked with a different color to indicate the frequency of failures. Grids with high failure rates, such as G8, are displayed in dark red, clearly showing the spatial distribution characteristics of urban component risks.

[0064] In the overall scheme of step S101 above, by integrating the scattered urban component location data and historical fault information, the urban space is processed in a grid structure, realizing the spatial presentation of urban component risk information. This provides a unified spatial carrier for subsequent risk assessment and high-risk area identification, enabling the originally scattered urban component data to form a risk distribution model that can be intuitively analyzed, and improving the basic data support capability for urban component risk governance.

[0065] S102. Set an environmental risk index in the gridded risk map, and dynamically adjust the weight of the environmental risk index based on the correlation between changes in the status of urban components and historical failures through reinforcement learning.

[0066] Optionally, step S102 may specifically include the following steps:

[0067] S1021. Set an environmental risk index in each grid of the gridded risk map. The environmental risk index includes the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component.

[0068] S1022. When the state of a city component changes, extract the grid position of the city component and the state information before and after the change. Combine the failure occurrence of similar state changes in historical failure records and generate weight adjustment suggestions for each influencing factor through a reinforcement learning framework.

[0069] S1023. Based on the weight adjustment suggestions, correct the weights of the corresponding influencing factors in the environmental risk index, recalculate the environmental risk index, continuously monitor the status changes and failure occurrences of subsequent urban components, and input the new feedback results into the reinforcement learning framework to dynamically adjust the weights of the environmental risk index.

[0070] In the above steps, the environmental risk index is a comprehensive indicator used to quantify the risk level of urban components within the grid. It comprises the number of failures of urban components within the grid, the interval between the most recent failures, and the basic impact value corresponding to the type of urban component. The number of failures refers to the total number of times an urban component fails; the interval between the most recent failures refers to the time elapsed since the last failure; and the basic impact value refers to the inherent risk value set according to the functional importance of different urban components. For example, the basic impact value of urban components is usually higher than that of ordinary urban components. Reinforcement learning is a technique that optimizes decisions through continuous interaction with the environment and learning feedback. Here, it is used to adjust the weights of each influencing factor in the environmental risk index based on the correlation between changes in the state of urban components and failures. Changes in the state of urban components refer to changes in the state of urban components, such as loosening, leakage, or abnormal pressure. The weight adjustment suggestion refers to the suggested values ​​output by the reinforcement learning framework used to correct the proportion of each influencing factor in the environmental risk index.

[0071] In this embodiment of the application, firstly, an environmental risk index is set in each grid of the gridded risk map through step S1021. For example, in grid G1 of city A, the system counts that there are 3 urban components in the grid in the past 3 years, of which 2 have failed, so the failure count is recorded as 2; the most recent failure occurred 6 months ago, so the interval between the most recent failures is 6 months; since all the components in this grid are ordinary urban components, the basic impact value corresponding to the urban component type is preset to 3, thereby clarifying the initial constituent elements and specific values ​​of the environmental risk index of this grid.

[0072] Secondly, in step S1022, when the state of a city component changes, relevant information is extracted and weight adjustment suggestions are generated. For example, if a city component numbered J1 in grid G1 is found to have "loose edges" during a monthly inspection, the system immediately extracts the location of the city component in grid G1, as well as the state change information of "normal state for 3 months before loosening and missing fault occurring within 1 month after loosening". At the same time, it retrieves data on similar "loose edges of city components" state changes from historical fault records and finds 8 similar cases in the past 5 years, of which 6 cases occurred within 2 months after loosening, showing a high fault correlation rate. By analyzing these "state change-fault occurrence" correlation data, the reinforcement learning framework learns the correlation between the probability of fault occurrence in the loose state and the number of faults and the interval duration, and then outputs specific adjustment suggestions such as "increasing the weight ratio of the number of faults and decreasing the weight ratio of the most recent fault interval duration".

[0073] Finally, the weights are corrected and the index is dynamically adjusted in step S1023. For example, according to the suggestion in step S1022, the weights of the number of failures, the most recent failure interval, and the basic impact value in grid G1 were originally 0.3, 0.4, and 0.3, respectively. After adjustment, the weight of the number of failures is increased to 0.4, the weight of the most recent failure interval is reduced to 0.3, and the weight of the basic impact value remains unchanged at 0.3. The environmental risk index of the grid is recalculated according to the calculation method of "environmental risk index = number of failures × weight 1 + most recent failure interval × weight 2 + basic impact value × weight 3". Afterwards, the system continuously monitors the status of urban components in the grid through sensors. For example, it collects the loosening data of urban components in J1 every week. If no failure occurs within 3 months, the new feedback result of "the status is continuously loose but no failure has occurred" is input into the reinforcement learning framework. The framework further fine-tunes the weights based on the new data to ensure that the index can adapt to the actual risk changes in real time.

[0074] In practical application, City B configured an environmental risk index for each grid in its gridded risk map. Grid G8 contains two urban components. Historical fault records show one water leakage fault in the past two years, with the most recent fault interval being 10 months. The basic impact value for urban components is preset to 5. When urban component X2 exhibits a "corrosion at the interface" status change, the system extracts the location of that component within grid G8, along with the status information of "normal status for 6 months before corrosion, water leakage appearing 3 weeks after corrosion." Combining this with historical data showing "an increase in the incidence of water leakage faults within 2 months after interface corrosion of urban components," reinforcement learning framework analysis reveals an enhanced correlation between fault occurrence under corrosion and the most recent fault interval. It is recommended to increase the weight of the most recent fault interval from 0.2 to 0.3. After adjusting according to the recommendation, the system recalculates the environmental risk index for grid G8. Subsequently, the status of urban component X2 is monitored monthly. If the water leakage problem is resolved and no leakage occurs after one month, a new fault-free feedback input framework is used, and the framework fine-tunes the weight to 0.25, making the index more accurately reflect the actual risk.

[0075] In the overall scheme of step S102 above, by setting an environmental risk index containing multiple factors and using reinforcement learning to continuously analyze the correlation between changes in the status of urban components and historical failures, the weights of each factor are dynamically adjusted so that the risk quantification index can adapt to the actual risk change trend of urban components in real time. This avoids the problem of risk assessment being out of touch with the actual situation under the fixed weight model, and makes the quantification results of the environmental risk index more consistent with the real risk status of urban components, providing dynamic and reliable quantitative support for subsequent high-risk area identification and governance decisions.

[0076] S103. Acquire three-dimensional point cloud data of crowd behavior by transmitting radar signals, and spatially register the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data with the position coordinates of the gridded risk map to generate an overlaid dynamic risk heat map.

[0077] Optionally, step S103 may specifically include the following steps:

[0078] S1031. Deploy radar equipment, transmit and receive reflected signals through the radar equipment, and generate three-dimensional point cloud data covering the monitoring area based on the time difference and intensity change of the reflected signals. The three-dimensional point cloud data contains spatial location information corresponding to the behavior of people in the monitoring area.

[0079] S1032. Extract crowd density distribution, movement speed vector, and fall behavior features from the three-dimensional point cloud data, and obtain the position coordinates of each grid in the gridded risk map;

[0080] S1033. Spatial registration is performed between the spatial location corresponding to the crowd density distribution, the moving speed vector, and the fall behavior feature and the position coordinates of each grid, respectively, to determine the grid to which each feature belongs and store them together.

[0081] Specifically, step S1033 may include the following processes: extracting the spatial coordinates of each density measurement point corresponding to the crowd density distribution, the spatial coordinates of each vector starting point corresponding to the movement speed vector, and the spatial coordinates of each behavior occurrence point corresponding to the fall behavior feature; determining the boundary range of each grid based on the position coordinates of each grid, wherein the boundary range includes the minimum and maximum coordinate values ​​of the grid in the horizontal and vertical directions; for the spatial coordinates of each density measurement point of the crowd density distribution, each vector starting point of the movement speed vector, and each behavior occurrence point of the fall behavior feature, respectively selecting spatial coordinates whose horizontal and vertical values ​​are both between the minimum and maximum coordinate values ​​of the grid, and classifying the corresponding density measurement point, vector starting point, and behavior occurrence point into the same grid; assigning a unique identifier to each of the same grids, establishing an association between the crowd density distribution, the movement speed vector, and the fall behavior feature belonging to the same grid and the unique identifier, and storing them in the corresponding data record table.

[0082] S1034. On the gridded risk map, different shades of color are assigned according to the magnitude of the population density distribution within the grid, and arrows are used to mark the locations where the fall behavior characteristics exist, in conjunction with the direction of the movement speed vector, to generate an overlaid dynamic risk heat map.

[0083] In the above steps, 3D point cloud data refers to a set of 3D data containing spatial location information within the monitoring area, generated through radar signal reflection. It consists of a large number of discrete spatial point coordinates and can reflect the spatial distribution of the population. Population density distribution refers to the distribution of people per unit area, used to reflect the degree of population concentration within the area. Movement speed vector refers to vector data describing the direction and speed of population movement, including both direction and speed magnitude. Fall behavior characteristics refer to the feature information corresponding to human fall actions identified from the point cloud data, including sudden drops in height, abnormal posture, etc. Spatial registration refers to the process of matching spatial data from different sources, specifically matching population feature coordinates with grid coordinates. A dynamic risk heatmap is a visual chart that overlays population behavior characteristics with a gridded risk map, intuitively displaying risk distribution through color depth, arrows, and other markers.

[0084] In this embodiment of the application, firstly, radar equipment is deployed and three-dimensional point cloud data is generated through step S1031. For example, three radar devices are deployed around the central square of District A, respectively installed on the lighthouse on the east side of the square, the roof of the shops on the west side, and the pillar of the corridor on the north side. The radar continuously emits electromagnetic wave signals. When the signals encounter people in the square, they are reflected back to the radar. The equipment calculates the distance between the people and the radar based on the time difference between signal transmission and reception, and distinguishes different human targets by combining the changes in signal strength. Finally, three-dimensional point cloud data covering the entire square is generated. The coordinates (X, Y, Z) of each point in the data correspond to the spatial position of a certain human body.

[0085] Secondly, in step S1032, crowd behavior features are extracted and grid coordinates are obtained. For example, from the generated 3D point cloud data, the number of points per 10 square meters is counted to obtain the crowd density distribution. There are 8 points per 10 square meters in the northeast corner of the square, indicating that the density in this area is relatively high. By analyzing the position changes of points in the continuous frame point cloud, the movement speed vector of the crowd is calculated. The crowd in the northeast corner mainly moves in the southwest direction, and the speed vector is (-2m / s, 1m / s). By identifying the feature that the height of the points in the point cloud drops from 1.7 meters to 0.5 meters in a short period of time and the posture is irregular, the fall behavior features are extracted. At the same time, the grid coordinates of the area where the square is located are obtained from the gridded risk map. The square is divided into 10 grids from G1 to G10, and the coordinate range of each grid is clear.

[0086] Next, spatial registration and associated storage are performed in step S1033. For example, the coordinates of each density measurement point in the crowd density distribution are extracted, with the coordinates of the density measurement point in the northeast corner being (X5, Y5, Z0); the coordinates of the starting point of the movement speed vector are extracted, with the starting coordinates of the vector being (X6, Y6, Z0); the coordinates of the point where the fall occurred are extracted, with the coordinates of the fall point being (X7, Y7, Z0); the boundary range of G5 is determined based on the grid coordinates as X being 100-120 meters and Y being 80-100 meters; the above coordinates are compared with the boundary of G5, and it is found that they are all within this range, so these features are assigned to G5; a unique identifier "G5-2024" is assigned to G5, and the crowd density, speed vector, and fall feature data of this grid are associated with the identifier "G5-2024" and stored in the corresponding data record table.

[0087] Finally, a superimposed dynamic risk heat map is generated through step S1034. For example, on the G5 grid of the gridded risk map, colors are assigned according to the population density, with dark red for high density and light orange for low density in G3. Arrows are used to mark the direction of movement speed vectors, with arrows in G5 pointing southwest to indicate the population movement trend. Yellow triangles are added as special markers at the points where falls occur. By dynamically updating the data, the heat map displays changes in population behavior in real time, forming a superimposed dynamic risk heat map.

[0088] In practical applications, to monitor the risk of crowds during holidays, five radar devices were deployed at the main entrances of the commercial area, the central square, and the junctions of underground passages to ensure no blind spots in monitoring. The radar devices emit signals at preset frequencies. Upon contact with pedestrians, the signals reflect off, and the devices receive and process the reflected waves to generate three-dimensional point cloud data covering the entire commercial area. The coordinates of each point in the data precisely correspond to the real-time spatial location of pedestrians. The system extracts crowd behavior characteristics from the point cloud data. Six points are identified per 10 square meters in the pedestrian street area to determine the crowd density. By analyzing the displacement changes of points over a continuous time period, it is determined that most pedestrians are moving towards the exit direction at a speed of 1.5 m / s. Simultaneously, a fall caused by a pedestrian avoiding a collision is identified, and the characteristics of this fall behavior are extracted. The system synchronously retrieves the coordinate ranges of each grid in the business district from the gridded risk map. It compares the coordinates of the pedestrian street area's crowd density measurement points, the starting coordinates of the movement speed vector, and the coordinates of the fall occurrence points with the boundary range of grid G7 (X: 50-70 meters, Y: 30-50 meters). This confirms that these features belong to grid G7, and a unique identifier "G7-202405" is assigned to it. The crowd density, movement speed vector, and fall behavior characteristic data for this grid are then associated with this identifier and stored in the corresponding data record table. On the final generated dynamic risk heatmap, grid G7 is displayed in dark red due to its high crowd density, with arrows clearly marking the trend of crowd movement towards the exit. Yellow triangles are added as special markers at the fall occurrence locations, visually presenting the crowd risk distribution within the business district.

[0089] In the overall scheme of step S103 above, three-dimensional point cloud data of crowd behavior is acquired by radar, key features are extracted and spatially matched with gridded risk maps, and finally a dynamic risk heat map is generated. This realizes the organic integration of crowd behavior data and spatial risk data, and presents the spatial distribution characteristics of crowd-related risks in an intuitive and dynamic way. It provides clear visualization basis and data support for subsequent accurate identification of high-risk cluster areas and formulation of targeted governance strategies.

[0090] This application provides a flowchart illustrating a specific implementation of a reinforcement learning-based urban risk governance decision optimization method. Figure 2 As shown, it includes the following:

[0091] S104. Based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map, identify high-risk cluster areas associated with geographical locations, and generate composite event identifiers and handling instructions that include urban component failure and crowd congestion characteristics according to the spatiotemporal evolution trend of the high-risk cluster areas.

[0092] Optionally, step S104 may specifically include the following steps:

[0093] S1041. Invoke the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map, wherein the crowd behavior characteristics include crowd density distribution, movement speed vector and fall behavior characteristics;

[0094] S1042. Based on the numerical value of the environmental risk index and the parameter range of the crowd behavior characteristics, mark the grids with high environmental risk and the grid areas with abnormal crowd behavior, and identify them as high-risk cluster areas.

[0095] Step S1042 may specifically include the following processes: based on the magnitude of the environmental risk index, a maximum value range for environmental risk is set; based on the parameter range of the crowd behavior characteristics, abnormal parameter ranges for crowd density distribution, movement speed vector, and fall behavior characteristics are set respectively; the gridded risk map is traversed, and grids whose environmental risk index values ​​are within the maximum value range are marked as grids with higher environmental risk; the dynamic risk heat map is traversed, and grids whose crowd behavior characteristics conform to the abnormal parameter range are marked as grids with abnormal crowd behavior; the positional distance between the grids with higher environmental risk and the grids with abnormal crowd behavior is calculated, and related grids with a positional distance less than a preset length are integrated into an initial clustering area; the initial clustering areas with a preset number of grids are determined as high-risk clustering areas; in the initial clustering areas with a lower number of grids, if the highest value of the environmental risk index or the most abnormal value of the crowd behavior characteristics exceeds the corresponding limit value, they are also determined as high-risk clustering areas.

[0096] S1043. Continuously record the changes in the location, range, and risk characteristics of the high-risk cluster area at different points in time to form the spatiotemporal evolution trend of the high-risk cluster area;

[0097] S1044. Based on the risks related to urban component failures and crowd congestion in the spatiotemporal evolution trend, generate a composite event identifier that includes the risk type, location, and severity, and generate a disposal instruction based on the composite event identifier and the spatiotemporal evolution trend.

[0098] In the above steps, a high-risk cluster area refers to a geographically associated area with high environmental risk and abnormal population behavior, formed by integrating grids with high environmental risk indices and grids with abnormal population behavior. The spatiotemporal evolution trend refers to the changes in the location, extent, and risk characteristics of the high-risk cluster area at different points in time, including environmental risk levels and population congestion. A composite event identifier is a comprehensive identifier encompassing risk type, location, and severity; risk types include urban component malfunctions and population congestion, used to uniformly mark composite risk events. Response instructions are targeted response instructions formulated based on the composite event identifier and spatiotemporal evolution trend, used to guide risk response actions. Anomaly parameter ranges are standards used to determine abnormal population behavior, covering critical value ranges for population density, movement speed, and fall behavior. Preset length is the distance threshold for determining whether grids are geographically associated; preset quantity is the standard number of grids required to determine the initial cluster area as a high-risk cluster area; and the limit value is the critical risk parameter value that still classifies a cluster as high-risk even if the preset quantity is not reached.

[0099] In this embodiment of the application, firstly, relevant data is called through step S1041, such as the environmental risk index of the gridded risk map of the central square of City A from the system database, where the index of grid G5 is 85 and the index of grid G6 is 72; at the same time, the crowd behavior characteristics of the dynamic risk heat map are called, including the crowd density, movement speed vector and fall behavior characteristics of each grid. There are 8 people per 10 square meters in grid G5, the crowd is moving southwest at a speed of 2m / s, and there is 1 fall record.

[0100] Secondly, high-risk gathering areas are identified through step S1042. For example, the highest environmental risk value range is set to 80-100, the abnormal population density parameter range is no less than 6 people per 10 square meters, the abnormal movement speed range is no more than 1m / s or no less than 3m / s, and the presence of fall records is judged as abnormal fall behavior. After traversing the map, grid G5 with an index of 85 is marked as a grid with high environmental risk, and grid G5 with a density of 8 people and a fall record, and the adjacent grid G6 with a density of 7 people are marked as grids with abnormal population behavior. The positional distance between grid G5 and grid G6 is calculated to be 15 meters, which is less than the preset length of 20 meters. The two are integrated into an initial gathering area. This area contains 2 grids, reaching the preset number of 2, and is therefore determined to be a high-risk gathering area.

[0101] Next, the spatiotemporal evolution trend is recorded in step S1043. For example, from 9:00 AM to 11:00 AM, the changes in the high-risk gathering area are recorded every 30 minutes. At 9:00 AM, the area is grid G5 to grid G6, with environmental risk indices of 85 and 72, and population densities of 8 and 7 people, respectively. At 10:00 AM, the area expands to grid G5 to grid G7, with the index of grid G7 rising to 78 and the density increasing to 6 people. At 11:00 AM, the index of grid G5 remains at 85, and the density drops to 6 people, but a second fall is recorded in grid G6. This record forms the spatiotemporal evolution trend.

[0102] Finally, in step S1044, a composite event identifier and handling instructions are generated. For example, based on the trend of urban component failure risk and crowd congestion risk in grids G5 to G6, the urban component failure risk is reflected as a high index, and the crowd congestion risk is reflected as high density and many fall incidents. Thus, a composite event identifier "High Risk of Urban Components - Crowd Congestion - G5 - G6 - Severe" is generated. Based on this identifier and the regional expansion trend, handling instructions are generated, including arranging maintenance personnel to prioritize the repair of urban components around grid G5, increasing security personnel to guide the diversion of crowds in grids G5 to G6, and setting up temporary warning signs.

[0103] In practical application, during holiday risk monitoring in the commercial area of ​​District B, the system utilizes the environmental risk index of a gridded risk map. Grid G7 has an index of 82, and grid G8 has an index of 76. Simultaneously, it utilizes the crowd behavior characteristics of a dynamic risk heatmap. Grid G7 has a density of 7 people per 10 square meters and one recorded fall. Grid G8 has a density of 6 people and a movement speed of 0.8 m / s. The highest environmental risk range is set at 75-100, with abnormal crowd density defined as at least 6 people and a speed not exceeding 1 m / s. Based on this, grids G7 and G8 are marked as high-risk grids with abnormal crowd behavior. The distance between the two grids is calculated to be 18 meters, less than the preset length of 25 meters, and is thus integrated into an initial area containing 2 grids, reaching the preset number of 2, and is identified as a high-risk cluster area. Continuous recording reveals that from 10:00 AM to 12:00 PM, the area expands from grids G7 to G8 to grids G7 to G9, with the index of grid G9 rising to 77 and the density increasing to 5 people. Based on this, a composite event identifier "Urban Component Risk - Crowd Congestion - G7 - G8 - Relatively Severe" is generated, and a handling instruction is generated, including dispatching the maintenance team to inspect urban components from grid G7 to grid G8, arranging staff to guide the flow of people at the entrance of grid G7, and setting up a temporary passage at the fall site.

[0104] In the overall scheme of step S104 above, by integrating environmental risk index and population behavior characteristics, high-risk cluster areas with geographical location association are accurately identified, their spatiotemporal change trends are dynamically tracked, and then composite event identifiers containing multiple risk characteristics and targeted disposal instructions are generated. This realizes the collaborative identification and response to environmental and population risks, provides clear target areas and action basis for risk governance, and improves the accuracy and timeliness of risk disposal.

[0105] S105. Based on the risk data corresponding to the composite event identifier, implement the disposal instructions in the local virtual environment and record the execution effect. Adjust the maintenance priority order and crowd route allocation according to the execution effect to generate a global optimization decision. Feed back the global optimization decision to each grid to update the local strategy model to form a dynamic decision-making closed-loop system.

[0106] Optionally, step S105 may specifically include the following steps:

[0107] S1051. Collect the risk data corresponding to the composite event identifier, input the risk data into the local virtual environment, implement the disposal instruction in the local virtual environment, and record the execution process and execution effect of the disposal instruction. The execution effect includes the efficiency of urban component failure resolution, the degree of crowd congestion relief, and resource consumption.

[0108] S1052. Based on the execution effect, adjust the maintenance priority order of component failure risk and the crowd route allocation of crowd congestion risk in each city accordingly to obtain the corrected maintenance priority order and crowd route allocation.

[0109] S1053. Based on the revised maintenance priority order and crowd route allocation, a global optimization decision is generated and fed back to each grid. Each grid updates its local strategy model according to the global optimization decision, forming a dynamic decision-making closed-loop system.

[0110] In the above steps, the risk data corresponding to the composite event identifier refers to detailed risk information related to the composite event identifier, including the specific location, type, and severity of urban component failures, the area of ​​crowd congestion, peak density, and movement obstacles. The local virtual environment refers to a digital simulation scenario simulating the actual urban spatial layout, distribution of urban components, and crowd flow characteristics, used to test the execution effect of response instructions. Execution effect refers to the results generated after the implementation of response instructions, covering the efficiency of resolving urban component failures, the degree of crowd congestion relief, and resource consumption. The efficiency of resolving urban component failures can be reflected in the failure repair time; the degree of crowd congestion relief can be reflected in changes in regional density; and resource consumption includes the amount of manpower and materials invested. Maintenance priority order refers to the order of urban component maintenance determined according to the urgency and scope of impact. Crowd route allocation refers to the crowd flow path plan designed to alleviate congestion. Global optimization decision refers to the overall optimization plan formed after comprehensively adjusting maintenance priorities and route allocation. The local strategy model refers to the model used by each grid for autonomous decision-making. The dynamic decision-making closed-loop system refers to a continuous improvement cycle system formed through decision implementation, effect feedback, and optimization adjustments.

[0111] In this embodiment, firstly, risk data is collected in step S1051, and instructions are tested in a virtual environment. For example, the risk data corresponding to the composite event identifier in the central square of District A includes the location of loose urban components in grids G5 and G6, abnormal pressure points of urban components, peak crowd density in the two grids, and fall records. This data is input into the local virtual environment, simulating the execution process of instructions for maintenance personnel to inspect urban components in G5 and security personnel to guide crowd diversion in G6. Simultaneously, the execution effect is recorded, including the time taken from discovery to repair of urban component faults, the decrease in crowd density in the two grids, and the number of maintenance and security personnel deployed.

[0112] Secondly, in step S1052, the maintenance priority and route allocation are adjusted based on the execution results. For example, based on the records in the virtual environment, it was found that the repair of loose components in city G5 took longer than expected and affected the safety of the crowd. Therefore, the maintenance priority of city G5 was adjusted to be higher than that of other surrounding grids. At the same time, it was observed that the crowd in G6 still gathered in the southwest direction after being diverted, causing local congestion. The crowd route allocation scheme was optimized, and a guidance path from the east side of G6 to the square exit was added.

[0113] Finally, step S1053 generates a global optimization decision and feeds it back to update the model. For example, the adjusted priority order of "G5 city component repair takes precedence over G7" and the allocation scheme of "G6 adds an eastern diversion route" are integrated into a global optimization decision. This decision is fed back to G5, G6 and surrounding related grids. Each grid updates its local strategy model based on the global decision. For example, the G5 model adds an emergency repair response mechanism, and the G6 model updates crowd guidance rules, forming a dynamic decision-making closed-loop system of "decision making - virtual verification - effect feedback - model optimization".

[0114] In practical application, for the complex event of "urban component risk - crowd congestion - G7 - G8 - relatively severe" in the commercial area of ​​District B, the collected risk data included the location of the damaged interface of urban component G7, the peak crowd density of G8, and the bottleneck point of slow movement. This data was input into a local virtual environment to simulate the execution of instructions for a maintenance team to replace the interface at G7 and for staff to guide pedestrian flow towards the northwest exit at G8. The simulation recorded a repair time of 2 hours for the urban component, a reduction in crowd density at G8 from 6 people per 10 square meters to 4 people, and the deployment of 2 maintenance personnel and 3 guidance personnel. Based on the execution results, it was found that the maintenance priority of urban component G7 should be higher than that of urban components in other areas during the same period, and that insufficient guidance at the northwest exit of G8 still resulted in localized congestion. Therefore, the maintenance priority was adjusted, listing G7 as the first-level maintenance target, and optimizing crowd route allocation by adding a temporary passage on the north side of G8. After integrating these adjustments into a global optimization decision, the feedback was sent to the G7, G8, and adjacent G9 grids. Each grid updated its local strategy model; for example, the G7 model strengthened the emergency response process for urban components, and the G8 model added multi-exit collaborative guidance logic, forming a dynamic decision-making closed loop.

[0115] In the overall scheme of step S105 above, by verifying the effectiveness of the disposal instructions in a virtual environment, adjusting maintenance priorities and crowd routes based on the results, generating global optimization decisions, and feeding back to update the grid model, a closed-loop process from virtual testing to actual optimization is achieved. This process not only reduces the risk of actual decision-making errors, but also enables the decision-making scheme to dynamically adapt to changes in risk through continuous feedback optimization, thereby improving the scientific nature, flexibility, and long-term effectiveness of urban risk governance decisions.

[0116] The following is a complete embodiment for steps S101 to S105:

[0117] As shown in Figure 3, this method was specifically applied in the urban risk management project in District C. First, the system collected the latitude and longitude coordinates of 300 urban components and 80 other urban components within the jurisdiction, along with historical fault records from the past five years. Based on the spatial relationships of each urban component, a 100m × 100m grid was divided. A gridded risk map was constructed by combining fault frequency and type, clearly showing the high incidence of loosening faults in urban components in the northeastern grid. Subsequently, an environmental risk index was set in the map, including the number of faults, the most recent fault interval, and the basic impact value. The weights were dynamically adjusted through reinforcement learning—when a pressure drop was detected in a certain grid's urban component, the weight of the most recent fault interval was automatically increased, based on the historical data showing an increase in the probability of leakage after similar changes. Simultaneously, radar equipment was deployed in the commercial plaza to acquire three-dimensional point cloud data of the crowd, extracting the density distribution of 7 people per 10 square meters, the velocity vector moving northwest, and two fall behavior features. After registering this data with the grid coordinates, a dynamic heat map overlaid with crowd risk was generated. The system integrates environmental risk indices and heatmap features from the geographic map to identify high-risk clusters in grids G12-G15 caused by high-risk urban components and overcrowding. After tracking the expansion trend of these clusters from 9:00 AM to 11:00 AM, it generates a composite event identifier of "Urban Component Failure - Crowd Congestion - G12 - G15 - Severe" and corresponding instructions for repairing urban components and diverting crowds. Finally, the risk data is input into a local virtual environment simulation to execute instructions, recording the repair time of urban components and changes in crowd density. Based on this, the repair priority of urban components in G12 is adjusted to the highest level, and a new diversion route to the east is added to form a global optimization decision. This decision is then fed back to each grid to update its local strategy model, forming a continuously optimizing dynamic decision-making closed loop.

[0118] The reinforcement learning-based urban risk governance decision optimization method presented in this application achieves spatial integration of urban component risks by constructing a gridded risk map. It dynamically adjusts the environmental risk index weights using reinforcement learning, enabling risk assessments to accurately adapt to changes in the state of urban components and historical failure patterns, overcoming the limitations of fixed assessment standards. By integrating radar sensing and spatial registration technologies with crowd behavior data to generate dynamic risk heat maps, it achieves multi-dimensional correlation analysis between urban component risks and crowd risks, addressing the blind spots of traditional single-data-dimensional governance. Combining risk map and heat map features, it accurately identifies high-risk cluster areas and tracks their spatiotemporal evolution. The generated composite event identifiers and handling instructions achieve targeted risk response. Through local virtual environment verification and global optimization decision feedback, a dynamic decision-making closed loop is formed, effectively improving the efficiency of maintenance resource allocation and the scientific nature of crowd management, significantly enhancing the accuracy, timeliness, and synergy of urban risk governance.

[0119] Figure 4This is a schematic diagram illustrating a specific implementation of a reinforcement learning-based urban risk governance decision optimization system, as provided in this application. Figure 4 The system may include:

[0120] The data acquisition module 41 is used to collect the location coordinates and historical fault records of urban components, and to construct a gridded risk map based on the relationship between the location coordinates of each urban component and the historical fault records.

[0121] The adjustment module 42 is used to set an environmental risk index in the gridded risk map and dynamically adjust the weight of the environmental risk index based on the correlation between changes in the status of urban components and historical failures through reinforcement learning.

[0122] The registration module 43 is used to acquire three-dimensional point cloud data of crowd behavior by transmitting radar signals, and to spatially register the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data with the position coordinates of the gridded risk map to generate an overlaid dynamic risk heat map.

[0123] The generation module 44 is used to identify high-risk cluster areas with geographical location association based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map, and generate composite event identifiers and handling instructions containing urban component failure and crowd congestion characteristics according to the spatiotemporal evolution trend of the high-risk cluster areas.

[0124] Feedback module 45 is used to implement the disposal instructions in the local virtual environment based on the risk data corresponding to the composite event identifier and record the execution effect. Based on the execution effect, it adjusts the maintenance priority order and the allocation of crowd routes, generates a global optimization decision, and feeds the global optimization decision back to each grid to update the local strategy model, forming a dynamic decision-making closed-loop system.

[0125] The reinforcement learning-based urban risk governance decision optimization system of this application is used to implement the aforementioned reinforcement learning-based urban risk governance decision optimization method. Therefore, the specific implementation of the reinforcement learning-based urban risk governance decision optimization system can be found in the embodiment section of the reinforcement learning-based urban risk governance decision optimization method above. The specific implementation can be referred to the description of the corresponding embodiment, and will not be repeated here.

[0126] This application also provides an electronic device, comprising: a memory for storing a computer program; and a processor for executing the computer program to implement the steps of any of the above-described reinforcement learning-based urban risk governance decision optimization methods.

[0127] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of any of the above-described reinforcement learning-based urban risk governance decision optimization methods.

[0128] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as USB flash drives, read-only memory, random access memory, portable hard drives, magnetic disks, or optical disks.

[0129] Embodiments of the present invention also provide a computer program product, which includes a computer program that, when executed by a processor, implements the steps in any of the embodiments of the reinforcement learning-based urban risk governance decision optimization method.

[0130] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0131] The foregoing has provided a detailed description of the urban risk governance decision optimization method and system based on reinforcement learning provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and its core ideas. It should be noted that those skilled in the art can make various improvements and modifications to this application without departing from its principles, and these improvements and modifications also fall within the protection scope of this application.

Claims

1. A method for optimizing urban risk governance decisions based on reinforcement learning, characterized in that, include: Collect the location coordinates and historical fault records of urban components, and construct a gridded risk map based on the relationship between the location coordinates of each urban component and the historical fault records. The urban components are manhole covers and fire hydrants; An environmental risk index is set in the gridded risk map. The weight of the environmental risk index is dynamically adjusted by reinforcement learning based on the correlation between changes in the status of urban components and historical failures. The environmental risk index is a comprehensive indicator used to quantify the risk level of urban components within the grid. It is determined by the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component. By transmitting radar signals to acquire three-dimensional point cloud data of crowd behavior, the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data are spatially registered with the location coordinates of the gridded risk map to generate an overlaid dynamic risk heat map. Based on the environmental risk index of the gridded risk map and the population behavior characteristics of the dynamic risk heat map, high-risk cluster areas with geographical location associations are identified. According to the spatiotemporal evolution trend of the high-risk cluster areas, composite event identifiers and handling instructions containing urban component failure and population congestion characteristics are generated. The composite event identifier refers to a comprehensive identifier that includes risk type, location of occurrence, and severity. Based on the risk data corresponding to the composite event identifier, the disposal instructions are implemented in the local virtual environment and the execution effect is recorded. The maintenance priority order and crowd route allocation are adjusted according to the execution effect to generate a global optimization decision. The global optimization decision is fed back to each grid to update the local strategy model, forming a dynamic decision-making closed-loop system. The local strategy model refers to the model used by each grid for autonomous decision-making.

2. The method according to claim 1, characterized in that, Based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map, high-risk clusters associated with geographical locations are identified. According to the spatiotemporal evolution trend of these high-risk clusters, composite event identifiers and handling instructions containing characteristics of urban component failures and crowd congestion are generated, including: The environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map are invoked, wherein the crowd behavior characteristics include crowd density distribution, movement speed vector and fall behavior characteristics; Based on the numerical value of the environmental risk index and the parameter range of the crowd behavior characteristics, grids with high environmental risk and grid areas with abnormal crowd behavior are marked and identified as high-risk cluster areas. The location, extent, and risk characteristics of the high-risk clusters are continuously recorded at different points in time to form the spatiotemporal evolution trend of the high-risk clusters. Based on the risks associated with urban component failures and crowd congestion in the spatiotemporal evolution trend, a composite event identifier containing risk type, location, and severity is generated, and a handling instruction is generated based on the composite event identifier and the spatiotemporal evolution trend.

3. The method according to claim 2, characterized in that, Based on the numerical value of the environmental risk index and the parameter range of the population behavior characteristics, grids with high environmental risk and grid areas with abnormal population behavior are marked and identified as high-risk cluster areas, including: Based on the magnitude of the environmental risk index, the highest value range of environmental risk is set, and based on the parameter range of the crowd behavior characteristics, the abnormal parameter ranges of crowd density distribution, movement speed vector, and fall behavior characteristics are set respectively. Traverse the gridded risk map and mark the grids whose environmental risk index values ​​are in the highest range as grids with higher environmental risk. Traverse the dynamic risk heat map and mark the grids whose population behavior characteristics match the abnormal parameter range as grids with abnormal population behavior. Calculate the location distance between the grids with high environmental risk and the grids with abnormal crowd behavior, and integrate the associated grids whose location distance is less than a preset length into an initial clustering area; The initial clustering area with a total number of grids reaching a preset number is identified as a high-risk clustering area. In the initial clustering area where the number of grids has not reached the preset number, if the highest value of the environmental risk index of the grid or the most abnormal value of the population behavior characteristics exceeds the corresponding limit value, it is also identified as a high-risk clustering area.

4. The method according to claim 1, characterized in that, Three-dimensional point cloud data of crowd behavior is acquired by transmitting radar signals. The crowd density distribution, movement speed vector, and fall behavior characteristics in the three-dimensional point cloud data are spatially registered with the location coordinates of the gridded risk map to generate an overlaid dynamic risk heat map, including: Deploy radar equipment to transmit and receive reflected signals, and generate three-dimensional point cloud data covering the monitoring area based on the time difference and intensity changes of the reflected signals. The three-dimensional point cloud data contains spatial location information corresponding to the behavior of people in the monitoring area. The crowd density distribution, movement speed vector, and fall behavior features are extracted from the three-dimensional point cloud data, and the position coordinates of each grid in the gridded risk map are obtained. Spatial registration is performed between the spatial locations corresponding to the crowd density distribution, the movement speed vector, and the fall behavior features and the position coordinates of each grid, to determine the grid to which each feature belongs and store them together. On the gridded risk map, different shades of color are assigned according to the magnitude of the population density distribution within the grid, and arrows are used to mark the direction of the movement speed vector. Special marks are added to the locations where the fall behavior characteristics exist, generating an overlaid dynamic risk heat map.

5. The method according to claim 1, characterized in that, Spatial registration is performed between the spatial locations corresponding to the crowd density distribution, the movement speed vector, and the fall behavior feature and the position coordinates of each grid, respectively, to determine the grid to which each feature belongs and store them in association, including: Extract the spatial coordinates of each density measurement point corresponding to the crowd density distribution, the spatial coordinates of the starting point of each vector corresponding to the movement speed vector, and the spatial coordinates of each behavior occurrence point corresponding to the fall behavior feature; The boundary range of each grid is determined based on the position coordinates of each grid, and the boundary range includes the minimum and maximum coordinate values ​​of the grid in the horizontal and vertical directions; For each density measurement point of the crowd density distribution, each vector starting point of the moving speed vector, and each behavior occurrence point of the fall behavior feature, spatial coordinates in which both horizontal and vertical values ​​are between the minimum and maximum coordinate values ​​of the grid are selected, and the corresponding density measurement points, vector starting points, and behavior occurrence points are assigned to the same grid. A unique identifier is assigned to each of the same grids, and the population density distribution, movement speed vector, and fall behavior characteristics belonging to the same grid are associated with the unique identifier and stored in the corresponding data record table.

6. The method according to claim 1, characterized in that, Based on the risk data corresponding to the composite event identifier, the handling instructions are implemented in the local virtual environment and the execution effect is recorded. The maintenance priority order and crowd route allocation are adjusted according to the execution effect to generate a global optimization decision. This global optimization decision is then fed back to each grid to update its local strategy model, forming a dynamic decision-making closed-loop system, including: Collect risk data corresponding to the composite event identifier, input the risk data into the local virtual environment, implement the disposal command in the local virtual environment, and record the execution process and execution effect of the disposal command. The execution effect includes the efficiency of urban component failure resolution, the degree of crowd congestion relief, and resource consumption. Based on the execution results, the maintenance priority order for component failure risk and the crowd route allocation for crowd congestion risk in each city are adjusted accordingly to obtain the revised maintenance priority order and crowd route allocation. Based on the revised maintenance priority order and crowd route allocation, a global optimization decision is generated and fed back to each grid. Each grid updates its local strategy model according to the global optimization decision, forming a dynamic decision-making closed-loop system.

7. The method according to claim 1, characterized in that, An environmental risk index is set in the gridded risk map. Through reinforcement learning, the weights of the environmental risk index are dynamically adjusted based on the correlation between changes in the state of urban components and historical failures. This includes: An environmental risk index is set in each grid of the gridded risk map. The environmental risk index includes the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component. When the state of a city component changes, the grid position and state information before and after the change are extracted. Combined with the failure occurrence of similar state changes in historical failure records, a reinforcement learning framework is used to generate weight adjustment suggestions for each influencing factor. Based on the proposed weight adjustment, the weights of the corresponding influencing factors in the environmental risk index are corrected, and the environmental risk index is recalculated. The status changes and failure occurrences of subsequent urban components are continuously monitored, and the new feedback results are input into the reinforcement learning framework to dynamically adjust the weights of the environmental risk index.

8. A decision optimization system for urban risk governance based on reinforcement learning, characterized in that, include: The data acquisition module is used to collect the location coordinates and historical fault records of urban components. Based on the relationship between the location coordinates of each urban component and the historical fault records, a gridded risk map is constructed. The urban components are manhole covers and fire hydrants. The adjustment module is used to set an environmental risk index in the gridded risk map. Through reinforcement learning, the weight of the environmental risk index is dynamically adjusted according to the correlation between changes in the status of urban components and historical failures. The environmental risk index is a comprehensive indicator used to quantify the risk level of urban components within the grid. It is determined by the number of failures of urban components within the grid, the duration of the most recent failure, and the basic impact value corresponding to the type of urban component. The registration module is used to acquire three-dimensional point cloud data of crowd behavior by transmitting radar signals, and to spatially register the crowd density distribution, movement speed vector and fall behavior characteristics in the three-dimensional point cloud data with the position coordinates of the gridded risk map to generate an overlaid dynamic risk heat map. The generation module is used to identify high-risk cluster areas with geographical location association based on the environmental risk index of the gridded risk map and the crowd behavior characteristics of the dynamic risk heat map. According to the spatiotemporal evolution trend of the high-risk cluster areas, it generates composite event identifiers and handling instructions that include urban component failure and crowd congestion characteristics. The composite event identifier refers to a comprehensive identifier that includes risk type, location of occurrence, and severity. The feedback module is used to implement the disposal instructions in the local virtual environment based on the risk data corresponding to the composite event identifier and record the execution effect. Based on the execution effect, the module adjusts the maintenance priority order and the allocation of crowd routes, generates a global optimization decision, and feeds the global optimization decision back to each grid to update the local strategy model, forming a dynamic decision-making closed-loop system. The local strategy model refers to the model used by each grid for autonomous decision-making.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the reinforcement learning-based urban risk governance decision optimization method as described in any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, enables the implementation of the reinforcement learning-based urban risk governance decision optimization method as described in any one of claims 1 to 7.