A scenic spot personnel flow direction counting method and system

By collecting multimodal data through a drone platform and performing adaptive fusion and geographic coordinate mapping, combined with target detection and trajectory analysis technologies, the problem of rigid perspective and insufficient environmental adaptability in scenic area personnel counting and flow monitoring has been solved, realizing high-precision flow analysis and real-time management in all weather and all areas.

CN122200545APending Publication Date: 2026-06-12CHINA UNITED NETWORK COMM GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA UNITED NETWORK COMM GRP CO LTD
Filing Date
2026-03-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies for counting and monitoring the flow of people in scenic areas suffer from rigid perspectives, severe obstruction, poor environmental adaptability, and a lack of geospatial trajectory tracking capabilities, making it impossible to achieve high-precision monitoring and analysis across all weather conditions and areas.

Method used

Multimodal monitoring data is collected using an unmanned aerial vehicle (UAV) platform. The data is then adaptively fused with visible light images and thermal imaging data. A target YOLO model is used for personnel target detection. Furthermore, motion state prediction and trajectory analysis are performed through geographic coordinate system mapping and trajectory matching, combined with Kalman filtering and the Hungarian algorithm, to achieve high-precision flow direction counting and abnormal behavior detection.

🎯Benefits of technology

It achieves high-precision positioning and long-term trajectory tracking of personnel in complex scenarios, overcoming the limitations of traditional monitoring solutions in terms of viewing angle and poor environmental adaptability, and providing high-quality monitoring and real-time decision support in all weather and all areas.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122200545A_ABST
    Figure CN122200545A_ABST
Patent Text Reader

Abstract

The application discloses a kind of scenic spot personnel flow direction counting method and system.The method includes: obtaining multi-modal monitoring data and carrying out multi-modal fusion, generate fusion image;Fusion image is input to target YOLO model, and the position information of personnel target in image coordinate system is obtained;Subsequently position is mapped to geographic coordinate system, and the geographic coordinates of personnel target are obtained;Then predict the motion state of each personnel target under geographic coordinate system, and the detection result of current frame is matched with existing trajectory, and the motion trajectory of personnel carrying unique identifier is obtained;According to personnel motion trajectory, spatio-temporal correlation analysis is carried out, the moving speed and direction of each personnel are calculated, and the density clustering of geographic coordinate point in personnel motion trajectory is carried out, to identify the hotspot area of personnel gathering and main channel of people flow, to realize the counting of scenic spot personnel flow direction.The application can realize the high-precision positioning of scenic spot personnel in complex scene, long-time sequence trajectory tracking and flow accurate analysis.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information management technology, and in particular to a method and system for counting the flow of people in a scenic area. Background Technology

[0002] With the rapid development of the tourism industry, scenic area visitor flow management and safety monitoring face enormous challenges. Accurate personnel flow counting is the core of achieving refined management. However, existing personnel counting technologies mainly rely on fixed-location video surveillance networks, which have significant limitations: First, fixed cameras have rigid perspectives and limited coverage, resulting in blind spots and an inability to flexibly cope with complex terrain changes in scenic areas. Furthermore, at ground level or upward angles, severe mutual obstruction between dense crowds leads to a significant decrease in counting accuracy, making it difficult to obtain global flow information. Second, existing monitoring methods lack environmental adaptability, generally relying on single visible light imaging. Image quality drops sharply at night, in foggy or inclement weather conditions, making all-weather monitoring impossible. Finally, the lack of effective multi-source data fusion and geospatial analysis mechanisms makes it difficult for existing technologies to accurately map image data to a geographic coordinate system for long-term trajectory tracking. This results in the inability to accurately identify personnel movement trajectories, flow hotspots, and main channel distribution in complex scenarios, failing to provide real-time and accurate decision support for scenic area management. Summary of the Invention

[0003] The technical problem this invention aims to solve is to address the shortcomings of existing technologies for counting and monitoring the flow of people in scenic areas, such as rigid perspectives, severe obstruction, poor environmental adaptability, and a lack of geospatial trajectory tracking capabilities. This invention provides a method and system for counting the flow of people in scenic areas. This method can achieve high-precision positioning, long-term trajectory tracking, and accurate flow analysis of people in complex scenarios.

[0004] In a first aspect, the present invention provides a method for counting the flow of people in a scenic area, the method comprising:

[0005] Acquire multimodal monitoring data, which is data collected by the UAV platform within the scenic area according to a preset cruise path. The multimodal monitoring data includes visible light images, thermal imaging data, and the real-time location information of the UAV platform.

[0006] Visible light images and thermal imaging data are preprocessed, and multimodal fusion of the preprocessed visible light images and thermal imaging data is performed based on real-time ambient light intensity and weather conditions to generate a fused image.

[0007] The fused image is input into the target YOLO model to obtain the position information of the person target in the image coordinate system;

[0008] Based on the position information of the personnel target in the image coordinate system and the real-time position information of the UAV platform, the position of the personnel target is mapped to the geographic coordinate system to obtain the geographic coordinates of the personnel target;

[0009] Based on the geographic coordinates of the personnel targets, the motion state of each personnel target in the geographic coordinate system is predicted, and the detection results of the current frame are matched with existing trajectories to obtain the personnel motion trajectory carrying a unique identifier.

[0010] Spatiotemporal correlation analysis is performed on the movement trajectories of people to calculate the movement speed and direction of each person, and density clustering is performed on the geographical coordinate points in the movement trajectories to identify hot spots and main channels of people gathering, thereby realizing the counting of the flow of people in the scenic area.

[0011] Furthermore, the target YOLO model is an improved YOLOv8 model that incorporates a convolutional block attention module and is trained using MixUp and Mosaic data augmentation techniques.

[0012] Furthermore, before acquiring multimodal monitoring data, the method also includes: path planning;

[0013] Path planning, specifically including:

[0014] Based on the scenic area's electronic map and historical pedestrian density distribution data, a genetic algorithm is used to generate a preset cruise path for the drone platform, so that the preset cruise path can maximize the coverage frequency of historically high pedestrian density areas while meeting flight energy consumption constraints.

[0015] Furthermore, the method also includes: overload warning;

[0016] Overload warnings include:

[0017] The density of people in hot spots, obtained by counting the flow of people in the scenic area, is compared with the preset carrying capacity thresholds for each area of ​​the scenic area. If the density of people in any area continues to exceed its corresponding threshold, an overload warning is triggered and diversion guidance suggestions are generated.

[0018] Furthermore, the preprocessed visible light image and thermal imaging data are fused using multimodal methods, specifically including:

[0019] Calculate the image sharpness index of the visible light image and the thermal imaging signal-to-noise ratio of the thermal imaging data, respectively.

[0020] Dynamic fusion weights are determined based on image sharpness index and thermal imaging signal-to-noise ratio;

[0021] Visible light images and thermal imaging data are weighted and fused using dynamic fusion weights to generate a fused image.

[0022] Furthermore, mapping the location of personnel targets to a geographic coordinate system specifically includes:

[0023] Acquire the camera intrinsic parameters, shooting attitude angle, and relative flight altitude of the UAV platform at the time of data collection;

[0024] Based on camera intrinsic parameters, shooting attitude angle, and relative flight altitude, a perspective transformation model is used to establish the transformation relationship between the image coordinate system and the geographic coordinate system.

[0025] The positional information of personnel targets in the image coordinate system is converted into geographic coordinates of the personnel targets using transformation relationships.

[0026] Furthermore, the method also includes: abnormal behavior detection;

[0027] Abnormal behavior detection specifically includes:

[0028] Acquire visible light images;

[0029] Key points of human skeletons in visible light images were extracted using the OpenPose algorithm.

[0030] Abnormal behavior is identified based on the motion characteristics of key points in a person's skeleton. Abnormal behavior includes running or pushing.

[0031] When abnormal behavior is detected, an abnormal warning signal is generated.

[0032] Furthermore, the method also includes: dynamic cruise adjustment;

[0033] Dynamic cruise control adjustments include:

[0034] Based on the hotspots of crowds and real-time crowd density, a genetic algorithm is used to optimize and update the preset cruise route, generating a real-time optimized route.

[0035] The real-time optimized path is sent to the drone platform to control the drone platform to cruise according to the real-time optimized path.

[0036] Furthermore, based on the geographic coordinates of the personnel targets, the motion state of each personnel target in the geographic coordinate system is predicted, and the detection results of the current frame are matched with existing trajectories to obtain the personnel motion trajectory carrying a unique identifier, specifically including:

[0037] Using the Kalman filter algorithm, based on the state of the personnel's movement trajectory at the previous moment, the position and movement state of each personnel target in the geographic coordinate system at the current moment are predicted;

[0038] Obtain the geographic coordinates of each person target detected in the current frame;

[0039] Calculate the matching cost between the geographic coordinates of each person target in the current frame and each predicted location. The matching cost is measured using Euclidean distance or intersection-union ratio.

[0040] A cost matrix is ​​constructed based on the matching cost, and the Hungarian algorithm is used to solve the cost matrix to obtain the detection result and trajectory matching pair that minimizes the total matching cost.

[0041] Based on the detection results and trajectory matching pairs, the corresponding personnel movement trajectory is updated using the geographical coordinates of the successfully matched personnel targets in the current frame; for personnel targets that are not successfully matched, the corresponding personnel movement trajectory is initialized.

[0042] The original unique identifier of the successfully matched personnel movement trajectory is retained, and a new unique identifier is assigned to the initialized personnel movement trajectory, thereby outputting the personnel movement trajectory carrying the unique identifier.

[0043] Secondly, the present invention provides a system for counting the flow of people in a scenic area, which is used to execute the method for counting the flow of people in a scenic area described in the first aspect.

[0044] The system includes a drone platform, a data processing center, and monitoring terminals;

[0045] The drone platform is equipped with a visible light camera, a thermal imager, a positioning module, and a wireless communication module. The visible light camera is used to acquire visible light images; the thermal imager is used to acquire thermal imaging data; the positioning module is used to obtain the real-time location information of the drone platform; and the wireless communication module is used to transmit the visible light images, thermal imaging data, and real-time location information as multimodal monitoring data to the data processing center in real time according to the preset cruise path.

[0046] The data processing center comprises a multimodal fusion module, a target YOLO detection module, a geographic mapping module, a trajectory tracking module, and a flow analysis module, connected in sequence. The multimodal fusion module receives visible light images and thermal imaging data, dynamically weights and fuses them based on current ambient light intensity and weather conditions to generate a fused image, which is then output to the target YOLO detection module. The target YOLO detection module performs personnel target detection on the fused image and outputs the position information of each personnel target in the image coordinate system to the geographic mapping module. The geographic mapping module combines the position information of personnel targets in the image coordinate system with the real-time position information of the UAV platform. The system maps the location of personnel targets to a geographic coordinate system through coordinate transformation, obtaining the corresponding geographic coordinates, and inputs the geographic coordinates into the trajectory tracking module. The trajectory tracking module uses the Kalman filter algorithm to predict the movement state of each personnel target in the geographic coordinate system based on the geographic coordinates, and combines the Hungarian algorithm to match the detection results of the current frame with existing trajectories, constructing and updating personnel movement trajectories carrying unique identifiers, and transmitting the personnel movement trajectories to the flow direction analysis module. The flow direction analysis module performs spatiotemporal correlation analysis on the personnel movement trajectories, performs DBSCAN clustering on the geographic coordinate points contained in the trajectories, identifies hotspot areas where people gather and main channels of pedestrian flow, and generates personnel flow direction counting results.

[0047] The monitoring terminal is connected to the data processing center to receive and visualize the counting results of hotspot areas, main pedestrian channels, and pedestrian flow directions.

[0048] This invention effectively overcomes the inherent shortcomings of existing fixed monitoring schemes, such as limited field of view, poor environmental adaptability, and lack of spatial depth analysis, by introducing a mobile data acquisition platform using unmanned aerial vehicles (UAVs), an adaptive multimodal fusion mechanism for visible light and thermal imaging, and trajectory tracking and analysis technology based on geographic coordinate systems. Specific beneficial effects are as follows:

[0049] 1. To address the issues of rigid perspectives and severe obstruction caused by fixed cameras, this invention employs a drone platform to collect data along a preset cruise path. By acquiring global information about the scenic area from an aerial, overhead view, the drone can flexibly cover complex terrain areas such as mountains, waterways, and narrow trails. This effectively avoids the mutual obstruction caused by dense crowds when viewed from eye level or upward angles by fixed ground cameras, eliminating blind spots in traditional monitoring networks. It significantly improves the completeness and counting accuracy of personnel detection in dense scenes, providing high-quality raw data support for global flow analysis.

[0050] 2. To address the issue of insufficient environmental adaptability in existing technologies, this invention integrates visible light images and thermal imaging data, and performs adaptive multimodal fusion based on real-time light intensity and weather conditions. This mechanism fully utilizes the characteristics of thermal imaging—its immunity to light and its strong ability to penetrate smoke—to compensate for the drastic drop in image quality caused by a single visible light sensor in harsh environments such as nighttime, fog, and heavy rain. This ensures stable operation of the system under various weather conditions and significantly improves the robustness and all-weather monitoring capability of the scenic area personnel counting method.

[0051] 3. Addressing the lack of multi-source data fusion and geospatial analysis mechanisms, this invention combines real-time UAV location information with target detection results to achieve precise mapping of personnel targets from the image coordinate system to the geographic coordinate system. Based on this, it constructs personnel movement trajectories carrying unique identifiers by combining motion state prediction and trajectory matching technologies. Furthermore, through spatiotemporal correlation analysis and density clustering, it accurately identifies hotspots of crowd gathering and the distribution of main pedestrian channels. This not only solves the problem of long-term trajectory tracking but also represents a leap from simple counting to complex flow situational awareness, providing real-time and accurate data support for scenic area visitor flow early warning, traffic management, and refined management.

[0052] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0053] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof. The above and other features and advantages will become more apparent to those skilled in the art from the detailed description of exemplary embodiments with reference to the accompanying drawings, in which:

[0054] Figure 1 This is a schematic diagram of a method for counting the flow of people in a scenic area, provided in an embodiment of the present invention.

[0055] Figure 2 This is a schematic diagram of the overall framework of the flow of people in a scenic area provided in an embodiment of the present invention;

[0056] Figure 3 A flowchart illustrating the implementation of visitor flow in a scenic area, provided in an embodiment of the present invention;

[0057] Figure 4 This is a schematic diagram of a multimodal fusion detection structure for the flow of people in a scenic area, provided in an embodiment of the present invention. Detailed Implementation

[0058] It is understood that the specific embodiments and accompanying drawings described herein are merely for explaining the invention and are not intended to limit the invention.

[0059] It is understood that, without conflict, the various embodiments and features in the embodiments of the present invention can be combined with each other.

[0060] It is understood that, for ease of description, only the parts related to the present invention are shown in the accompanying drawings, while the parts unrelated to the present invention are not shown in the drawings.

[0061] It is understood that each unit or module involved in the embodiments of the present invention may correspond to only one entity structure, or may be composed of multiple entity structures, or multiple units or modules may be integrated into one entity structure.

[0062] It is understood that, without conflict, the functions and steps marked in the flowcharts and block diagrams of this invention may occur in a different order than that marked in the accompanying drawings.

[0063] It is understood that the flowcharts and block diagrams of this invention illustrate the possible architecture, functions, and operations of systems, apparatuses, devices, and methods according to various embodiments of this invention. Each block in the flowchart or block diagram may represent a unit, module, program segment, or code, containing executable instructions for implementing the specified function. Furthermore, each block or combination of blocks in the block diagram and flowchart can be implemented using a hardware-based system to achieve the specified function, or using a combination of hardware and computer instructions.

[0064] It is understood that the units and modules involved in the embodiments of the present invention can be implemented by software or by hardware. For example, the units and modules can be located in a processor.

[0065] Example 1:

[0066] This embodiment provides a method for counting the flow of people in a scenic area. The method in this embodiment relies on the technical solution of "UAV platform + multimodal data processing". It is applicable to the dynamic monitoring of the entire area during the peak tourist season / large-scale events (such as during National Day and music festivals, UAVs can flexibly cruise to break through the field of view limitations of fixed equipment, and multimodal fusion (visible light + thermal imaging) ensures that people can be stably detected even in severe weather such as strong light and rain, so as to realize real-time statistics of the flow of people in the entire scenic area and identification of hot spots), tracking the flow of people in scenic areas with complex terrain (such as areas where fixed equipment is difficult to cover, such as mountains and water areas, UAVs can flexibly fly to collect data and accurately identify the movement trajectory of people), and early warning and intervention of safety hazards (when there is excessive gathering of people or abnormal behavior, early warnings are generated in a timely manner through clustering and spatiotemporal correlation analysis to assist managers in making quick decisions). It solves the problems of "difficulty in full coverage, poor adaptability to harsh environments, and delayed early warning" in the background technology, and provides technical support for real-time personnel management in smart scenic areas.

[0067] like Figure 1 As shown, this embodiment provides a method for counting the flow of people in a scenic area, specifically including S1~S6.

[0068] Step S1: Acquire multimodal monitoring data. The multimodal monitoring data is the data collected by the UAV platform in the scenic area according to the preset cruise path. The multimodal monitoring data includes visible light images, thermal imaging data, and the real-time location information of the UAV platform.

[0069] The drone platform used for data collection is a multi-rotor drone with long endurance (≥30 minutes) and wind resistance (≥5 levels). Its hardware modules are as follows: Figure 2 As shown, it is equipped with a 4K high-definition camera (frame rate ≥30fps), a thermal imager (resolution ≥640×480), a GPS positioning module (positioning accuracy ≤0.5m), and a 5G / 4G data transmission module. The implementation process is as follows: Figure 3 As shown, during flight, a high-definition camera and a thermal imager simultaneously acquire images of people in the scenic area, while a GPS module obtains the drone's real-time location data. The acquired data is transmitted in real-time to the data processing center via a 5G / 4G module. After receiving the data, the data processing center first unifies the multimodal data format through the data receiving interface, then preprocesses it using median filtering / Gaussian filtering, and subsequently performs weighted fusion of visible light and thermal imaging data. The fused data is then input into the YOLOv8 target detection model for personnel detection, and then the DeepSORT multi-target tracking algorithm is used to track and store personnel trajectories. Next, the spatiotemporal correlation analysis module processes the data, and combined with the OpenPose abnormal behavior detection and overload / abnormal warning modules, the heat map of pedestrian flow is calculated. Finally, the data is visualized and displayed on the monitoring terminal. At the same time, the system adjusts the drone's path and parameters through control command feedback, forming a closed-loop management system.

[0070] As a specific implementation method, before acquiring multimodal monitoring data, the method further includes: path planning;

[0071] Path planning, specifically including:

[0072] Based on the scenic area's electronic map and historical pedestrian density distribution data, a genetic algorithm is used to generate a preset cruise path for the drone platform, so that the preset cruise path can maximize the coverage frequency of historically high pedestrian density areas while meeting flight energy consumption constraints.

[0073] Step S2: Preprocess the visible light image and thermal imaging data, and perform multimodal fusion of the preprocessed visible light image and thermal imaging data according to the real-time ambient light intensity and weather conditions to generate a fused image.

[0074] As a specific implementation method, multimodal fusion is performed on the preprocessed visible light image and thermal imaging data, specifically including:

[0075] Calculate the image sharpness index of the visible light image and the thermal imaging signal-to-noise ratio of the thermal imaging data, respectively.

[0076] Dynamic fusion weights are determined based on image sharpness index and thermal imaging signal-to-noise ratio;

[0077] Visible light images and thermal imaging data are weighted and fused using dynamic fusion weights to generate a fused image.

[0078] As a more specific implementation method, mapping the location of personnel targets to a geographic coordinate system includes:

[0079] Acquire the camera intrinsic parameters, shooting attitude angle, and relative flight altitude of the UAV platform at the time of data collection;

[0080] Based on camera intrinsic parameters, shooting attitude angle, and relative flight altitude, a perspective transformation model is used to establish the transformation relationship between the image coordinate system and the geographic coordinate system.

[0081] The positional information of personnel targets in the image coordinate system is converted into geographic coordinates of the personnel targets using transformation relationships.

[0082] Step S3: Input the fused image into the target YOLO model to obtain the position information of the person target in the image coordinate system.

[0083] As a specific implementation method, the target YOLO model is an improved YOLOv8 model that incorporates a convolutional block attention module and is trained using MixUp and Mosaic data augmentation techniques.

[0084] Step S4: Based on the position information of the personnel target in the image coordinate system and the real-time position information of the UAV platform, map the position of the personnel target to the geographic coordinate system to obtain the geographic coordinates of the personnel target.

[0085] Step S5: Based on the geographic coordinates of the personnel targets, predict the motion state of each personnel target in the geographic coordinate system, and match the detection results of the current frame with the existing trajectories to obtain the personnel motion trajectory carrying a unique identifier.

[0086] As a specific implementation method, based on the geographic coordinates of personnel targets, the motion state of each personnel target in the geographic coordinate system is predicted, and the detection results of the current frame are matched with existing trajectories to obtain the personnel motion trajectory carrying a unique identifier, specifically including:

[0087] Using the Kalman filter algorithm, based on the state of the personnel's movement trajectory at the previous moment, the position and movement state of each personnel target in the geographic coordinate system at the current moment are predicted;

[0088] Obtain the geographic coordinates of each person target detected in the current frame;

[0089] Calculate the matching cost between the geographic coordinates of each person target in the current frame and each predicted location. The matching cost is measured using Euclidean distance or intersection-union ratio.

[0090] A cost matrix is ​​constructed based on the matching cost, and the Hungarian algorithm is used to solve the cost matrix to obtain the detection result and trajectory matching pair that minimizes the total matching cost.

[0091] Based on the detection results and trajectory matching pairs, the corresponding personnel movement trajectory is updated using the geographical coordinates of the successfully matched personnel targets in the current frame; for personnel targets that are not successfully matched, the corresponding personnel movement trajectory is initialized.

[0092] The original unique identifier of the successfully matched personnel movement trajectory is retained, and a new unique identifier is assigned to the initialized personnel movement trajectory, thereby outputting the personnel movement trajectory carrying the unique identifier.

[0093] Step S6: Perform spatiotemporal correlation analysis based on the movement trajectory of people, calculate the movement speed and direction of each person, and perform density clustering on the geographical coordinate points in the movement trajectory of people to identify hot spots and main channels of people gathering, thereby realizing the counting of the flow of people in the scenic area.

[0094] As a specific implementation method, the method further includes: overload warning;

[0095] Overload warnings include:

[0096] The density of people in hot spots, obtained by counting the flow of people in the scenic area, is compared with the preset carrying capacity thresholds for each area of ​​the scenic area. If the density of people in any area continues to exceed its corresponding threshold, an overload warning is triggered and diversion guidance suggestions are generated.

[0097] As a specific implementation method, the method further includes: abnormal behavior detection;

[0098] Abnormal behavior detection specifically includes:

[0099] Acquire visible light images;

[0100] Key points of human skeletons in visible light images were extracted using the OpenPose algorithm.

[0101] Abnormal behavior is identified based on the motion characteristics of key points in a person's skeleton. Abnormal behavior includes running or pushing.

[0102] When abnormal behavior is detected, an abnormal warning signal is generated.

[0103] As a specific implementation method, the method further includes: dynamic cruise adjustment;

[0104] Dynamic cruise control adjustments include:

[0105] Based on the hotspots of crowds and real-time crowd density, a genetic algorithm is used to optimize and update the preset cruise route, generating a real-time optimized route.

[0106] The real-time optimized path is sent to the drone platform to control the drone platform to cruise according to the real-time optimized path.

[0107] The data processing center handles the following tasks: receiving visible light and thermal imaging data transmitted from UAVs and performing preprocessing operations such as grayscale conversion and normalization; expanding the dataset using MixUp and Mosaic technologies; inputting the preprocessed data into an improved YOLOv8 model (introducing the CBAM convolutional block attention module); enhancing the feature extraction capability for small targets through channel and spatial attention mechanisms; outputting information such as the location and category of personnel targets; acquiring personnel detection results from multimodal fusion detection; predicting the movement state of personnel targets using the Kalman filter algorithm; and establishing a personnel movement trajectory database by matching positions using the Hungarian algorithm; calculating personnel movement speed, direction, and dwell time using spatiotemporal correlation analysis; identifying hotspot areas using the DBSCAN clustering algorithm; and analyzing pedestrian flow trends using a spatiotemporal cube model to identify main pedestrian channels and gathering areas; and analyzing personnel posture using the OpenPose algorithm to identify abnormal behaviors such as running and pushing. The data processing center, as the core processing layer, receives visible light and thermal imaging data transmitted from the UAV and deploys an improved model based on YOLOv8 (its multimodal fusion detection structure diagram is shown in Figure 1). Figure 4As shown, the model improves detection performance by introducing the CBAM convolutional block attention module and employing MixUp and Mosaic data augmentation techniques. The specific process is as follows: After the visible light images and thermal imaging data collected by the UAV are transmitted to the data processing center, preprocessing operations such as grayscale conversion and normalization are performed to unify the data format and scale. Simultaneously, MixUp and Mosaic techniques are used to expand the dataset. The preprocessed data is then input into the improved YOLOv8 model. This model starts from the original image or multimodal fusion data, sequentially going through MixUp / Mosaic data augmentation, the C2F cross-stage module, and the SPPF spatial pyramid pooling module to optimize the feature extraction process. Then, through the CBAM convolutional block attention module (which utilizes channel and spatial attention mechanisms to enhance the feature capture capability for small targets), and after feature weighting optimization, multi-scale feature fusion is achieved through the FPN top-down feature pyramid and the PAN bottom-up path fusion. Finally, the classification head (person category prediction) and regression head (boundary box prediction) output the detection results, including the location, category, and confidence level of the personnel target. In the personnel flow analysis phase, the modules for acquiring data from multimodal fusion detection, target tracking based on Kalman filtering and the Hungarian algorithm, and data analysis work together. The process is as follows: After acquiring the personnel detection results from multimodal fusion detection, the Kalman filtering algorithm is used to predict the movement state of personnel targets, and the Hungarian algorithm is used to match positions, establishing and updating a personnel movement trajectory database. Next, spatiotemporal correlation analysis is used to calculate personnel movement speed, direction, and dwell time; the DBSCAN clustering algorithm is used to identify hotspot areas where personnel gather; and a spatiotemporal cube model is combined to analyze pedestrian flow trends, clarifying the main pedestrian channels and gathering areas within the scenic area. Abnormal behavior detection consists of an image acquisition module (sharing a camera with multimodal detection), a behavior recognition module based on the OpenPose algorithm, an early warning module, and a scenic area carrying capacity model module. The image acquisition module collects images of people's activities through scenic area cameras (sharing a data source with multimodal detection); the behavior recognition module uses the OpenPose algorithm to analyze people's postures, identify abnormal behaviors such as running and pushing, and trigger the early warning module; at the same time, it combines the scenic area's carrying capacity model to determine whether the scenic area is overloaded. If the overload threshold is reached, an early warning is issued and diversion suggestions are generated based on the analysis results of people's flow, so as to realize timely response to abnormal behaviors and dynamic management of scenic area traffic.

[0108] This embodiment focuses on three core directions: multimodal fusion and intelligent detection, dynamic analysis and intelligent scheduling, and intelligent early warning and system collaboration. Multimodal fusion and intelligent detection achieves high-precision personnel detection in complex scenes by fusing visible light and thermal imaging data and adjusting weights in real time based on the environment. Combined with an improved YOLO model incorporating data augmentation and CBAM modules, it particularly enhances the ability to identify small targets. Dynamic analysis and intelligent scheduling utilize Kalman filtering and the Hungarian algorithm to track personnel trajectories and analyze flow direction. This is combined with flow direction calculation and hotspot area identification algorithms based on spatiotemporal correlation analysis. Furthermore, it uses a genetic algorithm to plan the drone's cruise path and dynamically adjusts shooting parameters and computing resources based on personnel density to achieve efficient data acquisition and analysis. Intelligent early warning and system collaboration utilize the OpenPose algorithm to identify abnormal behavior, combined with… The scenic area carrying capacity model provides overload warnings and generates diversion suggestions. It constructs a collaborative architecture of drone platform, data processing center and monitoring terminal, and realizes convenient cross-platform management through standardized interface communication. Its technical details cover detection and fusion technology (multimodal data weighted fusion and dynamic weight adjustment method, improved YOLO model structure and training strategy), analysis and scheduling methods (trajectory establishment method based on Kalman filter and Hungarian algorithm, flow direction calculation and hot spot area identification algorithm based on spatiotemporal correlation analysis, genetic algorithm path planning method, personnel density driven parameter and resource dynamic adjustment strategy), and early warning and system architecture (OpenPose abnormal behavior identification method, early warning and diversion strategy combining carrying capacity and flow direction, system module architecture design, functional division, communication process and cross-platform monitoring terminal function implementation technical solution).

[0109] Example 2:

[0110] like Figure 2 As shown, this system provides a system for counting the flow of people in a scenic area. This system is used to execute the method for counting the flow of people in a scenic area described in Example 1.

[0111] The system includes a drone platform, a data processing center, and monitoring terminals;

[0112] The drone platform is equipped with a visible light camera, a thermal imager, a positioning module, and a wireless communication module. The visible light camera is used to acquire visible light images; the thermal imager is used to acquire thermal imaging data; the positioning module is used to obtain the real-time location information of the drone platform; and the wireless communication module is used to transmit the visible light images, thermal imaging data, and real-time location information as multimodal monitoring data to the data processing center in real time according to the preset cruise path.

[0113] The data processing center comprises a multimodal fusion module, a target YOLO detection module, a geographic mapping module, a trajectory tracking module, and a flow analysis module, connected in sequence. The multimodal fusion module receives visible light images and thermal imaging data, dynamically weights and fuses them based on current ambient light intensity and weather conditions to generate a fused image, which is then output to the target YOLO detection module. The target YOLO detection module performs personnel target detection on the fused image and outputs the position information of each personnel target in the image coordinate system to the geographic mapping module. The geographic mapping module combines the position information of personnel targets in the image coordinate system with the real-time position information of the UAV platform. The system maps the location of personnel targets to a geographic coordinate system through coordinate transformation, obtaining the corresponding geographic coordinates, and inputs the geographic coordinates into the trajectory tracking module. The trajectory tracking module uses the Kalman filter algorithm to predict the movement state of each personnel target in the geographic coordinate system based on the geographic coordinates, and combines the Hungarian algorithm to match the detection results of the current frame with existing trajectories, constructing and updating personnel movement trajectories carrying unique identifiers, and transmitting the personnel movement trajectories to the flow direction analysis module. The flow direction analysis module performs spatiotemporal correlation analysis on the personnel movement trajectories, performs DBSCAN clustering on the geographic coordinate points contained in the trajectories, identifies hotspot areas where people gather and main channels of pedestrian flow, and generates personnel flow direction counting results.

[0114] The monitoring terminal is connected to the data processing center to receive and visualize the counting results of hotspot areas, main pedestrian channels, and pedestrian flow. When it receives overload warning signals or abnormal behavior warning signals, it generates diversion guidance suggestions or issues warning prompts.

[0115] The scenic area visitor flow counting system in this embodiment uses a core architecture of "UAV platform + data processing center + monitoring terminal" to construct a complete process system from data collection to intelligent processing and application decision-making. The UAV platform, as the system's "front-end perception layer," is the core carrier for data collection. It is equipped with a high-definition camera (for collecting visible light images of the scenic area, recording personnel appearance and environmental details), a thermal imager (overcoming light limitations to collect thermal imaging data of personnel body temperature characteristics), a GPS positioning module (obtaining real-time UAV location information to provide a benchmark for geographic coordinate mapping), and a data transmission module (integrating visible light images, thermal imaging data, and real-time location information into "multimodal monitoring data," and transmitting it to the data processing center in real time according to a preset cruise path). Through hardware integration and data transmission, the UAV platform achieves dynamic monitoring of the entire scenic area, providing raw multimodal data support for subsequent processing and forming the basis for the system to perceive the status of personnel in the scenic area. The data processing center, as the system's "core processing layer," undertakes the key tasks of multimodal data fusion, personnel target identification, trajectory tracking, and flow analysis. Its internal modules are connected sequentially in a "data input-processing-output" manner. First, the multimodal fusion module receives visible light images and thermal imaging data. Based on the current ambient light intensity and weather conditions (e.g., strong light, overcast / rainy weather), it dynamically weights and fuses the two types of data (e.g., emphasizing visible light details in strong light and thermal imaging robustness in overcast weather), generating a fused image and outputting it to the target YOLO detection module. This module performs personnel target detection on the fused image based on the YOLO algorithm, efficiently identifying personnel targets in the image and outputting the position information of each person in the image coordinate system to the geographic mapping module. The geographic mapping module combines the personnel image position with the real-time position of the UAV, and uses coordinate transformation algorithms (e.g., perspective transformation, geographic projection) to map the people in the image coordinate system. Personnel locations are mapped to a geographic coordinate system, yielding corresponding geographic coordinates which are then input into the trajectory tracking module. The trajectory tracking module uses a Kalman filter algorithm to predict the motion state (such as speed and direction) of each personnel target in the geographic coordinate system. It then combines this with a Hungarian algorithm to match the current frame detection results with existing trajectories (addressing target occlusion and re-identification issues), constructing and updating personnel movement trajectories carrying unique identifiers, which are then transmitted to the flow analysis module. Finally, the flow analysis module performs spatiotemporal correlation analysis on the personnel movement trajectories, executes DBSCAN clustering (density clustering algorithm) on the geographic coordinate points in the trajectories, identifies hotspot areas and main pedestrian channels, and generates personnel flow count results. The data processing center utilizes the YOLO target detection model, DeepSORT tracking algorithm, and spatiotemporal correlation analysis / behavior recognition functions to achieve intelligent processing of multimodal data. The monitoring terminal, as the system's "application interaction layer," communicates with the data processing center, undertaking functions such as data visualization, early warning response, and closed-loop optimization.It receives "hotspot areas, main pedestrian channels, and pedestrian flow counts" from the data processing center, and visually displays the distribution and flow of people in the scenic area through an interface (such as charts, heat maps, and trajectory animations), assisting managers in grasping the real-time situation. Simultaneously, when it receives "overload warning signals" (such as pedestrian flow exceeding a threshold in a certain area) or "abnormal behavior warning signals" (such as crowd gathering or deviation from normal paths), it generates diversion guidance suggestions (such as prompting tourists to turn to low-density areas) or issues warning prompts (such as audible and visual alarms), supporting managers in making rapid decisions. Furthermore, the monitoring terminal can send "control commands" to the drone platform to adjust the patrol path, data collection frequency, etc., achieving dynamic system optimization (such as encrypted monitoring of hotspot areas), forming a closed-loop management system of "data collection-processing-application-feedback." Through functions such as generating pedestrian flow heat maps, flow direction analysis reports, anomaly warnings, and management personnel operations, the monitoring terminal completes data visualization and application decision-making.

[0116] It is understood that the above embodiments are merely exemplary implementations used to illustrate the principles of the present invention, and the present invention is not limited thereto. For those skilled in the art, various modifications and improvements can be made without departing from the spirit and essence of the present invention, and these modifications and improvements are also considered to be within the scope of protection of the present invention.

Claims

1. A method for counting the flow of people in a scenic area, characterized in that, The method includes: Acquire multimodal monitoring data, which is data collected by the UAV platform within the scenic area according to a preset cruise path. The multimodal monitoring data includes visible light images, thermal imaging data, and the real-time location information of the UAV platform. The visible light image and the thermal imaging data are preprocessed, and the preprocessed visible light image and the thermal imaging data are fused in a multimodal manner according to the real-time ambient light intensity and weather conditions to generate a fused image. The fused image is input into the target YOLO model to obtain the position information of the person target in the image coordinate system; Based on the position information of the personnel target in the image coordinate system and the real-time position information of the UAV platform, the position of the personnel target is mapped to the geographic coordinate system to obtain the geographic coordinates of the personnel target; Based on the geographic coordinates of the personnel targets, the motion state of each personnel target in the geographic coordinate system is predicted, and the detection result of the current frame is matched with the existing trajectory to obtain the personnel motion trajectory carrying a unique identifier. Based on the movement trajectory of the people, spatiotemporal correlation analysis is performed to calculate the movement speed and direction of each person, and density clustering is performed on the geographical coordinate points in the movement trajectory to identify hot spots and main channels of people gathering, thereby realizing the counting of the flow of people in the scenic area.

2. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The target YOLO model is an improved YOLOv8 model that incorporates a convolutional block attention module and is trained using MixUp and Mosaic data augmentation techniques.

3. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, Before acquiring the multimodal monitoring data, the method further includes: path planning; The path planning specifically includes: Based on the scenic area's electronic map and historical pedestrian density distribution data, a genetic algorithm is used to generate a preset cruise path for the UAV platform, so that the preset cruise path maximizes the coverage frequency of historically high pedestrian density areas while meeting flight energy consumption constraints.

4. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The method also includes: overload warning; The overload warning specifically includes: The density of people in hot spots, obtained by counting the flow of people in the scenic area, is compared with the preset carrying capacity thresholds for each area of ​​the scenic area. If the density of people in any area continues to exceed its corresponding threshold, an overload warning is triggered and diversion guidance suggestions are generated.

5. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The step of performing multimodal fusion of the preprocessed visible light image and the thermal imaging data specifically includes: Calculate the image sharpness index of the visible light image and the thermal imaging signal-to-noise ratio of the thermal imaging data, respectively. The dynamic fusion weights are determined based on the image sharpness index and the thermal imaging signal-to-noise ratio. The visible light image and the thermal imaging data are weighted and fused using the dynamic fusion weights to generate the fused image.

6. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The step of mapping the location of the personnel target to a geographic coordinate system specifically includes: Acquire the camera intrinsic parameters, shooting attitude angle, and relative flight altitude of the UAV platform at the time of data acquisition; Based on the camera intrinsic parameters, the shooting attitude angle, and the relative flight altitude, a perspective transformation model is used to establish the transformation relationship between the image coordinate system and the geographic coordinate system; The location information of the person target in the image coordinate system is converted into the geographic coordinates of the person target using the transformation relationship.

7. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The method further includes: abnormal behavior detection; The abnormal behavior detection specifically includes: Acquire the visible light image; The OpenPose algorithm was used to extract key points of the human skeleton in the visible light image; Abnormal behavior is identified based on the motion characteristics of the key points of the human skeleton, including running or pushing. When the abnormal behavior is detected, an abnormal warning signal is generated.

8. The method for counting the flow of people in a scenic area according to claim 1, characterized in that, The method also includes: dynamic cruise adjustment; The dynamic cruise adjustment specifically includes: Based on the hotspots where people gather and the real-time crowd density, the preset cruise route is optimized and updated using a genetic algorithm to generate a real-time optimized route. The real-time optimized path is sent to the drone platform to control the drone platform to cruise according to the real-time optimized path.

9. The method for counting the flow of people in a scenic area according to any one of claims 1 to 8, characterized in that, Based on the geographic coordinates of the personnel targets, the motion state of each personnel target in the geographic coordinate system is predicted, and the detection results of the current frame are matched with existing trajectories to obtain the personnel motion trajectory carrying a unique identifier, specifically including: Using the Kalman filter algorithm, based on the state of the personnel's movement trajectory at the previous moment, predict the position and movement state of each personnel target in the geographic coordinate system at the current moment; Obtain the geographic coordinates of each of the detected personnel targets in the current frame; Calculate the matching cost between the geographic coordinates of each of the personnel targets in the current frame and each of the predicted locations, wherein the matching cost is measured using Euclidean distance or intersection-union ratio; A cost matrix is ​​constructed based on the matching cost, and the Hungarian algorithm is used to solve the cost matrix to obtain the detection result and trajectory matching pair that minimizes the total matching cost. Based on the detection results and trajectory matching pairs, the corresponding personnel movement trajectory is updated using the geographical coordinates of the successfully matched personnel targets in the current frame; for personnel targets that are not successfully matched, the corresponding personnel movement trajectory is initialized. The original unique identifier of the matched personnel movement trajectory is retained, and a new unique identifier is assigned to the initialized personnel movement trajectory, thereby outputting the personnel movement trajectory carrying the unique identifier.

10. A system for counting the flow of people in a scenic area, characterized in that, The system is used to execute the method for counting the flow of people in a scenic area as described in any one of claims 1 to 9. The system includes a drone platform, a data processing center, and a monitoring terminal; The drone platform is equipped with a visible light camera, a thermal imager, a positioning module, and a wireless communication module; wherein, the visible light camera is used to acquire visible light images; the thermal imager is used to acquire thermal imaging data; the positioning module is used to obtain the real-time location information of the drone platform; and the wireless communication module is used to transmit the visible light images, the thermal imaging data, and the real-time location information as multimodal monitoring data to the data processing center in real time according to a preset cruise path. The data processing center includes a multimodal fusion module, a target YOLO detection module, a geographic mapping module, a trajectory tracking module, and a flow analysis module connected in sequence. The multimodal fusion module receives the visible light image and the thermal imaging data, and dynamically weights and fuses them according to the current ambient light intensity and weather conditions to generate a fused image, which is then output to the target YOLO detection module. The target YOLO detection module performs personnel target detection on the fused image and outputs the position information of each personnel target in the image coordinate system to the geographic mapping module. The geographic mapping module combines the position information of the personnel targets in the image coordinate system with the actual position of the UAV platform. The system uses location information to map the location of the personnel targets to a geographic coordinate system through coordinate transformation, obtaining the corresponding geographic coordinates, and inputs the geographic coordinates into the trajectory tracking module. The trajectory tracking module uses the Kalman filter algorithm to predict the motion state of each personnel target in the geographic coordinate system based on the geographic coordinates, and combines the Hungarian algorithm to match the detection results of the current frame with existing trajectories, constructing and updating personnel movement trajectories carrying unique identifiers, and transmitting the personnel movement trajectories to the flow direction analysis module. The flow direction analysis module performs spatiotemporal correlation analysis on the personnel movement trajectories, performs DBSCAN clustering on the geographic coordinate points contained in the trajectories, identifies hotspot areas where people gather and main pedestrian channels, and generates personnel flow direction counting results. The monitoring terminal is communicatively connected to the data processing center and is used to receive and visualize the hotspot area, the main pedestrian channel, and the pedestrian flow direction counting results.