A method for built environment spatial orientation positioning and visualization based on video images
By constructing a pedestrian recognition material library through surveillance video, and using OpenCV and artificial intelligence algorithms to establish coordinate mapping and generate heat map, the problem of crowd recognition and visualization in the built environment is solved, and low-cost and efficient space use analysis is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2022-12-07
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to efficiently and cost-effectively identify and visualize population distribution patterns and trends in built environments, especially in large public buildings. Traditional methods are inefficient and inaccurate, smartphone positioning has large errors, sensors are expensive, and data processing and visual presentation are inadequate.
By acquiring surveillance video, a pedestrian recognition material library is built. Pedestrians are detected using OpenCV and artificial intelligence algorithms. A mapping relationship between image coordinates and actual coordinates is established, and a pedestrian distribution heat map is generated and dynamically displayed.
It enables convenient and low-cost location tracking of pedestrians and dynamic visualization of thermal distribution, helping architects understand space usage patterns and pedestrian flow dynamics, and optimize spatial layout.
Smart Images

Figure CN115830504B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of architectural design, and in particular to an assessment and feedback mechanism for the built environment and public spaces. Background Technology
[0002] When architects design buildings, their primary goal is to ensure that the space meets human needs, allowing people to experience happiness and fulfillment within the constructed environment. Therefore, the interaction between people and the built environment—that is, people's perception of space and their various behavioral expressions within it—becomes a crucial factor in evaluating the success or failure of a built environment. Examining the behavioral patterns of people within the built environment is also an important means for architects to explore the principles of architectural design, providing insights and guidance for future projects. Among these, the distribution and variation patterns of people within the built environment are a key entry point for studying these behavioral patterns. To obtain data on the distribution and variation patterns of people, it is first necessary to locate and identify people within the environment, conduct statistical analysis, and then present this data through visualization.
[0003] Several existing technologies exist for identifying, locating, and statistically analyzing people. A traditional method involves creating a floor plan of the built environment, dividing it into grids, and visually observing and counting the number of people in each grid at a given time, creating a chart. However, this method is inefficient and inaccurate, producing static and localized data that fails to reflect the dynamic, diverse, and complex distribution of people in real space. This is particularly problematic for large public buildings with complex spatial relationships and numerous users. Later, with the widespread adoption of smartphones, a large amount of mobile phone location data containing temporal and spatial characteristics of human activity was collected and applied. While this data can record and reflect the distribution of people in the built environment, the development of mobile phone signal detection technology in recent years has rendered this method inapplicable. Other methods include using sensors in conjunction with tags worn by people to monitor their location. However, this method is costly and requires human cooperation, making it difficult to implement in large public buildings with high population density and mobility. Other systems for locating people include Wi-Fi, Bluetooth, and UltraWide Band (UWB). These systems differ in their principles, costs, and accuracy. Wi-Fi and Bluetooth systems have significant positioning errors; radio signals are blocked and reflected in indoor spaces, resulting in highly uneven signal attenuation with distance. For small to medium-sized buildings, this is insufficient for studying indoor environmental behavior. UWB technology offers centimeter- to decimeter-level accuracy, but its high cost and inability to penetrate building walls limit its suitability for smaller-scale residential and office spaces. Furthermore, how to process and visually represent the locating data collected by these systems to allow architects to intuitively visualize the thermal distribution and changes in crowds, facilitating further exploration and analysis, is also a research topic.
[0004] Therefore, existing technologies urgently need a new method for recording, analyzing, and presenting human usage patterns and behavioral trajectories in space, to achieve a universal, simple, and low-cost way to determine the location of people in the built environment and visualize their thermal distribution. To this end, the following four challenges must be addressed:
[0005] First, acquire basic imagery. Too many images will result in excessive computational workload, impacting efficiency. Conversely, too few images will fail to capture the dynamic distribution characteristics of pedestrians in space. Therefore, it is necessary to determine an appropriate number of frames for video image extraction.
[0006] Secondly, intelligent pedestrian detection in blurred images. Video footage from public spaces often has a large field of view, resulting in low accuracy in pedestrian capture. Therefore, when performing intelligent pedestrian feature detection, manual verification is necessary to further improve the accuracy of intelligent detection.
[0007] Next, spatial orientation is calculated. Determining the spatial orientation of pedestrians from the image on a plane requires establishing a mathematical model for spatial orientation calculation based on spatial coordinate mapping relationships.
[0008] Finally, thermal distribution visualization calculation. Based on image frames, pedestrian detection, and spatial coordinate mapping, a thermal distribution calculation model needs to be established according to the kernel density calculation principle. Summary of the Invention
[0009] This invention aims to at least solve one of the technical problems existing in the prior art. To this end, this invention provides a method for spatial positioning and visualization of built environments based on video imagery, the method comprising the following implementation steps:
[0010] S1: Obtain surveillance video of the built environment from a certain perspective within a certain time period and construct a pedestrian recognition material library;
[0011] S2: Perform pedestrian recognition and detection on the pedestrian recognition material library, and obtain the image coordinates of all pedestrians in the pedestrian recognition material library;
[0012] S3: Obtain the actual floor plan of the built environment corresponding to the video, and construct the actual rectangular coordinate system of the built environment;
[0013] S4: Establish the mapping relationship between the image coordinates of the pedestrian in the surveillance video and the actual coordinates in the plan view of the built environment;
[0014] S5: Obtain the actual coordinates of all pedestrians in the pedestrian recognition material library;
[0015] S6: Based on the actual coordinates of the pedestrians, represent all pedestrians as points on the plan view of the built environment;
[0016] S7: Draw a heat map of pedestrian distribution in the built environment and display it dynamically.
[0017] Furthermore, step S1 includes the following sub-steps:
[0018] S11: Acquire the surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n×(T-1)×v≤, T is the expected number of times the trajectory of the same pedestrian is included in the pedestrian recognition material library, v is the pedestrian walking speed, and s is the area of the actual area reflected by the video.
[0019] S12: Extract frame images from the surveillance video according to the time interval, and the collection of frame images forms the pedestrian recognition material library; wherein, the method of extracting frame images is to process the video using the VideoCapture class in OpenCV, and save the extracted frame images as image files or directly use them as input values for the next step of pedestrian detection.
[0020] Furthermore, step S1 includes the following sub-steps:
[0021] S11a: Acquire surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n = deltaL / v, where deltaL is the expected distance of the change in the landing point position of the same pedestrian's trajectory in adjacent frames, and v is the pedestrian's walking speed.
[0022] S12: Extract frame images from the surveillance video according to the time interval, and the collection of frame images forms the pedestrian recognition material library; wherein, the method of extracting frame images is to process the video using the VideoCapture class in OpenCV, and save the extracted frame images as image files or directly use them as input values for the next step of pedestrian detection.
[0023] Furthermore, step S2 includes the following sub-steps:
[0024] S21: Construct the image pixel coordinate system in the frame image;
[0025] S22: Use a pedestrian detection model to detect people in the frame image and outline them with a bounding box. Take the midpoint of the bottom edge of the bounding box as the point where the pedestrian's feet are located, and obtain the image coordinates of this point in the image pixel coordinate system, thereby obtaining the image coordinates (xm, ym) of the pedestrian based on the image pixel coordinate system.
[0026] S23: Repeat step S22 above until the image coordinates of pedestrians in all images in the pedestrian recognition material library are obtained;
[0027] S24: Check the detection results of the pedestrian detection model to determine if there are any omissions in pedestrian recognition. If there are pedestrians in some frames that have not been recognized, manually record and supplement them, and feed them back to the pedestrian detection model for further learning and algorithm optimization. If there are no omissions, you can directly proceed to the next step.
[0028] Furthermore, the pedestrian detection model's algorithm is based on the Region Proposal R-CNN family of algorithms or the YOLO and SSD algorithms; the R-CNN family of algorithms includes the R-CNN algorithm, the Fast R-CNN algorithm, or the Faster R-CNN algorithm. The algorithm first generates pedestrian target candidate boxes, and then classifies and regresses the pedestrian target candidate boxes; the YOLO and SSD algorithms use a convolutional neural network (CNN) to directly predict the category and location of different pedestrian targets.
[0029] Furthermore, step S3 includes the following sub-steps:
[0030] S31: Determine an origin point on the floor plan of the built environment reflected in the surveillance video, and establish an actual plane rectangular coordinate system by selecting a certain unit of size;
[0031] S32: If there are videos from multiple adjacent locations, a unified original origin must be selected, and a certain size unit must be selected to establish an actual plane rectangular coordinate system. The planar diagrams reflected by other monitoring videos must be mapped to this coordinate system, and there is no need to establish a separate coordinate system.
[0032] Furthermore, step S4 includes the following sub-steps:
[0033] S41: Find an image with minimal pedestrian occlusion in the pedestrian recognition material library, select at least four easily identifiable spatial reference points on the same horizontal plane in the image, the four spatial reference points cannot have three points collinear, and record the image coordinates (xi,yi) of all spatial reference points based on the image pixel coordinate system in sequence.
[0034] S42: Locate the planar position points corresponding to the spatial reference points marked in the pedestrian recognition material library, and record the actual coordinates (xj, yj) of all planar position points in the actual plane rectangular coordinate system in sequence;
[0035] S43: Using the image coordinates and actual coordinates of the four or more sets of spatial reference points obtained in step S42 as known conditions, the mapping relationship between the image coordinates and actual coordinates of the built environment is obtained by solving the homography matrix H.
[0036] Furthermore, the specific solution process for the homography matrix H in step S43 is as follows:
[0037] S431: Let the image coordinates of the spatial reference point be q2 and the actual coordinates be q1. The image coordinates and actual coordinates of the spatial reference point have the following coordinate correspondence: Where H represents the homography matrix;
[0038] S432: Let the coordinates of q1 be (u1, v1, 1) and the coordinates of q2 be (u2, v2, 1). Expanding the expression in S431, we get:
[0039]
[0040] S433: By normalizing using H33=1, we can obtain:
[0041]
[0042] S434: Eight equations are provided using the image coordinates and actual coordinates of the four spatial reference points:
[0043]
[0044] S435: Solve the system of linear equations in S434 to obtain the homography matrix H.
[0045] Furthermore, step S5 includes the following sub-steps:
[0046] S51: Substitute the pedestrian image coordinates obtained in step S2 into the homography matrix H obtained in step S4 to obtain the actual coordinates of all pedestrians detected in the pedestrian recognition material library.
[0047] S52: Output the actual coordinates (xn, yn) of all pedestrians and generate a text document containing all (xn, yn) coordinate information.
[0048] Furthermore, step S6 includes the following sub-steps:
[0049] S61: Import the CAD file of the plan view of the built environment into ArcGIS, and adjust the coordinates of the plan view in ArcGIS according to the actual Cartesian coordinate system established in step S32 to ensure that the coordinates in ArcGIS are consistent with the established actual Cartesian coordinate system.
[0050] S62: Combine the text files obtained in step S5 into a table, modify the coordinates in the form of (xn, yn) into two columns of xm and ym by splitting the data into columns, indicate the horizontal (x) and vertical (y) coordinates in the table header, and save it as a csv or xls file;
[0051] S63: Import the table into ArcGIS, use the xy-to-line tool to convert the x field in the table to the x-axis and the y field to the y-axis, and generate point features at the corresponding positions on the plan view. Each point represents the position of a pedestrian in the pedestrian recognition material library on the plan view.
[0052] Furthermore, step S7 includes the following sub-steps:
[0053] S71: The kernel function is used to calculate the value of each unit area based on the points representing the pedestrian positions to fit each point to a smooth conical surface, and the calculation range is set to the range of the planar map to calculate the kernel density value.
[0054] S72: Generate a kernel density map with varying color intensity based on the kernel density values calculated in step S71, thereby forming a pedestrian distribution heat map. The darker the color, the higher the pedestrian density within that spatial range.
[0055] S73: Dynamically display pedestrian distribution heatmaps to achieve better visual effects.
[0056] Furthermore, the method for calculating the kernel density value in step S71 is as follows:
[0057]
[0058] For dist i <radius
[0059] in:
[0060] i = 1, ..., n are the input points; if they are within the radius of the (x, y) position, only the points in the sum are included;
[0061] popi is the value of the population field at point i, and is an optional parameter;
[0062] disti is the distance between point i and the position (x,y).
[0063] The present invention can achieve the following technical effects:
[0064] 1. By using artificial intelligence algorithms to analyze the images captured by surveillance video frame by frame and identify pedestrians, they are marked as rectangles. Then, the midpoint of the bottom edge of the pedestrian rectangle is used to represent the coordinates of the pedestrian, thus realizing the localization of pedestrians in a simple, low-cost and easy-to-implement way.
[0065] 2. By establishing a mapping relationship between the image pixel coordinates and Cartesian coordinates of spatial reference points, pedestrian information appearing in videos can be transformed into pedestrian heat maps on floor plans and dynamically displayed. Through the overlay of architectural floor plans and heat maps, information such as where pedestrians congregate, which spaces are used most frequently, pedestrian flow, and tidal changes can be intuitively displayed. This helps urban designers, planners, and architects better discover the interaction patterns between space and user behavior, assess the usage characteristics of public spaces, especially densely populated areas, and provide insights for optimizing spatial layout and improving site design. Attached Figure Description
[0066] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
[0067] Figure 1 This is a schematic diagram of a frame image extracted from a surveillance video taken by a camera from a certain angle in a certain built environment in step S1 at a certain moment.
[0068] Figure 2 This is a schematic diagram of the frame image for which the image pixel coordinate system has been constructed in step S21.
[0069] Figure 3 This is a schematic diagram of detecting pedestrians in the frame image and obtaining image coordinates in step S22.
[0070] Figure 4 This is a schematic diagram of step S31, in which an origin is determined on the plan view of the built environment reflected in the monitoring video, and a certain size unit is selected to establish an actual plane rectangular coordinate system.
[0071] Figure 5 In step S32, there are multiple videos of adjacent locations being monitored. A unified original origin is selected, and a certain size unit is selected to establish an actual plane rectangular coordinate system. The planar diagrams reflected by other monitoring videos are mapped to the schematic diagrams in this coordinate system.
[0072] Figure 6 In step S41, a pedestrian recognition material library is found with minimal pedestrian occlusion. At least four easily identifiable spatial reference points are selected on the same horizontal plane in the image, and the image coordinates (xi, yi) of all spatial reference points based on the image pixel coordinate system are recorded sequentially.
[0073] Figure 7 This is a schematic diagram of finding the planar position points corresponding to the spatial reference points marked in the pedestrian recognition material library in step S42, and sequentially recording the actual coordinates (xj, yj) of all planar position points in the actual plane rectangular coordinate system.
[0074] Figure 8 This is a schematic diagram of step S52, which outputs the actual coordinates (xn, yn) of all pedestrians and generates a text document containing all (xn, yn) coordinate information.
[0075] Figure 9 This is a diagram illustrating how, in step S62, the text files obtained in step S5 are integrated into a table, and the coordinates in the form of (xn, yn) are modified into two columns, xm and ym, by splitting the data into columns. The horizontal (x) and vertical (y) coordinates are marked in the table header, and the table is saved as a CSV or XLS file.
[0076] Figure 10 In step S63, the table is imported into ArcGIS. Using the xy-to-line tool, the x field in the table is used as the horizontal axis and the y field as the vertical axis to generate point features at the corresponding positions on the plan view. Each point represents a schematic diagram of the position of a pedestrian in the pedestrian recognition material library on the plan view.
[0077] Figure 11 This is a schematic diagram illustrating the dynamic display of the pedestrian distribution heatmap in step S73.
[0078] Figure 12 This is a flowchart of a method for spatial positioning and visualization of built environment based on video images, as disclosed in an embodiment of the present invention. Detailed Implementation
[0079] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0080] In the description of this invention, the terms "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and do not require that this invention must be constructed and operated in a specific orientation. Therefore, they should not be construed as limiting this invention.
[0081] Furthermore, in the description of this invention, unless otherwise specified and limited, it should be noted that the terms "installation", "connection" and "linking" should be interpreted broadly. For example, they can refer to mechanical or electrical connections, or internal connections between two components. They can be direct connections or indirect connections through an intermediate medium. Those skilled in the art can understand the specific meaning of the above terms according to the specific circumstances.
[0082] Furthermore, in any of the descriptions of the methods described below, any process or method described in the flowcharts or otherwise herein may be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order according to the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.
[0083] It should be understood that various parts of the present invention can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system.
[0084] Combination Figure 12 This embodiment discloses a method for spatial positioning and visualization of built environment based on video imagery, the method comprising the following implementation steps:
[0085] S1: Obtain surveillance video of the built environment from a certain perspective within a certain time period and construct a pedestrian recognition material library;
[0086] S2: Perform pedestrian recognition and detection on the pedestrian recognition material library, and obtain the image coordinates of all pedestrians in the pedestrian recognition material library;
[0087] S3: Obtain the actual floor plan of the built environment corresponding to the video, and construct the actual rectangular coordinate system of the built environment;
[0088] S4: Establish the mapping relationship between the image coordinates of the pedestrian in the surveillance video and the actual coordinates in the plan view of the built environment;
[0089] S5: Obtain the actual coordinates of all pedestrians in the pedestrian recognition material library;
[0090] S6: Based on the actual coordinates of the pedestrians, represent all pedestrians as points on the plan view of the built environment;
[0091] S7: Draw a heat map of pedestrian distribution in the constructed environment and display it dynamically.
[0092] The specific implementation steps of this method can be combined with Figure 12 Understand the flowchart shown.
[0093] Specifically, step S1 is implemented in the following manner:
[0094] S11: Acquire the surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n×(T-1)×v≤√s, where T is the expected number of times the trajectory of the same pedestrian is included in the pedestrian recognition material library, v is the pedestrian's walking speed, and s is the area of the actual area reflected by the video.
[0095] S12: Extract the frame image of the moment to be identified from the surveillance video or extract the frame image at regular intervals within the time segment to be identified. The collection of the frame images forms the pedestrian recognition material library. Specifically, the method of extracting the frame image is to process the video using the VideoCapture class in OpenCV, and save the extracted frame image as an image file or use it directly as the input value for the next step of pedestrian detection.
[0096] This embodiment is more suitable for situations where the actual area reflected in the video is large. For example, when the actual area s reflected in the video is about 900 square meters, in step S11, the average walking speed v of the pedestrian is about 1.2 meters / second. The expected number of times the same pedestrian's trajectory is included in the pedestrian recognition material library should reach more than 5 times. According to the constraint condition n×(5-1)×1.2≤√s, n≤6.25 can be calculated. Then, we can round down to n=6 seconds. Therefore, the frame image extraction time interval n can be determined to be 6 seconds.
[0097] In another embodiment, step S1 can also be implemented in the following way:
[0098] S11a: Acquire surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n = deltaL / v, where deltaL is the expected distance of the change in the landing point position of the same pedestrian's trajectory in adjacent frames, and v is the pedestrian's walking speed.
[0099] S12: Extract frame images from the surveillance video according to the time interval, and the collection of frame images forms the pedestrian recognition material library; wherein, the method of extracting frame images is to process the video using the VideoCapture class in OpenCV, and save the extracted frame images as image files or directly use them as input values for the next step of pedestrian detection.
[0100] This embodiment is more suitable for situations where the actual area reflected in the video is relatively small. Specifically, in step S11a, for example, if a pedestrian's walking speed is approximately 1-1.5 meters per second, the expected distance deltaL for the change in the landing point position of the same pedestrian's trajectory in adjacent frames should be 2-3 meters. This allows for clear identification of the pedestrian's physical spatial behavior through the monitoring video. Therefore, by using the constraint n = deltaL / v, n can be calculated to be 2 seconds, and the frame image extraction time interval n can be determined to be 2 seconds. It is understandable that if the actual area reflected in the video is larger, the value of deltaL can be set accordingly larger, for example, around 6 meters. Then, the value of n calculated according to n = deltaL / v would be approximately 6 seconds, which would be close to the value calculated in the previous embodiment. Therefore, both calculation methods can achieve the purpose of calculating the frame image extraction time interval n, and different preset values can be set according to specific circumstances.
[0101] like Figure 1 As shown, this image is a frame extracted from a surveillance video taken by a camera from a specific angle within a built environment, capturing a snapshot at a particular moment. This image extraction process can be repeated periodically, for example, every 2 seconds. The extracted frames collectively form the pedestrian recognition database. It should be noted that images within the same pedestrian recognition database should be taken by cameras from the same angle to ensure a consistent result after subsequent coordinate transformation.
[0102] Specifically, step S2 above includes the following sub-steps:
[0103] S21: Construct the image pixel coordinate system in the frame image; for example, the top left corner of the frame image can be used as the origin, each pixel can be 1 unit, the horizontal direction to the right can be the x-axis, the further to the right the x-axis pixel value is, the higher the x-axis pixel value is, and the vertical direction downward can be the y-axis, the further down the y-axis pixel value is, the higher the y-axis pixel value is.
[0104] S22: Use a pedestrian detection model to detect people in the frame image and outline them with a bounding box. Take the midpoint of the bottom edge of the bounding box as the point where the pedestrian's feet are located, and obtain the image coordinates of this point in the image pixel coordinate system, thereby obtaining the image coordinates (xm, ym) of the pedestrian based on the image pixel coordinate system.
[0105] S23: Repeat step S22 above until the image coordinates of pedestrians in all images in the pedestrian recognition material library are obtained;
[0106] S24: Check the detection results of the pedestrian detection model to determine if there are any omissions in pedestrian recognition. If there are pedestrians in some frames that have not been recognized, manually record and supplement them, and feed them back to the pedestrian detection model for further learning and algorithm optimization. If there are no omissions, you can directly proceed to the next step.
[0107] like Figure 2 As shown, the image frame has constructed an image pixel coordinate system, where the upper left corner of the image frame is the origin, each pixel is 1 unit, the horizontal direction to the right is the x-axis, the further to the right the x-axis pixel value is, the higher the pixel value is, and the vertical direction downward is the y-axis, the further down the y-axis pixel value is, the higher the pixel value is.
[0108] like Figure 3 As shown, in this frame image, people have been detected by the pedestrian detection model and enclosed in a bounding box. The midpoint of the bottom edge of each bounding box is taken as the location of the pedestrian's foot, and the image coordinates of this point in the image pixel coordinate system are obtained. Thus, the image coordinates (xm, ym) of all pedestrians in this frame image based on the image pixel coordinate system can be obtained. A total of 3 pedestrians were detected in this image, and their image coordinates (xm1, ym1), (xm2, ym2), and (xm3, ym3) were obtained respectively.
[0109] Specifically, the pedestrian detection models mentioned above employ algorithms based on Region Proposal R-CNN or YOLO and SSD algorithms. Region Proposal R-CNN algorithms include R-CNN, Fast R-CNN, and Faster R-CNN, which first generate candidate bounding boxes (target locations) and then classify and regress these boxes. Algorithms like YOLO and SSD use only a single convolutional neural network (CNN) to directly predict the category and location of different targets. Pedestrian detection is a component of object detection and an application of object detection algorithms; the specific implementation of these algorithms falls within the scope of existing technology and will not be elaborated upon here.
[0110] Specifically, step S3 above includes the following sub-steps:
[0111] S31: On the floor plan of the built environment reflected in the surveillance video, determine an origin point and establish a real-world Cartesian coordinate system using a specific unit of measurement; the result is as follows: Figure 3 As shown.
[0112] S32: If there are videos from multiple adjacent locations, a unified origin must be selected, and a Cartesian coordinate system must be established using a specific unit of measurement. The floor plans reflected in other videos must be mapped to this coordinate system; a separate coordinate system is not required. The result is as follows: Figure 4 As shown.
[0113] Specifically, step S4 above includes the following sub-steps:
[0114] S41: Find an image with minimal pedestrian occlusion in the pedestrian recognition material library, select at least four easily identifiable spatial reference points on the same horizontal plane in the image, and ensure that no three of the four spatial reference points are collinear, and record the image coordinates (xi, yi) of all spatial reference points based on the image pixel coordinate system in sequence; the reference points can be selected from locations such as wall corners, bottom of doors, road corners, bottom of streetlights, etc.
[0115] S42: Find the planar position points corresponding to the spatial reference points marked in the pedestrian recognition material library, and record the actual coordinates (xj, yj) of all planar position points in the actual plane rectangular coordinate system in sequence;
[0116] S43: Using the image coordinates and actual coordinates of the four or more sets of spatial reference points obtained in step S42 as known conditions, the mapping relationship between the image coordinates and actual coordinates of the built environment is obtained by solving the homography matrix H.
[0117] like Figure 6 As shown, in step S41, four easily identifiable spatial reference points on the same horizontal plane are selected in the image. There is no case of three points being collinear among these four spatial reference points. The image coordinates (xi1,yi1), (xi2,yi2), (xi3,yi3), and (xi4,yi4) of all spatial reference points based on the image pixel coordinate system are recorded in sequence.
[0118] like Figure 7 As shown, in step S42, the planar position points corresponding to the spatial reference points marked in the pedestrian recognition material library are found, and the actual coordinates (xj1, yj1), (xj2, yj2), (xj3, yj3), and (xj4, yj4) of all planar position points in the actual plane rectangular coordinate system are recorded in sequence. Thus, the image coordinates (xi, yi) and the actual coordinates (xj, yj) of four or more sets of spatial reference points can be obtained.
[0119] Subsequently, in step S43, the image coordinates and actual coordinates of the four or more sets of spatial reference points obtained in step S42 can be used as known conditions and substituted into the homography matrix H for solution. The homography matrix H (Homography) constrains the 2D homogeneous coordinates of the same 3D spatial point in two pixel planes. In practice, the `findhomography` function in OpenCV can be used to solve the matrix. By inputting the image coordinates and actual coordinates of the four or more sets of spatial reference points from S42 as parameters, the mapping relationship between image coordinates and actual coordinates at that viewpoint can be determined. That is, all points in the pedestrian recognition database located on this horizontal plane can have their actual coordinates determined in the actual Cartesian coordinate system.
[0120] The specific solution process for the homography matrix H in step S43 is as follows:
[0121] S431: Let the image coordinates of the spatial reference point be q2 and the actual coordinates be q1. The image coordinates and actual coordinates of the spatial reference point have the following coordinate correspondence: Where H represents the homography matrix;
[0122] S432: Let the coordinates of q1 be (u1, v1, 1) and the coordinates of q2 be (u2, v2, 1). Expanding the expression in S341, we get:
[0123]
[0124] S433: By normalizing using H33=1, we can obtain:
[0125]
[0126] S434: Eight equations are provided using the image coordinates and actual coordinates of the four spatial reference points:
[0127]
[0128] S435: Solve the system of linear equations in S434 to obtain the homography matrix H.
[0129] Specifically, step S5 in the method disclosed in this invention includes the following sub-steps:
[0130] S51: Substitute the pedestrian image coordinates obtained in step S2 into the homography matrix H obtained in step S3 to obtain the actual coordinates of all pedestrians detected in the pedestrian recognition material library.
[0131] S52: Output the actual coordinates (xn, yn) of all pedestrians and generate a text document containing the coordinate information of all (xn, yn). The text document is as follows: Figure 8 As shown.
[0132] Specifically, step S6 in the method disclosed in this invention includes the following sub-steps:
[0133] S61: Import the CAD file of the plan view of the built environment into ArcGIS, and adjust the coordinates of the plan view in ArcGIS according to the actual Cartesian coordinate system established in step S32 to ensure that the coordinates in ArcGIS are consistent with the established actual Cartesian coordinate system.
[0134] S62: Combine the text files obtained in step S4 into a table, modify the coordinates in the form of (xn, yn) into two columns of xm and ym by splitting the data into columns, indicate the horizontal (x) and vertical (y) coordinates in the table header, and save it as a csv or xls file;
[0135] S63: Import the table into ArcGIS, and use the xy-to-line tool to set the x-field as the x-axis and the y-field as the y-axis (e.g., ...). Figure 9 As shown), point features are generated at the corresponding locations on the plan view. Each point represents the position of a pedestrian in the pedestrian recognition material library on that plan view (e.g., ...). Figure 10 (As shown).
[0136] Specifically, step S7 in the method disclosed in this invention includes the following sub-steps:
[0137] S71: The kernel function is used to calculate the value of each unit area based on the points representing the pedestrian positions to fit each point to a smooth conical surface, and the calculation range is set to the range of the planar map to calculate the kernel density value.
[0138] S72: Generate a kernel density map with varying color intensity based on the kernel density values calculated in step S61, thereby forming a pedestrian distribution heat map. The darker the color, the higher the pedestrian density within that spatial range.
[0139] S73: Dynamically display pedestrian distribution heatmaps to achieve better visual effects. For example... Figure 11 As shown.
[0140] The method for calculating the kernel density value in step S71 is as follows:
[0141]
[0142] For dist i <radius
[0143] in:
[0144] i = 1, ..., n are input points. If they are within a radius of the (x, y) position, only points in the sum are included.
[0145] popi is the value of the population field at point i, and it is an optional parameter.
[0146] disti is the distance between point i and the position (x,y).
[0147] The calculated density is then multiplied by the number of points, or by the sum of the population field (if any). This correction makes the spatial quota equal to the number of points (or the sum or the population field), rather than always equal to 1. This implementation uses a quadruple kernel (Silverman, 1986). The formula needs to be calculated separately for each location where the density is to be estimated. Since a raster is being created, the calculation will be applied to the center of each cell in the output raster.
[0148] The present invention can achieve the following technical effects:
[0149] 1. By using artificial intelligence algorithms to analyze the images captured by surveillance video frame by frame and identify pedestrians, they are marked as rectangles. Then, the midpoint of the bottom edge of the pedestrian rectangle is used to represent the coordinates of the pedestrian, thus realizing the localization of pedestrians in a simple, low-cost and easy-to-implement way.
[0150] 2. By establishing a mapping relationship between the image pixel coordinates of spatial reference points and the Cartesian coordinates of the plane, pedestrian information appearing in the video can be transformed into a pedestrian heat map on the floor plan and displayed dynamically. By overlaying the building floor plan and the heat map, information such as which spaces pedestrians gather in, which spaces are used more frequently, and the flow and tidal changes of pedestrians can be displayed intuitively. This helps urban designers, planners and architects to better discover the interaction patterns between space and user behavior, and to evaluate the usage characteristics of public spaces, especially densely populated places, so as to provide ideas for optimizing spatial layout and improving site design.
[0151] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions, and variations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims
1. A method for spatial positioning and visualization of built environment based on video imagery, characterized in that: The method includes the following implementation steps: S1: Obtain surveillance video of the built environment from a certain perspective within a certain time period and construct a pedestrian recognition material library; S2: Perform pedestrian recognition and detection on the pedestrian recognition material library, and obtain the image coordinates of all pedestrians in the pedestrian recognition material library; S3: Obtain the actual floor plan of the built environment corresponding to the video, and construct the actual rectangular coordinate system of the built environment; S4: Establish the mapping relationship between the image coordinates of the pedestrian in the surveillance video and the actual coordinates in the plan view of the built environment; S5: Obtain the actual coordinates of all pedestrians in the pedestrian recognition material library; S6: Based on the actual coordinates of the pedestrians, represent all pedestrians as points on the plan view of the built environment; S7: Draw a heat map of pedestrian distribution in the built environment and display it dynamically; Step S1 includes the following sub-steps: S11: Acquire the surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n×(T-1)×v≤ T is the expected number of times the trajectory of the same pedestrian is included in the pedestrian recognition material library, v is the pedestrian's walking speed, and s is the area of the actual area reflected by the video; or acquire the surveillance video and determine the frame image extraction time interval n based on the video range, where the determination of n satisfies the following constraint: n=deltaL / v, deltaL is the expected distance of the change in the landing point position of the same pedestrian's trajectory in adjacent frames, and v is the pedestrian's walking speed; S12: Extract frame images from the surveillance video according to the time interval, and the collection of frame images forms the pedestrian recognition material library; wherein, the method of extracting frame images is to process the video using the VideoCapture class in OpenCV, and save the extracted frame images as image files or directly use them as input values for the next step of pedestrian detection; Step S3 includes the following sub-steps: S31: Determine an origin point on the floor plan of the built environment reflected in the surveillance video, and establish an actual plane rectangular coordinate system by selecting a certain unit of size; S32: For videos from multiple adjacent locations, select a unified original origin and establish an actual Cartesian coordinate system using a specific unit of measurement. The planar views reflected in other surveillance videos can be mapped to this coordinate system without the need to establish a separate coordinate system. Step S4 includes the following sub-steps: S41: Find an image with minimal pedestrian occlusion in the pedestrian recognition material library, select at least four easily identifiable spatial reference points on the same horizontal plane in the image, and ensure that no three of the four spatial reference points are collinear, and record the image coordinates (xi, yi) of all spatial reference points based on the image pixel coordinate system in sequence; S42: Locate the planar position points corresponding to the spatial reference points marked in the pedestrian recognition material library, and record the actual coordinates (xj, yj) of all planar position points in the actual plane rectangular coordinate system in sequence; S43: Using the image coordinates and actual coordinates of the four or more sets of spatial reference points obtained in step S42 as known conditions, the mapping relationship between the image coordinates and actual coordinates of the built environment is obtained by solving the homography matrix H. The specific solution process for the homography matrix H in step S43 is as follows: S431: Let the image coordinates of the spatial reference point be q2 and the actual coordinates be q1. The image coordinates and actual coordinates of the spatial reference point have the following coordinate correspondence: , where H represents the homography matrix; S432: Let the coordinates of q1 be (u1, v1, 1) and the coordinates of q2 be (u2, v2, 1). Expanding the expression in S431, we get: S433: By normalizing using H33=1, we can obtain: S434: Eight equations are provided using the image coordinates and actual coordinates of the four spatial reference points: S435: Solve the system of linear equations in S434 to obtain the homography matrix H; The videos monitored in adjacent locations are processed by repeating steps S41 to S43 to obtain the corresponding homography matrix H.
2. The method according to claim 1, characterized in that: Step S2 includes the following sub-steps: S21: Construct the image pixel coordinate system in the frame image; S22: Use a pedestrian detection model to detect people in the frame image and outline them with a rectangle. Take the midpoint of the bottom edge of the rectangle as the point where the pedestrian's feet are located, and obtain the image coordinates of this point in the image pixel coordinate system, thereby obtaining the image coordinates (xm, ym) of the pedestrian based on the image pixel coordinate system. S23: Repeat step S22 above until the image coordinates of pedestrians in all images in the pedestrian recognition material library are obtained; S24: Check the detection results of the pedestrian detection model to determine if there are any omissions in pedestrian recognition. If there are pedestrians in some frames that have not been recognized, manually record and supplement them, and feed them back to the pedestrian detection model for further learning and algorithm optimization. If there are no omissions, you can directly proceed to the next step.
3. The method according to claim 2, characterized in that: The pedestrian detection model uses an algorithm based on Region Proposal R-CNN or YOLO / SSD algorithms. The R-CNN algorithm includes R-CNN, Fast R-CNN, or Faster R-CNN. The algorithm first generates pedestrian target candidate boxes, and then classifies and regresses the pedestrian target candidate boxes. The YOLO / SSD algorithm uses a convolutional neural network (CNN) to directly predict the category and location of different pedestrian targets.
4. The method according to claim 1, characterized in that: Step S5 includes the following sub-steps: S51: Substitute the pedestrian image coordinates obtained in step S2 into the homography matrix H obtained in step S4 to obtain the actual coordinates of all pedestrians detected in the pedestrian recognition material library; S52: Output the actual coordinates (xn, yn) of all pedestrians and generate a text document containing the coordinate information of all (xn, yn).
5. The method according to claim 4, characterized in that: Step S6 includes the following sub-steps: S61: Import the CAD file of the plan view of the built environment into ArcGIS, and adjust the coordinates of the plan view in ArcGIS according to the actual Cartesian coordinate system established in step S32 to ensure that the coordinates in ArcGIS are consistent with the established actual Cartesian coordinate system. S62: Combine the text files obtained in step S5 into a table, modify the coordinates in the form of (xn, yn) into two columns of xm and ym by splitting the data into columns, indicate the horizontal (x) and vertical (y) coordinates in the table header, and save it as a csv or xls file; S63: Import the table into ArcGIS, use the xy-to-line tool to convert the x field in the table to the x-axis and the y field to the y-axis, and generate point features at the corresponding positions on the plan view. Each point represents the position of a pedestrian in the pedestrian recognition material library on the plan view.
6. The method according to claim 5, characterized in that: Step S7 includes the following sub-steps: S71: The kernel function is used to calculate the value of each unit area based on the points representing the pedestrian positions to fit each point to a smooth conical surface, and the calculation range is set to the range of the planar map to calculate the kernel density value. S72: Generate a kernel density map with varying color depth based on the kernel density value calculated in step S71, thereby forming a pedestrian distribution heat map. The darker the color, the higher the pedestrian density within the spatial range corresponding to the area of the map. S73: Dynamically display pedestrian distribution heatmaps to achieve better visual effects.
7. The method according to claim 6, characterized in that: The method for calculating the kernel density value in step S71 is as follows: in: i = 1,…,n, are the input points; if they are within the radius of the (x,y) position, only the points in the sum are included; pop i This is the value of the population field at point i, and is an optional parameter; dist i It is the distance between point i and the position (x,y).