Gate passing intention recognition method and device, electronic equipment and storage medium

By combining spatiotemporal matching and behavioral feature analysis of wireless positioning and visual monitoring data, the problem of misjudgment in complex scenarios of the contactless gate system was solved, and highly accurate gate intention recognition and control were achieved.

CN121963345BActive Publication Date: 2026-06-26深圳市深圳通有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
深圳市深圳通有限公司
Filing Date
2026-04-01
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing contactless gate systems cannot accurately identify the true intentions of pedestrians in complex scenarios, leading to problems such as incorrect charges and gates opening erroneously.

Method used

By acquiring wireless positioning data and visual monitoring data within the target area, spatiotemporal matching is performed to extract visual behavioral features. Combined with body orientation angle, walking trajectory, and dwell time, the predicted value of the intention to pass through the gate is determined, and corresponding gate control operations are executed.

Benefits of technology

It improves the accuracy of gate intention recognition, effectively avoids gates being opened incorrectly and charges being deducted incorrectly, and enhances the system's recognition capabilities in complex crowd scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121963345B_ABST
    Figure CN121963345B_ABST
Patent Text Reader

Abstract

The application provides a gate passing intention recognition method and a related device thereof, and belongs to the field of data processing, and is applied to a gate control device. The method comprises the following steps: acquiring wireless positioning position data and visual monitoring position data in a target area; performing space-time matching on the wireless positioning position data and the visual monitoring position data, and if the matching is successful, it is determined that the wireless positioning position data and the visual monitoring position data belong to the same target object; acquiring visual behavior characteristics of the target object; determining a gate passing intention prediction value of the target object according to the visual behavior characteristics; and determining and executing corresponding gate control operations based on the gate passing intention prediction value. The application can improve the accuracy of gate passing intention recognition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing, and in particular to a method and apparatus for identifying gate passage intentions, an electronic device, and a storage medium. Background Technology

[0002] Currently, contactless turnstiles have become a new application hotspot in areas such as subway turnstiles, building and park access control. When users carry handheld sensing devices to pass through turnstiles, they no longer need to actively present a pass certificate to swipe a card or scan a code; authentication and door opening are automatically completed, greatly improving passage efficiency and experience.

[0003] In related technologies, a single spatial distance sensor is typically used as the trigger condition. That is, when a pedestrian enters a specific area directly in front of the turnstile, it is assumed that the pedestrian intends to pass through the turnstile, and the payment and door opening instructions are executed. This makes the system prone to misjudgment in complex scenarios, such as when a pedestrian is simply crossing the passage in front of the turnstile, pacing back and forth in front of the turnstile, or only intending to stop briefly in front of the turnstile. This single distance-based judgment mechanism cannot accurately identify the pedestrian's true behavioral intention, thus frequently leading to problems such as incorrect payment and incorrect turnstile opening. Summary of the Invention

[0004] The main objective of this application is to provide a method, device, electronic device, and storage medium for identifying gate passage intentions, which can improve the accuracy of gate passage intention identification.

[0005] To achieve the above objectives, a first aspect of this application proposes a method for identifying gate passage intentions, applied to gate control equipment, the method comprising:

[0006] Acquire wireless positioning data and visual monitoring location data within the target area;

[0007] The wireless positioning data and the visual monitoring data are spatiotemporally matched. If the match is successful, it is determined that the wireless positioning data and the visual monitoring data belong to the same target object.

[0008] Obtain the visual behavioral features of the target object;

[0009] Based on the visual behavioral characteristics, determine the predicted value of the target object's intention to pass through the gate;

[0010] Based on the predicted value of the gate passage intention, the corresponding gate control operation is determined and executed.

[0011] In some embodiments, the spatiotemporal matching of the wireless positioning location data and the visual monitoring location data includes:

[0012] Obtain the spatial mapping relationship between the positioning module coordinate system and the visual coordinate system;

[0013] Extract the wireless positioning location data and the visual monitoring location data within the same time window;

[0014] Based on the spatial mapping relationship, the wireless positioning data is converted to the visual coordinate system to obtain the corresponding converted positioning coordinates;

[0015] Calculate the spatial distance between the transformed positioning coordinates and the visual coordinates indicated by the visual monitoring location data;

[0016] If the spatial distance is less than a preset distance threshold, the match is considered successful.

[0017] In some embodiments, before extracting the wireless positioning location data and the visual monitoring location data within the same time window, the method further includes:

[0018] The basic system times of the wireless positioning module that collects the wireless positioning location data and the camera device that generates the visual monitoring location data are respectively acquired;

[0019] Using a preset time base as a reference, calculate the time offset between the wireless positioning module and the camera device respectively;

[0020] The timestamps of the collected wireless positioning data and visual monitoring data are aligned using the time offset.

[0021] In some embodiments, obtaining the visual behavioral features of the target object includes:

[0022] Extract the skeletal key points and bounding box coordinates belonging to the target object from the visual monitoring screen;

[0023] Based on the distribution of the skeletal key points, calculate the body orientation angle of the target object relative to the gate;

[0024] Based on the displacement change of the center point of the bounding box coordinates in continuous video frames, the trajectory of the target object is determined, and the dwell time of the target object within a preset range from the gate is calculated.

[0025] The body orientation angle, the movement trajectory, and the dwell time are combined as the visual behavioral features.

[0026] In some embodiments, determining the predicted value of the target object's gate-crossing intention based on the visual behavioral characteristics includes:

[0027] If the angle between the body's orientation and the gate's reference direction is less than a preset angle, and the travel trajectory is a straight line trajectory toward the gate, then the predicted value of the intention to pass through the gate is determined to be the first prediction level.

[0028] If the travel trajectory is across the front of the gate and the dwell time reaches the set first duration threshold, then the predicted value of the intention to pass through the gate is determined to be the second prediction level.

[0029] If the travel trajectory is to cross in front of the gate or repeatedly turn back, and the dwell time does not reach the first duration threshold, then the gate passage intention prediction value is determined to be the third prediction level; wherein, the first prediction level and the second prediction level represent the intention to allow passage, and the third prediction level represents the intention to refuse passage.

[0030] In some embodiments, determining and executing the corresponding gate control operation based on the predicted gate passage intention value includes:

[0031] When the predicted value of the intention to pass through the gate is the first prediction level or the second prediction level, the gate control operation is determined to be a release operation, an opening command is generated and sent, and authorization data allowing seamless passage is sent to the mobile terminal held by the target object.

[0032] When the predicted value of the intention to pass through the gate is the third prediction level, the gate control operation is determined to be an interception operation, the gate is kept closed, and an instruction data to refuse contactless passage is sent to the mobile terminal.

[0033] In some embodiments, after determining that the wireless positioning location data and the visual monitoring location data belong to the same target object, the method further includes:

[0034] Extract the appearance and kinematic features of the target object in the current video frame;

[0035] Based on the appearance and kinematic features, predict the candidate bounding box of the target object in the next video frame;

[0036] Based on the correlation and matching between the candidate bounding boxes and the actual detection results, the visual behavioral features of the target object are continuously updated in consecutive video frames until the target object is identified as passing through the gate channel.

[0037] To achieve the above objectives, a second aspect of this application provides a gate passage intention recognition device, applied to gate control equipment, the device comprising:

[0038] The first acquisition module is used to acquire wireless positioning location data and visual monitoring location data within the target area;

[0039] The spatiotemporal matching module is used to perform spatiotemporal matching between the wireless positioning location data and the visual monitoring location data. If the matching is successful, it is determined that the wireless positioning location data and the visual monitoring location data belong to the same target object.

[0040] The second acquisition module is used to acquire the visual behavioral features of the target object;

[0041] The determination module is used to determine the predicted value of the target object's gate passage intention based on the visual behavioral characteristics;

[0042] The execution module is used to determine and execute the corresponding gate control operation based on the predicted gate intention value.

[0043] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect of the embodiment.

[0044] To achieve the above objectives, a fourth aspect of the present application provides a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect of the present application.

[0045] The gate passage intention recognition method, device, electronic equipment, and storage medium proposed in this application have the following beneficial effects: First, this method acquires wireless positioning location data and visual monitoring location data within the target area, and performs spatiotemporal matching between the two to determine that they belong to the same target object. This accurately associates high-precision wireless spatial signals with physical visual targets, providing reliable multimodal data support for subsequent intention analysis. Second, it acquires the visual behavioral characteristics of the target object. This step can directly capture the dynamic posture and trajectory changes of pedestrians from the monitoring screen, making up for the deficiency that simple distance sensing cannot detect the real actions of pedestrians. At the same time, it determines the gate passage intention prediction value of the target object based on the visual behavioral characteristics. This step realizes the refined intention classification and quantification of complex pedestrian flow scenarios such as crossing, turning back, and short-term stopping, effectively filtering out a large number of unintentional gate passage interference behaviors. Finally, it determines and executes the corresponding gate control operation based on the gate passage intention prediction value, ensuring that the physical action of the gate matches the pedestrian's real passage intention, effectively avoiding gate erroneous opening and erroneous deduction. In summary, this invention effectively overcomes the logical defects of traditional solutions that rely on single spatial distance triggers by fusing heterogeneous sensor data for spatiotemporal target anchoring and combining multidimensional visual behavioral semantics for quantitative decision-making on intent, thereby improving the accuracy of gate passage intent recognition. Attached Figure Description

[0046] Figure 1 This is a flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0047] Figure 2 This is another flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0048] Figure 3 This is another flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0049] Figure 4 This is another flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0050] Figure 5 This is another flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0051] Figure 6 This is another flowchart of the gate crossing intention recognition method provided in the embodiments of this application;

[0052] Figure 7 This is a schematic diagram of the gate system structure provided in the embodiments of this application;

[0053] Figure 8 This is a schematic diagram of the gate passage intention recognition device provided in the embodiments of this application;

[0054] Figure 9 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0055] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0056] It should be noted that although functional modules are divided in the device schematic diagram and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart.

[0057] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0058] Currently, contactless turnstiles have become a new application hotspot in areas such as subway turnstiles, building and park access control. When users carry handheld sensing devices to pass through turnstiles, they no longer need to actively present a pass certificate to swipe a card or scan a code; authentication and door opening are automatically completed, greatly improving passage efficiency and experience.

[0059] In related technologies, a single spatial distance sensor is typically used as the trigger condition. That is, when a pedestrian enters a specific area directly in front of the turnstile, it is assumed that the pedestrian intends to pass through the turnstile, and the payment and door opening instructions are executed. This makes the system prone to misjudgment in complex scenarios, such as when a pedestrian is simply crossing the passage in front of the turnstile, pacing back and forth in front of the turnstile, or only intending to stop briefly in front of the turnstile. This single distance-based judgment mechanism cannot accurately identify the pedestrian's true behavioral intention, thus frequently leading to problems such as incorrect payment and incorrect turnstile opening.

[0060] Based on this, embodiments of this application provide a method for identifying gate passage intentions and related equipment, which can improve the accuracy of gate passage intention identification.

[0061] The gate crossing intention recognition method and related equipment provided in this application are specifically described through the following embodiments. First, the gate crossing intention recognition method in this application embodiment is described.

[0062] The gate crossing intent recognition method in this application embodiment can be illustrated through the following embodiments.

[0063] It should be noted that in all specific embodiments of this application, when processing data related to user identity or characteristics, such as user information, user behavior data, user historical data, and user location information, user permission or consent is obtained first. For example, when obtaining user-stored data and user cached data access requests, user permission or consent is obtained first. Furthermore, the collection, use, and processing of this data comply with relevant laws, regulations, and standards of the relevant countries and regions. In addition, when embodiments of this application need to obtain sensitive personal information of users, separate permission or consent from the user is obtained through pop-ups or redirection to a confirmation page. Only after obtaining the user's separate permission or consent is the necessary user-related data for the normal operation of the embodiments of this application obtained.

[0064] Figure 1 This is an optional flowchart of the gate crossing intent recognition method provided in the embodiments of this application. Figure 1 The method may include, but is not limited to, steps 101 to 105. It is also understood that this embodiment... Figure 1The order of steps 101 to 105 is not specifically limited. The order of steps can be adjusted or some steps can be reduced or added according to actual needs.

[0065] Step 101: Obtain wireless positioning data and visual monitoring location data within the target area.

[0066] Step 102: Perform spatiotemporal matching between the wireless positioning data and the visual monitoring data. If the matching is successful, it is determined that the wireless positioning data and the visual monitoring data belong to the same target object.

[0067] Step 103: Obtain the visual behavioral features of the target object.

[0068] Step 104: Determine the predicted value of the target object's intention to pass through the gate based on visual behavioral characteristics.

[0069] Step 105: Determine and execute the corresponding gate control operation based on the predicted gate intention value.

[0070] In step 101 of some embodiments, the target area typically refers to a specific physical space covered by the front end of the contactless access gate. Within this area, multimodal data can be collected separately by independently deployed different types of sensing hardware. The wireless positioning data refers to the physical coordinates of the handheld sensing terminal carried by the pedestrian, calculated after wireless signal communication between the wireless positioning module and the handheld sensing terminal. According to the mathematical parameter definition, in the coordinate system of the positioning module (e.g., coordinate system A), the wireless positioning data can be represented by an identifier ID1 and three-dimensional position coordinates. This is represented by [the identifier]. Meanwhile, visual monitoring location data refers to the physical location information extracted by camera equipment using machine vision technology to detect human targets within its field of view in real time. In the camera equipment's coordinate system (e.g., coordinate system B), this visual monitoring location data can be represented by another identifier ID2 and the corresponding three-dimensional position coordinates. This is used to represent the two modalities. In the initial data acquisition phase, the location data for these two modalities were collected independently by the heterogeneous systems and were not directly related at the time.

[0071] In step 102 of some embodiments, the wireless positioning data and visual monitoring location data are spatiotemporally matched. If the match is successful, it is determined that the wireless positioning data and visual monitoring location data belong to the same target object. Since the two types of location data obtained above originate from independent heterogeneous systems, each with its own independent time reference and spatial reference system, the specific technical operation of spatiotemporal matching refers to cross-validating the degree of overlap between the two in physical space at similar time points. In a specific mathematical logical description, when at a given time point... Nearby wireless location coordinates obtained visual position coordinates At that time, the spatial distance between the two points is calculated and evaluated under a unified spatial reference system. When the spatial distance error between the two coordinate points is less than the set allowable range, the spatiotemporal matching is considered successful. Through this matching verification mechanism, the virtual wireless signal source can be logically bound to the physically existing visual entity, thereby confirming that the acquired wireless positioning data and visual monitoring location data actually belong to the same target object, solving the problem of data fragmentation between heterogeneous sensors.

[0072] Please see Figure 2 In some embodiments, step 102 may include, but is not limited to, steps 201 to 205.

[0073] Step 201: Obtain the spatial mapping relationship between the positioning module coordinate system and the visual coordinate system.

[0074] Step 202: Extract wireless positioning location data and visual monitoring location data within the same time window.

[0075] Step 203: Based on the spatial mapping relationship, the wireless positioning data is converted to the visual coordinate system to obtain the corresponding converted positioning coordinates.

[0076] Step 204: Calculate the spatial distance between the converted positioning coordinates and the visual coordinates indicated by the visual monitoring location data.

[0077] Step 205: If the spatial distance is less than the preset distance threshold, the match is considered successful.

[0078] In step 201 of some embodiments, the spatial mapping relationship between the positioning module coordinate system and the visual coordinate system is obtained. Specifically, since the wireless positioning module responsible for collecting wireless positioning location data and the camera device responsible for collecting visual monitoring location data are two completely different heterogeneous hardware, and their physical installation positions and orientation angles at the contactless access gate are different, the underlying three-dimensional reference spaces constructed by each are independent of each other. The positioning module coordinate system refers to the three-dimensional spatial coordinate system established with the sensor center or designated position of the wireless positioning module as the origin (e.g., labeled as coordinate system A); while the visual coordinate system refers to the independent three-dimensional spatial coordinate system established with the optical center of the camera device as the origin (e.g., labeled as coordinate system B). The spatial mapping relationship is essentially a mathematical transformation model reflecting the relative position and attitude differences between these two independent coordinate systems, usually composed of translation vectors and rotation matrices. During factory calibration or on-site installation and debugging, this mapping relationship is calculated and stored through a preset algorithm, thereby providing a basic mathematical transformation basis for subsequent cross-modal spatial alignment.

[0079] In step 202 of some embodiments, the same time window refers to an extremely short time period used to control observation timing errors. Since pedestrians crossing the gate are dynamic targets in constant motion, comparing wireless and visual coordinates at significantly different times will result in large comparison errors due to the pedestrian's displacement. Therefore, it is necessary to extract observation data from the continuous data stream that are close to a specific time point (e.g., time point t1 and its allowable time error range). Within this time window, the three-dimensional wireless positioning location data detected by the wireless positioning module, bearing the first identifier ID1, is extracted, and its coordinates are recorded as follows: Simultaneously, the 3D visual monitoring location data with the second identifier ID2 detected by the camera equipment is extracted, and its coordinates are recorded as follows: This step ensures that the data to be spatially compared later are on the same time plane, eliminating temporal misalignment interference caused by the dynamic movement of the target.

[0080] Please see Figure 3 In some embodiments, step 202 may include, but is not limited to, steps 301 to 303.

[0081] Step 301: Obtain the basic system time of the wireless positioning module that collects wireless positioning location data and the camera device that generates visual monitoring location data, respectively.

[0082] Step 302: Using a preset time base as a reference, calculate the time offset between the wireless positioning module and the camera device respectively.

[0083] Step 303: Use time offset to align the timestamps of the collected wireless positioning data and visual monitoring data.

[0084] In step 301 of some embodiments, the base system time of the wireless positioning module that collects wireless positioning location data and the camera device that generates visual monitoring location data are obtained respectively. Specifically, the wireless positioning module and the camera device, as independent hardware modules, typically maintain their own internal local clock system. In actual operation, due to factors such as different device power-on times, slight differences in hardware crystal oscillator frequencies, and network environment, the current base system time of these two hardware devices often cannot be kept absolutely consistent, resulting in a certain time deviation. In order to accurately compare data at the same physical moment in subsequent data processing, it is first necessary to read the current raw time values ​​of the wireless positioning module and the camera device respectively. This step is a prerequisite for time synchronization, as it determines the current real clock state of each heterogeneous sensor.

[0085] In step 302 of some embodiments, the time offset between the wireless positioning module and the camera device is calculated with reference to a preset time base. The preset time base refers to a pre-defined, unified standard reference time, such as the local absolute time maintained by the control host or the standard network time obtained through an NTP (Network Time Protocol) server. After reading the basic system time of each hardware component, the difference between the system time of the wireless positioning module and the preset time base is calculated to obtain the time offset of the wireless positioning module (e.g., denoted as a mathematical parameter). Similarly, the system calculates the difference between the system time of the camera device and the preset time base to obtain the time offset of the camera device (for example, denoted as a mathematical parameter). These mathematical operations accurately quantified the degree to which each independent device was ahead or behind the unified standard time, providing a numerical basis for subsequent data time compensation.

[0086] In step 303 of some embodiments, the timestamps of the collected wireless positioning data and visual monitoring location data are aligned using a time offset. In actual data acquisition, both the wireless positioning module and the camera device append an original timestamp based on their own underlying system time when generating their respective location data. Directly comparing these original timestamps would lead to a misalignment in the time dimension. Therefore, the time offset obtained in step 302 is used... and The original timestamps of the collected wireless positioning data and visual monitoring data are compensated by addition and subtraction, and all are converted and unified to the aforementioned preset time base. After mathematical compensation, the time tags of the two types of data are unified on the same time axis, which allows for accurate filtering of the two sets of data generated at the same physical moment in the real world.

[0087] Through steps 301 to 303 described above, this embodiment of the application obtains the basic system time of the device, calculates its offset relative to a unified time reference, and uses this offset to numerically compensate for the timestamps of the underlying data, effectively eliminating timing errors caused by clock asynchrony between different hardware. This timestamp alignment mechanism ensures that the multimodal location data subsequently involved in spatial mapping and distance calculation are synchronized in the time dimension, effectively eliminating timing error interference caused by pedestrian dynamic walking, thereby ensuring the accuracy of multimodal data matching and intent recognition.

[0088] In step 203 of some embodiments, the wireless positioning data is transformed to a visual coordinate system based on the spatial mapping relationship to obtain the corresponding transformed positioning coordinates. After acquiring the original data that are in the same time dimension and located in two independent spatial coordinate systems, the spatial mapping relationship matrix constructed in step 201 is called to process the extracted three-dimensional wireless positioning coordinates. Perform matrix transformation operations. Through this mathematical three-dimensional spatial projection and conversion, the system accurately maps the wireless signal location point, originally belonging to the positioning module's coordinate system A, to the camera device's visual coordinate system B, thereby generating the transformed positioning coordinates of the wireless signal in the visual coordinate system. The core purpose of this step is to unify the dimensions and reference system, so that the wireless virtual coordinates and physical visual coordinates, which were originally unable to be directly compared under different spatial coordinate systems, are placed in the same mathematical space, thus eliminating the spatial dimension barrier for calculating the physical deviation between the two.

[0089] In step 204 of some embodiments, the spatial distance between the transformed positioning coordinates and the visual coordinates indicated by the visual monitoring position data is calculated. After unifying the three-dimensional coordinate system, the visual position coordinates with ID2 identifiers are extracted. And using the spatial distance formula, the coordinates of the visual position are calculated. The transformed positioning coordinates derived in step 203 The calculated spatial distance is the absolute three-dimensional physical distance between the two points. This calculated spatial distance intuitively reflects the degree of deviation between the pedestrian's physical location captured by the camera and the mobile terminal's location sensed by the wireless positioning module in real three-dimensional space. If the pedestrian happens to be the target user holding the mobile terminal, then in an ideal and error-free environment, these two coordinate points should be highly coincident in the same visual coordinate system.

[0090] In step 205 of some embodiments, if the spatial distance is less than a preset distance threshold, a successful match is determined. The preset distance threshold is a pre-set tolerance value, typically set based on the positioning accuracy error of the wireless positioning module itself and the visual error of the camera's depth measurement. In practical applications, due to the detection noise of the hardware device, even if the same pedestrian is carrying the terminal device, the calculated spatial distance will not be absolutely zero. Therefore, the actual spatial distance obtained in step 204 is compared with the preset distance threshold. If the calculated spatial distance is less than the preset distance threshold, it indicates that the location of the wireless radio frequency signal source and the location of the human body captured by the visual image are within a reasonable error range in physical space. At this point, a clear conclusion of a successful match is reached, thus confirming that the wireless positioning data and the visual monitoring data belong to the same target object.

[0091] In some embodiments, after determining that the wireless location data and the visual monitoring location data belong to the same target object, the method may further include, but is not limited to, the following steps:

[0092] Extract the appearance and kinematic features of the target object in the current video frame;

[0093] Predict candidate bounding boxes for target objects in the next video frame based on appearance and kinematic features;

[0094] Based on the correlation and matching between candidate bounding boxes and actual detection results, the visual behavioral features of the target object are continuously updated in consecutive video frames until the target object is identified as passing through the gate channel.

[0095] After determining that the wireless positioning data and visual monitoring data belong to the same target object, the first step is to extract the appearance and kinematic features of the target object in the current video frame. Specifically, after completing the spatiotemporal matching of multimodal data and initially locking the target object, in order to continuously track the object in subsequent video streams, it is necessary to extract feature description parameters from the current video frame containing the object that can be used to maintain unique identity. Appearance features mainly refer to the visual information of the target object at the image pixel level, usually covering static visual semantic features such as the color distribution, texture details, and local contour edges of pedestrian clothing; kinematic features refer to the dynamic parameters describing the physical motion state of the target object, usually including displacement vector information such as the motion speed, acceleration, and motion direction exhibited by the target object in adjacent historical frames. By extracting these two types of features, a multi-dimensional feature description model can be established for the target object in both static pixel texture and dynamic physical motion dimensions, providing a basic data source for subsequent cross-frame target tracking. After obtaining the comprehensive feature model of the target object, it is necessary to predict the candidate bounding box of the target object in the next video frame based on the appearance and kinematic features. Based on motion patterns, the target object's position and trajectory are inferred in a very short time. Using extracted kinematic features and a motion state estimation model, the possible center coordinates of the target object in the next time step can be calculated. Simultaneously, combining the target scale and aspect ratio defined by appearance features, a geometric rectangular region representing the expected location of the target object is generated at the predicted center coordinates—a candidate bounding box. This prediction operation is equivalent to position extrapolation on the timeline, significantly narrowing the search range for the target object in the next frame, effectively improving the algorithm's matching efficiency and reducing the false detection rate. After the next video frame is actually input and basic detection is completed, association matching is performed based on the candidate bounding boxes and the actual detection results. The visual behavioral features of the target object are continuously updated in consecutive video frames until the target object is identified as passing through the turnstile. Specifically, in the new frame, the actual detection boxes (i.e., actual detection results) of all currently existing pedestrians are output. The candidate bounding boxes derived from the previous frame are compared with these actual detection results in the current frame using spatial overlap calculation and appearance feature similarity comparison, thus completing the association matching operation. If the comparison metric reaches a set threshold, the actual detection result in the current frame is confirmed as the target object originally being tracked. Subsequently, the latest pose and position coordinates of the target object in the current frame are extracted and overwritten or appended to the historical data to achieve continuous updates of the target object's visual behavioral features. The above-mentioned iterative calculation process of extraction, prediction, matching, and updating will continuously iterate in consecutive video frames, forming an uninterrupted target tracking sequence.The tracking and updating process will continue until the coordinate logic determines that the target object has completely crossed the physical boundary of the sensorless channel and left the monitoring field of view. The tracking task will then terminate, ensuring the continuity of the target object's identity and the integrity of its trajectory throughout the entire gate passage cycle. This provides a stable and reliable continuous visual data stream for the intent prediction logic, further enhancing the anti-interference capability of the gate passage intent recognition method in practical applications.

[0096] Through steps 201 to 205 described above, this embodiment of the application overcomes the physical isolation between heterogeneous sensors by employing a complete mathematical processing flow from obtaining mapping relationships, unifying time windows, performing spatial coordinate transformations, to calculating and comparing spatial distances. This matching verification based on spatial distance calculation not only improves the reliability of heterogeneous data fusion but also accurately links invisible wireless terminal identifiers with visible physical pedestrian entities, effectively avoiding identity misjudgment problems caused by sensor data mismatch in complex scenarios with large crowds.

[0097] In step 103 of some embodiments, after the identification and data attribution of the same target object are completed, dynamic monitoring of that specific target object will continue. Visual behavioral characteristics here refer to a collection of various external visual expressions that reflect a pedestrian's true movement state and passage intentions. Unlike isolated static location points, behavioral characteristics are dynamic process parameters that change continuously over time. Image processing technology can be used to extract the overall body movements, spatial trajectory relative to the gate, and dynamic posture changes of the target object from continuous visual monitoring images. By acquiring this multi-dimensional visual information, the specific movement patterns of pedestrians can be analyzed in depth, providing necessary and highly valuable basic data support for accurately determining whether they truly intend to pass through the gate.

[0098] Please see Figure 4 In some embodiments, step 103 may include, but is not limited to, steps 401 to 404.

[0099] Step 401: Extract the skeletal key points and bounding box coordinates belonging to the target object from the visual monitoring screen.

[0100] Step 402: Calculate the body orientation angle of the target object relative to the gate based on the distribution of the skeletal key points.

[0101] Step 403: Determine the trajectory of the target object based on the displacement change of the center point of the bounding box coordinates in the continuous video frames, and calculate the dwell time of the target object within the preset range of the gate.

[0102] Step 404: Combine the body orientation angle, movement trajectory, and dwell time as visual behavioral features.

[0103] In step 401 of some embodiments, skeletal key points and bounding box coordinates belonging to the target object are extracted from the visual monitoring screen. Specifically, the visual monitoring screen is a sequence of two-dimensional or three-dimensional images acquired in real time by a camera device. To transform the unstructured pixel information in the screen into structured data that can be computed, human pose estimation algorithms and object detection algorithms can be used to analyze consecutive video frames one by one. In this step, skeletal key points refer to the set of coordinates of the main joints and feature parts of the human body (such as the head, shoulders, elbows, hips, etc.) in the image space. These coordinate points, when connected topologically, can accurately represent the current physical posture and torso extension state of the human body. Bounding box coordinates refer to the geometric rectangle in the image that can completely enclose the pixel area of ​​the target object. Typically, the pixel coordinates of the upper left and lower right corners are used to define the overall position and area occupied by the target object in the current video frame. Based on the target attribution relationship determined in the previous steps, only the pixel features belonging to the specific target object are extracted in a targeted manner.

[0104] In step 402 of some embodiments, the body orientation angle of the target object relative to the gate is calculated based on the distribution of skeletal key points. Body orientation is an important spatial posture feature for determining whether a pedestrian intends to approach and enter the gate passage. The set of skeletal key points obtained in step 401 is extracted, especially core key points such as the left and right shoulders, chest, or face. By analyzing the relative geometric distribution of these key points in three-dimensional space or two-dimensional projection plane, a normal vector representing the frontal orientation of the human body is constructed. Subsequently, the pre-calibrated gate passage reference direction (i.e., the standard straight line direction when normally passing through the gate) is read, and the spatial angle between the human orientation normal vector and the gate passage reference direction is calculated using the vector angle formula. This calculated angle is the body orientation angle, and its value reflects whether the target object's current frontal gaze is aligned with the gate entrance.

[0105] In step 403 of some embodiments, the trajectory of the target object is determined based on the displacement change of the center point of the bounding box coordinates in consecutive video frames, and the dwell time of the target object within a preset range from the gate is calculated. The movement state of a pedestrian needs to be quantified using sequential data on the time axis. First, the center point of the bounding box coordinates of the target object in each video frame (e.g., center coordinates) is calculated. Then, in a continuous video frame sequence, a series of center points representing the same target object are connected and smoothed in chronological order to fit a curve or broken line reflecting the overall movement route of the target object, i.e., the trajectory. At the same time, a specific monitoring area is defined in the physical space as a preset range from the gate (e.g., an area 0.3 meters to 1 meter from the gate entrance). When the center point of the target object is determined to have entered the preset range through coordinate conversion, and the displacement change of its center point is lower than the set static determination threshold in subsequent consecutive frames, an internal timer is triggered. By accumulating the number of consecutive frames or time differences in which the target object is in this relatively static state, the dwell time of the target object in this key area can be calculated, thereby objectively quantifying its dwelling or waiting behavior.

[0106] In step 404 of some embodiments, the body orientation angle, movement trajectory, and dwell time are combined as visual behavioral features. Since single-dimensional visual parameters are prone to ambiguity in complex pedestrian scenarios and cannot fully describe the pedestrian's true behavioral intentions, the instantaneous spatial posture parameters calculated in step 402 (i.e., body orientation angle), the macroscopic spatial route parameters determined in step 403 (i.e., movement trajectory), and the time-dimensional quantified parameters (i.e., dwell time) are structurally encapsulated and concatenated. These three different-dimensional parameters constitute a multi-dimensional feature vector or feature set at the underlying data structure, serving as visual behavioral features. This combined feature serves as standardized input data for downstream intention decision-making logic, comprehensively and three-dimensionally describing the dynamic behavior of the target object.

[0107] Through steps 401 to 404 above, this embodiment of the application uses target detection and pose estimation algorithms to progressively parse and transform unstructured visual monitoring images into structured multidimensional objective data containing human skeletal posture, continuous spatial motion trajectory, and dwell time in specific areas. This multidimensional feature extraction and combination method overcomes the limitations of traditional technologies that rely solely on single location coordinates for simple distance threshold judgments. It can reconstruct the real action details of pedestrians in complex scenes, providing comprehensive and highly reliable basic data for subsequent accurate quantification of gate crossing intention prediction values.

[0108] In step 104 of some embodiments, a predicted value of the target object's intention to pass through the gate is determined based on visual behavioral characteristics. The predicted intention value is a quantitative evaluation index given regarding whether the target object has a genuine subjective intention to pass through the current gate. A pre-configured intention decision logic engine logically compares and comprehensively evaluates the visual behavioral characteristics obtained in the aforementioned steps, analyzing whether the target object's current actions conform to typical gate-passing preparation or are merely unintentional behaviors such as passing by, crossing, or lingering, and assigns a corresponding predicted value to the target object accordingly. This step successfully transforms complex, unstructured visual behavioral data into structured intention indicators that can be directly read and processed by a computer, enabling the identification of different passage requests in complex pedestrian flow scenarios.

[0109] Please see Figure 5 In some embodiments, step 104 may include, but is not limited to, steps 501 to 503.

[0110] Step 501: If the angle between the body's orientation and the gate's reference direction is less than a preset angle, and the movement trajectory is a straight line trajectory toward the gate, then the gate intention prediction value is determined to be the first prediction level.

[0111] Step 502: If the travel trajectory is to cross in front of the gate and the dwell time reaches the set first duration threshold, then the gate passage intention prediction value is determined to be the second prediction level.

[0112] Step 503: If the travel trajectory is to cross in front of the gate or repeatedly turn back and forth, and the dwell time does not reach the first duration threshold, then the gate intention prediction value is determined to be the third prediction level.

[0113] In step 501 of some embodiments, if the angle between the body's orientation and the gate's reference direction is less than a preset angle, and the trajectory is a straight line approaching the gate, then the gate-crossing intention prediction value is determined to be the first prediction level. Specifically, after obtaining the visual behavioral characteristics of the target object, its spatial posture and motion vector need to be quantitatively evaluated. Here, the angle between the body's orientation and the gate's reference direction is used to measure the geometric deviation between the pedestrian's frontal line of sight and the gate's standard entrance direction; the preset angle (e.g., 30 degrees) is a set reference threshold for determining whether the target is "facing" the gate. When the calculated actual angle is less than the preset angle, it indicates that the target object's body posture has aligned with the sensorless passage. At the same time, its trajectory is further evaluated. If the tangent direction of the trajectory in the two-dimensional or three-dimensional coordinate system continuously points to the gate and appears as a straight line (i.e., a "straight line approaching the gate"), it indicates that the target object is in a state of clear purpose and unwavering preparation for passage. In this scenario that conforms to typical forward gate-crossing characteristics, its gate-crossing intention prediction value is directly assigned the highest level of quantitative index, namely the first prediction level.

[0114] In step 502 of some embodiments, if the travel trajectory is across the front of the turnstile and the dwell time reaches a set first duration threshold, the predicted value of the intention to pass through the turnstile is determined to be the second prediction level. In actual pedestrian flow scenarios, pedestrian behavior is often complex. Crossing the front of the turnstile means that the target object's travel trajectory vector is parallel to the turnstile entrance plane and does not directly point to the turnstile channel. Under conventional logic, crossing is usually regarded as an unintentional passing behavior. However, the determination logic of this application introduces time-dimensional features for comprehensive consideration. By reading the dwell time of the target object in a specific area away from the turnstile, it is compared with a set first duration threshold (e.g., 2 seconds). If the target object is on a crossing trajectory, but its dwell time in the area in front of the turnstile is greater than or equal to the first duration threshold, this usually corresponds to a special behavioral state of a pedestrian waiting for companions, adjusting their passage posture, or waiting for the turnstile to respond. For situations where the trajectory does not directly point to the gate but shows a clear desire to pass through in terms of time characteristics, the gate-passing intention prediction value will be set to the second prediction level, thereby effectively distinguishing such complex behaviors with potential gate-passing intentions from simple passing behaviors.

[0115] In step 503 of some embodiments, if the movement trajectory is crossing in front of the turnstile or repeatedly turning back, and the dwell time does not reach the first duration threshold, the predicted value of the intention to pass through the turnstile is determined to be the third prediction level. Here, the first and second prediction levels represent the intention to allow passage, and the third prediction level represents the intention to refuse passage. This step is mainly used to filter and eliminate interfering targets that do not have a genuine intention to pass through the turnstile. When the movement trajectory of a target object is detected to be crossing in front of the turnstile (i.e., walking straight past the passage) or repeatedly turning back (i.e., the coordinate trajectory in front of the passage shows a back-and-forth state), and the dwell time of the object in this area is less than the aforementioned first duration threshold, it indicates that the target object neither aligns with the turnstile nor effectively lingers in front of the turnstile to wait for passage. Based on these behavioral characteristics, it can be determined that the object is merely a passing pedestrian or a person loitering nearby, and its predicted value of the intention to pass through the turnstile is output as the lowest third prediction level. Meanwhile, the control semantics of the three prediction levels are clearly defined in the underlying decision logic: the first and second prediction levels are both mapped to represent the intention to allow passage, serving as a prerequisite for triggering the door opening action; while the third prediction level is mapped to represent the intention to refuse passage, serving as a decision condition for maintaining the gate's blocking state.

[0116] Through steps 501 to 503 above, this embodiment of the application introduces objective physical parameters such as continuous body orientation angle, travel trajectory, and dwell time into conditional branch judgment. This not only accurately identifies purposeful straight-line passage behavior (first prediction level) but also effectively identifies complex gate-passing behavior accompanied by crossing but with dwelling and waiting characteristics (second prediction level), and accurately filters out unintentional passing or turning-back behavior (third prediction level). This multi-condition intention prediction mechanism overcomes the logical limitations of traditional single-sensor triggering based solely on spatial distance, achieving discretization and structured hierarchical classification of continuous dynamic behavior, providing objective and reliable data decision-making basis for subsequent precise release or interception operations.

[0117] In step 105 of some embodiments, after calculating the specific intent prediction result, the underlying hardware actuators can be linked to complete the final physical control closed loop. The gate control operation encompasses all control response actions taken for different intent prediction values. Based on the strength of the passage intention indicated by the gate intent prediction value, the system automatically decides the control strategy to be adopted, such as triggering the corresponding release mechanism or adopting an interception and denial mechanism. Simultaneously, based on the determined control strategy, corresponding control commands are generated to drive the gate body to move, and corresponding data interaction feedback is executed. This step ensures that the physical operation and control logic of the gate can strictly match the pedestrian's true intent, realizing the final implementation of intelligent gate control.

[0118] Please see Figure 6 In some embodiments, step 105 may include, but is not limited to, steps 601 to 602.

[0119] Step 601: When the gate passage intention prediction value is the first prediction level or the second prediction level, determine that the gate control operation is a release operation, generate and send the gate opening command, and send authorization data allowing contactless gate passage to the mobile terminal held by the target object.

[0120] Step 602: When the gate passage intention prediction value is the third prediction level, determine that the gate control operation is an interception operation, keep the gate closed, and send the indication data of refusing contactless gate passage to the mobile terminal.

[0121] In step 601 of some embodiments, when the gate passage intention prediction value is a first prediction level or a second prediction level, the gate control operation is determined to be a release operation. An opening command is generated and sent, and authorization data allowing seamless gate passage is sent to the mobile terminal held by the target object. Specifically, based on the determination of the pre-processing logic, both the first and second prediction levels indicate that the target object has a clear or potential genuine need for passage. When these two prediction values ​​are obtained, the control logic triggers the corresponding release operation, which covers processing actions at both the physical execution layer and the data interaction layer. At the physical execution layer, a control level or data message is generated as an opening command, and this command is sent to the motor drive module to drive the gate to open automatically, providing the target object with a barrier-free passage. At the data interaction layer, authorization data allowing seamless gate passage is transmitted to the mobile terminal carried by the target object via a wireless communication link. This authorization data typically includes confirmation information of successful identity verification, electronic credentials for successful back-end payment deduction, or encryption keys allowing passage. Through this physical control and data communication processing mechanism, the passage process can be completed automatically and seamlessly after confirming the pedestrian's intention to pass.

[0122] In step 602 of some embodiments, when the predicted value of the intention to pass through the gate is the third prediction level, the gate control operation is determined to be an interception operation. The gate remains closed, and an indication data for refusing contactless passage is sent to the mobile terminal. Corresponding to the above-described release logic, the third prediction level indicates that the target object's behavioral trajectory does not conform to the passage characteristics. When a prediction value of this level is received, the control logic will trigger a specific "interception operation." This interception operation is manifested in not generating any opening command, allowing the gate to continue to maintain its default closed state, thereby physically preventing unauthorized personnel from entering. Simultaneously, at the data interaction layer, "indication data for refusing contactless passage" is sent to the mobile terminal corresponding to the target object via a wireless communication link. This indication data is used to provide feedback to the mobile terminal on the current status information that the conditions for contactless automatic passage are not met. In actual hardware and software collaboration, after receiving this indication data, the mobile terminal can stop the current contactless authentication process based on this data status, or, when the user does need to pass through the gate, use it as a underlying trigger condition to guide the user to actively adopt a physical proximity interaction method such as Near Field Communication (NFC) card tapping for downgraded verification. This design preserves fault tolerance and safety degradation pathways when the target's intention to pass through the gate cannot be confirmed.

[0123] In summary, steps 601 to 602 transform the pre-prediction results into physical mechanical actions and data status feedback. This not only achieves efficient and smooth seamless passage and automatic payment when the target object has genuine intent, but also proactively maintains physical obstruction and interrupts the seamless payment process when the target object's intent is unclear or unintentional, while providing clear data feedback to support security degradation of the interactive method. This effectively overcomes the shortcomings of existing technologies that frequently lead to gate erroneous opening and payment due to relying solely on spatial distance triggering. While ensuring the security and accuracy of channel management, it also considers the reliability of passage under various complex business scenarios.

[0124] In summary, the gate intention recognition method provided in this application first introduces dual-modal data from wireless positioning and visual monitoring during the data acquisition stage, and then uses a spatiotemporal matching mechanism at the logical level to accurately associate the originally isolated heterogeneous data with the same target object. This breaks through the technical bottleneck of traditional single-distance sensors being susceptible to external environmental interference and unable to recognize human bodies. Subsequently, based on the multi-dimensional visual behavioral characteristics of the anchored object, in-depth analysis and intention prediction are performed, and finally, precise gate control operations are executed in a coordinated manner. This effectively filters out interference from complex unintentional behaviors such as crossing and turning back in the flow of people, improves the accuracy of gate intention recognition, and significantly reduces the risk of gates opening accidentally and deducting fees incorrectly while ensuring the intelligent management and control capabilities of the system.

[0125] Please see Figure 7 , Figure 7 This is a schematic diagram of the gate system structure provided in this application embodiment. A gait monitoring camera is deployed directly above the gate channel (such as at the gantry or ceiling position). This camera is responsible for capturing the visual image of the target object in front of the channel in real time and extracting its spatial position and behavioral characteristics. On both sides of the front end of the physical gate device, the main module and the sub-module of the Star Flash positioning module are symmetrically installed. The main module, as the core calculation node, is mainly responsible for high-precision positioning calculation and data processing; the sub-module is responsible for auxiliary positioning. Together, they construct a wireless radio frequency sensing area covering a specific range in front of the gate to capture the wireless signal emitted by the Star Flash terminal carried by the target object in real time and calculate the spatial physical coordinates of the Star Flash terminal. In terms of data link communication, a high-speed physical communication channel is established between the gait monitoring camera and the main module of the Star Flash positioning module through a dedicated connection line, ensuring that the visual monitoring position data and the wireless positioning position data can be converged and clock-aligned at the underlying level with extremely low latency.

[0126] Please see Figure 8 This application also provides a gate passage intention recognition device, which can implement the above-mentioned gate passage intention recognition method. The device includes:

[0127] The first acquisition module is used to acquire wireless positioning location data and visual monitoring location data within the target area;

[0128] The spatiotemporal matching module is used to perform spatiotemporal matching between the wireless positioning location data and the visual monitoring location data. If the matching is successful, it is determined that the wireless positioning location data and the visual monitoring location data belong to the same target object.

[0129] The second acquisition module is used to acquire the visual behavioral features of the target object;

[0130] The determination module is used to determine the predicted value of the target object's gate passage intention based on the visual behavioral characteristics;

[0131] The execution module is used to determine and execute the corresponding gate control operation based on the predicted gate intention value.

[0132] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described gate passage intention recognition method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0133] Please refer to the figure. Figure 9 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes:

[0134] The processor 901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0135] The memory 902 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 902 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 902 and is called and executed by the processor 901 to execute the gate crossing intent recognition method of the embodiments of this application.

[0136] The input / output interface 903 is used to implement information input and output;

[0137] The communication interface 904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0138] Bus 905 transmits information between various components of the device (e.g., processor 901, memory 902, input / output interface 903, and communication interface 904);

[0139] The processor 901, memory 902, input / output interface 903, and communication interface 904 are connected to each other within the device via bus 905.

[0140] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described gate crossing intent recognition method.

[0141] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0142] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0143] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0144] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0145] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0146] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0147] It should be understood that in this application, "at least one" and "several" refer to one or more, and "multiple" refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0148] In the embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. The couplings or direct couplings or communication connections shown or discussed may be indirect couplings or communication connections through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0149] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0150] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0151] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0152] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A method for recognizing gate passage intent, characterized in that, The method, applied to gate control equipment, includes: Acquire wireless positioning data and visual monitoring location data within the target area; The process of performing spatiotemporal matching between the wireless positioning data and the visual monitoring data includes: obtaining the spatial mapping relationship between the positioning module coordinate system and the visual coordinate system; extracting the wireless positioning data and the visual monitoring data within the same time window; based on the spatial mapping relationship, converting the wireless positioning data to the visual coordinate system to obtain the corresponding converted positioning coordinates; calculating the spatial distance between the converted positioning coordinates and the visual coordinates indicated by the visual monitoring data; if the spatial distance is less than a preset distance threshold, the matching is determined to be successful; if the matching is successful, the wireless positioning data and the visual monitoring data are determined to belong to the same target object. Obtaining the visual behavior features of the target object includes: extracting skeletal key points and bounding box coordinates belonging to the target object from the visual monitoring screen; calculating the body orientation angle of the target object relative to the gate based on the distribution state of the skeletal key points; determining the movement trajectory of the target object based on the displacement change of the center point of the bounding box coordinates in consecutive video frames, and calculating the dwell time of the target object within a preset range from the gate; and combining the body orientation angle, the movement trajectory, and the dwell time as the visual behavior features. Based on the visual behavioral characteristics, the predicted value of the target object's intention to pass through the gate is determined, including: if the angle between the body's orientation and the gate's reference direction is less than a preset angle, and the movement trajectory is a straight line approaching the gate, then the predicted value of the intention to pass through the gate is determined to be a first prediction level; if the movement trajectory is across the front of the gate, and the dwell time reaches a set first duration threshold, then the predicted value of the intention to pass through the gate is determined to be a second prediction level; if the movement trajectory is across the front of the gate or repeatedly turns back, and the dwell time does not reach the first duration threshold, then the predicted value of the intention to pass through the gate is determined to be a third prediction level; wherein, the first prediction level and the second prediction level represent an intention to allow passage, and the third prediction level represents an intention to refuse passage; Based on the predicted value of the gate passage intention, the corresponding gate control operation is determined and executed.

2. The gate passage intention recognition method according to claim 1, characterized in that, Before extracting the wireless positioning location data and the visual monitoring location data within the same time window, the method further includes: The basic system times of the wireless positioning module that collects the wireless positioning location data and the camera device that generates the visual monitoring location data are respectively acquired; Using a preset time base as a reference, calculate the time offset between the wireless positioning module and the camera device respectively; The timestamps of the collected wireless positioning data and visual monitoring data are aligned using the time offset.

3. The gate passage intention recognition method according to claim 1, characterized in that, The step of determining and executing the corresponding gate control operation based on the predicted gate passage intention value includes: When the predicted value of the intention to pass through the gate is the first prediction level or the second prediction level, the gate control operation is determined to be a release operation, an opening command is generated and sent, and authorization data allowing seamless passage is sent to the mobile terminal held by the target object. When the predicted value of the intention to pass through the gate is the third prediction level, the gate control operation is determined to be an interception operation, the gate is kept closed, and an instruction data to refuse contactless passage is sent to the mobile terminal.

4. The gate passage intention recognition method according to claim 1, characterized in that, After determining that the wireless positioning data and the visual monitoring location data belong to the same target object, the method further includes: Extract the appearance and kinematic features of the target object in the current video frame; Based on the appearance and kinematic features, predict the candidate bounding box of the target object in the next video frame; Based on the correlation and matching between the candidate bounding boxes and the actual detection results, the visual behavioral features of the target object are continuously updated in consecutive video frames until the target object is identified as passing through the gate channel.

5. A gate passage intention recognition device, characterized in that, The device, used in gate control equipment, includes: The first acquisition module is used to acquire wireless positioning location data and visual monitoring location data within the target area; The spatiotemporal matching module is used to perform spatiotemporal matching between the wireless positioning location data and the visual monitoring location data, including: obtaining the spatial mapping relationship between the positioning module coordinate system and the visual coordinate system; extracting the wireless positioning location data and the visual monitoring location data within the same time window; based on the spatial mapping relationship, converting the wireless positioning location data to the visual coordinate system to obtain the corresponding converted positioning coordinates; calculating the spatial distance between the converted positioning coordinates and the visual coordinates indicated by the visual monitoring location data; if the spatial distance is less than a preset distance threshold, the matching is determined to be successful; if the matching is successful, the wireless positioning location data and the visual monitoring location data are determined to belong to the same target object. The second acquisition module is used to acquire the visual behavior features of the target object, including: extracting skeletal key points and bounding box coordinates belonging to the target object from the visual monitoring screen; calculating the body orientation angle of the target object relative to the gate based on the distribution state of the skeletal key points; determining the movement trajectory of the target object based on the displacement change of the center point of the bounding box coordinates in continuous video frames, and calculating the dwell time of the target object within a preset range from the gate; and combining the body orientation angle, the movement trajectory, and the dwell time as the visual behavior features. The determination module is used to determine the predicted value of the target object's intention to pass through the gate based on the visual behavioral characteristics, including: if the angle between the body's orientation and the gate's reference direction is less than a preset angle, and the movement trajectory is a straight line trajectory towards the gate, then the predicted value of the intention to pass through the gate is determined to be a first prediction level; if the movement trajectory is across the front of the gate, and the dwell time reaches a set first duration threshold, then the predicted value of the intention to pass through the gate is determined to be a second prediction level; if the movement trajectory is across the front of the gate or repeatedly turns back, and the dwell time does not reach the first duration threshold, then the predicted value of the intention to pass through the gate is determined to be a third prediction level; wherein, the first prediction level and the second prediction level represent the intention to allow passage, and the third prediction level represents the intention to refuse passage; The execution module is used to determine and execute the corresponding gate control operation based on the predicted gate intention value.

6. An electronic device, characterized in that, The electronic device includes a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements the gate crossing intention recognition method according to any one of claims 1 to 4.

7. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the gate crossing intention recognition method according to any one of claims 1 to 4.