Method and system for identifying stay gas station behavior trajectory based on maxcompute

By using MaxCompute-based methods for identifying driver behavior trajectories at gas stations and employing techniques such as data preprocessing and the Haversine formula, the problem of misjudgment in existing driver behavior identification technologies has been solved. This approach enables accurate behavior pattern recognition and personalized marketing strategies, thereby improving data processing and operational efficiency.

CN119167013BActive Publication Date: 2026-06-23BEIJING BAILONG MAYUN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING BAILONG MAYUN TECH CO LTD
Filing Date
2024-08-29
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, the method of determining whether a driver has passed through a gas station by the straight-line distance between the driver's location and the POI point of the gas station is prone to misjudgment, lacks accuracy and personalization, and leads to a crude marketing strategy.

Method used

A MaxCompute-based method for identifying drivers' behavior trajectories at gas stations is adopted. Through data preprocessing, trajectory smoothing and filling, distance calculation and filtering, and behavior pattern recognition and optimization, the method identifies drivers' entry, stay and departure behaviors at gas stations. The Haversine formula is used to calculate geographical distances, and the driver trajectory data is mapped to the H3 grid index system to filter out drivers near the POI.

Benefits of technology

It improves data processing efficiency and accuracy, enables refined behavioral pattern recognition, provides gas station operators with detailed driver behavior data, supports personalized services and marketing strategies, reduces operating costs, and improves operational efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119167013B_ABST
    Figure CN119167013B_ABST
Patent Text Reader

Abstract

The application discloses a MaxCompute-based stay gas station behavior trajectory identification method and system, and relates to the technical field of automobile trajectory identification; driver point data of each day is read from a database to form a data set D, including a time stamp t, a longitude l and a latitude d. Abnormal point positions in the data set D are filtered out by using an acceleration threshold value, for each POI point position, driver trajectory points d' in a grid and adjacent grid ranges of each side are queried. The distance between the driver trajectory points d' and the POI is calculated, and drivers located near the POI are screened out. The application effectively reduces the redundancy and error data in the original data set, reduces the complexity and computation amount of subsequent processing, and improves the overall data processing efficiency through data thinning and abnormal point filtering. The trajectory smoothing and filling technology makes the driving trajectory of the driver more continuous and smooth, and provides a basic guarantee for accurately identifying the behavior mode of the driver.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of vehicle trajectory recognition technology, specifically to intelligent direction recommendation for refueling and charging POI locations, and particularly to a method and system for recognizing station behavior trajectories based on MaxCompute. Background Technology

[0002] In the context of the rapid development of the automotive industry and new energy technologies, gas stations and charging stations, as key nodes for energy replenishment, directly impact the travel experience of car owners through their operational efficiency and service quality. While traditional marketing methods, such as advertising and promotional activities, can boost sales to some extent, these methods often lack precision and personalization.

[0003] To overcome this challenge, applications based on massive amounts of trajectory data have emerged as an innovative solution. Intelligent recommendations are achieved by precisely calculating whether drivers actually passed through points of interest (POIs) for refueling or charging. Operators can then develop more personalized marketing strategies by deeply analyzing drivers' driving routes and refueling habits. For example, they can recommend nearby gas stations to drivers who frequently travel in specific areas, or provide tailored services to drivers who prefer specific brands or types of fuel.

[0004] However, the traditional calculation method determines whether a driver has passed through a gas station by calculating the straight-line distance between the driver's location and the gas station's POI (Point of Interest). This traditional method, due to its crude strategy, often leads to a large number of false positives.

[0005] Therefore, this invention proposes a method and system for identifying the behavior trajectory of a gas station based on MaxCompute. Summary of the Invention

[0006] In view of this, the present invention aims to provide a method and system for identifying parking station behavior trajectories based on MaxCompute, in order to solve or alleviate the technical problems existing in the prior art, namely, how to analyze the driver's behavior patterns, and then efficiently filter each POI point and the driver points within it, so as to reduce the amount of data processing, and at least provide a useful option for this; the technical solution of the present invention is implemented as follows:

[0007] Firstly, a method for recognizing the behavior trajectory of people staying at gas stations based on MaxCompute:

[0008] (I) Overview:

[0009] This invention aims to achieve efficient identification and screening of driver behavior patterns through data processing and analysis technologies. First, data preprocessing and trajectory smoothing and filling techniques are used to clean and optimize the raw driver location data, improving the accuracy and reliability of subsequent analysis. Distance calculation and screening methods are then used to further filter drivers located near specific POIs (such as gas stations), providing foundational data for behavior pattern recognition. Behavior pattern recognition and optimization technologies accurately identify drivers' entry and exit behaviors at gas stations, breaking down continuous trajectories into independent entry, stay, and departure events. By aggregating and analyzing these events, indicators such as the daily number of drivers passing through each gas station, the daily number of drivers staying at each station, and the stay time for each driver are calculated, providing gas station operators with precise data support and personalized service and marketing strategies.

[0010] (II) Technical Solution:

[0011] 2.1 Step S1, Data Preprocessing:

[0012] The daily driver location data is read from the database to form a dataset D, which includes timestamp t, longitude l, and latitude d;

[0013] Anomalies in dataset D are filtered out using an acceleration threshold to ensure data accuracy and reliability.

[0014] 2.1.1 Step S100, Read data:

[0015] Daily driver location data is read from the database to form dataset D:

[0016] D={(t i ,l i ,d i )|i=1,2,…,n};

[0017] Where t i It is the i-th timestamp, l i It is the i-th longitude, d i It is the i-th latitude.

[0018] 2.1.2 Step S101, Data thinning:

[0019] A data thinning algorithm is applied to reduce the amount of data in dataset D. The data thinning algorithm is a pre-defined filtering strategy used to remove redundant or unnecessary data points.

[0020] 2.1.3 Step S102, Acceleration threshold filtering:

[0021] Calculate the velocity change between adjacent data points, then estimate the acceleration and compare it with a threshold T. If the acceleration a satisfies |a|>T, then outlier points are filtered out.

[0022] 2.2 Step S2, Trajectory Smoothing and Filling:

[0023] Windowing is applied to dataset D; for missing trajectory points within the time window, spline interpolation is used to fill in the missing coordinates.

[0024] 2.2.1 Step S200, window opening process:

[0025] For dataset D={(t i ,l i ,d i ) |i=1,2,…,n} and the set time window size Δt, for each data point (t i ,l i ,d i The dataset D within the corresponding time window i Represented as:

[0026] ;

[0027] 2.2.2 Step S201: Fill missing coordinates using spline interpolation:

[0028] For time window D i Missing trajectory points within the range are filled using spline interpolation.

[0029] Suppose there are m known data points within a time window, and they are sorted by time as follows:

[0030] ;

[0031] For the missing time point t ij ′ (where t) ij <t ij ′ <t ij+1 The corresponding longitude and latitude are calculated using cubic spline interpolation. ij ′ and d ij ′, so that:

[0032] ;

[0033] And S(t) in each subinterval [t ik ,t ik+1 The expression is a polynomial and satisfies the smoothness condition (e.g., continuous or differentiable) over the entire interval.

[0034] 2.3 Step S3, Distance Calculation and Filtering:

[0035] The Haversine formula is used to calculate the actual distance d between any geographic coordinate points in two datasets D and the distance d' between driver trajectory points. Then, driver trajectory data for drivers who did not drive that day are filtered out based on their driving records.

[0036] 2.3.1 Step S300, calculate distance d:

[0037] For any two geographic coordinate points (l) in dataset D i , d i ) and (l j , d j ), use the Haversine formula to calculate the actual distance d between them:

[0038] ;

[0039] Where r is the Earth's radius, which is 6371 kilometers.

[0040] The distance d' is calculated using the Haversine formula described above for each driver's trajectory point (t', l', d').

[0041] 2.3.2 Step S301: Filter the trajectory data of drivers who did not drive:

[0042] Let R be the set of driver dispatch records, where each element r is a time range representing the driver's dispatch time. For each driver's trajectory point (t', l', d'), check if it exists. The time range t' is within r. If such a r does not exist, the trajectory point is filtered out; otherwise, it is retained. The filtered dataset is denoted as D', which contains only the trajectory data of drivers who set off on that day.

[0043] 2.4 Step S4, Matching POI with Driver Location:

[0044] For each POI (Point of Interest), query the driver trajectory points d' within its own grid and the adjacent grids on each side. Calculate the distance between the queried driver trajectory points d' and the POI, and filter out drivers located near the POI.

[0045] 2.4.1 Step S400, mapping to the H3 grid index system:

[0046] POI point ( lPOI , d POI The driver trajectory points (l', d') and (l', d') are mapped to the H3 level 7 grid indexing system to obtain the corresponding grid index h. POI and h'.

[0047] 2.4.2 Step S401: Query driver trajectory points within adjacent grids:

[0048] For each POI location, query its corresponding grid h. POI And the driver trajectory points (l', d') within the adjacent grid ranges on each side. Let the set of adjacent grids be N(h POI If the query condition is:

[0049] ;

[0050] 2.4.3 Step S402: Calculate distance and filter drivers:

[0051] For each driver trajectory point (l', d') found, calculate its relationship with the POI point (l', d'). POI , d POI The distance d between them POI (Use the Haversine formula, as described in step S3).

[0052] Drivers located near the POI are selected based on a distance threshold θ, i.e., those satisfying d POI '≤θ;

[0053] The distance threshold θ is a distance range set according to actual needs, used to determine the specific business needs "nearby".

[0054] 2.5 Step S5, Behavioral Pattern Recognition and Optimization:

[0055] Identify patterns of driver trajectories entering, stopping, and leaving from outside the POI range. Then, break down continuous trajectories into independent entry, stop, and departure events, mark them, and record the relevant timestamps and location information.

[0056] 2.5.1 Step S500: Define the driver's trajectory data as a series of timestamps and location points:

[0057] {(t1,loc1),(t2,loc2),…,(t n ,loc n )}

[0058] Among them, t i It is the i-th timestamp, loc i It is the i-th location point (including longitude and latitude).

[0059] A Point of Interest (POI) can be defined as an area represented by a center point and a radius, or as a polygon, such as:

[0060] POI = (center, radius);

[0061] Here, center and radius are the center point and radius, respectively.

[0062] 2.5.2 Step S501, execute the judgment mechanism:

[0063] (1) Entry Event: When the driver's location enters the POI area for the first time from outside the POI area, an entry event is recorded, and the condition statement is as follows:

[0064] if distance(loc i ,center)≤radius and distance(loci) −1 ,center)>radiusthen Evententer=(t i ,loc i );

[0065] (2) Dwell Event: When the driver's location remains within the POI range for a period of time, a dwell event is recorded. A time threshold can be set to determine dwell time, as shown in the following conditional statement:

[0066] if distance(loc i ,center)≤radius for Δt≥thresholdstay thenEventstay=[(tstart,locstart),(tend,locend)];

[0067] Where Δt is the time the driver stays within the POI range, and thresholdstay is the time threshold for determining the stay.

[0068] (3) Departure Event: When the driver's location leaves the POI range, a departure event is recorded, with the following conditional statement:

[0069] if distance(loc i ,center)>radius and distance(loc i−1 ,center)≤radiusthen Eventleave=(t i ,loc i );

[0070] Finally, these events are tagged and their timestamps and location information are recorded. This allows continuous trajectory data to be broken down into independent entry, stay, and departure events, enabling subsequent analysis and optimization.

[0071] 2.6 Step S6, Gas Station Indicator Aggregation and Output:

[0072] Based on the split and tagged events, calculate the number of drivers passing through each gas station daily, the number of drivers stopping at each gas station daily, and the time each driver stays at the gas station.

[0073] 2.6.1 Step S600, calculate the number of drivers passing through each day:

[0074] For each gas station j, the daily number of drivers passing through (Npass,j) is obtained by counting all independent events that pass through the area of ​​that gas station. This includes both entry and exit events, but these must be counted without duplication.

[0075] This can be achieved by iterating through the event list and counting the unique identifiers of each driver.

[0076] 2.6.2 Step S601, calculate the number of drivers staying per day:

[0077] Number of drivers staying per day (Nstay, j This counts the number of all independent drivers who had a stop event within the vicinity of the gas station.

[0078] This can be achieved by iterating through the event list and counting the unique identifiers of drivers with stop events.

[0079] 2.6.3 Step S602: Calculate the time each driver spends at the gas station:

[0080] For each driver i with a stop event, the time T spent at gas station j is... stay,i,j This is obtained by calculating the difference between the start and end times of the driver's stop event:

[0081] T stay,i,j =t end,i,j -t start,i,j ;

[0082] Among them, t start,i,j and tend,i,j These are the start and end times of the event where driver i stops at gas station j.

[0083] (III) Mechanisms for solving technical problems:

[0084] By mapping Points of Interest (POIs) and driver locations to grid indexing systems such as H3, spatial indexing techniques are used to quickly locate driver locations around each POI. By calculating the distance between driver locations and POIs, drivers located near POIs are filtered out, reducing unnecessary data processing.

[0085] Continuous trajectory data is broken down into independent entry, dwell, and departure events, and each event is labeled, with relevant timestamps and location information recorded. Statistical analysis of the split and labeled events improves the accuracy and efficiency of identification.

[0086] Secondly, a MaxCompute-based system for recognizing the behavior trajectory of people staying at gas stations:

[0087] The system includes a processor and a memory connected to the processor. The memory stores program instructions, which, when executed by the processor, cause the processor to perform the behavior trajectory recognition method as described above. The processor is connected to:

[0088] (1) A data preprocessing module responsible for reading raw driver location data from the database, including key information such as timestamps, longitude, and latitude. Simultaneously, the data is thinned to reduce the data volume and lower the complexity of subsequent processing. Furthermore, methods such as acceleration thresholds are used to filter out outlier locations, ensuring the accuracy and reliability of the data.

[0089] The output is a cleaned and optimized dataset of driver locations.

[0090] (2) Trajectory smoothing and filling module for further processing of preprocessed driver trajectory data: including windowing of trajectory data, and then spline interpolation to fill missing coordinates for missing trajectory points within the time window, making the trajectory more continuous and smooth.

[0091] The output is smoothed and padded driver trajectory data.

[0092] (3) Distance calculation and spatial indexing module for calculating the actual distance between driver trajectory points: Based on geographic calculation methods such as the Haversine formula, the POI points and driver locations are mapped to grid indexing systems such as H3. Spatial indexing technology is used to quickly find the driver points around each POI and filter out the drivers located near the POI.

[0093] The output is a list of driver locations near each POI and their distances from the POI.

[0094] (4) Behavioral pattern recognition and optimization module for mining driver trajectory data: Identify the driver's behavioral patterns of entering from outside the POI range, staying within the POI range, and leaving the POI range. At the same time, the continuous trajectory data is split into independent entry events, stay events, and departure events, and each event is marked and recorded.

[0095] The output is a split and labeled dataset of driver behavior events.

[0096] (5) Gas station indicator aggregation and output module: Based on the split and labeled events, calculate key indicators such as the number of drivers passing through each gas station daily, the number of drivers staying at each station daily, and the dwell time of each driver. Then, output the calculation results to the database or reports for gas station operators to analyze and make decisions.

[0097] The output is a dataset of gas station operation metrics and a visualization report.

[0098] Compared with the prior art, the beneficial effects of the present invention are:

[0099] I. Improving Data Processing Efficiency and Accuracy: This invention effectively reduces redundant and erroneous data in the original dataset through data thinning and outlier filtering, thereby reducing the complexity and computational load of subsequent processing and improving overall data processing efficiency. Trajectory smoothing and filling techniques make the driver's driving trajectory more continuous and smooth, providing a fundamental guarantee for accurately identifying driver behavior patterns.

[0100] II. Achieving Refined Behavioral Pattern Recognition: This invention can accurately identify driver behavior patterns when entering from outside the POI area, staying within the POI area, and leaving the POI area, providing gas station operators with detailed driver behavior data. Continuous trajectories are broken down into independent entry, stay, and departure events, and each event is marked and recorded, making the analysis of driver behavior more refined and in-depth.

[0101] III. Providing Personalized Service and Marketing Strategy Support: This invention provides comprehensive data support to gas station operators by calculating indicators such as the daily number of drivers passing through each gas station, the daily number of drivers stopping at each station, and the duration of each driver's stay. This enables them to make data-driven decisions and optimizations. Based on driver behavior patterns and customer traffic at gas stations, gas station operators can develop personalized service and marketing strategies to improve customer satisfaction and the competitiveness of their gas stations.

[0102] IV. Improved Operational Efficiency and Reduced Costs: This invention automates data processing and analysis, reducing manual intervention and errors, and improving operational efficiency. By optimizing data processing workflows and improving analytical accuracy, this invention helps gas station operators reduce operating costs and increase profitability. Attached Figure Description

[0103] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0104] Figure 1 This is a schematic diagram of the method flow of the present invention;

[0105] Figure 2 This is a schematic diagram of the "driver trajectory" described in Embodiment 1 of the present invention;

[0106] Figure 3 This is a schematic diagram of the system composition of the present invention. Detailed Implementation

[0107] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Many specific details are set forth in the following description to provide a thorough understanding of the present invention. However, the present invention can be practiced in many other ways different from those described herein, and those skilled in the art can make similar modifications without departing from the spirit of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below;

[0108] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to the method section.

[0109] Example 1: As Figures 1-2 As shown, this embodiment will provide the application of the MaxCompute-based gas station behavior trajectory recognition method in a ride-hailing platform, including the following process:

[0110] In this embodiment, regarding step S1, data preprocessing involves reading daily driver location data from the database to form a dataset D, including timestamp t, longitude l, and latitude d; and using an acceleration threshold to filter out abnormal locations in dataset D to ensure the accuracy and reliability of the data.

[0111] Specifically, in step S100, data is read: daily driver location data is read from the ride-hailing platform's database to form a dataset D, where each data point contains a timestamp t, longitude l, and latitude d.

[0112] D={(t i ,l i ,d i )|i=1,2,…,n};

[0113] Where t i It is the i-th timestamp, l i It is the i-th longitude, d i It is the i-th latitude.

[0114] Specifically, in step S101, data thinning: a data thinning algorithm is applied to reduce the amount of data in dataset D. The data thinning algorithm is a preset filtering strategy to remove redundant or unnecessary data points.

[0115] For example, time interval sampling or road network matching algorithms can be applied to reduce the amount of data in dataset D and remove redundant or unnecessary data points.

[0116] Specifically, in step S102, acceleration threshold filtering: the velocity change between adjacent data points is calculated, and then the acceleration is estimated. If the absolute value of the acceleration 'a' is greater than a preset threshold T, the data point is filtered out to ensure the accuracy and reliability of the data.

[0117] In this embodiment, regarding step S2, trajectory smoothing and filling: the dataset D is windowed; for missing trajectory points within the time window, spline interpolation is used to fill the missing coordinates.

[0118] Specifically, in step S200, windowing: for each data point in dataset D, a time window size Δt is set, and dataset D within the time window is constructed. i For dataset D={(t i ,l i ,d i ) |i=1,2,…,n} and the set time window size Δt, for each data point (t i ,l i ,d i The dataset D within the corresponding time window i Represented as:

[0119] ;

[0120] Specifically, in step S201, spline interpolation is used to fill in missing coordinates: for time window D i Missing trajectory points within the range are filled using spline interpolation.

[0121] For example, suppose there are m known data points within a time window, and they are sorted by time as follows:

[0122] ;

[0123] For the missing time point t ij ′ (where t) ij <t ij ′ <t ij+1 The corresponding longitude and latitude are calculated using cubic spline interpolation. ij ′ and d ij ′, so that:

[0124] ;

[0125] And S(t) in each subinterval [t ik ,t ik+1 The expression is a polynomial and satisfies the smoothness condition (e.g., continuous or differentiable) over the entire interval.

[0126] In this embodiment, regarding step S3, distance calculation and filtering: the actual distance d between any geographic coordinate points in the two datasets D and the distance d' between driver trajectory points are calculated based on the Haversine formula. Then, driver trajectory data for drivers who did not drive that day are filtered out based on their driving records.

[0127] Specifically, in step S300, the distance d is calculated: the actual distance d between any two geographic coordinate points in dataset D is calculated using the Haversine formula, as well as the distance d' between the driver's trajectory point and the gas station.

[0128] ;

[0129] Where r is the Earth's radius, which is 6371 kilometers.

[0130] The distance d' is calculated using the Haversine formula described above for each driver's trajectory point (t', l', d').

[0131] Specifically, in step S301, the trajectory data of drivers who did not drive are filtered out based on the drivers' driving records, and only the trajectory data of drivers who did drive are retained.

[0132] For example, suppose the driver's dispatch records are a set R, where each element r is a time range representing the driver's dispatch time. For each driver's trajectory point (t', l', d'), check if it exists. The goal is to ensure that t' falls within the time range of r. If such a r does not exist, the trajectory point is filtered out; otherwise, it is retained. The filtered dataset is denoted as D', which contains only the trajectory data of drivers who set off on that day.

[0133] In this embodiment, regarding step S4, matching POIs with driver locations: For each POI point, query the driver trajectory points d' within its own grid and the adjacent grids on each side. Calculate the distance between the queried driver trajectory points d' and the POI, and filter out drivers located near the POI.

[0134] Specifically, in step S400, mapping to the H3 grid index system: mapping the gas station locations and driver trajectory points to the H3 level 7 grid index system to obtain the corresponding grid index, and obtaining the corresponding grid (index) h. POI and h'.

[0135] Specifically, in step S401, query the driver trajectory points within adjacent grids: For each gas station location, query the driver trajectory points within its own grid and the adjacent grids on each side. Let the set of adjacent grids be N(h POI If the query condition is:

[0136] ;

[0137] Specifically, in step S402, calculate the distance and filter drivers: calculate the distance between each queried driver trajectory point and the gas station location, and filter out drivers located near the gas station using the distance threshold θ.

[0138] For example, for each driver trajectory point (l', d') retrieved, calculate its relationship with the POI point (l', d'). POI , d POI The distance d between them POI (Use the Haversine formula, as described in step S3).

[0139] Drivers located near the POI are selected based on a distance threshold θ, i.e., those satisfying d POI '≤θ;

[0140] It should be noted that the distance threshold θ is a distance range set according to actual needs, used to determine the specific business needs "nearby".

[0141] In this embodiment, regarding step S5, behavior pattern recognition and optimization: identify the patterns of driver trajectories entering, stopping, and leaving from outside the POI range. Then, the continuous trajectory is broken down into independent entry events, stop events, and departure events, and these are marked, with relevant timestamps and location information recorded.

[0142] Specifically, in step S500, the driver's trajectory data is defined as a series of timestamps and location points:

[0143] {(t1,loc1),(t2,loc2),…,(t n ,loc n )};

[0144] Among them, t i It is the i-th timestamp, loc i It is the i-th location point (including longitude and latitude).

[0145] Furthermore, the scope of a POI (Point of Interest) is defined as an area represented by a center point and a radius, or by a polygon, such as POI=(center, radius); where center and radius are the center point and the radius, respectively.

[0146] Specifically, in step S501, the judgment mechanism is executed:

[0147] (1) Entry Event: When the driver's location enters the POI area for the first time from outside the POI area, an entry event is recorded, and the condition statement is as follows:

[0148] if distance(loc i ,center)≤radius and distance(loci) −1 ,center)>radiusthen Evententer=(t i ,loc i );

[0149] (2) Dwell Event: When the driver's location remains within the POI range for a period of time, a dwell event is recorded. A time threshold can be set to determine dwell time, as shown in the following conditional statement:

[0150] if distance(loc i ,center)≤radius for Δt≥thresholdstay thenEventstay=[(tstart,locstart),(tend,locend)];

[0151] Where Δt is the time the driver stays within the POI range, and thresholdstay is the time threshold for determining the stay.

[0152] (3) Departure Event: When the driver's location leaves the POI range, a departure event is recorded, with the following conditional statement:

[0153] if distance(loc i ,center)>radius and distance(loc i−1 ,center)≤radiusthen Eventleave=(t i ,loc i );

[0154] In this embodiment, regarding step S6, gas station index aggregation and output: based on the split and labeled events, calculate the number of drivers passing through each gas station daily, the number of drivers staying at each gas station daily, and the time each driver stays at the gas station.

[0155] Specifically, in step S600, the number of drivers passing through each gas station is calculated: For each gas station j, the number of drivers passing through each day (Npass,j) is obtained by calculating the number of all independent events that pass through the area of ​​that gas station. This includes entry and exit events, but these events need to be counted without duplication.

[0156] This can be achieved by iterating through the event list and counting the unique identifiers of each driver.

[0157] Specifically, in step S601, calculate the number of drivers staying per day: Number of drivers staying per day (Nstay, j This involves counting the number of unique drivers who had a stop event within the gas station's area. This can be achieved by iterating through the event list and counting the unique identifiers of the drivers who had stop events.

[0158] Specifically, in step S602, the time each driver spends at the gas station is calculated: for each driver i with a stop event, the time T spent at gas station j is calculated. stay,i,j T is obtained by calculating the difference between the start and end times of the driver's stop event. stay,i,j =t end,i,j -t start,i,j ;

[0159] Among them, t start,i,j and tend,i,j These are the start and end times of the event where driver i stops at gas station j.

[0160] In this embodiment, the Python execution program for steps S1 to S6 is as follows:

[0161] import numpy as np;

[0162] import pandas as pd;

[0163] from scipy.spatial import distance;

[0164] from haversine import haversine;

[0165] import datetime;

[0166] # S1, Input data;

[0167] drivers_data = { ;

[0168] 'timestamp': pd.date_range(start='2023-01-01', periods=1000, freq='T'), ;

[0169] 'driver_id': np.random.randint(1, 101, 1000), ;

[0170] 'latitude': np.random.uniform(30.0, 31.0, 1000), ;

[0171] 'longitude': np.random.uniform(120.0, 121.0, 1000);

[0172] } ;

[0173] df = pd.DataFrame(drivers_data);

[0174] # Gas station location;

[0175] gas_stations = pd.DataFrame({;

[0176] 'station_id': [1, 2], ;

[0177] 'latitude': [30.5, 30.6], ;

[0178] 'longitude': [120.5, 120.6], ;

[0179] 'radius': [0.1, 0.1] # Radius of the gas station, in degrees;

[0180] });

[0181] # Step S3: Distance Calculation and Filtering;

[0182] def calculate_distance(lat1, lon1, lat2, lon2):;

[0183] return haversine((lat1, lon1), (lat2, lon2));

[0184] df['distance_to_station1'] = df.apply(lambda row: calculate_distance(row['latitude'], row['longitude'], gas_stations.loc[0, 'latitude'], gas_stations.loc[0, 'longitude']), axis=1);

[0185] df['distance_to_station2'] = df.apply(lambda row: calculate_distance(row['latitude'], row['longitude'], gas_stations.loc[1, 'latitude'], gas_stations.loc[1, 'longitude']), axis=1);

[0186] # The driver only works during working hours;

[0187] work_hours = pd.date_range(start='2023-01-01 08:00', end='2023-01-0120:00', freq='H');

[0188] df['is_working'] = df['timestamp'].apply(lambda x: x.hour >= 8 andx.hour < 20);

[0189] # Step S4: Match POI with driver location;

[0190] threshold_distance = 0.1 # Distance threshold, unit: degrees;

[0191] df['near_station1'] = df['distance_to_station1'] <= threshold_distance;

[0192] df['near_station2'] = df['distance_to_station2'] <= threshold_distance;

[0193] # Step S5: Behavioral pattern recognition and optimization;

[0194] # Define enter, stay, and leave events;

[0195] def detect_events(df, station_id, radius, threshold_stay=10):

[0196] events = []

[0197] prev_near = False

[0198] start_time = None

[0199] for idx, row in df.iterrows():

[0200] near = row[f'near_station{station_id}']

[0201] if near and not prev_near:

[0202] # Enter event

[0203] events.append(('enter', row['timestamp'], row['driver_id']))

[0204] start_time = row['timestamp']

[0205] elif not near and prev_near:

[0206] # Leave event

[0207] if (row['timestamp'] - start_time).seconds >= threshold_stay:

[0208] events.append(('stay', start_time, row['timestamp'],row['driver_id']))

[0209] events.append(('leave', row['timestamp'], row['driver_id']))

[0210] prev_near = near;

[0211] return events;

[0212] events_station1 = detect_events(df, 1, gas_stations.loc[0, 'radius']);

[0213] events_station2 = detect_events(df, 2, gas_stations.loc[1, 'radius']);

[0214] # Step S6: Aggregation and Output of Gas Station Metrics;

[0215] def aggregate_events(events):;

[0216] passing_drivers = set();

[0217] staying_drivers = set();

[0218] stay_times = [];

[0219] for event_type, *args in events:;

[0220] if event_type == 'enter' or event_type == 'leave':;

[0221] passing_drivers.add(args[1]);

[0222] elif event_type =='stay':;

[0223] staying_drivers.add(args[2]);

[0224] stay_times.append((args[2], (args[1] - args[0]).seconds));

[0225] return len(passing_drivers), len(staying_drivers), stay_times;

[0226] passing_drivers_station1, staying_drivers_station1, stay_times_station1 = aggregate_events(events_station1);

[0227] passing_drivers_station2, staying_drivers_station2, stay_times_station2 = aggregate_events(events_station2);

[0228] print(f"Gas station 1: Number of drivers passing by {passing_drivers_station1}, Number of drivers staying at {staying_drivers_station1}");

[0229] print(f"Gas station 2: Number of drivers passing by {passing_drivers_station2}, Number of drivers staying at {staying_drivers_station2}");

[0230] Example 2: Figure 2 As shown, this embodiment, based on Embodiment 1, further discloses a MaxCompute-based system for recognizing the behavior trajectory of a stopped gas station:

[0231] The system includes a processor and a memory connected to the processor. The memory stores program instructions, which, when executed by the processor, cause the processor to perform the behavior trajectory recognition method as described above. The processor is connected to:

[0232] (1) A data preprocessing module responsible for reading raw driver location data from the database, including key information such as timestamps, longitude, and latitude. Simultaneously, the data is thinned to reduce the data volume and lower the complexity of subsequent processing. Furthermore, methods such as acceleration thresholds are used to filter out outlier locations, ensuring the accuracy and reliability of the data.

[0233] The output is a cleaned and optimized dataset of driver locations.

[0234] (2) Trajectory smoothing and filling module for further processing of preprocessed driver trajectory data: including windowing of trajectory data, and then spline interpolation to fill missing coordinates for missing trajectory points within the time window, making the trajectory more continuous and smooth.

[0235] The output is smoothed and padded driver trajectory data.

[0236] (3) Distance calculation and spatial indexing module for calculating the actual distance between driver trajectory points: Based on geographic calculation methods such as the Haversine formula, the POI points and driver locations are mapped to grid indexing systems such as H3. Spatial indexing technology is used to quickly find the driver points around each POI and filter out the drivers located near the POI.

[0237] The output is a list of driver locations near each POI and their distances from the POI.

[0238] (4) Behavioral pattern recognition and optimization module for mining driver trajectory data: Identify the driver's behavioral patterns of entering from outside the POI range, staying within the POI range, and leaving the POI range. At the same time, the continuous trajectory data is split into independent entry events, stay events, and departure events, and each event is marked and recorded.

[0239] The output is a split and labeled dataset of driver behavior events.

[0240] (5) Gas station indicator aggregation and output module: Based on the split and labeled events, calculate key indicators such as the number of drivers passing through each gas station daily, the number of drivers staying at each station daily, and the dwell time of each driver. Then, output the calculation results to the database or reports for gas station operators to analyze and make decisions.

[0241] The output is a dataset of gas station operation metrics and a visualization report.

[0242] All the above embodiments merely illustrate implementation methods for relevant practical applications of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this invention patent should be determined by the appended claims.

[0243] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0244] Furthermore, those skilled in the art will understand that implementing all or part of the processes in all the above-described embodiments can be accomplished by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in this application and in the embodiments can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Claims

1. A method for recognizing the behavior trajectory of a gas station stop based on MaxCompute, characterized in that, Includes the following steps: S1, read the daily driver location data from the database to form a dataset D, including timestamp t, longitude l and latitude d; S100, Dataset D={(t i ,l i ,d i )|i=1,2,…,n}; Where t i It is the i-th timestamp, l i It is the i-th longitude, d i It is the i-th latitude; S101, apply a data thinning algorithm to reduce the amount of data in dataset D; S102, calculate the velocity change between adjacent data points, then estimate the acceleration and compare it with the threshold T; If the acceleration a satisfies |a|>T, then outlier points are filtered out. S2, perform windowing on dataset D; for missing trajectory points within the time window, fill in the missing coordinates; S3, calculate the actual distance d between any geographic coordinate points in the two datasets D and the distance between driver trajectory points d'; and filter out the trajectory data of drivers who did not drive that day based on the driver's driving record; S4. For each POI point, query the driver trajectory points d' within its grid and the adjacent grids on each side; calculate the distance between the queried driver trajectory points d' and the POI. S5 identifies patterns of driver trajectories entering, stopping, and leaving from outside the POI range; then, it breaks down continuous trajectories into independent entry, stop, and departure events, marks them, and records the relevant timestamps and location information; The implementation method of S5 includes: Point of Interest (l) POI , d POI The driver trajectory points (l', d') and (l', d') are mapped to the H3 level 7 grid indexing system to obtain the corresponding grid h. POI and h'; For each POI location, query its corresponding grid h. POI And the driver trajectory points (l', d') within the adjacent grid range on each side; For each driver trajectory point (l', d') found, calculate its relationship with the POI point (l', d'). POI , d POI The distance d between them POI ', Based on a distance threshold θ, drivers located near the POI are selected, i.e., those satisfying d POI '≤θ; S6, based on the split and tagged events, calculate the number of drivers passing through each gas station daily, the number of drivers stopping at each gas station daily, and the time each driver stays at the gas station.

2. The behavior trajectory recognition method according to claim 1, characterized in that: The execution method of S2 includes: S200, for dataset D={(t i ,l i ,d i ) |i=1,2,…,n} and the set time window size Δt, for each data point (t i ,l i ,d i The dataset D within the corresponding time window i for: ; S201, for time window D i Missing trajectory points within the range are filled using spline interpolation.

3. The behavior trajectory recognition method according to claim 2, characterized in that: The spline interpolation method mentioned is the cubic spline interpolation method: Suppose there are m known data points within a time window, and they are sorted by time as follows: ; For the missing time point t ij ′, where t ij <t ij ′ <t ij+1 The corresponding longitude and latitude are calculated using cubic spline interpolation. ij ′ and d ij ′, and the spline function S(t) in each subinterval [t ik ,t ik+1 The expression is a polynomial and satisfies the smoothness condition over the entire interval, such that: 。 4. The behavior trajectory recognition method according to claim 1, characterized in that: The implementation method of S3 includes: for any two geographic coordinate points (l) in dataset D i , d i ) and (l j , d j ), use the Haversine formula to calculate the actual distance d between them: ; Where r is the Earth's radius; The distance d' is calculated using the Haversine formula described above for each driver's trajectory point (t', l', d').

5. The behavior trajectory recognition method according to claim 4, characterized in that: In S3, let the driver's dispatch records be a set R, where each element r is a time range. For each driver's trajectory point (t', l', d'), check if it exists. If t' is within the time range of r, then filter out the trajectory point; otherwise, retain it.

6. The behavior trajectory recognition method according to claim 1, characterized in that: The implementation method of S5 includes: When the driver's location first enters the POI area from outside the POI area, an entry event is recorded; Record a dwell event when the driver's location remains within the POI range for a period of time. When the driver's location leaves the POI range, a departure event is recorded.

7. A MaxCompute-based system for recognizing the behavior trajectory of people staying at gas stations, characterized in that: The system includes a processor and a memory connected to the processor. The memory stores program instructions, which, when executed by the processor, cause the processor to perform the behavior trajectory recognition method as described in any one of claims 1-6.

8. The behavior trajectory recognition system according to claim 7, characterized in that: The processor is connected to, The data preprocessing module is responsible for reading raw driver location data from the database; A trajectory smoothing and filling module for processing preprocessed driver trajectory data; Distance calculation and spatial indexing module for calculating the actual distance between driver trajectory points; A behavior pattern recognition and optimization module that mines driver trajectory data.