A bus passenger alighting station inference method based on trip chain and ensemble learning

By combining the travel chain hypothesis and the two-stage model of ensemble learning, the problem of irregular travel records in bus passenger alighting station inference is solved, improving the accuracy and efficiency of alighting station inference. It is applicable to bus network optimization and timetable arrangement in smart bus systems.

CN116823475BActive Publication Date: 2026-06-26TONGJI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TONGJI UNIV
Filing Date
2023-06-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for inferring bus passenger alighting points based on travel chains struggle to effectively handle irregular travel records, and machine learning models suffer from limitations in the transferability of inference methods and feature selection, resulting in low accuracy and efficiency in alighting point inference.

Method used

A two-stage model based on travel chain and ensemble learning is adopted. First, a preliminary inference is made through rule-based algorithms, combined with travel chain assumptions and passenger travel history. Then, a two-layer stacked framework multi-classification model based on ensemble learning is constructed to make accurate inferences using trip features, user features, built environment features and public transport route features.

Benefits of technology

It enables the inference of alighting points from all passenger travel records, improving the accuracy and efficiency of alighting point inference. It can fully utilize public transport transaction data to mine passenger travel characteristics and provide a reference for public transport network optimization and timetable arrangement.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116823475B_ABST
    Figure CN116823475B_ABST
Patent Text Reader

Abstract

The application discloses a bus passenger alighting station inference method based on a travel chain and an integrated learning, and comprises the following steps: collecting bus multi-source data; performing data cleaning and preprocessing on the multi-source data; fusing the multi-source data and judging a passenger boarding station according to a station arrival time window; performing preliminary inference on the alighting station based on a deterministic algorithm; establishing a deterministic algorithm to infer the passenger alighting station according to a travel chain assumption and passenger travel history; performing supplementary inference on the alighting station based on the integrated learning; introducing built environment and bus line external information for a journey that cannot be inferred by the deterministic algorithm; and establishing a multi-classification model based on an integrated learning two-layer stacking framework to further infer the passenger alighting station. The application realizes non-aggregated, station-level single-ticket bus passenger alighting station inference based on bus multi-source big data, can obtain accurate bus passenger flow space-time distribution characteristics, and provides a reference for bus operation organization optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of public transportation data processing and analysis, and in particular to a method for inferring bus passenger alighting stations based on a two-stage model of travel chain and ensemble learning. Background Technology

[0002] Developing smart public transportation is one of the main directions of modern urban transportation development, and it is of great significance for improving the service level of public transportation systems, achieving the strategic goal of public transportation priority, and promoting the sustainable development of urban transportation. In smart public transportation systems, passenger flow characteristic analysis provides the demand data foundation for application scenarios such as service design, operation management, and service evaluation, making it a key link in the entire system. The data foundation for passenger flow characteristic analysis mainly consists of Automatic Fare Collection (AFC) systems, represented by public transportation IC cards and QR codes. Based on the method of passenger card (code) transactions, AFC systems can be divided into two categories: open and closed. In the former, passengers only swipe their card (code) when boarding, and the system only records the passenger's boarding record; this is commonly used in single-fare public transportation systems. In the latter, passengers need to swipe their card (code) both when boarding and alighting to complete the transaction, and the system records both passenger boarding and alighting records; this is commonly used in segmented fare public transportation systems. Due to factors such as system cost and travel efficiency, many public transportation systems still use open AFC systems, which cannot directly obtain passenger alighting information from transaction data. In this context, accurately inferring passengers' alighting points based on their boarding records and other relevant information is both a key challenge and a difficult aspect of obtaining bus passenger flow characteristics.

[0003] To address the problem of predicting bus passenger alighting points, scholars both domestically and internationally have developed inference methods at different levels. Aggregate inference primarily obtains the distribution of alighting points and the total passenger flow for all passengers on a bus route by assuming the probability of passengers alighting at each stop. However, it ignores the heterogeneity of individual travel characteristics and cannot predict the actual alighting point for each passenger. Disaggregate inference, on the other hand, focuses on individual passengers, primarily establishing connections between different travel records based on travel chain theory, and employing rule-based or statistical algorithms to predict passenger alighting points. It offers better accuracy than aggregate inference, and therefore, disaggregate methods for predicting bus passenger alighting points have gradually replaced aggregate methods as the current mainstream approach.

[0004] A typical method of disaggregate inference is passenger drop-off station inference based on travel chains. This involves making assumptions about the relationships between different travel records based on the travel chain patterns of public transportation users, and establishing a deterministic algorithm to infer passenger drop-off stations. The main drawback of travel chain-based passenger drop-off station inference methods is their heavy reliance on the regularity of passenger travel patterns. Although some scholars have proposed improvement strategies such as supplementing travel chain assumptions and introducing historical travel information to assist in inference, the inherent mechanism of travel chain methods makes it difficult to infer drop-off stations from irregular travel records.

[0005] In recent years, the development of big data and machine learning technologies has provided new solutions for inferring alighting points on irregular trips. Some studies treat the alighting point inference problem as a serialization labeling problem, using models such as Bidirectional Long Short-Term Memory (Bi-LSTM) and Conditional Random Fields (CRF) to achieve inference. Although some studies have shown that machine learning models can effectively supplement methods for inferring alighting points for public transport passengers based on trip chains, there is still no consensus on the transferability of inference methods, the differences in alighting point inference across different bus routes, and the rationality of model feature selection, indicating room for further development. Summary of the Invention

[0006] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a method for inferring bus passenger alighting points based on travel chains and ensemble learning.

[0007] The objective of this invention can be achieved through the following technical solutions:

[0008] A method for inferring bus passenger alighting points based on travel chains and ensemble learning includes the following steps:

[0009] Collect basic data for inferring bus passenger alighting points, including bus AFC system transaction data, bus GPS data, and route and station GIS data.

[0010] Cleaning and preprocessing of the basic data for inferring alighting stations;

[0011] Integrate transaction data with vehicle GPS data to match bus passenger boarding points;

[0012] Based on the travel chain assumption and the travel history of public transport passengers, a rule-based deterministic algorithm is constructed to infer the first stage of public transport passenger alighting points.

[0013] A multi-classification model based on a two-layer stacked framework of ensemble learning is constructed. For routes that cannot be inferred by deterministic algorithms, a second stage of bus passenger alighting station inference is performed. The final result of bus passenger alighting station inference is obtained by combining the outputs of the two-stage model.

[0014] Furthermore, the collected and inferred basic data specifically includes:

[0015] Collect bus transaction data, including boarding time, transaction POS machine number, transaction card number, and transaction amount;

[0016] Collect GPS data from buses, including time, longitude, latitude, and instantaneous speed fields;

[0017] Collect GIS data on bus routes and stops. The route GIS data includes route name, route direction, stop sequence and stop name, and the stop GIS data includes stop name, longitude and latitude.

[0018] Data is collected from route sheets and vehicle-mounted POS machine-vehicle matching tables. The route sheet is a vehicle operation information table, and each route sheet records all the trips of a vehicle in a day, including the origin and destination stations and departure and arrival times for each trip. The vehicle-mounted POS machine-vehicle matching table records the matching relationship between the vehicle-mounted POS machine and the bus, including the license plate number, route name, vehicle-mounted POS machine number, and vehicle self-number.

[0019] Furthermore, the specific steps of the data cleaning and preprocessing include:

[0020] Based on the arrival and departure times of the bus routes recorded on the route slips, the GPS data of the buses are assigned to the corresponding routes.

[0021] Based on preset distance and time thresholds, identify and remove abnormal GPS trajectory points in the GPS data of public transport vehicles that include large-scale drift and repetition, and perform linear interpolation to complete the GPS trajectory.

[0022] The bus transaction data is matched to the corresponding vehicle based on the vehicle POS machine-vehicle matching table, and then the transaction data of the matched vehicle is matched to the corresponding route and schedule based on the trip log.

[0023] Furthermore, the specific steps for matching bus passenger boarding stops include:

[0024] Based on GIS data of bus stops, the cumulative distance from each stop to the corresponding starting station is calculated for each direction of each bus route;

[0025] For each bus route and each stop in each direction, the positions of the vehicle entering and exiting the intermediate stop are specified by a distance threshold Δd before and after the stop; for the starting station, only the exit position after the stop is specified, and the entry and exit positions are not specified for the terminal station.

[0026] The GPS track points of the vehicles with assigned shifts are matched to road segments on the map according to the principle of proximity, and the cumulative distance from each track point to the starting station is calculated based on the road segment matching results;

[0027] Based on the cumulative distance from the GPS track point to the starting station, the time when each bus arrives at the entry and exit positions of each intermediate station is determined, and these are respectively defined as the entry time and exit time of the bus at that intermediate station. The interval between the entry time and exit time is defined as the arrival time window of the bus at that intermediate station.

[0028] Iterate through each bus transaction record. For intermediate stations, if the transaction time falls within the arrival time window of the corresponding route, direction, and bus number, then the boarding station of the transaction record is determined to be the intermediate station; if the transaction time falls before the departure time of the starting station of the bus number, then the boarding station of the transaction record is determined to be the starting station.

[0029] Furthermore, the specific steps for inferring the alighting station based on the deterministic algorithm of the aforementioned rules include:

[0030] The drop-off station is inferred based on the travel chain assumptions used in the transaction data. The travel chain assumptions include: continuous travel assumption, symmetrical travel on the same day assumption, and home travel assumption.

[0031] If the transaction record does not apply to any travel chain assumption, the drop-off station is inferred based on the corresponding passenger's travel history;

[0032] If the drop-off station cannot be determined based on the corresponding passenger's travel history, then the transaction record is a trip that cannot be determined by a deterministic algorithm.

[0033] Furthermore, the specific steps for inferring the disembarkation station based on the travel chain assumption using transaction records include:

[0034] Step 1: Retrieve one transaction record T from the public transport transaction data. i Get the route, direction, boarding station, and boarding time of the passenger in the record; determine whether the record is the last trip of the day for the passenger corresponding to the transaction card number. If yes, proceed to step 4; otherwise, proceed to step 2.

[0035] Step 2: Read the passenger's next trip transaction record for the same day. i+1 Get the boarding station for the next trip. i+1 Find the transaction record in the subsequent stations of the route and direction, and the boarding station for the next trip. i+1 The closest station s n ;

[0036] Step 3: Calculate site s n With s i+1 Given the walking distance d1, determine whether d1 is less than or equal to a preset threshold d. max1 If so, then infer the current transaction record T. i The stop to get off is s iOtherwise, the alighting station will be inferred based on the corresponding passenger's travel history;

[0037] Step 4: Determine the current transaction record T i Is this the passenger's first trip of the day? If so, that is, the record is the passenger's only trip of the day, then proceed to step 7; otherwise, proceed to step 5.

[0038] Step 5: Read the passenger's first trip transaction record T1 for the day, obtain the boarding station s1 for the first trip, and search for transaction record T. i Among the subsequent stations on the line and in the direction, the station s with the smallest distance from s1 is... f ;

[0039] Step 6: Calculate site s f Given the walking distance d2 between s1 and s1, determine whether d2 is less than or equal to a preset threshold d. max2 If so, then infer the current transaction record T. i The stop to get off is s f Otherwise proceed to step 7;

[0040] Step 7: Determine if the passenger has a travel record for the next day. If yes, proceed to Step 8; otherwise, infer the drop-off station based on the corresponding passenger's travel history.

[0041] Step 8: Read the transaction record T1 for the passenger's first trip the following day. ′ The boarding station S1 for the first trip the following day was obtained. ′ Find transaction record T i In the subsequent stations along the route and in the direction, it is related to S1. ′ The closest station s f ′ ;

[0042] Step 9: Calculate site s f ′ With s1 ′ Given the walking distance d3, determine whether d3 is less than or equal to a preset threshold d. max3 If so, then infer the current transaction record T. i The stop to get off is s f ′ Otherwise, the alighting station will be inferred based on the corresponding passenger's travel history.

[0043] Furthermore, the specific steps for inferring the alighting station based on the corresponding passenger's travel history include:

[0044] Step 10: Find all transaction records of the passenger within the set time range, denoted as set S, and record the boarding time, boarding station, route and alighting station for each transaction record;

[0045] Step 11: Determine if there are two or more travel records in the transaction record set S that have the same boarding and alighting stations. If so, record these travel records as... Proceed to step 12; otherwise, the alighting station cannot be determined.

[0046] Step 12: For S h For each transaction record in the database, calculate its boarding time and t. i The absolute value of the difference Δt, if Δt of all transaction records is less than or equal to the preset threshold t c Then, the current transaction record T is inferred. i The stop to get off is s h Otherwise, it would be impossible to deduce the drop-off point.

[0047] Furthermore, the specific steps for determining the bus passenger alighting point in the second phase include:

[0048] Constructing a multi-classification model framework: Treating the inference of the alighting station as a multi-classification problem, using the number of boarding stations obtained by subtracting the order of the alighting station from the boarding station as the label, and using internal and external information related to the selection of the passenger's alighting station as the feature, and adopting a two-layer stacked model framework of ensemble learning for modeling;

[0049] Extract the model input features, including trip features, user features, built environment features, and public transport route features;

[0050] Construct a multi-classification model dataset: Select a reference bus system that has similar passenger travel characteristics to the bus system under study, but adopts a closed AFC system. Collect bus transactions at known drop-off points and corresponding vehicle GPS and station GIS data to create a bus passenger drop-off point inference ensemble learning model dataset that includes travel features, user features, built environment features and bus route features.

[0051] Multi-classification model training and testing are conducted. For transaction records where the alighting point cannot be inferred in the first stage, the trained two-layer stacked model is used to predict the number of stops taken, and the alighting point is calculated based on the matched boarding point, thus completing the inference of the alighting point for all transaction records.

[0052] Furthermore, the extraction of model input features includes the following specific steps:

[0053] Extract trip features from a passenger's single transaction record, including boarding time, route, boarding station, and transaction amount.

[0054] Extract user characteristics from all passenger transaction records within a week: including travel frequency and activity radius. The travel frequency is defined as the average number of transaction records per day within a week, and the activity radius is defined as the maximum distance from the boarding station to the centroid of all passenger transaction records within a week.

[0055] Based on the GIS data of bus stops, the built environment characteristics are extracted: taking the bus stop as the center, the density and entropy of points of interest within a defined radius are calculated using the following formula:

[0056]

[0057]

[0058] Among them, DPOI i The density of points of interest (POIs) at boarding station i; i Let N be the entropy of the point of interest at boarding station i; i p represents the total number of points of interest (POIs) within the plots surrounding boarding station i; n represents the number of POI categories; p j This represents the probability that a point of interest within a plot belongs to category j.

[0059] Calculate bus route characteristics, including average station spacing, daily number of departures, and passenger flow per trip.

[0060] Furthermore, the specific steps for model training include:

[0061] Training and testing of primary learners: In the first layer of the two-layer stacked model, the dataset of the bus passenger alighting station inference ensemble learning model is divided into training and testing sets according to a certain ratio. Cross-validation is used to obtain the prediction results of the primary learners on the training and testing sets. For each primary learner, the prediction results of each fold of the training set are stacked and merged row by row.

[0062] Secondary learner training and testing: The second layer of the two-layer stacked model combines the prediction results of the training and testing sets of multiple primary learners into a column-wise stacked and merged set to construct a new training and testing set for training and testing the secondary learner.

[0063] Compared with the prior art, the present invention has the following beneficial effects:

[0064] 1) By introducing an ensemble learning multi-classification model, this invention solves the problem that traditional methods based on travel chains and travel history cannot infer the drop-off point when the travel chain is broken. It realizes the inference of drop-off points for all passenger travel records, which helps to make full use of public transport transaction data to mine passenger travel characteristics and behavioral patterns, and provides a reference for smart public transport application scenarios such as public transport network optimization and timetable arrangement.

[0065] 2) When establishing the integrated learning multi-classification model, this invention delves into the intrinsic relationship between drop-off station selection and travel characteristics, built environment characteristics, user characteristics, and route characteristics. By introducing internal and external information closely related to passengers' drop-off station selection behavior, the accuracy and efficiency of drop-off station inference are improved.

[0066] 3) By introducing a reference bus system with similar passenger travel characteristics but using a closed AFC system, this invention solves the problem of lack of labeled training sets in ensemble learning multi-classification models, making the proposed bus passenger alighting station inference method practically feasible. Attached Figure Description

[0067] Figure 1 This is a flowchart of the method for inferring passenger alighting stations based on basic data of a public transportation system according to the present invention.

[0068] Figure 2 This is a flowchart of the first stage of the method for inferring passenger drop-off stations based on travel chains and travel history according to the present invention.

[0069] Figure 3 This is a flowchart of the second stage of the method for inferring passenger alighting stations based on a two-layer stacked ensemble learning model, as described in this invention. Detailed Implementation

[0070] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. These embodiments are based on the technical solution of the present invention and provide detailed implementation methods and specific operating procedures. However, the scope of protection of the present invention is not limited to the following embodiments.

[0071] Example 1

[0072] like Figure 1 As shown in the figure, this embodiment presents a method for inferring bus passenger alighting points based on travel chains and ensemble learning, which specifically includes the following steps:

[0073] Step 1: Collect basic data for inferring bus passenger alighting points;

[0074] Step 2: Clean and preprocess the basic data for inferring alighting stations;

[0075] Step 3: Integrate transaction data with vehicle GPS data to match bus passenger boarding stops;

[0076] Step 4: Based on the classic assumptions of the travel chain and the travel history of public transport passengers, construct a rule-based deterministic model to infer the first stage of public transport passenger alighting points.

[0077] Step 5: Considering the limited selection of drop-off points, construct a multi-classification model based on an ensemble learning two-layer stacking framework to perform the second stage of bus passenger drop-off point inference. Combine the outputs of the two-stage models to obtain the final result of bus passenger drop-off point inference.

[0078] Step 1 involves acquiring basic data for determining alighting points. The data collected in this step includes transaction data from the public transport AFC system, vehicle GPS data, and route and station GIS data. The specific process is as follows:

[0079] Step 1.1: Use the bus-mounted POS machine to collect bus IC card and QR code transaction data, including six valid fields: transaction POS machine number, transaction card number, transaction date, transaction time, transaction amount, and route name. The transaction time is recorded in the format "yyyy-mm-dd hh-mm-ss".

[0080] Step 1.2: Use the vehicle-mounted satellite positioning device to collect GPS data from the bus, including seven valid fields: time, longitude, latitude, heading angle, instantaneous speed, direction of travel, and road segment name. The GPS data collection interval is 0.1 seconds; the time is recorded in 13-digit millisecond timestamp format; longitude, latitude, and heading angle are all in degrees, with longitude and latitude accurate to 6 decimal places and heading angle accurate to the units place; instantaneous speed is in meters per second, accurate to the units place; the direction of travel is distinguished between up and down, recorded as "0" and "1" respectively.

[0081] Step 1.3: Use web crawling technology to access the electronic map API to collect GIS data of bus routes and stops, including the names, station order and latitude and longitude coordinates of all stops on the specified bus routes.

[0082] Step 1.4: Obtain the route slip and the vehicle POS machine-vehicle matching table from the bus operator. The route slip records all the trips of a vehicle in a day, including the origin and destination stations and departure and arrival times for each trip; the vehicle POS machine-vehicle matching table records the matching relationship between the vehicle POS machine and the bus, and its fields include license plate number, route name, vehicle POS machine number, and vehicle identification number.

[0083] Step 2 involves cleaning and preprocessing the basic data for inferring alighting stations, mainly involving the handling of outlier data and the matching and fusion of multi-source data. The specific process is as follows:

[0084] Step 2.1 involves classifying vehicle GPS data into shifts: Iterating through the vehicle's route data to obtain the departure and arrival times for each shift, and assigning records in the vehicle's GPS data whose timestamps fall between the departure and arrival times to the corresponding shifts. In practice, the validity of the classification result can be checked by comparing the distance between the vehicle's location at the shift classification point and the route's starting or ending station. If this distance is less than or equal to 200 meters, the shift classification is considered valid; otherwise, the route record is considered incorrect, and the actual departure and arrival times of the vehicles are extracted from the GPS data, and the shifts are reclassified.

[0085] Step 2.2: Based on the preset distance threshold, identify and remove abnormal GPS trajectory points with large-scale drift in the GPS data of buses: if a trajectory point is 500 meters or more away from the normal route, it is removed as a drift point; then, assuming that the vehicle moves at a constant speed between the two points, perform linear interpolation on the GPS trajectory to make the time interval between two adjacent GPS trajectory points less than or equal to 10 seconds, so as to meet the matching accuracy requirements of the boarding station.

[0086] Step 2.3: Based on the one-to-one correspondence between the POS machine number and the vehicle number in the vehicle-to-vehicle matching table, match the bus transaction records to the corresponding vehicles. Then, based on the one-to-one correspondence between vehicles, routes, and shifts in the route slip, determine the shift corresponding to each transaction record. The matched bus transaction data are arranged in ascending order by transaction card number.

[0087] Step 3 is the matching of bus passenger boarding points, which mainly involves the integration of bus transaction data and vehicle GPS data. The specific process is as follows:

[0088] Step 3.1: For each direction of all designated bus routes, calculate the cumulative distance from each stop to the corresponding starting station based on the electronic map of the route. Record the cumulative distance from bus route R, direction D, and stop s to the starting station as d. R,D,s .

[0089] Step 3.2: For all stops in each direction of all designated bus routes, set the entry / exit identification distance threshold Δd = 25 meters, and define the entry position of stop s as d for route R, direction D, and stop s. R,D,s -Δd, the exit position is d R,D,s +Δd. The practical significance of entry and exit positions is that when the cumulative distance traveled by the vehicle from the starting station equals the entry position of the station, the vehicle is determined to have entered the station; when the cumulative distance equals the exit position of the station, the vehicle is determined to have exited the station. In particular, there is no concept of entry for the starting station, so only the exit position is defined; for the terminal station, since passengers do not board at the terminal station, there is no need to match the boarding station, and therefore no need to define entry and exit positions.

[0090] Step 3.3: For the vehicle GPS data after the shift division is completed, match each GPS trajectory point to the road segment on the electronic map according to the principle of proximity, and calculate the cumulative distance from each trajectory point to the starting station in the corresponding direction.

[0091] Step 3.4: For all designated bus routes, for each stop in each direction, find the arrival and departure times of all buses based on the cumulative distance from the trajectory point to the starting station calculated in Step 3.3. Define these times as arrival and departure times, and the time interval between them as the arrival time window of the bus at that stop.

[0092] Step 3.5: Iterate through the bus transaction records and group them according to the arrival time window calculated in Step 3.4. If the transaction time falls within the arrival time window of the corresponding bus at a certain intermediate station, the passenger's boarding station in the transaction record can be determined as that intermediate station. In particular, if the transaction time falls before the departure time of the corresponding bus at the origin station, the passenger's boarding station in the transaction record can be determined as the origin station.

[0093] like Figure 2 As shown, step 4 is the first stage of the two-stage model for inferring bus passenger alighting points. Based on classic travel chain assumptions such as the "continuous travel assumption," "same-day symmetrical travel assumption," and "homecoming travel assumption," and passenger travel history, a rule-based algorithm is constructed to make preliminary inferences about alighting points. The specific process is as follows:

[0094] Step 4.1: First, determine the applicable travel chain hypothesis for the transaction record: Take one transaction record T from all the public transport transaction records to be inferred. i Retrieve the transaction card number (nid) and the route (R) from the record. i Direction D i boarding station s i and boarding time t i Determine if the record is the last trip of the day for the passenger corresponding to nid. If yes, proceed to step 4.4; otherwise, proceed to steps 4.2-4.3.

[0095] Steps 4.2 and 4.3 correspond to the "continuous travel assumption," which states that for two consecutive trips in a day for a public transport passenger, the alighting point of the first trip is the closest station to the boarding point of the second trip, and the maximum walking distance threshold between them is d. max1 .

[0096] In step 4.2, the transaction record T for the passenger's next trip on the same day corresponding to nid is retrieved. i+1 Get the boarding station for your next trip. i+1 ; Traverse transaction records T i On Line Ri Direction D i The subsequent stations are calculated based on the electronic map, showing the distance between each subsequent station and the next boarding station. i+1 The walking distance, obtain the same as s i+1 The station with the shortest walking distance n and site s n With s i+1 Find the shortest walking distance d1, and then proceed to step 4.3.

[0097] Step 4.3: Determine whether d1 is less than or equal to the preset threshold d. max1 Based on the average station spacing of the bus network and the passenger travel preference survey results in this embodiment, d is set. max1 = 500 meters. If for the current transaction record, d1 ≤ 500 meters, then it is inferred that the current transaction record T is... i The stop to get off is s n Otherwise, proceed to step 4.10.

[0098] Step 4.4, determine the current transaction record T i If this is the first trip of the day for the passenger corresponding to nid, then this record indicates that it is the only trip of the day for that passenger, and proceed to step 4.7; otherwise, proceed to step 4.5.

[0099] Steps 4.5 and 4.6 correspond to the "Symmetrical Trip Assumption of the Day," which states that for a public transport passenger's last trip of the day, the alighting point is the closest to the boarding point of the first trip of the day, and the maximum walking distance threshold between them is d. max2 .

[0100] In step 4.5, the transaction record T1 for the passenger's first trip of the day corresponding to nid is retrieved to obtain the boarding station s1 for the first trip of the day; the transaction records T are then traversed. i On Line R i Direction D i For subsequent stations, the walking distance between each subsequent station and the first boarding station s1 of the day is calculated based on the electronic map, and the station s with the smallest walking distance from s1 is obtained. f , and site s f Calculate the walking distance d2 from s1, and then proceed to step 4.6.

[0101] Step 4.6: Determine whether d2 is less than or equal to the preset threshold d. max2 d is set according to the actual situation in this embodiment. max2 = 1000 meters. If for the current transaction record, d2 ≤ 1000 meters, then it is inferred that the current transaction record T is... i The stop to get off is sf Otherwise, proceed to step 4.7.

[0102] Step 4.7, determine if the passenger is in the current transaction record T. i Check if there is a travel record on the second day. If so, proceed to step 4.8; otherwise, proceed to step 4.10.

[0103] Steps 4.8 and 4.9 correspond to the "Homebound Travel Hypothesis," which states that for the last trip of a public transport passenger in a day, if the "Symmetrical Travel Hypothesis" cannot determine the alighting point, then the alighting point for that trip is the closest point to the boarding point of the first trip the following day, and the maximum walking distance threshold between them is d. max3 .

[0104] In step 4.8, the transaction record T′1 for the passenger's first trip the next day corresponding to nid is retrieved to obtain the boarding station s′1 for the first trip the next day; the transaction records T′1 are then traversed. i On Line R i Direction D i For subsequent stations, the walking distance between each subsequent station and the first boarding station s′1 on the following day is calculated based on the electronic map, and the station s′ with the smallest distance from s′1 is obtained. f and site s′ f Calculate the walking distance d3 from s′1, and then proceed to step 4.9.

[0105] Step 4.9: Determine whether d3 is less than or equal to the preset threshold d. max3 d is set according to the actual situation in this embodiment. max3 = 1000 meters. If for the current transaction record, d3 ≤ 1000 meters, then it is inferred that the current transaction record T is... i The stop to get off is s′ f Otherwise, proceed to step 4.10.

[0106] Steps 4.10 to 4.11 correspond to the method for inferring the drop-off station based on the passenger's travel history. Its main principle is to find out if the same user has similar travel records in time and space within a certain period of time, and infer the passenger's drop-off station based on these similar travel records.

[0107] In step 4.10, all transaction records of the passenger corresponding to nid within one week are obtained, and these transaction records are denoted as set S. The boarding time, boarding station, route and alighting station of each transaction record are obtained.

[0108] Step 4.11: Determine if there are two or more travel records in S, satisfying the condition that the boarding station is the same and both are in S. i And the drop-off point is the same, both are s hIf so, then these travel records will be recorded as Proceed to step 4.12; otherwise, consider the current transaction record T as... i Unable to determine the drop-off point, proceed to step 4.13.

[0109] Step 4.12, for S h For each transaction record in the database, calculate its boarding time and t. i The absolute value of the difference Δt, if Δt of all transaction records is less than or equal to the preset threshold t c Then, the current transaction record T is inferred. i The stop to get off is s h Otherwise, the current transaction record T is considered... i Unable to determine the drop-off point, proceed to step 4.13.

[0110] Step 4.13, let i = i + 1, and return to step 4.1.

[0111] Step 5 is the second stage of the two-stage model for predicting bus passenger alighting points. It uses a data-driven machine learning model to analyze the relationship between alighting point selection and various relevant internal and external information to predict passenger alighting points. The specific process is as follows:

[0112] Step 5.1 involves constructing the model framework. Considering the limited choice of passenger drop-off stations, the drop-off station inference is treated as a multi-classification problem. The number of stops a passenger takes during a trip is used as the label, and internal and external information related to the passenger's drop-off station choice is used as the feature. A two-layer stacking framework of ensemble learning is used for modeling.

[0113] Step 5.2 involves extracting the model input features, including four categories: trip features, user features, built environment features, and public transport route features. The specific process is as follows:

[0114] Step 5.2.1: Extract trip features from a passenger's single transaction record, including boarding time, route, boarding station, and transaction amount. The boarding time is discretized by hour, the route and transaction amount are encoded using one-hot encoding, and the boarding station is converted into the station sequence of the corresponding route and direction and encoded using label encoding.

[0115] Step 5.2.2: Extract user features from all transaction records of passengers within a week: extract travel frequency, i.e., the average number of transaction records of passengers per day within a week; extract activity radius, i.e., the maximum distance from the boarding station to the centroid of all transaction records of passengers within a week; perform Z-Score standardization on the calculated travel frequency and activity radius.

[0116] Step 5.2.3: Extract the built environment features around the bus stop based on the bus stop GIS data: Using the bus stop as the center, crawl the point of interest (POI) information within a 500-meter radius of the surrounding area using the electronic map API, and calculate the POI density and POI entropy. The calculation formula is as follows:

[0117]

[0118]

[0119] Where DPOIi is the point of interest density of boarding station i;

[0120] EPOI i Let i be the entropy of the point of interest at boarding station i;

[0121] N i This represents the total number of points of interest within the plots surrounding boarding station i.

[0122] n is the number of categories of points of interest;

[0123] p j This represents the probability that a point of interest within a plot belongs to category j.

[0124] Step 5.2.4, calculate the bus route characteristics, including average station spacing, daily departure frequency, and passenger flow per trip; the daily departure frequency is obtained from the route record; the formulas for calculating average station spacing and passenger flow per trip are as follows:

[0125]

[0126]

[0127] All three features were Z-Score standardized.

[0128] Step 5.3 involves constructing the model dataset. Since the two-layer stacked framework of ensemble learning is supervised learning and relies on existing labels for model training, a reference bus system using a closed AFC (Automatic Fare Collection) system is needed to directly obtain passenger alighting information. Bus transaction data for known alighting points, along with corresponding vehicle GPS and station GIS data, are collected to create a bus passenger alighting point inference ensemble learning model dataset containing trip features, user features, built environment features, and bus route features. To ensure model transferability, the reference bus system needs to have similar operational and passenger travel characteristics to the research bus system.

[0129] Step 5.4 is model training and testing, which includes two sub-steps: primary learner training and testing, and secondary learner training and testing. The specific process is as follows:

[0130] Step 5.4.1, Training and Testing of Primary Learners: The bus passenger alighting station inference dataset constructed in Step 5.3 is randomly divided into training and testing sets in an 8:2 ratio; three typical machine learning models are selected as primary learners: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Adaptive BaseClass Boosting (ABCBoost); the parameters are tuned using ten-fold cross-validation to obtain the optimal parameter combination of the three primary learners, and the optimal combination is saved to the model file; the prediction results on the test set are obtained using the parameter-tuned primary learners.

[0131] Step 5.4.2, Secondary learner training and testing: The prediction results of the three primary learners in step 5.4.1 are stacked column-wise and merged into a new feature matrix as the input feature. The true labels of the training dataset are used as the output target to build a logistic regression model as the secondary learner. The parameters are tuned using the ten-fold cross-validation method to obtain the optimal parameter combination of the secondary learner and save it to the model file.

[0132] Step 5.5 is to obtain the final result of the bus passenger alighting station inference. For transaction records where the alighting station cannot be inferred in step 4, the trained two-layer stacked model is used to predict the number of stops taken, and the alighting station is calculated based on the boarding station matched in step 3. This completes the inference of the alighting station for all transaction records.

[0133] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. A method for inferring bus passenger alighting points based on travel chains and ensemble learning, characterized in that, Includes the following steps: Collect basic data for inferring bus passenger drop-off points, including bus AFC system transaction data, bus GPS data, and route and station GIS data; Cleaning and preprocessing of the basic data for inferring alighting stations; Integrate transaction data with vehicle GPS data to match bus passenger boarding points; Based on the travel chain assumption and the travel history of public transport passengers, a rule-based deterministic algorithm is constructed to perform the first stage of public transport passenger alighting station inference. The specific steps include: The drop-off station is inferred based on the travel chain assumptions used in the transaction records. The travel chain assumptions include: continuous travel assumption, same-day symmetrical travel assumption, and home travel assumption. If the transaction record does not apply to any travel chain assumption, the drop-off station is inferred based on the corresponding passenger's travel history; If the drop-off station cannot be determined based on the corresponding passenger's travel history, then the transaction record is a trip that cannot be determined by a deterministic algorithm. A multi-classification model based on a two-layer stacked ensemble learning framework is constructed to perform a second-stage inference of bus passenger alighting points for trips that deterministic algorithms cannot infer. Specific steps include: Constructing a multi-classification model framework: Treating the inference of the alighting station as a multi-classification problem, using the number of boarding stations obtained by subtracting the order of the alighting station from the boarding station as the label, and using internal and external information related to the selection of the passenger's alighting station as the feature, and adopting a two-layer stacked model framework of ensemble learning for modeling; Extract the model input features, including trip features, user features, built environment features, and public transport route features; Construct a multi-classification model dataset: Select a reference bus system that has similar passenger travel characteristics to the bus system under study, but adopts a closed AFC system. Collect bus transactions at known drop-off points and corresponding vehicle GPS and station GIS data to create a bus passenger drop-off point inference ensemble learning model dataset that includes travel features, user features, built environment features and bus route features. Multi-classification model training and testing were conducted. For transaction records where the alighting point could not be inferred in the first stage, the trained two-layer stacked model was used to predict the number of stops taken, and the alighting point was calculated based on the matched boarding point, thus completing the inference of the alighting point for all transaction records. The final result of inferring bus passenger alighting points is obtained by combining the outputs of the two-stage model.

2. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 1, characterized in that, The specific data collected for inference includes: Collect bus transaction data, including boarding time, transaction POS machine number, transaction card number, and transaction amount; Collect GPS data from buses, including time, longitude, latitude, and instantaneous speed fields; Collect GIS data on bus routes and stops. The route GIS data includes route name, route direction, stop sequence and stop name, and the stop GIS data includes stop name, longitude and latitude. Data is collected from route sheets and vehicle-mounted POS machine-vehicle matching tables. The route sheet is a vehicle operation information table, and each route sheet records all the trips of a vehicle in a day, including the origin and destination stations and departure and arrival times for each trip. The vehicle-mounted POS machine-vehicle matching table records the matching relationship between the vehicle-mounted POS machine and the bus, including the license plate number, route name, vehicle-mounted POS machine number, and vehicle self-number.

3. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 2, characterized in that, The specific steps of the data cleaning and preprocessing include: Based on the arrival and departure times of the bus routes recorded on the route slips, the GPS data of the buses are assigned to the corresponding routes. Based on preset distance and time thresholds, identify and remove abnormal GPS trajectory points in the GPS data of public transport vehicles that include large-scale drift and repetition, and perform linear interpolation to complete the GPS trajectory. The bus transaction data is matched to the corresponding vehicle based on the vehicle POS machine-vehicle matching table, and then the transaction data of the matched vehicle is matched to the corresponding route and schedule based on the trip log.

4. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 1, characterized in that, The specific steps for matching bus passenger boarding stations include: Based on GIS data of bus stops, the cumulative distance from each stop to the corresponding starting station is calculated for each direction of each bus route; For each bus route and each stop in each direction, specified distance thresholds are defined for the area before and after the stop. The location refers to the vehicle's entry and exit positions at the intermediate station; for the originating station, only the exit position after the station is specified, while the entry and exit positions are not specified for the destination station. The GPS track points of the vehicles with assigned shifts are matched to road segments on the map according to the principle of proximity, and the cumulative distance from each track point to the starting station is calculated based on the road segment matching results; Based on the cumulative distance from the GPS track point to the starting station, the time when each bus arrives at the entry and exit positions of each intermediate station is determined, and these are respectively defined as the entry time and exit time of the bus at that intermediate station. The interval between the entry time and exit time is defined as the arrival time window of the bus at that intermediate station. Iterate through each bus transaction record. For intermediate stations, if the transaction time falls within the arrival time window of the corresponding route, direction, and bus number, then the boarding station of the transaction record is determined to be the intermediate station; if the transaction time falls before the departure time of the starting station of the bus number, then the boarding station of the transaction record is determined to be the starting station.

5. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 1, characterized in that, The steps for inferring the drop-off point based on the travel chain assumption using transaction records include: Step 1: Retrieve one transaction record from the public transport transaction data. Get the route, direction, boarding station, and boarding time of the passenger in the record; determine whether the record is the last trip of the day for the passenger corresponding to the transaction card number. If yes, proceed to step 4; otherwise, proceed to step 2. Step 2: Read the passenger's transaction record for their next trip that day. Get the boarding station for your next trip. Find the transaction record in subsequent stations along the route and in the direction of travel, and the boarding station for your next trip. The station with the shortest distance ; Step 3: Calculate the site and walking distance ,judge Is it less than or equal to the preset threshold? If so, then infer the current transaction record. The stop to get off is Otherwise, the alighting station will be inferred based on the corresponding passenger's travel history; Step 4: Determine the current transaction record Is this the passenger's first trip of the day? If so, that is, the record is the passenger's only trip of the day, then proceed to step 7; otherwise, proceed to step 5. Step 5: Read the transaction record of the passenger's first trip of the day. Get the boarding station for your first trip. Search transaction records In the subsequent stations along the route and in the direction, and The station with the shortest distance ; Step 6: Calculate the site and walking distance ,judge Is it less than or equal to the preset threshold? If so, then infer the current transaction record. The stop to get off is Otherwise proceed to step 7; Step 7: Determine if the passenger has a travel record for the next day. If yes, proceed to Step 8; otherwise, infer the drop-off station based on the corresponding passenger's travel history. Step 8: Read the transaction record of the passenger's first trip the following day. Get the boarding station for the first trip the next day. Search transaction records In the subsequent stations along the route and in the direction, and The station with the shortest distance ; Step 9: Calculate the site and walking distance ,judge Is it less than or equal to the preset threshold? If so, then infer the current transaction record. The stop to get off is Otherwise, the alighting station will be inferred based on the corresponding passenger's travel history.

6. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 5, characterized in that, The specific steps for inferring the alighting station based on the corresponding passenger's travel history include: Step 10: Find all transaction records of this passenger within the specified time range and record them as a set. Record the boarding time, boarding station, route, and alighting station for each transaction. Step 11: Determine the set of transaction records If there are two or more travel records in the system where the boarding and alighting stations are the same, then these travel records will be recorded as follows: Proceed to step 12; otherwise, the alighting station cannot be determined. Step 12: For For each transaction record in the database, calculate its boarding time and... The absolute value of the difference If all transaction records All are less than or equal to the preset threshold Then infer the current transaction record. The stop to get off is Otherwise, it would be impossible to deduce the drop-off point.

7. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 1, characterized in that, The extracted model input features; The specific steps include: Extract trip features from a passenger's single transaction record, including boarding time, route, boarding station, and transaction amount. Extract user characteristics from all passenger transaction records within a week: including travel frequency and activity radius. The travel frequency is defined as the average number of transaction records per day within a week, and the activity radius is defined as the maximum distance from the boarding station to the centroid of all passenger transaction records within a week. Based on the GIS data of bus stops, the built environment characteristics are extracted: taking the bus stop as the center, the density and entropy of points of interest within a defined radius are calculated using the following formula: in, boarding station Density of points of interest; boarding station Entropy of interest points; boarding station The total number of points of interest within the surrounding plots; The number of categories of points of interest; Indicates the category to which the points of interest within the plot belong. The probability of; Calculate bus route characteristics, including average station spacing, daily departure frequency, and passenger flow per trip.

8. The method for inferring bus passenger alighting points based on travel chains and ensemble learning according to claim 1, characterized in that, The specific steps for training the model include: Training and testing of primary learners: In the first layer of the two-layer stacked model, the dataset of the bus passenger alighting station inference ensemble learning model is divided into training and testing sets according to a certain ratio. Cross-validation is used to obtain the prediction results of the primary learners on the training and testing sets. For each primary learner, the K-fold prediction results of the training set are stacked and merged row by row. Secondary learner training and testing: The second layer of the two-layer stacked model combines the prediction results of the training and testing sets of multiple primary learners into a column-wise stacked and merged set to construct a new training and testing set for training and testing the secondary learner.