Rail transit station passenger flow prediction method based on big data analysis

By analyzing big data to identify the causes of train stop delays and combining signal interference probability and upstream capacity impact, accurate prediction of passenger flow at rail transit stations has been achieved. This solves the problem of insufficient prediction accuracy in existing technologies and enhances the support capabilities for capacity scheduling and safety management.

CN122243092APending Publication Date: 2026-06-19SHANDONG TRAFFIC CONTROL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANDONG TRAFFIC CONTROL TECH CO LTD
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing passenger flow forecasting methods cannot effectively distinguish the causes of train stop delays, resulting in insufficient forecast accuracy and an inability to support capacity scheduling and safety management.

Method used

By using big data analysis to obtain train operation data and ticket sales and inspection data, calculate platform dwell time and signal interference probability, separate net congestion time, and combine upstream capacity impact coefficient and passenger load completion rate to determine the actual departure passenger volume and achieve continuous prediction of passenger flow status.

🎯Benefits of technology

It improves the predictive robustness under abnormal driving conditions, reduces the false alarm rate, accurately reflects the capacity scheduling needs in congested scenarios, and supports dynamic and continuous tracking of passenger flow status throughout the entire operating period.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243092A_ABST
    Figure CN122243092A_ABST
Patent Text Reader

Abstract

This invention relates to the field of passenger flow prediction technology, specifically to a method for predicting passenger flow at rail transit stations based on big data analysis. The method determines the probability of signal interference based on the difference between train tracking intervals and a preset safety interval threshold; it separates net congestion time from platform dwell time based on the signal interference probability; it determines the upstream capacity impact coefficient based on upstream delay time; it determines passenger completion rate based on net congestion time; and it determines available power value in conjunction with the preset train design capacity; it determines the actual departure passenger volume of the current train based on the total number of people waiting on the platform and the available power value; it determines the number of passengers remaining on the platform after the current train departs based on the total number of people waiting on the platform and the actual departure passenger volume; and it uses the number of passengers remaining on the platform as the initial number of passengers remaining on the platform before the start of the next train, executing this method cyclically to achieve continuous prediction of passenger flow status, thereby significantly improving the accuracy of passenger flow prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of passenger flow prediction technology, specifically to a method for predicting passenger flow at rail transit stations based on big data analysis. Background Technology

[0002] During peak hours of urban rail transit operation (such as morning and evening rush hours), the actual dwell time of trains at platforms often exceeds the planned duration. This dwell time delay is usually caused by two types of reasons: one is the delay caused by passenger congestion due to excessive number of people waiting on the platform; the other is the delay caused by the signal system holding trains due to excessively close train tracking intervals or dispatch instructions.

[0003] Currently, most existing passenger flow forecasting methods are based on entry and exit data from automatic fare collection systems, estimating the number of passengers on the platform through simple differences. However, in actual operation, if the reasons for the two types of station delays mentioned above cannot be distinguished, it is impossible to differentiate between the time spent on basic operations and the time spent on congestion in the platform dwell time. This leads to the mistaken inclusion of time increases caused by non-passenger flow factors such as signal interference into the passenger flow forecast, ultimately resulting in insufficient forecast accuracy and a disconnect between the forecast results and the actual operating scenario. This makes it difficult to support downstream application needs such as capacity scheduling and safety management. Summary of the Invention

[0004] To address the problem of insufficient accuracy in passenger flow prediction caused by simple interpolation estimation of platform passenger numbers in existing technologies, this invention provides a passenger flow prediction method for rail transit stations based on big data analysis. The specific technical solution adopted is as follows: This invention proposes a method for predicting passenger flow at rail transit stations based on big data analysis. The method includes: Obtain train operation data and ticket sales and inspection data for the target station; Based on the arrival and departure times and planned arrival times in the train operation data, calculate the platform dwell time, train tracking interval, and upstream delay time for the current train service; based on the preset station hall passage time, perform time shift processing on the ticket sales and inspection data to determine the number of new waiting passengers for the current train and obtain the initial number of passengers waiting on the platform before the start of the current train; aggregate the number of new waiting passengers and the initial number of passengers waiting on the platform to determine the total number of passengers waiting on the platform. The signal interference probability is determined based on the difference between the train tracking interval and the preset safety interval threshold; the net congestion time is separated from the platform dwell time based on the signal interference probability. The upstream capacity impact coefficient is determined based on the upstream delay duration; the passenger load completion rate is determined based on the net congestion time; and the available power value is determined in conjunction with the preset train design capacity; the actual departure passenger volume of the current train is determined based on the total number of people waiting on the platform and the available power value. Based on the total number of people waiting on the platform and the actual number of passengers departing, determine the number of people remaining on the platform after the current train departs; use the number of people remaining on the platform as the initial number of people remaining on the platform before the start of the next train, and repeat this process to achieve continuous prediction of passenger flow.

[0005] Furthermore, the calculation of the platform dwell time, train tracking interval, and upstream delay duration for the current train service based on the arrival and departure times and planned arrival times in the train operation data includes: Extract the actual arrival time, actual departure time, and planned arrival time of the current train from the train operation data; The time difference between the actual departure time and the actual arrival time is taken as the platform dwell time. The time difference between the actual departure time of the current train and the actual departure time of the previous train is used as the train tracking interval. The difference between the actual arrival time and the planned arrival time is non-negative to obtain the upstream delay duration. If the difference between the actual arrival time and the planned arrival time is non-positive, the upstream delay duration is set to zero.

[0006] Furthermore, the process for determining the number of newly added waiting passengers includes: Get the actual departure time of the current train and the train before it; Subtract the preset station hall passage time from the departure time of the previous train to obtain the start time; subtract the preset station hall passage time from the departure time of the current train to obtain the end time; construct a time interval from the start time to the end time, and use the time interval as the effective entry time window; wherein, the time interval is a left-closed and right-open interval. From the ticket sales and inspection data, the number of passenger records whose entry timestamps fall within the valid entry time window is counted, and the number of passenger records is used as the number of newly added waiting passengers.

[0007] Furthermore, the process of determining the total number of people waiting on the platform includes: If the current shift is the first service shift after the start of the service, the initial number of people waiting on the platform will be set to zero. If the current shift is not the first service shift, read the number of people remaining on the platform after the previous shift departed, and use this as the initial number of people remaining on the platform for the current shift. If the number of passengers remaining on the platform after the departure of the previous train is not retrieved, starting from the initial state estimate, the net congestion time, actual departure passenger volume, and number of passengers remaining on the platform are iteratively calculated for at least one historical train immediately preceding the current train, using the train operation data and ticket sales data of the target station in the same historical period. The number of passengers remaining on the platform obtained by the iterative calculation is used as the initial number of passengers remaining on the platform required for the current train. Add the newly added number of people waiting on the platform to the initial number of people waiting on the platform to obtain the total number of people waiting on the platform.

[0008] Furthermore, the signal interference probability determination process includes: Obtain the preset safety interval threshold, preset system response delay, and preset signal sensitivity coefficient; The time difference is obtained by subtracting the train tracking interval from the sum of the safety interval threshold and the system response delay. The first exponential value is calculated as the exponential function with the natural constant as the base and the negative of the product of the time difference and the signal sensitivity coefficient as the exponent. The reciprocal of the sum of the positive integer 1 and the first exponent value is taken as the signal interference probability.

[0009] Furthermore, the step of separating the net congestion time from the platform dwell time based on the signal interference probability includes: Get the preset basic task time; The difference between the platform dwell time and the preset basic operation time is made non-negative to obtain the total extra dwell time. If the difference between the platform dwell time and the preset basic operation time is non-positive, the total extra dwell time is set to zero. Calculate the difference between the positive integer 1 and the signal interference probability, and use it as a complementary weight; The product of the total additional stop time and the complementary weight is calculated as the net congestion time, which represents the actual intensity of passenger congestion.

[0010] Furthermore, the determination of the upstream capacity impact coefficient based on the upstream delay duration includes: Obtain the preset minimum protection coefficient and preset attenuation coefficient; The second exponential value is calculated as an exponential function with the natural constant as the base and the negative of the product of the upstream delay duration and the attenuation coefficient as the exponent. Multiply the difference between the positive integer 1 and the minimum guarantee coefficient by the second exponent value, and add the product to the minimum guarantee coefficient to obtain the upstream capacity impact coefficient.

[0011] Furthermore, the determination of passenger completion rate based on net congestion time includes: Obtain the preset saturation growth coefficient and the preset basic completion rate; The third exponential value is calculated as the exponential function with the natural constant as the base and the negative of the product of net congestion time and saturation growth coefficient as the exponent. Multiply the difference between the positive integer 1 and the preset basic completion rate by the third exponent value, and add the product to the preset basic completion rate to obtain the passenger completion rate.

[0012] Furthermore, the process for determining the actual departing passenger volume of the current train includes: The product of the train's design capacity, the upstream capacity impact coefficient, and the passenger load completion rate is calculated as the available capacity value. The available capacity is compared with the total number of people waiting on the platform to obtain the comparison results; If the comparison result indicates that the total number of people waiting on the platform is not greater than the available capacity, the total number of people waiting on the platform will be used as the actual departure passenger volume. If the comparison result indicates that the total number of people waiting on the platform is greater than the available capacity, the available capacity will be used as the actual number of passengers departing the station.

[0013] Furthermore, after determining the number of passengers remaining on the platform after the current train departs based on the total number of passengers waiting on the platform and the actual number of passengers departing, the method further includes: Obtain historical ticket sales and inspection data for the same period corresponding to the planned arrival time of the next train; based on the entry records in the historical ticket sales and inspection data, calculate the average entry rate of passengers; Obtain the planned departure interval between the next bus and the current bus; Calculate the product of the average arrival rate and the planned departure interval as the predicted additional waiting number; add the number of people remaining on the platform after the current train departs to the predicted additional waiting number to obtain the predicted total waiting number. If the predicted total number of people waiting for the train exceeds the train's designed capacity, a warning message about insufficient capacity will be generated and output.

[0014] The present invention has the following beneficial effects: This invention utilizes signal interference probability to effectively distinguish between passenger flow factors and scheduling factors in stop delays. This prevents false alarms of passenger congestion during off-peak hours or train schedule adjustments, even if the current train is delayed for an extended period due to obstruction from the previous train. This significantly reduces the false alarm rate and improves predictive robustness under abnormal operating conditions. Furthermore, it creatively incorporates the physical fact that "the longer it's packed, the fuller it gets, but it approaches its limit," making the determined passenger load more consistent with actual congestion scenarios. Simultaneously, it quantifies the impact of upstream capacity on the carriages through an upstream capacity influence coefficient. The initial full load condition constrains the station's capacity. By combining passenger completion rate and upstream capacity impact coefficient, the calculated actual departing passenger volume accurately reflects the true complex state of the train in congested scenarios. The number of passengers remaining on the platform after the current train departs is used as the initial number of passengers remaining on the next train, forming a cyclical iterative prediction mechanism. This takes into account the cumulative effect of passenger flow transmission between different trains, so as to realize the dynamic and continuous tracking of passenger flow status throughout the entire operation period and provide forward-looking support for the capacity scheduling and passenger flow management of subsequent trains. Attached Figure Description

[0015] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0016] Figure 1 A flowchart illustrating a method for predicting passenger flow in rail transit stations based on big data analysis, provided as an embodiment of the present invention; Figure 2 This is an example diagram illustrating the signal interference probability determination process provided in one embodiment of the present invention. Detailed Implementation

[0017] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a big data analysis-based method for predicting passenger flow in rail transit stations according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.

[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0019] The following description, in conjunction with the accompanying drawings, details a specific scheme for a big data analysis-based method for predicting passenger flow in rail transit stations provided by this invention.

[0020] Please see Figure 1 The diagram illustrates a flowchart of a method for predicting passenger flow in rail transit stations based on big data analysis, according to an embodiment of the present invention. The method includes: S101: Obtain train operation data and ticket sales and inspection data for the target station.

[0021] The target station refers to a specific subway or railway station where passenger flow prediction needs to be implemented.

[0022] In this context, it can be understood that a train service schedule refers to a service event in which a train completes a full arrival, stop, and departure operation at the target station.

[0023] It should be noted that train operation data is primarily obtained from the Automatic Train Supervision (ATS) system or a functionally equivalent train dispatching information system. Key information with clear timestamps directly related to platform operations at the target station can be extracted through the ATS or train dispatching information system, including but not limited to actual arrival time, actual departure time, and planned arrival time.

[0024] It should be noted that automatic fare collection data can be obtained from the Automatic Fare Collection (AFC) system. Specific information that can be extracted from the AFC includes the entry transaction timestamp and the entry station number.

[0025] It is understandable that the station number is used to filter out entry records belonging to the target station.

[0026] It's important to understand that, in preparation for subsequent passenger flow forecasting based on "train service schedules," preliminary correlation and organization of train operation data and ticketing data can be performed. For example, ATS data can be organized according to train number, operating date, and arrival order; AFC entry data can be filtered and sorted according to timestamp and station; and a data coordinate system based on a unified time base (such as Beijing time) can be established to ensure the time comparability of data from different sources.

[0027] S102: Based on the arrival and departure times and planned arrival times in the train operation data, calculate the platform dwell time, train tracking interval, and upstream delay time of the current train service; based on the preset station hall passage time, perform time shift processing on the ticket sales and inspection data to determine the number of new waiting passengers for the current train, and obtain the initial number of passengers waiting on the platform before the start of the current train; aggregate the number of new waiting passengers and the initial number of passengers waiting on the platform to determine the total number of passengers waiting on the platform.

[0028] In this embodiment, the actual arrival time, actual departure time, and planned arrival time of the current train are extracted from the train operation data; the actual departure time is subtracted from the actual arrival time, and the resulting time difference is used as the platform dwell time; the actual departure time of the current train is subtracted from the actual departure time of the previous train, and the resulting time difference is used as the train tracking interval; the difference between the actual arrival time and the planned arrival time is non-negative to obtain the upstream delay time, wherein if the difference between the actual arrival time and the planned arrival time is non-positive, the upstream delay time is set to zero.

[0029] The platform stop time quantifies the entire process of the current train from coming to a complete stop and opening the doors to closing the doors and starting again.

[0030] Train following interval represents the time distance between the current train and its predecessor at the departure time. A shorter train following interval indicates higher train density and a greater probability of signal delays.

[0031] It is understandable that the previous train service refers to the most recent train service event before the current train service that completed the stop, passenger boarding and alighting, and departure at the target station.

[0032] Upstream delay duration refers to the time by which a train arrives at its destination station later than its scheduled arrival time. A longer upstream delay duration indicates a longer journey before reaching the destination station. During peak hours, this typically means that the train likely experienced severe passenger congestion and long boarding / alighting times at the previous station, resulting in near-full capacity and less space left for passengers at the destination station.

[0033] In this embodiment, the actual departure times of the current train and its previous train are obtained; the departure time of the previous train is subtracted from the preset station hall travel time to obtain the start time; the departure time of the current train is subtracted from the preset station hall travel time to obtain the end time; a time interval from the start time to the end time is constructed, and the time interval is used as the effective entry time window; wherein, the time interval is a left-closed and right-open interval; from the ticket sales and inspection data, the number of passenger records whose entry timestamps fall within the effective entry time window is counted, and the number of passenger records is used as the number of newly waiting passengers.

[0034] The preset station hall travel time represents the average time required for a passenger to walk through the station hall, go through security, take the escalator and stairs after passing through the station entrance gate, and finally reach the waiting area of ​​the target platform.

[0035] It should be noted that the specific value of the preset station hall passage time can be obtained based on the physical layout and passenger flow analysis of the target station, and this embodiment does not impose a specific limitation. For example, for a small to medium-sized subway station with a compact structure and the turnstiles being close to the platform, the station hall passage time may be set to 2 minutes; while for a large hub station with a large transfer hall and requiring long escalators or stairs to reach the platform, the station hall passage time may be set to 4 minutes or longer.

[0036] It's important to understand that the "entry timestamp" recorded by the ticketing system is not equal to the passenger's actual "arrival time at the platform waiting area." There is at least a fixed "station hall travel time" difference between the two. Therefore, directly counting the number of people entering the station within the train arrival interval will miscount passengers who are too late to reach the platform and will also miss those who entered the station early and are actually waiting for the train. So, the time can be shifted forward by one station hall travel time to estimate the time when passengers arrive at the platform, and this can be used to determine which passengers truly belong to the valid waiting group for the current train.

[0037] It should be noted that the left-closed option includes passengers who arrive at the station precisely at the start time. These passengers theoretically have just enough time to reach the platform after the previous train departs, and should therefore be included in the waiting crowd for the current train. The right-open option, on the other hand, excludes passengers who arrive at the station precisely at the "end time." These passengers theoretically arrive at the platform at the exact same time as the departure of the current train, making it virtually impossible for them to board in practice. Therefore, they are not included in the current train but belong to the next train.

[0038] It is important to understand that since passengers who were not transported on the previous trip will continue to remain on the platform, adding pressure to the next train along with the new passengers, it is necessary to know how many people were already on the platform before the current trip started, i.e., the initial number of people remaining on the platform, in order to accurately assess the true total load faced by the current trip.

[0039] In this embodiment, if the current train is the first service train after startup, the initial number of passengers waiting on the platform is set to zero; if the current train is not the first service train, the number of passengers waiting on the platform after the departure of the previous train is read as the initial number of passengers waiting on the platform for the current train; if the number of passengers waiting on the platform after the departure of the previous train is not read, starting from the initial state estimate, using the train operation data and ticket sales data of the target station in the same historical period, the net congestion time, actual departure passenger volume and number of passengers waiting on the platform are iteratively calculated for at least one historical train immediately preceding the current train; the number of passengers waiting on the platform obtained by the iterative calculation is used as the initial number of passengers waiting on the platform required for the current train; the total number of passengers waiting on the platform is obtained by adding the newly added number of passengers waiting on the platform.

[0040] The initial state estimate refers to a virtual starting point for the number of people remaining on the platform that is as close as possible to the actual situation when encountering a non-first train and a cold start (i.e., the number of people remaining on the platform after the departure of the previous train cannot be read). Due to the lack of real state records of the previous train, in order to start the subsequent passenger flow prediction process, it is set in advance.

[0041] It should be noted that the specific value of the initial state estimate can be determined based on historical data from the same period. This embodiment does not impose specific limitations. The specific determination process includes: identifying the same operating period and the same shift sequence corresponding to the current shift in the historical dates; then querying the historical values ​​of the number of people staying at the platform in the same operating period and the same shift over the past multiple working days (e.g., 5-10 working days), calculated and stored by the same method of this invention; finally, performing an arithmetic mean (or taking the median) on these historical values, and setting the calculation result as the initial state estimate for cold start.

[0042] For example, suppose that at 14:05 on a certain weekday afternoon, there is a non-first train and a cold start situation. The average number of people waiting on the platform at the target station at this time (for example, around 14:05 on the past 5 weekdays) is about 15. Then the initial estimate can be set to 15.

[0043] In this context, it can be understood that "historical same period" refers to the same specific operating day type: usually the same weekday (Monday to Friday) or the same weekend / holiday type; the same specific time period: usually the same clock time period corresponding to the planned arrival time of the current shift. For example, if the current shift is scheduled to arrive at 08:15, then the historical same period data refers to the data before and after 08:15; the same shift sequence position: usually the data corresponding to the historical shifts that have the same sequence position or the same train number as the current shift in the daily operating plan.

[0044] For example, assuming the current train is train k, the initial state estimate is used as the "virtual" platform congestion number before the start of the first historical train (e.g., train k-3). Then, the stored train operation data and ticketing data corresponding to these historical trains in the same historical period are retrieved sequentially, strictly following the steps of this invention: first, the net congestion time and actual departure passenger volume of the historical train are calculated, and then the new platform congestion number after the end of the historical train is updated. Next, this newly calculated congestion number is used as the initial value for the next more recent historical train (e.g., train k-2), and the above calculation is repeated. This cycle continues until the calculation for the historical train closest to the current train (e.g., train k-1) is completed, and the output platform congestion number is the initial platform congestion number required for the current train.

[0045] S103: Determine the signal interference probability based on the difference between the train tracking interval and the preset safety interval threshold; separate the net congestion time from the platform stopping time based on the signal interference probability.

[0046] It is important to understand that since long train stops at platforms can be caused by two completely different reasons: actual passenger congestion and signal system safety intervention, in order to avoid misjudgments, such as misinterpreting signal intervention as severe congestion, and to extract the net congestion time that characterizes the intensity of passenger congestion from the observed platform stop duration, we can first quantify the probability that the signal system will be forced to intervene due to obstruction or excessively close intervals, i.e., the signal interference probability. Determining the signal interference probability is so that, in the future, we can separate the waiting time caused by non-passenger flow factors from the platform stop duration, just like "filtering out noise," thereby obtaining the net congestion time that only reflects the actual passenger flow pressure.

[0047] The process of determining the probability of signal interference is as follows: Figure 2 As shown, it includes: S103-1: Obtain the preset safety interval threshold, preset system response delay, and preset signal sensitivity coefficient.

[0048] The preset safety interval threshold refers to the minimum safe tracking interval allowed by the design of the line signaling system (such as ATP or ATO). This is a physical constant determined by the line design, vehicle performance, and signaling system, and is usually measured in seconds. For example, for traditional fixed block or quasi-moving block lines, the typical safety interval threshold is 120 to 180 seconds.

[0049] The preset system response delay is used to compensate for the identification, communication and execution delays in the process from "perceiving the risk of the interval" to "executing the braking command".

[0050] It should be noted that the specific value of the preset system response delay can be determined based on engineering experience and data fitting. This embodiment does not impose a specific limitation. The specific determination process includes: retrieving historical data recorded by ATS and filtering out records that clearly show delays caused by reasons such as obstruction or train impoundment; analyzing the corresponding train tracking intervals when these train impoundment events occur; statistically analyzing the typical distribution of train tracking intervals when train impoundment occurs, and the value of the system response delay should be such that the safety interval threshold plus the system response delay is approximately located near the upper edge of this typical distribution.

[0051] For example, if it is found that the vast majority of train impoundment events occur when the train tracking interval is less than 130 seconds, and the safety interval threshold is 120 seconds, then the system response delay can be calibrated to be about 10 seconds.

[0052] The preset signal sensitivity coefficient controls the steepness of the rise and fall of the signal interference probability as the interval changes. A larger preset signal sensitivity coefficient indicates that the signal interference probability is more sensitive and decisive in responding to the approaching interval.

[0053] It should be noted that the specific value of the preset signal sensitivity coefficient can be determined based on logistic regression of historical data. This embodiment does not impose a specific limitation. The specific determination process includes: collecting a large number of historical train sample data, each sample containing a feature X (i.e., train tracking interval) and a label Y (e.g., Y=1 indicates that the corresponding train has experienced a signal delay; Y=0 indicates that no signal delay has occurred); using a logistic regression algorithm to fit the above dataset (X, Y), and using the absolute value of the slope obtained from the fitting as the preset signal sensitivity coefficient.

[0054] For example, a line with flexible train operation and interval control may take a value of 0.35; a line with extremely saturated capacity and extremely strict safety control may take a value of 0.8.

[0055] S103-2: Subtract the train tracking interval from the sum of the safety interval threshold and the system response delay to obtain the time difference.

[0056] The time difference quantifies the dangerous deviation of the current train interval from the safety boundary. A larger time difference for a particular train means that the actual train tracking interval is significantly and deeply shorter than the safety warning line (i.e., the sum of the safety interval threshold and the system response delay). This reflects that the closer the two trains are in time, the easier it is for the signaling system to detect an extremely high risk of collision, and therefore the greater the probability of intervention and train impoundment.

[0057] S103-3: Calculate the value of the exponential function with the natural constant as the base and the negative of the product of the time difference and the signal sensitivity coefficient as the exponent, and use it as the first exponent value.

[0058] S103-4: The reciprocal of the sum of the positive integer 1 and the first exponent value is used as the signal interference probability.

[0059] Since a larger positive time difference indicates a more dangerous actual interval and greater susceptibility to signal interference, the negative product of the time difference and the signal sensitivity coefficient becomes a very large negative number. This causes the first exponent value to approach 0, ultimately making the signal interference probability approach 1. This indicates that the shorter the current train's tracking interval, the deeper it has crossed the safety warning line, reflecting a blocked state under high-density tracking. The main reason for the current train's long stay at the platform is that the route ahead is occupied or the safety interval is insufficient, forcing the signal system to force the current train to wait. Therefore, the signal interference probability can be expressed by the following formula: Where W represents the signal interference probability; exp() represents an exponential function with the natural constant as the base; This represents the preset signal sensitivity coefficient; T represents the time difference.

[0060] To accurately separate net congestion time, as an example, the preset basic operation time is obtained; the difference between the platform dwell time and the preset basic operation time is non-negatively processed to obtain the total additional dwell time. If the difference between the platform dwell time and the preset basic operation time is non-positive, the total additional dwell time is set to zero; the difference between the positive integer 1 and the signal interference probability is calculated as a complementary weight; the product of the total additional dwell time and the complementary weight is calculated as the net congestion time, which represents the actual passenger flow congestion intensity.

[0061] The preset basic operation time represents the shortest time required to complete a standard platform operation under ideal empty and interference-free conditions for a certain shift. It typically includes: door opening time, driver lookout and confirmation time, symbolic passenger boarding and alighting time, and door closing and locking inspection time.

[0062] It should be noted that the specific value of the preset basic operation time can be determined based on the statistical analysis of historical data of the target station, and this embodiment does not impose a specific limitation. The specific method includes filtering out all train stop records of the target station during off-peak hours (when passenger flow is sparse and the impact of congestion can be ignored) and when the train tracking interval is large (e.g., more than 5 minutes) from historical ATS data; calculating the arithmetic mean of the platform dwell time of these records, which is the basic operation time of the train type marked as frequently stopping at the target station.

[0063] For example, for modern subway trains equipped with spacious double doors and efficient operation, the typical time for basic operations is about 25 to 35 seconds; for trains with narrower doors or more conservative operating procedures, the typical time for basic operations may be 35 to 45 seconds.

[0064] It should be noted that if the difference between the platform dwell time and the preset basic operation time is not positive, it means that the current shift's platform dwell time has not exceeded the basic operation requirements, and there may have been a skip, missed stop, or high-speed passage. In this case, setting the total extra dwell time to zero is a protective logic that conforms to the physical meaning and ensures the robustness of subsequent calculations.

[0065] The complementary weight quantifies the proportion of total extra stop time beyond basic operations that should be attributed to actual passenger congestion. Specifically, a larger complementary weight for a particular train (closer to 1) indicates a signal interference probability closer to 0. This means that the train tracking interval for that train is more relaxed, and there is less possibility of signal delays. Therefore, it reflects that the total extra stop time for that train is closer to 100% caused by passenger congestion on the platform, slow passenger boarding and alighting, and repeated door opening and closing.

[0066] Net congestion time is a quantitative indicator that measures the actual passenger flow pressure on the platform, after eliminating the influence of all non-passenger flow factors (basic fixed operation time, signal system waiting time). A larger net congestion time for a particular train indicates that the train likely experienced prolonged and intense real passenger flow congestion on the platform. This usually means a larger number of people waiting on the platform, greater difficulty for passengers to board, or more severe overlap between alighting and boarding passenger flows.

[0067] S104: Determine the upstream capacity impact coefficient based on the upstream delay duration; determine the passenger completion rate based on the net congestion time; and determine the available power value in conjunction with the preset train design capacity; and determine the actual departure passenger volume of the current train based on the total number of people waiting on the platform and the available power value.

[0068] It is important to understand that the remaining space (capacity) available for passengers in a train's carriages when it arrives at the destination station is not constant, but is strongly influenced by the passenger load at all upstream stations. Therefore, in order to accurately estimate how many passengers a train can still carry in the current trip, it is understandable that if a train is delayed at an upstream station, it is usually a direct signal that it has already carried a large number of passengers and the carriages are close to full capacity. Thus, the observable information of the duration of upstream delays can be converted into an estimate of the proportion of remaining capacity when the train arrives at the current trip. This estimate is the upstream capacity influence coefficient.

[0069] In this embodiment, a preset minimum guarantee coefficient and a preset attenuation coefficient are obtained; an exponential function value with the natural constant as the base and the negative of the product of the upstream delay duration and the attenuation coefficient as the exponent is calculated as the second exponential value; the difference between the positive integer 1 and the minimum guarantee coefficient is multiplied by the second exponential value, and the product result is added to the minimum guarantee coefficient to obtain the upstream capacity impact coefficient.

[0070] The upstream capacity impact coefficient quantifies the proportion of the remaining capacity available for passengers at the target station in the carriages of the current train when it arrives at the target station, relative to the total designed capacity of the train.

[0071] It should be noted that if the upstream delay time exceeds the preset timeout threshold (e.g., 15 minutes), the upstream capacity impact coefficient will be directly taken as the preset minimum guarantee coefficient, and no further index calculation will be performed.

[0072] Among them, capacity can be characterized by the number of passengers it can carry.

[0073] Since a larger upstream capacity influence coefficient for a particular train service (closer to 1) indicates a shorter delay time upstream, meaning a shorter stop time for the previous train, it reflects a smoother operation of the previous train at the preceding station, resulting in more spacious carriages and lower passenger density. Therefore, a larger portion of the designed capacity can be used to transport passengers at this station. Thus, the upstream capacity influence coefficient can be expressed by the following formula: Where P represents the upstream transportation capacity impact coefficient; This represents the preset minimum protection coefficient; r represents the preset attenuation coefficient. The value represents the delay time of the upstream; exp() represents an exponential function with the natural constant as the base.

[0074] It's important to understand that the longer the upstream delay of a particular train / bus is, the longer the preceding train / bus was delayed. The larger the absolute value of a negative number, the closer the second exponent value will be to 0. This will eventually cause the upstream capacity impact coefficient to approach the minimum guarantee coefficient, indicating that the capacity is extremely strained.

[0075] The preset minimum guarantee coefficient represents the lower limit of transport capacity guarantee under extreme congestion conditions. It reflects that even if a train is delayed for a long time due to severe congestion (at which point the carriages are theoretically fully loaded), the trains related to that train still have a minimum additional passenger capacity because passengers can actually "squeeze in".

[0076] It should be noted that the specific value of the preset minimum guarantee coefficient can be determined based on engineering experience and historical data statistical analysis, and this embodiment does not impose specific limitations. The specific analysis process may include retrieving historical data (such as historical train operation data and historical ticket sales and inspection data), filtering out those delayed train records known to have experienced serious delays (such as upstream delays exceeding 5 minutes); for these delayed train records, calculating the ratio of (actual passenger capacity ÷ train design capacity) at the target station, this ratio reflects the proportion of passengers who can actually squeeze onto the train even when the carriage is theoretically full; performing statistical analysis on these calculated ratios (such as taking the minimum value or lower quartile), and setting it as the preset minimum guarantee coefficient.

[0077] For example, if it is found that even under the most crowded conditions, the ratio of actual passenger capacity to the train's design capacity is rarely lower than 0.15, then the preset minimum guarantee factor can be set to 0.15 to ensure that the train's minimum passenger capacity under the worst conditions is not underestimated.

[0078] The preset attenuation coefficient controls the rate at which the upstream capacity impact coefficient decreases with increasing delay time. A larger preset attenuation coefficient indicates a stronger and faster erosion effect of the delay on the remaining capacity of the current trip (quantified by the upstream capacity impact coefficient); a smaller preset attenuation coefficient indicates a relatively mild impact of the delay.

[0079] It should be noted that the specific value of the preset attenuation coefficient can be determined through nonlinear regression analysis of historical data, and this embodiment does not impose specific limitations. The specific analysis process may include collecting historical train sample data. Each sample should include the upstream delay time and the proportion of actual remaining capacity when the train arrives at the target station, i.e., (train design capacity - actual number of passengers in the carriage when the train arrives at this station) ÷ train design capacity. The actual number of passengers can be indirectly obtained through onboard weighing systems, high-precision video analysis, or cross-sectional survey data, which are common technical means. Finally, a nonlinear regression algorithm is used to fit the above historical train sample data into the calculation formula of the upstream capacity influence coefficient. In this fitting process, the preset minimum guarantee coefficient can be used as a preset constant (determined by the above method). The fitting process will directly obtain the optimal attenuation coefficient value.

[0080] For example, on a line with a low degree of unevenness in passenger flow and a slow erosion of the target station's capacity by onboard passengers, the value might be 0.07; on a line with extremely high passenger flow pressure during peak hours and where trains are quickly filled, the value might be 0.15.

[0081] It is important to understand that since the net congestion time of a train (for a particular train) at the platform reflects the extra time passengers spend overcoming crowds and struggling to board the train, the length of the net congestion time is directly related to the extent to which the train utilizes its remaining capacity. In order to quantify "how much available space the train eventually fills during this net congestion time", the physical time can be mapped to an estimate of the capacity utilization. This estimate is the passenger completion rate.

[0082] In this embodiment, a preset saturation growth coefficient and a preset basic completion degree are obtained; an exponential function value with the natural constant as the base and the negative number of the product of net congestion time and saturation growth coefficient as the exponent is calculated as the third exponential value; the difference between the positive integer 1 and the preset basic completion degree is multiplied by the third exponential value, and the product result is added to the preset basic completion degree to obtain the passenger load completion degree.

[0083] Net congestion time refers to the increase in train dwell time caused solely by friction, crowding, delays, and reduced operational efficiency during passenger boarding and alighting. A higher net congestion time for a particular train indicates that it experienced more intense and prolonged real-world passenger congestion at the platform, reflecting a larger number of waiting passengers and more intense competition for boarding.

[0084] It's important to understand that the longer the net congestion time for a particular train, the more time passengers on the platform have to repeatedly try, slowly squeeze in, and move around in the crowded carriages. During this process, the train can create more boarding opportunities for as many passengers as possible by opening and closing doors multiple times. Therefore, the longer the effort, i.e., the longer the net congestion time, the more every possible space inside the train will be filled. In other words, the longer the time, the more sufficient it is to complete the near-limit loading, and the closer the utilization rate of the remaining capacity is to 100%, i.e., the closer the passenger load completion rate is to 1.

[0085] The preset saturation growth coefficient is a number greater than 0, which controls the rate at which passenger completion approaches saturation as net congestion time increases.

[0086] The preset basic completion rate indicates the passenger capacity that the train can achieve within the time limit of basic operations.

[0087] It should be noted that the specific value of the preset basic completion rate can be determined based on engineering experience and historical data statistical analysis, and this embodiment does not impose a specific limitation. For example, a basic completion rate of 0.8 means that the net congestion time is only used to contribute the remaining (0.2) passenger increase.

[0088] It should be noted that the specific value of the preset saturation growth coefficient can be determined based on historical data through nonlinear regression calibration, and this embodiment does not impose specific limitations. The specific determination process includes the following steps: Step 1: Collect historical data of the target station and filter out multiple train samples that clearly experienced passenger congestion (e.g., the stop time is significantly longer than the basic operation time, and the probability of signal interference is low). For each sample, there is a net congestion time of the historical train (which can be calculated through the relevant explanation in S103 above) and the actual utilization ratio of the remaining capacity of the historical train. The formula for calculating the actual utilization ratio of the remaining capacity of the train can be equal to the estimated value of the actual departure passenger volume of the historical train ÷ the train's designed capacity ÷ the upstream capacity influence coefficient of the train. Step 2: Fit the above dataset {net congestion time of historical trains, actual utilization ratio of the remaining capacity of historical trains}. Step 3: Use the optimal saturation growth coefficient obtained from the fitting as the preset saturation growth coefficient.

[0089] It should be noted that the actual passenger volume of a historical train departure can be determined by data from cross-sectional surveys, onboard weighing, or high-precision video analysis. Common technical methods will not be elaborated here. The upstream capacity influence coefficient of a train can be obtained by analyzing historical data, for example, by calculating it using the relevant explanations for determining the upstream capacity influence coefficient mentioned above.

[0090] It is understandable that, since the actual utilization rate of the remaining capacity by the train is approximately equal to the actual passenger load of the historical trains, the calculation model involved in the fitting process is similar to the passenger load calculation process mentioned above. The only difference is in the input parameters, namely {the net congestion time of the historical trains and the actual utilization rate of the remaining capacity by the historical trains}.

[0091] In this embodiment, the product of the train's designed passenger capacity, the upstream capacity influence coefficient, and the passenger load completion rate is calculated as the available capacity value. The available capacity value is compared with the total number of people waiting on the platform to obtain a comparison result. If the comparison result indicates that the total number of people waiting on the platform is not greater than the available capacity value, the total number of people waiting on the platform is taken as the actual departure passenger volume. If the comparison result indicates that the total number of people waiting on the platform is greater than the available capacity value, the available capacity value is taken as the actual departure passenger volume.

[0092] Available capacity refers to the maximum number of passengers that a train can theoretically transport at a target station when the current train stops there, taking into account the train's inherent design (i.e., the train's design capacity), the initial full load of the carriages (i.e., the upstream capacity influence coefficient), and the operational intensity of the target station (i.e., the passenger load completion rate). A higher available capacity value indicates a more sufficient capacity supply potential at the target station.

[0093] It's important to understand that available capacity represents the upper limit of the supply side, determining the maximum number of passengers a train can carry. Conversely, the total number of passengers waiting on the platform represents the total demand side, determining the total number of people waiting for a train. A train cannot carry more people than its supply-side upper limit, nor can it carry more people than the actual number on the platform. Therefore, the available capacity must be compared with the total number of passengers waiting on the platform. If demand (the total number of passengers waiting on the platform) is less than supply (available capacity), it means the train is waiting for passengers, and there is surplus train capacity; the actual passenger capacity equals the total number of people on the demand side. If demand exceeds supply, it means passengers are waiting for the train, and the train capacity is insufficient; the actual passenger capacity can only reach the supply-side limit, i.e., the available capacity.

[0094] S105: Based on the total number of people waiting on the platform and the actual number of passengers departing, determine the number of people remaining on the platform after the current train departs; use the number of people remaining on the platform as the initial number of people remaining on the platform before the start of the next train, and execute this cycle to achieve continuous prediction of passenger flow.

[0095] In this embodiment, the number of passengers remaining on the platform after the current train departs is obtained by subtracting the actual number of passengers departing from the total number of passengers waiting on the platform.

[0096] Cyclic execution refers to automatically running the complete calculation process of S101-S105 in this invention repeatedly, with the departure event of each train as the trigger point and rhythm, and the number of people stranded on the platform calculated this time as the known starting point.

[0097] It's important to understand that after obtaining the precise number of passengers remaining on the platform after the current train departs, it's also crucial to predict the pressure on the upcoming train to determine if early intervention is necessary. Therefore, it's possible to forecast the total passenger load when the next train arrives. This forecasted total load, i.e., the predicted total number of waiting passengers, consists of two parts: the known current backlog (number of passengers remaining on the platform) plus the predicted additional waiting passengers (the number of passengers who will enter the station before the next train arrives, which can be estimated based on historical patterns). This yields the predicted total number of waiting passengers, allowing for an assessment of supply and demand, determining whether the capacity of the next train is sufficient, and providing crucial decision-making basis for implementing proactive scheduling measures such as passenger flow control and adding extra trains.

[0098] In this embodiment, historical ticket sales and inspection data corresponding to the planned arrival time of the next train are obtained within the same historical period; based on the entry records in the historical ticket sales and inspection data, the average entry rate of passengers is calculated; the planned departure interval between the next train and the current train is obtained; the product of the average entry rate and the planned departure interval is calculated as the predicted number of new waiting passengers; the number of passengers remaining on the platform after the current train departs is added to the predicted number of new waiting passengers to obtain the predicted total number of waiting passengers; if the predicted total number of waiting passengers exceeds the train's design capacity, a capacity shortage warning is generated and output.

[0099] For explanations of the historical period, please refer to the relevant explanations in step 102; further details will not be repeated here.

[0100] For example, this can be calculated by analyzing historical data from the same period of the automatic fare collection system. Specifically, for the planned arrival time of the next train, extract all passenger records entering the station within the same or similar time periods (e.g., 5-10 minutes before or after the planned arrival time) from historical data over several past working days (e.g., 5-10 days). Calculate the total number of passengers entering the station within the same or similar time periods, and then divide it by the total duration of that time period. The resulting quotient is the historical average entry rate. This average entry rate represents the typical average intensity of passenger arrivals at the station within that time period.

[0101] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0102] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.

Claims

1. A method for predicting passenger flow in rail transit stations based on big data analysis, characterized in that, The method includes: Obtain train operation data and ticket sales and inspection data for the target station; Based on the arrival and departure times and planned arrival times in the train operation data, calculate the platform dwell time, train tracking interval, and upstream delay time for the current train service; based on the preset station hall passage time, perform time shift processing on the ticket sales and inspection data to determine the number of new waiting passengers for the current train and obtain the initial number of passengers waiting on the platform before the start of the current train; aggregate the number of new waiting passengers and the initial number of passengers waiting on the platform to determine the total number of passengers waiting on the platform. The signal interference probability is determined based on the difference between the train tracking interval and the preset safety interval threshold; the net congestion time is separated from the platform dwell time based on the signal interference probability. The upstream capacity impact coefficient is determined based on the upstream delay duration; the passenger load completion rate is determined based on the net congestion time; and the available power value is determined in conjunction with the preset train design capacity; the actual departure passenger volume of the current train is determined based on the total number of people waiting on the platform and the available power value. Based on the total number of people waiting on the platform and the actual number of passengers departing, determine the number of people remaining on the platform after the current train departs; use the number of people remaining on the platform as the initial number of people remaining on the platform before the start of the next train, and repeat this process to achieve continuous prediction of passenger flow.

2. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, The calculation of platform dwell time, train tracking interval, and upstream delay time for the current train service based on arrival and departure times and planned arrival times from train operation data includes: Extract the actual arrival time, actual departure time, and planned arrival time of the current train from the train operation data; The time difference between the actual departure time and the actual arrival time is taken as the platform dwell time. The time difference between the actual departure time of the current train and the actual departure time of the previous train is used as the train tracking interval. The difference between the actual arrival time and the planned arrival time is non-negative to obtain the upstream delay duration. If the difference between the actual arrival time and the planned arrival time is non-positive, the upstream delay duration is set to zero.

3. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, The process for determining the number of newly added waiting passengers includes: Get the actual departure time of the current train and the train before it; Subtract the preset station hall passage time from the departure time of the previous train to obtain the start time; subtract the preset station hall passage time from the departure time of the current train to obtain the end time; construct a time interval from the start time to the end time, and use the time interval as the effective entry time window; wherein, the time interval is a left-closed and right-open interval. From the ticket sales and inspection data, the number of passenger records whose entry timestamps fall within the valid entry time window is counted, and the number of passenger records is used as the number of newly added waiting passengers.

4. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 3, characterized in that, The process of determining the total number of people waiting on the platform includes: If the current shift is the first service shift after the start of the service, the initial number of people waiting on the platform will be set to zero. If the current shift is not the first service shift, read the number of people remaining on the platform after the previous shift departed, and use this as the initial number of people remaining on the platform for the current shift. If the number of passengers remaining on the platform after the departure of the previous train is not retrieved, starting from the initial state estimate, the net congestion time, actual departure passenger volume, and number of passengers remaining on the platform are iteratively calculated for at least one historical train immediately preceding the current train, using the train operation data and ticket sales data of the target station in the same historical period. The number of passengers remaining on the platform obtained by the iterative calculation is used as the initial number of passengers remaining on the platform required for the current train. Add the newly added number of people waiting on the platform to the initial number of people waiting on the platform to obtain the total number of people waiting on the platform.

5. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, The signal interference probability determination process includes: Obtain the preset safety interval threshold, preset system response delay, and preset signal sensitivity coefficient; The time difference is obtained by subtracting the train tracking interval from the sum of the safety interval threshold and the system response delay. The first exponential value is calculated as the exponential function with the natural constant as the base and the negative of the product of the time difference and the signal sensitivity coefficient as the exponent. The reciprocal of the sum of the positive integer 1 and the first exponent value is taken as the signal interference probability.

6. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, The method of separating net congestion time from platform dwell time based on signal interference probability includes: Get the preset basic task time; The difference between the platform dwell time and the preset basic operation time is made non-negative to obtain the total extra dwell time. If the difference between the platform dwell time and the preset basic operation time is non-positive, the total extra dwell time is set to zero. Calculate the difference between the positive integer 1 and the signal interference probability, and use it as a complementary weight; The product of the total additional stop time and the complementary weight is calculated as the net congestion time, which represents the actual intensity of passenger congestion.

7. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, The determination of the upstream capacity impact coefficient based on the upstream delay duration includes: Obtain the preset minimum protection coefficient and preset attenuation coefficient; The second exponential value is calculated as an exponential function with the natural constant as the base and the negative of the product of the upstream delay duration and the attenuation coefficient as the exponent. Multiply the difference between the positive integer 1 and the minimum guarantee coefficient by the second exponent value, and add the product to the minimum guarantee coefficient to obtain the upstream capacity impact coefficient.

8. The method for predicting passenger flow in rail transit stations based on big data analysis according to claim 7, characterized in that, The determination of passenger completion rate based on net congestion time includes: Obtain the preset saturation growth coefficient and the preset basic completion rate; The third exponential value is calculated as the exponential function with the natural constant as the base and the negative of the product of net congestion time and saturation growth coefficient as the exponent. Multiply the difference between the positive integer 1 and the preset basic completion rate by the third exponent value, and add the product to the preset basic completion rate to obtain the passenger completion rate.

9. A method for predicting passenger flow in rail transit stations based on big data analysis according to claim 8, characterized in that, The process for determining the actual departing passenger volume of the current train includes: The product of the train's design capacity, the upstream capacity impact coefficient, and the passenger load completion rate is calculated as the available capacity value. The available capacity is compared with the total number of people waiting on the platform to obtain the comparison results; If the comparison result indicates that the total number of people waiting on the platform is not greater than the available capacity, the total number of people waiting on the platform will be used as the actual departure passenger volume. If the comparison result indicates that the total number of people waiting on the platform is greater than the available capacity, the available capacity will be used as the actual number of passengers departing the station.

10. A method for predicting passenger flow in rail transit stations based on big data analysis according to claim 1, characterized in that, After determining the number of passengers remaining on the platform after the current train departs, based on the total number of passengers waiting on the platform and the actual number of passengers departing, the method further includes: Obtain historical ticket sales and inspection data for the same period corresponding to the planned arrival time of the next train; based on the entry records in the historical ticket sales and inspection data, calculate the average entry rate of passengers; Obtain the planned departure interval between the next bus and the current bus; Calculate the product of the average arrival rate and the planned departure interval as the predicted additional waiting number; add the number of people remaining on the platform after the current train departs to the predicted additional waiting number to obtain the predicted total waiting number. If the predicted total number of people waiting for the train exceeds the train's designed capacity, a warning message about insufficient capacity will be generated and output.