Railway ticket price estimation method and system based on multi-dimensional feature fusion
By using multi-dimensional feature fusion and hybrid machine learning models, the shortcomings of existing railway ticket price decision-making technologies have been addressed, enabling accurate, flexible, and efficient prediction of railway ticket prices, thus adapting to market changes and differentiated user needs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUZHOU CHUANGLUTIANXIA INFORMATION TECH CO LTD
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-12
AI Technical Summary
Existing railway fare decision-making technology cannot respond to real-time market supply and demand changes, does not make full use of data, cannot accurately match the actual operation situation, and is difficult to process large-scale data, lacks adaptability, and cannot take into account the differentiated user needs.
Input features are constructed through multi-dimensional feature fusion processing, including spatiotemporal coupling features and a three-dimensional ticket price feature space. A hybrid machine learning model is then used to generate a benchmark ticket price and a dynamic discount rate. The model parameters are optimized to achieve multi-factor linkage prediction of ticket prices.
It improves the accuracy and flexibility of fare forecasting, enabling it to respond to changes in market supply and demand, adapt to diversified travel needs, reduce model deployment costs, improve operational efficiency, and support differentiated user fare strategies.
Smart Images

Figure CN122199088A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of intelligent transportation technology, specifically to a railway fare prediction method and system based on multi-dimensional feature fusion. Background Technology
[0002] Currently, the high-speed railway network is continuously improving, and the coverage of the conventional train operation network is constantly expanding, leading to a simultaneous increase in passenger volume and operational complexity. With the dynamic fluctuations in market supply and demand, the diversification and upgrading of passenger travel needs, and the explosive growth of data during railway operations, traditional railway fare-related decision-making models are no longer adequate to meet the industry's development demands. Meanwhile, the deep application of technologies such as machine learning and big data processing in the transportation sector provides technical support for more accurate and efficient railway fare decision-making. There is an urgent need within the industry for fare decision-making technologies that integrate multi-dimensional features, adapt to large-scale data processing, and respond to dynamic market changes. It is imperative to break through the limitations of traditional technologies and construct an intelligent fare-related technology system that meets the operational needs of railways in the new era.
[0003] Existing railway fare decision-making technologies primarily rely on the principle of decreasing mileage. However, limited by fixed mileage-decreasing rules, these technologies suffer from several drawbacks: First, they cannot respond to real-time market supply and demand changes, making it difficult to adapt to diverse travel needs and operational scenarios. Second, data utilization is insufficient, failing to integrate key dynamic features such as line speed and geographical coordinates, leading to biased fare predictions and an inability to accurately match actual railway operations. Third, traditional data storage methods are limited by row count limits, making it difficult to handle large-scale fare data exceeding one million records, and existing models are prone to size expansion, resulting in high deployment costs and low operational efficiency. Fourth, they lack adaptability to various ticket types, lacking dynamic discount strategies for special ticket types such as student and child tickets, failing to balance differentiated user needs with railway operational efficiency. Summary of the Invention
[0004] This application provides a railway fare prediction method, system, device, and medium based on multi-dimensional feature fusion, which is used to solve the problems of existing railway fare prediction methods.
[0005] Firstly, this application provides a railway ticket price prediction method based on multi-dimensional feature fusion, the method comprising: Acquire multi-source heterogeneous data that affect ticket prices, and construct input features for ticket price prediction through multi-dimensional feature fusion processing. The input features include at least spatiotemporal coupling features. The input features are fed into a hybrid machine learning model, which includes at least a first sub-model for generating a baseline ticket price and a second sub-model for predicting dynamic discount rates. By combining the base fare with the dynamic discount rate, the estimated fare is generated through preset fare estimation rules.
[0006] By adopting the above technical solutions, railway ticket price forecasting with multiple factors can be achieved, effectively integrating diverse information affecting ticket prices, improving the overall accuracy of ticket price forecasting, and enabling the forecasting results to respond to dynamic changes in market supply and demand.
[0007] In one specific feasible implementation, the method for constructing the spatiotemporal coupling feature includes: The system acquires the operating mileage and corresponding average operating speed of the target train within the predetermined operating range from multi-source heterogeneous data, and generates spatiotemporal coupling features to characterize the comprehensive spatiotemporal cost of the trip.
[0008] By adopting the above technical solutions, key feature information that characterizes the comprehensive time and space cost of the trip can be supplemented, avoiding the one-sidedness of the impact of single-dimensional features on ticket prices and reducing the deviation in ticket price prediction caused by missing features.
[0009] In a specific feasible implementation, the multi-dimensional feature fusion processing method for multi-source heterogeneous data includes: Construct a three-dimensional ticket price feature space, which includes at least the service attribute dimension, spatial dimension, and time dimension; Among them, the service attribute dimension includes train type code, seat class code, and sub-seat location code; the spatial dimension and the temporal dimension correspond to spatiotemporal coupling characteristics; Multi-source heterogeneous data is mapped to a three-dimensional ticket price feature space, and specific feature values are extracted from each dimension and combined to generate input features.
[0010] By adopting the above technical solutions, the system integrates the factors influencing ticket prices in terms of service attributes, space, and time, making the input features more comprehensive and providing more complete feature support for subsequent ticket price prediction models.
[0011] In one specific feasible implementation, the method for generating input features includes: Unify feature values and spatiotemporally coupled features, and generate enhanced input features based on preset feature interaction rules; The enhanced input features are fed into the hybrid machine learning model.
[0012] By adopting the above technical solutions, the scale differences between features of different dimensions are eliminated, the potential correlations between features are explored, the expressive power of input features is improved, and a high-quality data foundation is provided for the training and prediction of hybrid machine learning models.
[0013] In a specific feasible implementation, the construction and optimization methods of the first sub-model include: By using a pre-defined hyperparameter search strategy, at least one key structural parameter of the first sub-model is determined and optimized. By combining historical benchmark ticket price data with the input features of the corresponding time period, a first mapping graph between benchmark ticket price and input features is constructed through the first sub-model.
[0014] By adopting the above technical solution, the structural parameters of the first sub-model are optimized, enabling the model to accurately capture the mapping relationship between the benchmark ticket price and the input features, thereby improving the accuracy and stability of the benchmark ticket price generation.
[0015] In a specific feasible implementation, the construction and training methods of the second sub-model include: Generate external time-series feature factors, which include at least periodic holiday identifiers and seasonal fluctuation identifiers; By combining historical discount rate data with external time-series characteristic factors for the corresponding time periods, a second mapping spectrum between dynamic discount rates and external time-series characteristic factors is constructed through a second sub-model.
[0016] By adopting the above technical solutions, the time-series influencing factors such as periodic holidays and seasonal fluctuations are fully incorporated, enabling dynamic discount rate prediction to adapt to the characteristics of different time periods and improving the timeliness and adaptability of discount rate prediction.
[0017] In a specific feasible implementation, the fare estimation rules include: Based on the dynamic discount rate, a time-dependent dynamic weight coefficient is generated; Based on the dynamic weighting coefficients, a fusion function is constructed to integrate the base fare and the dynamic discount rate; Substitute the base fare and dynamic discount rate into the fusion function to output the estimated fare.
[0018] By adopting the above technical solutions, the benchmark fare and dynamic discount rate can be reasonably integrated, so that the estimated fare can match the travel demand characteristics of the time period, thereby improving the rationality and practical application value of the fare estimation results.
[0019] A second aspect of this application provides a railway fare prediction system based on multi-dimensional feature fusion, the system comprising: The data acquisition and fusion module is used to acquire multi-source heterogeneous data and verify and clean abnormal mileage data; The feature engineering module is used to construct spatiotemporally coupled features and a three-dimensional ticket price feature space, generating enhanced input features; The hybrid model training module is used to train the first sub-model and the second sub-model. The fare estimation module is used to output the estimated fare by combining the base fare and the dynamic discount rate through a fusion function.
[0020] A third aspect of this application provides an electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute the above-described method steps.
[0021] A fourth aspect of this application provides a computer storage medium storing a plurality of instructions adapted for loading by a processor and executing the method steps described above. Attached Figure Description
[0022] Figure 1 This is a flowchart illustrating a railway ticket price prediction method based on multi-dimensional feature fusion provided in an embodiment of this application. Detailed Implementation
[0023] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0024] In the description of the embodiments of this application, the words "for example" or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design that is described as "for example" or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design options. Rather, the use of the words "for example" or "for instance" is intended to present the relevant concepts in a specific manner.
[0025] In the description of the embodiments of this application, the term "multiple" means two or more. For example, multiple systems means two or more systems, and multiple screen terminals means two or more screen terminals. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined with "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.
[0026] Please refer to Figure 1 This paper presents a flowchart and timing diagram of a railway fare prediction method based on multi-dimensional feature fusion. This method can be implemented using a computer program, a microcontroller, or run on a railway fare prediction system based on multi-dimensional feature fusion. The computer program can be integrated into a computer device or run as a standalone application. Specifically, the method includes steps S100 to S300, as follows: Based on the above embodiments, as another optional embodiment, the method includes: S100. Obtain multi-source heterogeneous data that affect ticket prices, and construct input features for ticket price prediction through multi-dimensional feature fusion processing. The input features include at least spatiotemporal coupling features. In this application embodiment, multi-source heterogeneous data refers to various types of data related to railway ticket prices that come from different channels and have different data types; multi-dimensional feature fusion refers to the process of integrating and processing features that affect ticket prices from different dimensions so that the features can more comprehensively reflect the factors affecting ticket prices; spatiotemporal coupling features refer to features that combine operating mileage and average operating speed to comprehensively characterize the spatial distance and time efficiency of the journey.
[0027] In some embodiments, station data is first synchronized from the existing database of railway operations. The station data includes departure station information and arrival station information. Then, real-time operation data is obtained with legal authorization. The real-time operation data includes train number information, seating information, published fare information, and GPS mileage data. GPS mileage data is calculated using the longitude and latitude of the departure station, the longitude and latitude of the arrival station.
[0028] The acquired multi-source heterogeneous data is verified and cleaned by comparing the mileage data stored in the existing database with the mileage data calculated by GPS, setting a mileage data threshold to remove abnormal mileage data, and then storing the verified and cleaned data in a specified format to support large-scale data processing.
[0029] After the data processing is completed, the input features for fare prediction are constructed through multi-dimensional feature fusion. These input features include at least spatiotemporal coupling features, which can simultaneously reflect the spatial and temporal attributes of the trip, providing more comprehensive feature support for fare prediction.
[0030] In some embodiments, the storage format is specified as Parquet columnar storage format, which can support the processing of tens of millions of data points; the mileage data threshold is set to 1.5 times, and when the ratio between the mileage data stored in the existing database and the mileage data calculated by GPS exceeds 1.5 times, the mileage data is determined to be abnormal data and is removed.
[0031] S101. Based on the above embodiments, as another optional embodiment, the method for constructing spatiotemporal coupling features includes: The system acquires the operating mileage and corresponding average operating speed of the target train within the predetermined operating range from multi-source heterogeneous data, and generates spatiotemporal coupling features to characterize the comprehensive spatiotemporal cost of the trip.
[0032] In this embodiment of the application, the predetermined operating section refers to the specific route section covered by the target train from the departure station to the arrival station; the operating mileage refers to the actual distance traveled by the target train within the predetermined operating section; the average operating speed refers to the ratio of the total mileage traveled by the target train within the predetermined operating section to the total travel time; and the comprehensive time and space cost of the trip refers to a comprehensive indicator that can comprehensively reflect the spatial distance cost and time efficiency cost of the target train within the predetermined operating section.
[0033] In some embodiments, the predetermined operating section information corresponding to the target train is first extracted from multi-source heterogeneous data, and the operating mileage within the predetermined operating section is extracted. The operating mileage is determined by verifying and cleaning the mileage data stored in the existing database and the mileage data calculated by GPS.
[0034] At the same time, the average operating speed of the target train within the predetermined operating range is extracted. The average operating speed is calculated by the operating mileage and total travel time of the target train within the predetermined operating range. The total travel time is the actual time taken for the target train to travel from the departure station to the arrival station.
[0035] The extracted operating mileage and average operating speed are combined to generate a spatiotemporal coupling feature that characterizes the comprehensive spatiotemporal cost of the trip. This combined processing enables the spatiotemporal coupling feature to include both spatial distance and time efficiency, solving the problem that a single feature cannot fully reflect the comprehensive cost of the trip and improving the accuracy of subsequent ticket price prediction.
[0036] S102. Based on the above embodiments, as another optional embodiment, the multi-dimensional feature fusion processing method for multi-source heterogeneous data includes: A three-dimensional fare feature space is constructed, which includes at least service attribute dimension, spatial dimension, and temporal dimension. Among them, the service attribute dimension includes train type code, seat class code, and sub-seat location code; the spatial dimension and temporal dimension correspond to spatiotemporal coupling features; multi-source heterogeneous data are mapped to the three-dimensional fare feature space, and specific feature values are extracted from each dimension and combined to generate input features.
[0037] In this embodiment of the application, the three-dimensional ticket price feature space refers to a feature set system comprising three dimensions, which can systematically integrate different types of features affecting ticket prices. Among them, the train type code refers to the result of digitally identifying different types of trains, the seat class code refers to the result of digitally identifying different classes of seats, the sub-seat location code refers to the result of digitally identifying sub-seats in different positions within the same seat class, and the feature value refers to the specific numerical value or identification result of each dimension feature in a specific application scenario.
[0038] In some embodiments, a three-dimensional fare feature space is constructed, the train number type code is determined according to the train type corresponding to the train number, the seat class code is determined according to the seat comfort level and service standard, and the sub-seat position code is determined according to the specific position of the sub-seat in the seat unit.
[0039] After completing the construction of the three-dimensional ticket price feature space, the multi-source heterogeneous data that has been verified and cleaned is mapped into the three-dimensional ticket price feature space. The corresponding specific feature values are extracted from the service attribute dimension, spatial dimension and time dimension respectively. Then, the feature values of each dimension are combined to generate input features for ticket price prediction.
[0040] In some embodiments, the rules for train number type coding include 1 for ordinary trains and 2 for high-speed trains; the rules for seat class coding include 1 for hard seat, 2 for hard sleeper, and 3 for soft sleeper; and the rules for sub-seat position coding include 0.8 for upper berth, 0.9 for middle berth, and 1.0 for lower berth.
[0041] S103. Based on the above embodiments, as another optional embodiment, the method for generating input features includes: Unify feature values and spatiotemporally coupled features, and generate enhanced input features based on preset feature interaction rules; input the enhanced input features into the hybrid machine learning model.
[0042] In this embodiment of the application, feature value unification refers to the process of adjusting the feature values of different dimensions to the same data scale or distribution range so that each feature plays a balanced role in subsequent model training and prediction.
[0043] Feature interaction rules refer to pre-defined rules used to discover the relationships between different features; enhanced input features refer to input features whose expressive power and association mining capabilities are improved after feature value unification and feature interaction processing.
[0044] Hybrid machine learning models refer to machine learning models that contain multiple sub-models, and the sub-models work together to complete the task of fare prediction.
[0045] In some embodiments, when generating input features, the feature values extracted from each dimension of the three-dimensional ticket price feature space and the spatiotemporally coupled features are uniformly processed, and the data scale differences between different features are eliminated through standardization methods. Standardization processing can ensure that the values of each feature are of the same order of magnitude, avoiding the model's overemphasis on or neglect of some features due to differences in feature scale.
[0046] After the feature values are unified, the unified features are processed based on the preset feature interaction rules. The feature interaction rules can explore the non-linear relationships between different features, so that the features can more accurately reflect the complex factors affecting ticket prices.
[0047] Enhanced input features are generated through feature interaction processing. These enhanced input features not only retain the effective information of each individual feature but also add the correlation information between features. Finally, the enhanced input features are fed into a hybrid machine learning model to provide high-quality feature support for subsequent benchmark fare generation and dynamic discount rate prediction.
[0048] In some embodiments, feature value unification can be achieved using the Z-score standardization method, the calculation formula of which includes: X'=(X-μ) / σ.
[0049] Where X is the original feature value, μ is the mean of all values of the feature, σ is the standard deviation of all values of the feature, and X' is the standardized feature value.
[0050] In some embodiments, the preset feature interaction rules include the product interaction of train type code and seat class code, and the product interaction of operating mileage and average operating speed.
[0051] S200. Input the input features into a hybrid machine learning model, which includes at least a first sub-model for generating a benchmark fare and a second sub-model for predicting dynamic discount rates. In this embodiment of the application, the first sub-model refers to the model in the hybrid machine learning model specifically used to generate the benchmark ticket price; the benchmark ticket price refers to the benchmark ticket price determined based on the basic pricing logic without considering dynamic discount factors.
[0052] The second sub-model refers to the model in the hybrid machine learning model specifically designed to predict dynamic discount rates; dynamic discount rates refer to the discount percentage used to adjust the base ticket price as it changes with factors such as time, period, and holidays.
[0053] In some embodiments, enhanced input features are fed into a hybrid machine learning model. The first sub-model, upon receiving the enhanced input features, generates a base fare based on the correlation between the features and the base fare. The base fare is the fundamental value for fare estimation and reflects the fare level corresponding to the basic cost of the trip. The second sub-model also receives relevant feature data and, combined with time-series changes, predicts a dynamic discount rate. The dynamic discount rate reflects the fare adjustment range under different time periods and scenarios.
[0054] In some embodiments, the first sub-model and the second sub-model work independently yet collaboratively, providing core data support for the generation of the final estimated ticket price through their respective functional implementations. The architecture design of the hybrid machine learning model can give full play to the advantages of different sub-models, solve the technical problems of generating the benchmark ticket price and predicting the dynamic discount rate respectively, and improve the accuracy and flexibility of the overall ticket price prediction.
[0055] In some embodiments, the first sub-model may be a random forest regression model, and the second sub-model may be a Profit algorithm model.
[0056] S201. Based on the above embodiments, as another optional embodiment, the method for constructing and optimizing the first sub-model includes: By using a pre-defined hyperparameter search strategy, at least one key structural parameter of the first sub-model is determined and optimized; by combining historical benchmark ticket price data with the input features of the corresponding time period, the first sub-model is used to construct a first mapping graph between the benchmark ticket price and the input features.
[0057] In this embodiment of the application, the hyperparameter search strategy refers to a pre-defined method for screening and determining the optimal hyperparameters of the model; key structural parameters refer to the core parameters that affect the training effect and prediction performance of the first sub-model.
[0058] Historical benchmark ticket price data refers to the actual published benchmark ticket price records during railway operations over a past period; the first mapping map refers to the model mapping relationship that can reflect the correspondence between benchmark ticket prices and input features.
[0059] In some embodiments, when constructing and optimizing the first sub-model, a preset hyperparameter search strategy is determined. This strategy is used to search and filter multiple potential key structural parameters of the first sub-model to determine the key structural parameters that enable the first sub-model to achieve optimal performance. The hyperparameter search strategy can avoid blindly setting key structural parameters and improve the efficiency and effectiveness of model optimization.
[0060] After determining the key structural parameters, the first sub-model is optimized to better suit the task requirements of generating benchmark ticket prices. Historical benchmark ticket price data and corresponding time period input features are collected. The historical benchmark ticket price data must be complete and accurate, and the time dimension of the input features for the corresponding time period must be consistent with that of the historical benchmark ticket price data.
[0061] Historical benchmark ticket price data is used as the label for model training, and the input features of the corresponding time period are used as the independent variables for model training. Both are input into the optimized first sub-model for training. Through training, the first sub-model constructs a first mapping graph between the benchmark ticket price and the input features. This first mapping graph can accurately capture the influence of changes in input features on the benchmark ticket price, providing reliable model support for the subsequent generation of benchmark ticket prices.
[0062] Meanwhile, model compression technology is introduced during the training process of the first sub-model to reduce the model size and improve the model deployment efficiency and running speed. This technology is optimized to solve the problem of model size expansion caused by large-scale data training.
[0063] In some embodiments, the hyperparameter search strategy may employ a grid search strategy; the key structural parameters of the first sub-model include the maximum depth of the decision tree and the number of decision trees, wherein the maximum depth of the decision tree is set to 15 and the number of decision trees is set to 200.
[0064] In some embodiments, model compression technology can be achieved by setting the compression parameter compress=3, reducing the size of the first sub-model from 1GB to 81M; the historical benchmark ticket price data includes 100,000 benchmark ticket price records for a certain high-speed railway line in the past year, and the input features for the corresponding time period include feature data such as the operating mileage, average operating speed, and train type code corresponding to each benchmark ticket price record.
[0065] S202. Based on the above embodiments, as another optional embodiment, the method for constructing and training the second sub-model includes: Generate external time-series feature factors, which include at least periodic holiday identifiers and seasonal fluctuation identifiers; combine historical discount rate data with the corresponding external time-series feature factors, and construct a second mapping spectrum between dynamic discount rates and external time-series feature factors through a second sub-model.
[0066] In this embodiment of the application, external time-series characteristic factors refer to external time-series related characteristics that affect changes in dynamic discount rates. Among them, periodic holiday identifiers refer to identifier information used to mark different holiday periods, and seasonal fluctuation identifiers refer to identifier information used to mark different seasonal periods.
[0067] Historical discount rate data refers to the actual discount rate records implemented during railway operations over a past period. The second mapping map refers to the model mapping relationship that can reflect the correspondence between dynamic discount rates and external time-series characteristic factors.
[0068] In some embodiments, when constructing and training the second sub-model, external temporal feature factors are generated. The periodic holiday identifier is determined according to the Chinese statutory holiday schedule, with each holiday period assigned a unique identifier, including peak periods before and after the holidays; the seasonal fluctuation identifier is determined according to the seasonal divisions of the year, with separate identifiers for spring, summer, autumn, and winter, while also considering the characteristics of travel demand fluctuations in different seasons.
[0069] Collect historical discount rate data and corresponding external time-series characteristic factors. The historical discount rate data includes discount rate records for different time periods, holidays, and seasons. The external time-series characteristic factors for the corresponding time periods are consistent with the time dimension of the historical discount rate data.
[0070] Historical discount rate data is used as the label for model training, and external time-series feature factors for the corresponding time periods are used as the independent variables for model training. Both are input into the second sub-model for training. Through training, the second sub-model constructs a second mapping spectrum between the dynamic discount rate and the external time-series feature factors. The second mapping spectrum can accurately capture the influence of changes in external time-series feature factors on the dynamic discount rate, enabling the second sub-model to predict the dynamic discount rate for different time periods.
[0071] In some embodiments, the rules for identifying periodic holidays are as follows: Spring Festival holiday is identified as 1, National Day holiday as 2, Qingming Festival holiday as 3, Labor Day holiday as 4, Dragon Boat Festival holiday as 5, Mid-Autumn Festival holiday as 6, and non-holiday periods as 0; the rules for identifying seasonal fluctuations are as follows: spring is identified as 1, summer as 2, autumn as 3, and winter as 4; the historical discount rate data includes 200,000 discount rate records from different periods over the past two years, and the external time series feature factors for the corresponding periods include the periodic holiday identifier and seasonal fluctuation identifier corresponding to each record.
[0072] S300 combines the base fare with the dynamic discount rate and generates the estimated fare through preset fare estimation rules.
[0073] In this embodiment of the application, the fare estimation rule refers to the specific rule set in advance for calculating the estimated fare by integrating the benchmark fare and the dynamic discount rate; the estimated fare refers to the railway fare value output for reference after being processed by a hybrid machine learning model.
[0074] When generating the estimated ticket price, the base ticket price output by the first sub-model and the dynamic discount rate output by the second sub-model are obtained first. The base ticket price reflects the ticket price level corresponding to the basic cost of the trip, and the dynamic discount rate reflects the ticket price adjustment range for the current period.
[0075] The base fare and dynamic discount rate are integrated according to the preset fare estimation rules. The fare estimation rules fully consider the basic nature of the base fare and the adjustability of the dynamic discount rate, and can achieve a reasonable integration of the two.
[0076] By integrating fare prediction rules, the impact of the base fare and dynamic discount rate is fully combined to output the final predicted fare. The predicted fare reflects both the basic cost of the journey and the market supply and demand and temporal characteristics of the current period, providing an accurate reference for intelligent decision-making on railway fares.
[0077] Meanwhile, during the integration process, the dynamic discount rate is adjusted for special ticket types such as student tickets and children's tickets to ensure that the estimated ticket price complies with relevant policy regulations and improve the applicability of the estimated ticket price.
[0078] S301. Based on the above embodiments, as another optional embodiment, the fare estimation rule includes: Based on the dynamic discount rate, a time-dependent dynamic weighting coefficient is generated; based on the dynamic weighting coefficient, a fusion function is constructed to integrate the base fare and the dynamic discount rate; the base fare and the dynamic discount rate are substituted into the fusion function to output the estimated fare.
[0079] In some embodiments, a time-dependent dynamic weighting coefficient is generated based on the dynamic discount rate. The higher the dynamic discount rate, the smaller the price adjustment for the current time period, and the closer the corresponding dynamic weighting coefficient is to 1. The lower the dynamic discount rate, the larger the price adjustment for the current time period, and the corresponding dynamic weighting coefficient is adaptively adjusted according to the travel demand characteristics of the time period.
[0080] Based on the generated dynamic weight coefficients, a fusion function is constructed to integrate the base fare and the dynamic discount rate. This fusion function is used to reasonably allocate the weights of the base fare and the dynamic discount rate in the estimated fare calculation. The base fare output by the first sub-model and the dynamic discount rate output by the second sub-model are substituted into the constructed fusion function, and the estimated fare is calculated using this function.
[0081] In some embodiments, the expression of the fusion function includes: Y = a × P × D.
[0082] Where Y is the estimated ticket price, a is the time-dependent dynamic weighting coefficient, P is the base ticket price, and D is the dynamic discount rate.
[0083] In some embodiments, the pseudocode for the data acquisition process may include: def data_collection(): Station data = Synchronized from the old database (departure station, arrival station) Real-time data = crawling (train number, seat number, published ticket price, GPS mileage) Fusion data = Verification mileage ( Database mileage, GPS calculates mileage (longitude 1, latitude 1, longitude 2, latitude 2). Threshold = 1.5 times # Remove outliers ) Save as Parquet format() In some embodiments, the pseudocode for the model training steps may include: #Price Discount Model Training model = RandomForestRegressor( max_depth=15, n_estimators=200, compress=3 # Enable model compression ) model.fit(feature matrix=[ Train type, Operating mileage Average speed Seating codes, #e.g., hard seat=1, soft sleeper=2 Sub-seat code: Upper bunk -0.8, Lower bunk =1.0 [, tag = published ticket price] #Discount Model Training prophet = Prophet( holidays = Chinese holiday dataset changepoint_prior_scale=0.15 ) prophet.fit(historical discount rate time series data) In some embodiments, comparative tests were conducted on the impact of adopting this scheme on fare prediction accuracy for both conventional train lines (KTZ lines) and high-speed train lines (GDC lines). The test results are presented quantified through accuracy data. The comparison of fare prediction accuracy before and after is shown in Table 1.
[0084] Table 1 Comparison of Prediction Accuracy Before and After Model type KTZ Line GDC Line Overall accuracy Before optimization - 0.8979 - After optimization 0.9845 0.8981 0.94 The traditional scheme before optimization did not build a complete feature system and dedicated prediction model adapted to the KTZ line, which resulted in the inability to make effective fare predictions for the line. Therefore, the prediction accuracy of the KTZ line is presented as no data label. For the GDC line, the prediction accuracy of the traditional scheme before optimization is 0.8979.
[0085] By adopting the technical solution of this invention, the prediction accuracy is significantly improved by constructing a three-dimensional ticket price feature space through spatiotemporal coupling characteristics, fully integrating key influencing factors such as operating mileage, average speed, and seat type, and combining the parameter-optimized random forest regression model with the Prophet discount model that incorporates holiday factors, as well as model compression and data storage optimization schemes.
[0086] The prediction accuracy of the KTZ line reached 0.9845, achieving high-precision prediction; the prediction accuracy of the GDC line was slightly improved to 0.8981 on the original basis, maintaining the stability of the prediction results; the overall prediction accuracy of the two types of lines combined reached 0.94.
[0087] The test data above fully demonstrates that this invention, through innovative feature engineering, hybrid machine learning model construction, and engineering optimization, effectively solves the problems of insufficient data utilization and inadequate adaptability of existing technologies. It not only achieves high-precision prediction of both conventional and high-speed train lines, but also ensures the reliability of the overall prediction effect, providing accurate data support for intelligent decision-making on railway ticket prices.
[0088] Based on the above embodiments, as another optional embodiment, this application also provides a railway ticket price prediction system based on multi-dimensional feature fusion, the system comprising: The data acquisition and fusion module is used to acquire multi-source heterogeneous data and verify and clean abnormal mileage data; the feature engineering module is used to construct spatiotemporal coupling features and a three-dimensional fare feature space to generate enhanced input features; the hybrid model training module is used to train the first sub-model and the second sub-model; and the fare prediction module is used to output the predicted fare by combining the benchmark fare and the dynamic discount rate through a fusion function.
[0089] In some embodiments, the system also includes a special ticket type adaptation module for adjusting the dynamic discount rate for student tickets and child tickets in accordance with existing policies.
[0090] It should be noted that the system provided in the above embodiments is only illustrated by the division of the above functional modules. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the system and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0091] Based on the above embodiments, as another optional embodiment, the present application embodiment may further include a computer storage medium, which may store multiple instructions adapted for loading by a processor and executing a method of the above embodiments. For the specific execution process, please refer to the detailed description of the above embodiments, which will not be repeated here.
[0092] Based on the above embodiments, as another optional embodiment, this application embodiment may further include an electronic device. The electronic device may include: at least one processor, at least one communication bus, a user interface, at least one network interface, and a memory.
[0093] The communication bus is used to enable communication between these components.
[0094] The user interface may include a display screen and a camera. Optional user interfaces may also include standard wired interfaces and wireless interfaces.
[0095] The network interface may include standard wired interfaces and wireless interfaces (such as Wi-Fi interfaces).
[0096] The processor may include one or more processing cores. It connects to various parts of the server via various interfaces and lines, executing instructions, programs, code sets, or instruction sets stored in memory, and accessing data stored in memory to perform various server functions and process data. Optionally, the processor may be implemented using at least one of the following hardware forms: Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor may integrate one or more of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content displayed on the screen; and the modem handles wireless communication. It is understood that the modem may also be implemented as a separate chip without being integrated into the processor.
[0097] The memory may include random access memory (RAM) or read-only memory. Optionally, the memory may include a non-transitory computer-readable storage medium. The memory can be used to store instructions, programs, code, code sets, or instruction sets. The memory may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the above-described method embodiments, etc.; the data storage area may store data involved in the above-described method embodiments, etc. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor. As a computer storage medium, the memory may include an operating system, a network communication module, a user interface module, and an application program of one method.
[0098] In electronic devices, the user interface is primarily used to provide an input interface for users and to acquire user input data; while the processor can be used to call an application program stored in memory that represents a method. When executed by one or more processors, this causes the electronic device to perform one or more methods as described in the above embodiments. It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps can be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.
[0099] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0100] In the various embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some service interface; the indirect coupling or communication connection between apparatuses or units may be electrical or other forms.
[0101] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0102] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0103] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, portable hard drives, magnetic disks, or optical disks.
[0104] The above are merely exemplary embodiments of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. Other embodiments of this disclosure will readily conceive of those skilled in the art upon consideration of the specification and the disclosure of practical truths.
[0105] This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described in this disclosure. The specification and embodiments are to be considered exemplary only, and the scope and spirit of this disclosure are defined by the claims.
Claims
1. A railway ticket price prediction method based on multi-dimensional feature fusion, characterized in that, The method includes: Acquire multi-source heterogeneous data that affect ticket prices, and construct input features for ticket price prediction through multi-dimensional feature fusion processing. The input features include at least spatiotemporal coupling features. The input features are fed into a hybrid machine learning model, which includes at least a first sub-model for generating a benchmark ticket price and a second sub-model for predicting dynamic discount rates. By combining the base fare with the dynamic discount rate, an estimated fare is generated using preset fare estimation rules.
2. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 1, characterized in that, The method for constructing the spatiotemporal coupling feature includes: The operating mileage and corresponding average operating speed of the target train in the predetermined operating section are obtained from the multi-source heterogeneous data, and a spatiotemporal coupling feature is generated to characterize the comprehensive spatiotemporal cost of the trip.
3. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 2, characterized in that, The multi-dimensional feature fusion processing method for multi-source heterogeneous data includes: Construct a three-dimensional ticket price feature space, which includes at least the service attribute dimension, spatial dimension, and time dimension; The service attribute dimensions include train type code, seat class code, and sub-seat location code; the spatial and temporal dimensions correspond to the spatiotemporal coupling feature. The multi-source heterogeneous data is mapped to the three-dimensional ticket price feature space, and specific feature values are extracted from each dimension and combined to generate the input features.
4. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 3, characterized in that, The method for generating the input features includes: The feature values are unified with the spatiotemporal coupling features, and enhanced input features are generated based on preset feature interaction rules; The enhanced input features are then fed into the hybrid machine learning model.
5. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 1, characterized in that, The construction and optimization methods of the first sub-model include: By using a preset hyperparameter search strategy, at least one key structural parameter of the first sub-model is determined and optimized. By combining historical benchmark ticket price data with the input features of the corresponding time period, a first mapping graph between the benchmark ticket price and the input features is constructed through the first sub-model.
6. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 1, characterized in that, The construction and training methods of the second sub-model include: Generate external time-series feature factors, which include at least periodic holiday identifiers and seasonal fluctuation identifiers; By combining historical discount rate data with external time-series characteristic factors for the corresponding time period, a second mapping spectrum between the dynamic discount rate and the external time-series characteristic factors is constructed through the second sub-model.
7. The railway ticket price prediction method based on multi-dimensional feature fusion according to claim 1, characterized in that, The fare estimation rules include: Based on the dynamic discount rate, a time-dependent dynamic weight coefficient is generated; Based on the dynamic weighting coefficients, a fusion function is constructed to integrate the base fare and the dynamic discount rate; Substitute the base fare and the dynamic discount rate into the fusion function to output the estimated fare.
8. A railway ticket price prediction system based on multi-dimensional feature fusion, characterized in that, The system includes: The data acquisition and fusion module is used to acquire multi-source heterogeneous data and verify and clean abnormal mileage data; The feature engineering module is used to construct spatiotemporally coupled features and a three-dimensional ticket price feature space, generating enhanced input features; The hybrid model training module is used to train the first sub-model and the second sub-model. The fare estimation module is used to output the estimated fare by combining the base fare and the dynamic discount rate through a fusion function.
9. An electronic device, characterized in that, It includes a processor, a memory, a user interface, and a network interface. The memory is used to store instructions, the user interface and the network interface are used to communicate with other devices, and the processor is used to execute the instructions stored in the memory to cause the electronic device to perform the method as described in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a plurality of instructions adapted to be loaded by a processor and executed as described in any one of claims 1-7.