Operation plan change calculation device and operation plan change calculation method
The system addresses the inefficiencies in existing train operation plan change systems by using a combination of supervised and reinforcement learning to calculate schedule changes that account for specific operational circumstances and multiple reward elements, enhancing the responsiveness and efficiency of schedule adjustments.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- HITACHI LTD
- Filing Date
- 2024-06-21
- Publication Date
- 2026-06-25
AI Technical Summary
Existing train operation plan change systems require significant manpower for preparing training schedule data and do not adequately consider multiple reward elements or specific operational circumstances at affected routes and stations during disruptions.
A system that includes a timetable simulator, reward evaluation unit, supervised learning execution unit, reinforcement learning execution unit, situation determination unit, and operation plan change proposal unit to calculate operation plan changes that account for specific operational circumstances and multiple reward elements.
Enables the calculation of operation plan changes that respond to various disruption situations and consider the specific operational circumstances of target lines and stations, reducing the need for manual input and improving the efficiency of schedule adjustments.
Smart Images

Figure 0007880496000001 
Figure 0007880496000002 
Figure 0007880496000003
Abstract
Description
Technical Field
[0001] The present invention relates to an operation plan change plan calculation device and an operation plan change plan calculation method.
Background Art
[0002] In the operation of trains and the like, due to the occurrence of personal accidents or any troubles in the vehicles, etc., delays may occur from the original operation schedule, or operational obstacles such as the impossibility of train operation between specific stations may occur. To return the train operation to normal from such operational obstacles, a series of schedule changes are referred to as operation plan changes. And when making an operation plan change, operations such as changing the departure time, platform, stopping stations, destination stations, train type, operation order, and suspension of some trains are required. Operation plan changes are generally made by the dispatcher in the operation command center. Therefore, it is required that the dispatcher be familiar with the characteristics of each station on the line under his / her responsibility and know what operation plan changes should be implemented when any operational obstacles occur where and how. Also, it is required to be able to restore to the original operation schedule as soon as possible when an operational obstacle occurs. However, due to the shortage of dispatcher personnel in recent years and the requirement to achieve the quickening and stabilization of the execution of operation plan changes, etc., technologies for assisting in creating operation plan change plans by computers or automating the creation of operation plan change plans are known.
[0003] For example, Patent Document 1 discloses a technology that "a train operation management system 1001 comprises a train operation change proposal creation system 1101, a learning data generation unit 1102 that transmits learning data to the train operation change proposal creation system 1101, and a train operation management unit 1109 that transmits train operation change timetable data to the train operation change proposal creation system 1101 and receives the train operation change proposal. A train operation change action learning unit 1103 comprises a simulation execution unit 1210 that receives learning timetable data and outputs the degree of suitability of the timetable change action, and a search rule update unit 1215 that creates search rules from the degree of suitability. A train operation change proposal search unit 1105 searches for train operation change proposals according to the updated search rules and outputs them to the train operation management unit 1109."
[0004] Furthermore, Patent Document 2 discloses "a system for changing the operation plans of priority and regular operating transportation services when delays occur in transportation services operating according to a planned schedule, comprising: a schedule prediction simulator unit that creates a predicted schedule that takes into account delays in transportation services in addition to the planned schedule, and a revised schedule that takes into account operation plan change items in addition to the predicted schedule; and an operation plan change determination unit that uses the planned schedule and the predicted schedule to select transportation services that should have their operation plans changed, determines operation plan change items for the transportation services that should have their operation plans changed using the relative driving position relationship of each transportation service, information in front of these transportation services, and information on the history of other operation plan changes, and provides the determined operation plan change items to the schedule prediction simulator unit." [Prior art documents] [Patent Documents]
[0005] [Patent Document 1] Japanese Patent Publication No. 2019-209797 [Patent Document 2] Japanese Patent Publication No. 2022-124694 [Overview of the Initiative] [Problems that the invention aims to solve]
[0006] However, while the invention described in Patent Document 1 makes it possible to teach a computer the know-how and application rules of dispatchers in changing train schedules, thereby automating the calculation of proposed changes to train schedules, there is a problem in that preparing the training schedule data, especially data that reflects the knowledge and know-how of skilled dispatchers, requires a lot of manpower, such as input work by dispatchers. In addition, a method for preparing ideal training schedule data for all operational disruptions on the routes subject to the change in train schedules has not been considered.
[0007] Furthermore, while the invention described in Patent Document 2 allows for the automatic learning of rules for applying changes to the operation plan in various operational disruptions by performing reinforcement learning under various operational disruption conditions, provided that some timetable data is prepared, it does not take into account cases where multiple different reward elements are required, such as prioritizing passenger transport volume over delay elimination in some cases. Furthermore, unlike reinforcement learning, which optimizes schedules solely based on set rewards, this system does not adequately consider situations where changes to the operating plan need to be made by taking into account specific operational circumstances at the affected routes and stations.
[0008] The present invention aims to enable the calculation of revised train operation plans that can respond to various operational disruption situations and take into account the specific operational circumstances of the target lines and stations. [Means for solving the problem]
[0009] To solve the above problems, one typical operation plan change calculation device includes: a timetable simulator that predicts a predicted timetable in which an operational disruption occurs and a revised timetable that reflects the proposed operation plan change in the predicted timetable; a reward evaluation unit that calculates and evaluates the reward calculated based on specific criteria in the revised timetable; a supervised learning execution unit that generates a supervised learning model using data of operational disruption situations that have occurred in the past as training data; a reinforcement learning execution unit that generates a reinforcement learning model that predicts an operation plan change that gives the highest possible reward for the revised timetable; a situation determination unit that determines whether to calculate an operation plan change using the supervised learning model or the reinforcement learning model depending on the operational disruption situation; and an operation plan change proposal design unit that designs the operation plan change based on the operational disruption situation and the learning model determined by the situation determination unit. [Effects of the Invention]
[0010] According to the present invention, it is possible to calculate proposed changes to the operation plan that can respond to various operational disruption situations and also take into account the specific operational circumstances of the target line and station. Other issues, configurations, and effects not mentioned above will be clarified by the description of the embodiments for carrying out the invention below. [Brief explanation of the drawing]
[0011] [Figure 1] Figure 1 shows an example of a system for calculating proposed changes to the train schedule. [Figure 2] Figure 2 shows an example of the hardware and software configuration of the operation plan change calculation device. [Figure 3] Figure 3 shows an example of the operational plan change data managed by operational plan change performance data. [Figure 4] Figure 4 shows an example of the combinations of reward weight coefficients managed by the reward weight table. [Figure 5] Figure 5 shows an example of a table illustrating the results of supervised learning calculations. [Figure 6] FIG. 6 shows an example of a reinforcement learning calculation result table. [Figure 7] FIG. 7 shows an example of a relearning case list. [Figure 8] FIG. 8 is a diagram showing an example of a screen display when a running plan change by supervised learning is proposed. [Figure 9] FIG. 9 is a diagram showing an example of a screen display when a running plan change by reinforcement learning is proposed. [Figure 10] FIG. 10 is a diagram showing an example of a screen presented after a commander selects to execute a running plan change by reinforcement learning in FIG. 9. [Figure 11] FIG. 11 is a flowchart showing a process of determining a calculation method for a running plan change proposal. [Figure 12] FIG. 12 is a flowchart showing a process of determining an update timing of a situation determination unit. [Figure 13] FIG. 13 is a flowchart showing a process of determining a calculation method for a running plan change proposal.
MODE FOR CARRYING OUT THE INVENTION
[0012] Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the present invention is not limited by this embodiment. Also, in the description of the drawings, the same parts are denoted by the same reference numerals. When there are a plurality of components having the same or similar functions, they may be described with the same reference numeral and different subscripts. Also, when it is not necessary to distinguish these plurality of components, the subscripts may be omitted in the description. Also, terms such as "first," "second," "third," etc. may be used in this disclosure to describe various elements or components, but it will be understood that these elements or components should not be limited by these terms. These terms are only used to distinguish one element or component from another. Thus, the first element or component discussed below could also be referred to as the second element or component without departing from the teachings of the inventive concept. The position, size, shape, range, etc. of each component shown in the drawings may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. For this reason, the present invention is not necessarily limited to the position, size, shape, range, etc. disclosed in the drawings.
[0013] [First Embodiment] First, referring to FIGS. 1 to 10, the operation plan change proposal calculation system 1 of the first embodiment will be described. FIG. 1 shows an example of the operation plan change proposal calculation system 1. The operation plan change proposal calculation device 101 is connected to the operation management system 201, receives information such as the operation status, occurrence of driving failures, and weather information necessary to execute an operation plan change from the operation management system 201, and calculates an operation plan change proposal and the like.
[0014] Similarly, the operation management system 201 transmits information such as the operation status, occurrence of driving failures, and weather information necessary to execute an operation plan change to the operation plan change proposal calculation device 101, and receives the operation plan change proposal calculated by the operation plan change proposal calculation device 101.
[0015] Next, referring to FIG. 2, the operation plan change proposal calculation device 101 of the embodiment will be described. FIG. 2 shows an example of the hardware configuration and software configuration of the operation plan change proposal calculation device 101. The operation plan change calculation device 101 includes a processing unit 1100, a storage device 1200, an arithmetic unit 1300, an input / output interface 1400, and a network interface 1500 for communication with the operation management system 201.
[0016] <Processing> Processing unit 1100 is a processing unit that performs the process of calculating proposed changes to the operation plan. In the embodiments of this disclosure, the processing unit 1100 implements software that functions as a data processing execution module. The processing unit 1100 is stored in the storage device 1200 when the operation plan change calculation device 101 is started up, and is then calculated by the arithmetic unit 1300. Furthermore, the processing unit 1100 mainly includes a control unit 1101, a timetable simulator 1102, a reward evaluation unit 1103, a supervised learning execution unit 1104, a reinforcement learning execution unit 1105, a reward weight estimation unit 1106, a situation judgment unit 1107, and a train operation plan change proposal unit 1108.
[0017] <<Control Panel>> The control unit 1101 is a control program that controls the operation of the data processing execution module of the processing unit 1100.
[0018] <<Diamond Simulator>> The timetable simulator 1102 predicts a timetable (hereinafter referred to as the predicted timetable) that takes into account the impact of operational disruptions that may cause delays between specific stations or make it impossible to operate trains between specific stations, in addition to the normal timetable. Furthermore, the timetable simulator 1102 takes input of proposed changes to the operation plan and performs a process to predict the timetable after the proposed changes have been reflected in the predicted timetable (hereinafter referred to as the revised timetable).
[0019] <<Compensation Evaluation Department>> The reward evaluation unit 1103 calculates and evaluates the target value of the operational plan change (hereinafter referred to as "reward") calculated based on specific criteria in the revised timetable that reflects the proposed operational plan change in the predicted timetable. The criteria for calculating rewards should not be limited to a single factor, but should also consider cases involving multiple factors. Furthermore, the reward may be of one type or multiple types.
[0020] <<Supervised Learning Implementation Department>> The supervised learning execution unit 1104 uses the past performance data accumulated in the operation plan change performance data 1201 (described later) as training data for calculating proposed operation plan changes and performs machine learning. The trained model is stored in the supervised learning model 1202 (described later).
[0021] <<Reinforcement Learning Execution Unit>> The reinforcement learning execution unit 1105 performs reinforcement learning to calculate a proposed change to the train schedule that maximizes the reward value of the revised schedule reflected in the predicted schedule generated by the schedule simulator 1102. The reinforcement learning execution unit 1105 stores the acquired model as a result of learning in the reinforcement learning model 1203, which will be described later. Furthermore, the reinforcement learning execution unit 1105 may perform reinforcement learning to calculate a revised train schedule that maximizes the sum of multiple types of reward values, rather than using a single reward value.
[0022] <<Reward Weight Estimation Unit>> The reward weight estimation unit 1106 estimates the weights of the rewards that were likely considered when actually making changes to the operation plan, for example, the weighted sum of multiple elements, from the past operation plan change performance data included in the operation plan change performance data 1201. Furthermore, the estimated reward weights are stored in the reward weight table 1204.
[0023] <<Situation Assessment Department>> The situation assessment unit 1107 determines information about operational disruption situations that may lead to changes in the operation plan. Specifically, the situation determination unit 1107 refers to the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207 to determine what kind of operational disruption it is, or what operational plan change calculation policy should be used to address the operational disruption.
[0024] <<Operational Plan Change Planning Committee>> The operation plan modification proposal unit 1108 uses a supervised learning model or a reinforcement learning model to devise the optimal operation plan modification proposal corresponding to the target operational disruption. Alternatively, the unit may be configured to devise both operation plan modification proposals using a supervised learning model and operation plan modification proposals using a reinforcement learning model, and to select one of the operation plan modification proposals based on the judgment result of the situation judgment unit.
[0025] <<User Behavior Collection Department>> The user action collection unit 1109 collects operational actions performed by the commander via the input / output interface 1400 in response to the operational plan change proposals devised and presented by the operational plan change proposal design unit 1108. The collected operational actions are stored in the calculated operational history of the operational plan change proposal 1208.
[0026] <<Situation Assessment Update Department>> The situation judgment update unit 1110 updates the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207, and adds an item to the list of cases requiring retraining 1209. Specifically, the situation determination unit 1107 updates the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207 based on the calculation change operation history 1208, and adds an item to the list of cases requiring retraining 1209.
[0027] <Arithmetic device> The arithmetic unit 1300 performs calculations on the program of the processing unit 1100, which is stored in the memory device 1200.
[0028] <Input / Output Interface> The input / output interface 1400 inputs operations from the commander to the operational plan change calculation device 101 and displays the information output from the operational plan change calculation device 101 to the commander. The input / output interface 1400 may be, for example, a keyboard, mouse, or display, but it may also be of other types of devices.
[0029] <Network Interface> The network interface 1500 is a device that provides a physical connection to a network. In the embodiments of this disclosure, the network interface 1500 is used by the operation plan change calculation device 101 to send and receive information with the operation management system 201.
[0030] <Storage device> The storage device 1200 stores the information necessary for the processing unit 1300 to execute the program that runs on the processing unit 1100. The information stored in the memory device 1200 mainly includes operational plan change data 1201, supervised learning model 1202, reinforcement learning model 1203, reward weight table 1204, constraint table 1205, supervised learning calculation record table 1206, reinforcement learning calculation record table 1207, calculation change operation history 1208, and list of cases requiring retraining 1209.
[0031] <<Data on changes to operational plans>> Next, with reference to Figure 3, we will explain the operational plan change data 1201. Figure 3 shows an example of the operational plan change data managed by operational plan change data 1201. The operational plan change data 1201 is data that records the characteristic quantities of operational disruption situations that have occurred in the past, and is added each time an operational disruption occurs and an operational plan change is made.
[0032] The operational plan change data 1201 mainly includes, for example, the relevant performance ID column 12011, the operational disruption status column 12012, and the feature group ID column 12013. Here, the relevant performance ID is a unique ID corresponding to the operational disruption that occurred, and one is issued for each operational disruption. Furthermore, a disruption in train operations refers to a situation where trains are running on a schedule different from the normal schedule (a predetermined schedule), such as a situation where trains are running despite delays.
[0033] Furthermore, the "operational disruption status" is an abstract representation of the circumstances of the operational disruption that occurred. It includes information on items such as the "time of occurrence" (the period during which the operational disruption occurred), the "expected duration of disruption" (the period during which the operational disruption affects the train), and the "disrupted section" (the section of the train affected by the disruption). This information is stored in the operational disruption status column 12012. Furthermore, the feature group ID is a label determined based on information regarding the operational disruption, allowing us to determine which group the operational disruption belongs to.
[0034] One method of grouping using feature group IDs is to treat the values of operational disruption information from all accumulated historical data as features and perform clustering based on multidimensional features. In this case, when comparing multidimensional features, records with similar features can be considered to belong to the same group, assigned the same feature group ID, and treated as similar operational disruption situations. This allows us to use the operational disruption situations of feature groups assigned the same feature group ID as training data for those operational disruption situations in the supervised learning model described later, even if there are no operational disruption situations with the same feature.
[0035] The number of feature groups may be predetermined, or it may be determined from the distribution of the accumulated operational plan change data 1201 in the multidimensional feature space. Alternatively, operational disruption conditions may be randomly generated independently of the operational plan change data 1201, and these may also be included in this classling process and label determination.
[0036] This allows supervised learning of operation plan changes to be performed even if there is no operation disruption situation identical to the one stored in the operation plan change history data 1201, that is, even if there is no operation disruption situation that has not occurred before, by using a similar operation disruption situation as training data.
[0037] Furthermore, the method of grouping using feature group IDs is not limited to this method; grouping may be performed using other methods as well.
[0038] <<Supervised Learning Model>> The supervised learning model 1202 is a learning model created by the supervised learning execution unit 1104, which performs supervised learning using the operational plan change data 1201 as training data for calculating proposed operational plan changes.
[0039] <<Reinforcement Learning Model>> The reinforcement learning model 1203 is a learning model created by the reinforcement learning execution unit 1105 by performing reinforcement learning to determine a strategy for calculating a revised train schedule that maximizes the reward value set from the predicted schedule generated by the schedule simulator 1102.
[0040] <<Reward Weight Table>> Next, we will explain the reward weight table 1204 with reference to Figure 4. Figure 4 shows an example of the combinations of reward weight coefficients managed by the reward weight table 1204. Figure 4(A) shows an example of the weight table 401 included in the reward weight table 1204. Figure 4(B) shows an example of the weight sort table 402 included in the reward weight table 1204. The reward weight table 1204 mainly consists of the weight table 401 and the sort table 402. The reward weight table 1204 is data that stores the reward weights estimated by the reward weight estimation unit 1106. Furthermore, the compensation is calculated based on past operational plan change data included in operational plan change performance data 1201, representing the compensation that would have been considered when actually making the operational plan change, for example, as a weighted sum of multiple factors, although it may be calculated by other methods.
[0041] The reinforcement learning execution unit 1105 generates a trained reinforcement learning model according to the number of reward weight combinations stored in the reward weight table 1204. In the embodiments of this disclosure, the reward is calculated by summing the results obtained by multiplying multiple elements as terms and each of them by weight coefficients α, β, and γ, based on equation (1). [Mathematics 1] Reward = α × delay reduction amount + β × passenger satisfaction + γ × transport volume … (1)
[0042] Here, "delay reduction amount" is an indicator that shows, for example, the amount of delay that is improved by the revised timetable by applying the proposed change in the operational plan, compared to the amount of delay due to operational disruptions predicted in the predicted timetable. Furthermore, "passenger satisfaction" is an indicator that shows, for example, the low number of cancellations in the overall timetable subject to the operational plan change. "Transportation volume" is another indicator that shows how many passengers can be transported by the entire timetable.
[0043] In this way, by changing the weight coefficient values for each term of the reward value, it is possible to reflect the policies that should be prioritized when changing the operation plan (for example, calculating an operation plan that prioritizes passenger transport volume because it is the morning rush hour) and flexibly calculate the operation plan change proposal based on the reward that should be prioritized.
[0044] Weight table 401 assigns a combination of reward weights and the corresponding performance ID belonging to each feature group ID. Furthermore, for each reward weight combination, the commander either inputs weight coefficients that are deemed appropriate for the performance data set belonging to that feature group, or the weight coefficients are automatically estimated.
[0045] One method for automatically estimating weight coefficients is to exploratoryally estimate the combination of coefficients that minimizes the variation in reward values for each performance, assuming that changes in the operation plan for performance within the same feature group are calculated with the same reward weights, and setting certain values for each coefficient α, β, and γ.
[0046] Sort table 402 displays the reward weight combinations sorted based on their relationship to the reward weight table. In this embodiment, the sort table 402 is presented to the commander sorted, for example, based on the coefficient α in ascending order, but other coefficients may also be used as the sorting criterion. Furthermore, the system may be configured to allow editing of weight coefficients and deletion of unnecessary combinations. Reinforcement learning can be performed for each combination of reward weight coefficients present in the reward weight table.
[0047] <<Constraint Table>> Constraint table 1205 is a table that stores conditions that should be explicitly avoided for operational plan modification tasks that are tested as proposed operational plan changes during reinforcement learning execution.
[0048] <<Supervised Learning Calculation Results Table>> Next, with reference to Figure 5, we will explain the supervised learning calculation results table 1206. Figure 5 shows an example of the supervised learning calculation results table 1206. The supervised learning calculation results table 1206 is data that the situation judgment unit 1107 references to determine what kind of operational disruption it is, and whether or not to use a specific operational plan change calculation policy, when dealing with information about operational disruptions that may lead to changes in the operational plan. Table 1206, which shows the results of supervised learning, primarily stores data calculated through supervised learning in a database table format.
[0049] The supervised learning calculation results table 1206 records, for each feature group ID, whether or not there is actual data from supervised learning (supervised experience), the number of times proposed changes were calculated using the supervised learning model (number of proposed changes calculated), the number of times the calculated proposed changes were adopted (number of proposed changes adopted), the number of times they were rejected (number of proposed changes rejected), and the confidence level of the results. Here, performance confidence is an indicator of how reliably a supervised learning model can be used for driving problems belonging to each feature group ID in question; a higher value suggests greater reliability. In the embodiments of this disclosure, the reliability of performance is calculated by adding up the results obtained by multiplying each of the multiple elements as terms and weight coefficients α, β, and γ, based on equation (2). [Math 2] Performance reliability = Number of times proposed changes were adopted ÷ Number of times proposed changes were calculated × 100 …(2)
[0050] <<Reinforcement Learning Calculation Results Table>> Next, we will explain the reinforcement learning calculation results table 1207, referring to Figure 6. Figure 6 shows an example of the reinforcement learning calculation results table 1207. The reinforcement learning calculation results table 1207, like the supervised learning calculation results table 1206, is data that the situation judgment unit 1107 references to determine the type of operational disruption and whether or not to use an operational plan change calculation policy when dealing with operational disruption information that is subject to operational plan changes. Table 1207, which records the results of reinforcement learning, primarily stores data calculated using reinforcement learning in a database table format. Reinforcement learning calculation results table 1207 is the same as supervised learning table 1206, except that it does not have an item indicating whether or not there is supervised learning experience. Furthermore, the reliability of the performance is calculated similarly based on equation (2), by taking multiple elements as terms, multiplying each by weight coefficients α, β, and γ, and then adding the results together.
[0051] In the embodiments of this disclosure, the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207 are stored separately, but they may be stored together.
[0052] <<Calculation Change Operation History>> The calculation change operation history 1208 is data collected and stored by the user action collection unit 1109 via the input / output interface 1400, which records the commander's operational actions.
[0053] <<List of Relearning Examples>> Next, we will explain the list of cases requiring retraining, item 1209, with reference to Figure 7. Figure 7 shows an example of item 1209 on the list of items requiring retraining. The list of cases requiring retraining 1209 is a list of items that the situation judgment update unit 1110 adds when updating the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207 based on the calculation change proposal operation history 1208. By creating a list of cases requiring retraining (list 1209) and updating the supervised learning calculation results table (table 1206) and the reinforcement learning calculation results table (table 1207) based on the items added to the list of cases requiring retraining (list 1209), it becomes possible to identify operational problems with feature group IDs that prevent commander recruitment in the current supervised learning model or reinforcement learning model, as well as add operational problems that cannot be addressed by the current performance data or learning model, when reconstructing the learning model.
[0054] <<Screen display when a change to the train schedule is proposed>> Next, with reference to Figures 8-10, we will describe an example of the screen display when a change in the operation plan is proposed and shown on the input / output interface 1400. Figure 8 shows an example of the screen display when a change in the operation plan is proposed using supervised learning.
[0055] Figure 8 presents a recommended method for modifying train schedules: a performance-based AI, or supervised learning-based method. The reasons for this recommendation are also shown at the bottom. For example, Figure 8 shows that the reliability of the supervised learning method for changing the operation plan is high, based on the fact that the feature group ID to which the operational disruption to be changed belongs has a history of training, and from the history of the results of calculating proposed changes using a supervised learning model for the corresponding feature group ID.
[0056] Figure 9 shows an example of the screen display when a change in the operation plan is proposed using reinforcement learning. Figure 9 presents a reward-based AI, or reinforcement learning-based, as the recommended method for changing the train schedule, and, as in Figure 8, the reasons for this recommendation are also presented at the bottom. For example, Figure 9 shows that there is no prior training data for the feature group ID to which the operational disruption to be changed belongs, and that the reliability is high based on the history of past reinforcement learning model calculation results for the corresponding feature group ID.
[0057] Figure 10 shows an example of a screen that is presented after the commander selects to implement the operational plan change using reinforcement learning in Figure 9. Figure 10 shows the selection of reward weight combinations for calculating proposed changes to the operation plan. By setting the weights used in reinforcement learning, it is possible to reflect the objectives that are important in reinforcement learning, such as delay reduction, customer satisfaction, and transport volume.
[0058] <Process for generating learning models in supervised learning> Next, we will explain the process of generating a supervised learning model by performing supervised learning training in advance as a preprocessing step before calculating proposed changes to the operation plan. The supervised learning execution unit 1104 learns a supervised learning model that outputs proposed changes to the operation plan using the operational disruption status stored in the operational plan change record data 1201 as training data.
[0059] Furthermore, the supervised learning execution unit 1104 can use conventional methods for specific explanatory variables and output formats. For example, in the operational disruption situation and predicted timetable for operational plan changes, the output format can be set to "What operational plan change procedure is best selected for each specific station or train?", and the explanatory variables can be "the time of day when the operational disruption occurs, the surrounding operational disruption conditions at the target station or train, and the approach status of following trains."
[0060] Thus, a supervised learning model that has learned rules for reflecting proposed changes to train schedules based on past train schedule changes can reproduce train schedule changes made by humans, taking into account the specific operational circumstances of the lines and stations being changed.
[0061] <Reinforcement learning model generation process> Next, we will explain the process of generating a reinforcement learning model by performing training using reinforcement learning as a pre-processing step before calculating proposed changes to the operation plan. The reinforcement learning execution unit 1105 learns a reinforcement learning model based on the number of reward weight combinations stored in the reward weight table 1204. The reinforcement learning execution unit 1105, like supervised learning, can use conventional methods for specific explanatory variables and output formats in the learning method itself.
[0062] The reinforcement learning performed by the reinforcement learning execution unit 1105 involves randomly changing, for example, the section (between stations) where the operational disruption occurs and the duration of the disruption, generating a large number of operational disruption situations, and learning to maximize the reward value for the set reward weight in each operational disruption situation.
[0063] Furthermore, the reinforcement learning execution unit 1105 may take into consideration the constraints listed in the constraint table 1205 and refrain from attempting operation plan modification tasks that fall under these constraints during the learning process. The constraints are created in advance by the commander, and examples include "Turnarounds will only be performed at stations A and D" and "Trains passing through station C will be cancelled as little as possible."
[0064] The reinforcement learning model, trained by the reinforcement learning execution unit 1105, learns a strategy to take information such as predicted timetables for which the operation plan will be changed as input, and output the operation plan change operation corresponding to that information, just as in the case of supervised learning. Furthermore, in reinforcement learning, if an actual operational disruption occurs and the calculated reward value for the proposed operational plan change is unsatisfactory, the system may additionally search for operational plan changes that yield a higher reward value.
[0065] This makes it possible to calculate proposed changes to the train schedule based on operational disruptions that would trigger such changes, using a pre-trained reinforcement learning model. Thus, a reinforcement learning model trained to reflect operational plan changes that maximize rewards can be expected to calculate proposed operational plan changes that reflect predicted timetables where operational disruptions occur into revised timetables less affected by operational disruptions.
[0066] <Decision on the calculation method for proposed changes to the operation plan> Next, referring to Figure 11, we will explain the process by which the situation determination unit 1107 appropriately uses the learning model according to various operational disruption situations to determine the calculation method for the proposed changes to the operation plan. Figure 11 is a flowchart showing the process for determining the calculation method for the proposed changes to the train schedule. The process shown in the flowchart of Figure 11 allows for the determination of a method for calculating proposed changes to the operation plan, taking into account factors such as the presence or absence of training experience, the reliability of the supervised learning method, and the reliability of the reinforcement learning method, for the feature group ID to which the operational disruption subject to the operation plan change belongs.
[0067] Furthermore, after this process determines the method and policy for calculating proposed changes to the operational disruption subject to modification through the operation of the commander, the operational plan modification design unit 1108 executes the calculation process of the proposed operational plan changes according to the determined content.
[0068] Alternatively, the reinforcement learning calculation results table 1207 may be managed individually for each reward weight combination.
[0069] (Step S101) In step S101, the feature quantities of the input operational disruption situation are calculated, and the feature quantity group ID to which the operational disruption situation belongs is determined.
[0070] (Step S102) In step S102, it is determined whether or not the operational plan change data 1201, which will be used as training data during supervised learning execution, exists for the determined feature group ID. This determination is made by referring to the supervised learning calculation results table 1206. If training data exists for the feature group ID to which the identified driving impairment belongs, the process proceeds to S103; otherwise, the process proceeds to S105.
[0071] (Step S103) In step S103, it is determined whether the actual confidence level of the calculation of proposed changes to the operation plan using the supervised learning model for the corresponding feature group ID is above a threshold. If the confidence level of the results calculated using the supervised learning method is above the threshold, the process proceeds to S104; otherwise, the process proceeds to S105. Furthermore, if the number of calculations for proposed changes falls below a certain number, the system determines that the reliability of the results is below the threshold and proceeds to process S105.
[0072] (Step S104) In step S104, the commander is proposed a method for calculating proposed changes to the operation plan using a supervised learning model.
[0073] (Step S105) In step S105, it is determined whether the actual confidence level of the revised operation plan calculation using the reinforcement learning model for the corresponding feature group ID is above a threshold. If the confidence level of the results calculated using the reinforcement learning model is above the threshold, the process proceeds to S106; otherwise, the process proceeds to S107.
[0074] (Step S106) In step S106, when calculating proposed changes to the operation plan using reinforcement learning, it is determined whether the estimated calculation time required to calculate the proposed changes to the operation plan is below a threshold. If the estimated calculation time is less than the threshold, the process proceeds to S107; if the estimated time is greater than or equal to the threshold, the process proceeds to S108. This allows for manual changes to the flight plan by the commander if the calculation of proposed flight plan changes using reinforcement learning takes too long.
[0075] (Step S107) In step S107, the commander is proposed a calculation of revised operational plans using a reinforcement learning arithmetic model. When proposing a change in the operation plan using a reinforcement learning model, the system may be configured to query the dispatcher for a combination of reward weight coefficients suitable for the operational disruption to be changed. For example, the system could be configured to recommend a reward weight combination corresponding to the feature group ID to which the operational impairment to be modified belongs, by referencing the weight table 401. Alternatively, reward weight combinations corresponding to other feature group IDs where the features are close together, i.e., similar to each other, may be recommended in order.
[0076] The proposed changes to the operation plan are calculated using a reinforcement learning model that corresponds to the selected reward weight combination. This ensures that the appropriate reinforcement learning model is selected and applied according to the operational disruption being addressed, and also reflects the reward judgment results based on the situation of the dispatcher making the selection, thereby enabling the calculation of a more optimal proposed change to the operation plan.
[0077] (Step S108) Step S108 involves proposing a manual change to the flight schedule to the commander.
[0078] <Process to determine the update timing of the status assessment unit> Next, referring to Figure 12, we will explain the process for determining the update timing of the status determination unit 1107. Figure 12 is a flowchart showing the process for determining the update timing of the status determination unit 1107. As mentioned above, the process in this flowchart begins after the proposed changes have been calculated and the history of the commander's actions in response to them has been added.
[0079] (Step S201) Step S201 determines whether the operation history has been added more than a threshold number of times since the last update execution. If the number of operations added to the operation history since the last update exceeds the threshold, the process proceeds to S202. If the actual reliability is below the threshold, the process terminates without performing the update.
[0080] (Step S202) In step S202, the supervised learning calculation results table 1206 is updated. Specifically, each item in the supervised learning calculation performance table 1206 is updated according to the accumulated calculation of proposed changes and the operation history. In addition, the performance reliability value is updated by recalculating it.
[0081] (Step S203) In step S203, the reinforcement learning calculation results table 1207 is updated. Similarly, updating the reinforcement learning calculation results table 1207 involves updating the values of each item and recalculating the confidence level of the results.
[0082] (Step S204) Step S204 involves updating the list of examples requiring retraining, specifically item 1209. At this time, the list of cases requiring retraining 1209 lists the feature group IDs determined in S107, and also records the corresponding incident IDs of operational disruptions for which the calculation results of proposed changes using supervised learning or reinforcement learning were rejected.
[0083] (Step S205) In step S205, it is determined whether the number of items in the list of features that need retraining is above a threshold. If the number of items in the list of cases requiring retraining (1209) is greater than or equal to the threshold, the process proceeds to S206. If the actual reliability is less than the threshold, the process proceeds to S207.
[0084] (Step S206) Step S206 proposes the reconstruction of supervised learning and reinforcement learning models. This allows us to add and reconstruct the learning model with historical data on operational disruptions that were not covered by previous learning models. This enables us to obtain a learning model that reflects this historical data and can address the relevant operational disruptions when calculating changes to the next operational plan.
[0085] Alternatively, the supervised learning calculation results table 1206 and the reinforcement learning calculation results table 1207 may be reset. This makes it possible to construct a supervised learning calculation record table 1206 and a reinforcement learning calculation record table 1207 that are suitable for each newly generated learning model.
[0086] <Mechanism and Effects> The operational plan change calculation system 1 according to the embodiment has been described above. The operational plan change calculation system 1 of this disclosure mainly comprises an operational plan change calculation device 101 and an operational management system 201. The operational plan change calculation device 101 can calculate an operational plan change in the operational plan change design unit 1108 based on a learning model learned by the supervised learning execution unit 1104 or the reinforcement learning execution unit 1105, in accordance with the judgment of the situation judgment unit 1107. This allows for the selection of an appropriate method for calculating proposed changes to the operational plan, depending on the operational disruption situation and other factors that may affect the operational plan. Furthermore, if neither method can generate an appropriate proposed change to the operational plan, the model can be retrained by incorporating newly accumulated historical data. This is expected to enable the model to handle a wider range of operational disruptions in subsequent calculations of operational plan changes. In other words, it becomes possible to generate operational plan changes that can handle various operational disruption situations and take into account the specific operational circumstances of the target lines and stations.
[0087] [Second Embodiment] Next, a second embodiment will be described with reference to Figure 13. Figure 13 is a flowchart showing the process for determining the proposed change in the operation plan in the second embodiment. Note that the flowchart for the first embodiment in Figure 11 first determines whether or not a supervised learning method can be implemented, whereas the flowchart for the second embodiment in Figure 13 first determines whether or not a reinforcement learning method can be implemented, and then determines whether or not a supervised learning method can be implemented. This allows for prioritizing the calculation of appropriate operational plan changes for all routes and stations, rather than considering specific operational circumstances at the target routes and stations, when calculating proposed operational plan changes. In the following description, components that are the same as or equivalent to those in the first embodiment described above will be denoted by the same reference numerals, and their descriptions will be simplified or omitted.
[0088] (Step S301) Step S301 performs the same process as step S101 in Figure 11.
[0089] (Step S302) Step S301 performs the same process as step S105 in Figure 11. If the confidence level of the results calculated using the reinforcement learning model is above the threshold, the process proceeds to S303; otherwise, the process proceeds to S305.
[0090] (Step S303) Step S303 performs the same process as step S106 in Figure 11. If the estimated calculation time is below the threshold, the process proceeds to S304; if the estimated time is equal to or greater than the threshold, the process proceeds to S305.
[0091] (Step S304) Step S304 performs the same process as step S107 in Figure 11.
[0092] (Step S305) Step S305 performs the same process as step S102 in Figure 11. If training data exists for the feature group ID to which the identified driving impairment belongs, the process proceeds to S306; otherwise, the process proceeds to S308.
[0093] (Step S306) Step S304 performs the same process as step S103 in Figure 11. If the reliability of the results calculated using the supervised learning method is above the threshold, the process proceeds to S307; otherwise, the process proceeds to S308. Furthermore, if the number of calculations for proposed changes falls below a certain number, the system determines that the reliability of the results is below the threshold and proceeds to process S308.
[0094] (Step S308) Step S308 performs the same process as step S108 in Figure 11.
[0095] Although embodiments of the present invention have been described above, the present invention is not limited to the embodiments described above, and various modifications are possible without departing from the spirit of the present invention.
[0096] Furthermore, the present invention can also take the following forms. (Aspect 1) In a device that calculates proposed changes to the operation plan, A timetable simulator that predicts a timetable in which operational disruptions will occur, and a revised timetable that reflects the proposed changes to the operational plan in the aforementioned timetable. The aforementioned revised timetable includes a compensation evaluation unit that calculates and evaluates compensation calculated based on specific criteria, A supervised learning execution unit that generates a supervised learning model using data of past operational disruption situations as training data, A reinforcement learning execution unit generates a reinforcement learning model that predicts a revised train schedule that offers the highest possible reward. A situation determination unit that determines whether to calculate a revised operation plan using the supervised learning model or the reinforcement learning model, depending on the aforementioned operational disruption situation, The system includes a train operation plan modification proposal unit that devises a modified train operation plan based on the aforementioned operational disruption situation and the learning model determined by the situation determination unit. A device for calculating proposed changes to the train schedule. (Aspect 2) In a device that calculates proposed changes to the operation plan, A timetable simulator that predicts a timetable in which operational disruptions will occur, and a revised timetable that reflects the proposed changes to the operational plan in the aforementioned timetable. The aforementioned revised timetable includes a compensation evaluation unit that calculates and evaluates compensation calculated based on specific criteria, A supervised learning execution unit that generates a supervised learning model using data of past operational disruption situations as training data, A reinforcement learning execution unit generates a reinforcement learning model that predicts a revised train schedule that offers the highest possible reward. A unit for designing proposed changes to the operation plan, which designs proposed changes to the operation plan based on the aforementioned operational disruption conditions and the supervised learning model and / or the reinforcement learning model, The system includes a situation determination unit that selects either a revised operation plan calculated using the supervised learning model or a revised operation plan calculated using the reinforcement learning model, depending on the aforementioned operational disruption situation. A device for calculating proposed changes to the train schedule. (Aspect 3) A device for calculating proposed changes to the operation plan according to embodiment 1 or 2, If the situation determination unit determines that the estimated calculation time required to calculate the proposed change in the operation plan based on the reinforcement learning execution unit is equal to or greater than a threshold, it will propose that the commander manually change the operation plan. A device for calculating proposed changes to the train schedule. (Aspect 4) A device for calculating proposed changes to the operation plan, as described in any one of embodiments 1 to 3, The aforementioned situation determination unit groups the operational plan change record data, which records operational disruption situations that have occurred in the past, into feature groups based on the features, and considers operational disruption situations belonging to the same feature group as similar operational disruption situations. A device for calculating proposed changes to the train schedule. (Appendix 5) A device for calculating proposed changes to the operation plan, as described in any one of embodiments 1 to 4, A user behavior collection unit collects operational actions taken by commanders in response to the operational plan change proposals devised by the aforementioned operational plan change proposal design unit, The system includes a calculation change proposal operation history that stores the operation actions collected by the user action collection unit, If the operation action stored in the calculation change operation history is above a threshold, the supervised learning model or the reinforcement learning model is reconstructed. A device for calculating proposed changes to the train schedule. (Aspect 6) A device for calculating proposed changes to the operation plan, as described in embodiment 4 or 5, The aforementioned data on changes to the operation plan includes the reliability of the results. The situation determination unit calculates the proposed changes to the operation plan using the reinforcement learning model if the reliability of the performance is above a threshold. A device for calculating proposed changes to the train schedule. (Aspect 7) In the operation plan change calculation device described in any one of embodiments 1 to 6, The situation determination unit performs a first step of determining whether the actual reliability of the calculation of the proposed changes to the operation plan using the supervised learning model is above a threshold, In the first step, if the reliability of the performance is above a threshold, the second step is to propose the calculation of the revised operation plan using the supervised learning model in a supervised learning manner, In the first step, if the performance reliability is below a threshold, the situation determination unit determines whether the performance reliability of the calculation of the revised operation plan using the reinforcement learning model is above the threshold, in a third step, In the third step, if the reliability of the performance is above a threshold, the operation plan change proposal unit proposes the calculation of the operation plan change using the reinforcement learning model in a reinforcement learning manner in a fourth step, A method for calculating proposed changes to the operation plan, including the changes mentioned above. [Explanation of symbols]
[0097] 101 Operation Plan Change Calculation Device 1100 Processing Unit 1101 Control Unit 1102 Diamond Simulator 1103 Compensation and Evaluation Department 1104 Supervised Learning Implementation Department 1105 Reinforcement Learning Execution Unit 1106 Reward Weight Estimation Unit 1107 Situation Assessment Department 1108 Drafting Committee for Proposed Changes to the Operation Plan 1109 User Behavior Collection Department 1110 Situation Assessment Update Department 1200 storage device 1201 Data on changes to operational plans 1202 Supervised Learning Models 1203 Reinforcement Learning Models 1204 Reward Weight Table 1205 Constraint Table 1206 Supervised Learning Calculation Results Table 1207 Reinforcement Learning Calculation Results Table 1208 Calculation Change Proposal Operation History 1209 List of Examples Requiring Retraining 1300 Arithmetic equipment 1400 Input / Output Interfaces 1500 network interfaces 201 Operation Management System
Claims
1. In a device that calculates proposed changes to the operation plan, A timetable simulator that predicts a timetable in which operational disruptions will occur, and a revised timetable that reflects the proposed changes to the operational plan in the aforementioned timetable. The aforementioned revised timetable includes a compensation evaluation unit that calculates and evaluates compensation calculated based on specific criteria, A supervised learning execution unit that generates a supervised learning model using data of past operational disruption situations as training data, A reinforcement learning execution unit generates a reinforcement learning model that predicts a revised train schedule that offers the highest possible reward. A situation determination unit that determines whether to calculate a revised operation plan using the supervised learning model or the reinforcement learning model, depending on the aforementioned operational disruption situation, The system includes a train operation plan modification proposal unit that devises a modified train operation plan based on the aforementioned operational disruption situation and the learning model determined by the situation determination unit. A device for calculating proposed changes to the train schedule.
2. In a device that calculates proposed changes to the operation plan, A timetable simulator that predicts a timetable in which operational disruptions will occur, and a revised timetable that reflects the proposed changes to the operational plan in the aforementioned timetable. The aforementioned revised timetable includes a compensation evaluation unit that calculates and evaluates compensation calculated based on specific criteria, A supervised learning execution unit that generates a supervised learning model using data of past operational disruption situations as training data, A reinforcement learning execution unit generates a reinforcement learning model that predicts a revised train schedule that offers the highest possible reward. A unit for designing a proposed change to the operation plan, which devises the proposed change to the operation plan based on the aforementioned operational disruption situation and the supervised learning model and / or the reinforcement learning model, The system includes a situation determination unit that selects either a revised operation plan calculated using the supervised learning model or a revised operation plan calculated using the reinforcement learning model, depending on the aforementioned operational disruption situation. A device for calculating proposed changes to the train schedule.
3. A device for calculating proposed changes to the operation plan according to claim 1 or 2, If the situation determination unit determines that the estimated calculation time required to calculate the proposed change in the operation plan based on the reinforcement learning execution unit is equal to or greater than a threshold, it will propose that the commander manually change the operation plan. A device for calculating proposed changes to the train schedule.
4. A device for calculating proposed changes to the operation plan according to claim 1 or 2, The aforementioned situation determination unit groups the operational plan change record data, which records operational disruption situations that have occurred in the past, into feature groups based on the features, and considers operational disruption situations belonging to the same feature group as similar operational disruption situations. A device for calculating proposed changes to the train schedule.
5. A device for calculating proposed changes to the operation plan according to claim 1 or 2, A user behavior collection unit collects operational actions taken by commanders in response to the operational plan change proposals devised by the aforementioned operational plan change proposal design unit, The system includes a calculation change proposal operation history that stores the operation actions collected by the user action collection unit, If the operation action stored in the calculation change operation history is above a threshold, the supervised learning model or the reinforcement learning model is reconstructed. A device for calculating proposed changes to the train schedule.
6. A device for calculating proposed changes to the operation plan according to claim 4, The aforementioned data on changes to the operation plan includes the reliability of the results. The situation determination unit calculates the proposed changes to the operation plan using the reinforcement learning model if the reliability of the performance is above a threshold. A device for calculating proposed changes to the train schedule.
7. In the operation plan change calculation device according to claim 1 or 2, The situation determination unit performs a first step of determining whether the actual reliability of the calculation of the proposed changes to the operation plan using the supervised learning model is above a threshold, In the first step, if the reliability of the performance is above a threshold, the second step is to propose the calculation of the operational plan change proposal using the supervised learning model in a supervised learning manner, In the first step, if the performance reliability is below a threshold, the situation determination unit determines whether the performance reliability of the calculation of the revised operation plan using the reinforcement learning model is above the threshold, in a third step, In the third step, if the reliability of the performance is above a threshold, the operation plan change proposal unit proposes the calculation of the operation plan change using the reinforcement learning model in a reinforcement learning method in a fourth step, A method for calculating proposed changes to the operation plan, including the changes mentioned above.