Xgboost and large language model fusion-based ship arrival prediction agent system and method

By integrating XGBoost with a large language model, the problem of high-precision prediction in complex and dynamic environments for ship arrival time forecasting was solved, achieving interpretability and data security, and improving port operation efficiency and supply chain collaboration.

CN122242845APending Publication Date: 2026-06-19上海市气象服务中心

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
上海市气象服务中心
Filing Date
2026-03-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to cope with complex and dynamic environments in predicting ship arrival times. They rely on a single data dimension, resulting in limited prediction accuracy. Machine learning methods lack the ability to model high-dimensional heterogeneous features, and the missing value handling mechanism is inadequate. Existing prediction systems lack interpretability and human-computer interaction capabilities, and large language models lack the ability to accurately model time-series numerical data. Consequently, they cannot effectively integrate high-precision numerical prediction with semantic intelligence, failing to meet the high-level safety protection requirements of ports.

Method used

By integrating XGBoost with a large language model, real-time ship dynamic data and port information are collected, missing values ​​are processed, and time-series features are constructed to generate a structured feature set. The XGBoost regression model is used for prediction, and the large language model is combined to generate readable explanatory text, realizing the integration of prediction, explanation, and interaction, and ensuring data security.

🎯Benefits of technology

It achieves high-precision ship arrival time prediction, provides interpretable prediction data, improves port operation efficiency and supply chain collaboration, and meets high-level safety protection requirements.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242845A_ABST
    Figure CN122242845A_ABST
Patent Text Reader

Abstract

This invention discloses a ship arrival prediction intelligent agent system and method based on the fusion of XGBoost and a large language model, belonging to the field of intelligent shipping technology. The system includes real-time collection of ship dynamic data, port information, marine meteorological data, and ship static attributes; inputting the feature set into a pre-trained XGBoost regression model to generate structured analytical results; retaining only semantic summary features; generating readable explanatory text through a large language model fine-tuned for port scheduling; and returning the generated explanatory text to the user via a web frontend. This invention achieves high-precision prediction of long-haul ETA through multi-source heterogeneous data fusion, enhanced missing value processing, and rich temporal feature engineering; it overcomes the limitations of existing "black box" models, enabling dispatchers to understand the prediction basis and attribute deviations; it achieves a leap from "information provision" to "decision support"; and it ensures that core shipping data does not leave the internal network, meeting high-level security protection requirements.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent shipping technology, and in particular to a ship arrival prediction intelligent agent system and method based on the fusion of XGBoost and large language models. Background Technology

[0002] Estimated Time of Arrival (ETA) is a core parameter for port operations scheduling, directly impacting the accuracy and efficiency of pilotage scheduling, berth allocation, yard resource allocation, and downstream logistics connections. For a long time, port ETA prediction has primarily relied on linear extrapolation or simple physical model calculations based on speed, heading, and historical trajectory data provided by the Automatic Identification System (AIS). However, ocean voyages are influenced by various dynamic factors such as sea state changes, weather conditions, traffic control, and mechanical failures. Traditional methods struggle to effectively capture these nonlinear, high-dimensional interference factors, resulting in limited prediction accuracy, especially with significantly increased errors in long-haul (over 7 days) scenarios. With the development of artificial intelligence, machine learning methods such as Support Vector Machines (SVMs) and Random Forests have been gradually introduced into ETA prediction. However, existing methods still suffer from insufficient modeling capabilities when handling multi-source heterogeneous features (such as weather data, port congestion status, and historical ship delay records), and struggle to effectively interpret the prediction results. Furthermore, Large Language Models (MLMs) are increasingly important in this area. In recent years, large language models (LLMs) have shown great potential in natural language understanding, knowledge reasoning and task planning, enabling human-computer intelligent interaction and decision assistance. However, large language models are essentially probabilistic text generation models, lacking the ability to accurately model time-series numerical data, and are difficult to apply directly to high-precision ETA prediction tasks.

[0003] However, current common solutions have many drawbacks, including: traditional linear extrapolation methods are difficult to cope with complex dynamic environments, data utilization is limited in dimension, and prediction accuracy is limited; although machine learning methods have improved, they are insufficient in modeling high-dimensional heterogeneous features, have imperfect missing value handling mechanisms, and do not fully utilize temporal dependencies; existing prediction systems are mostly "black box" models, lacking interpretability, users cannot understand the basis of prediction, and human-computer interaction capabilities are weak; although large language models perform well in natural language understanding, they lack the ability to accurately model time-series numerical data, and their reliance on public cloud API services poses data security risks, making them difficult to apply directly to ETA prediction; existing technologies have not yet achieved effective integration of high-precision numerical prediction and semantic intelligence, failing to form an integrated closed loop of "prediction-interpretation-decision-interaction," and lack a system architecture that meets the high-level security protection requirements of ports. Summary of the Invention

[0004] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.

[0005] In view of the problems existing in the ship arrival prediction intelligent agent system and method based on the fusion of XGBoost and large language model, the present invention is proposed.

[0006] Therefore, the purpose of this invention is to provide a ship arrival prediction intelligent agent system and method based on the fusion of XGBoost and large language models. It is applicable to solving the problems of traditional linear extrapolation methods being unable to cope with complex dynamic environments, having a single data utilization dimension, and limited prediction accuracy; although machine learning methods have improved, they are insufficient in modeling high-dimensional heterogeneous features, have imperfect missing value handling mechanisms, and do not fully utilize temporal dependencies; existing prediction systems are mostly "black box" models, lacking interpretability, users cannot understand the basis of prediction, and have weak human-computer interaction capabilities; although large language models perform well in natural language understanding, they lack the ability to accurately model time-series numerical data, and their reliance on public cloud API services poses data security risks, making them difficult to apply directly to ETA prediction; existing technologies have not yet achieved effective integration of high-precision numerical prediction and semantic intelligent agents, failing to form an integrated closed loop of "prediction-interpretation-decision-interaction," and also lacking a system architecture that meets the high-level safety protection requirements of ports.

[0007] To solve the above-mentioned technical problems, the present invention provides the following technical solution: Firstly, embodiments of the present invention provide a ship arrival prediction agent method based on the fusion of XGBoost and a large language model. This method includes real-time collection of ship dynamic data, port information, marine meteorological data, and ship static attributes; processing of missing values ​​and construction of temporal features in the collected data to form a structured feature set; inputting the feature set into a pre-trained XGBoost regression model to output the remaining arrival time of the ship, and extracting feature importance and prediction error signs to generate a structured analytical result; desensitizing the structured analytical result by removing sensitive fields and retaining only semantic summary features; injecting the desensitized summary features into a preset prompt template; generating readable explanatory text through a large language model fine-tuned in the port scheduling domain, including delay cause analysis and berthing suggestions; and returning the generated explanatory text to the user through a web front-end, achieving integrated prediction-explanation-interaction.

[0008] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and large language model described in this invention, the missing value processing includes: performing bidirectional linear interpolation on continuous variables grouped by voyage, and filling missing values ​​that still exist after interpolation with the global median.

[0009] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and large language model described in this invention, the construction of the time series features includes: generating lag features, dynamic change features, spatial distance, weather-navigation interaction terms, time period encoding, and navigation progress features.

[0010] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in this invention, the following features are included: the lag features include generating t-1, t-2, and t-3 step lag terms for key variables; the dynamic change features include velocity change, acceleration, and position offset; the spatial distance is calculated based on the Haversine formula to determine the spherical distance from the current point to the destination port; the weather-navigation interaction features include the product of wind speed and significant wave height, and the ratio of ship speed to wind speed; the time period encoding includes sine / cosine encoding of hours, days of the week, and months; and the navigation progress feature is the ratio of the current time step to the time already navigating.

[0011] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and large language model described in this invention, the XGBoost model includes a quality assurance step before prediction: intercepting infinite values / NaN in the feature matrix, replacing inf / -inf with NaN and filling with the median; training long-term samples (T-7 days or more) and short-term samples (less than 7 days) separately, and evaluating the prediction based on the sample time history.

[0012] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and large language model described in this invention, the structured parsing results include: feature importance ranking by semantic category, absolute error of each prediction, and error sign.

[0013] As a preferred embodiment of the ship arrival prediction agent method based on the fusion of XGBoost and large language model described in this invention, the security desensitization process ensures that the output semantic summary features do not contain sensitive fields such as ship precise coordinates, shipowner information, and commercial contracts.

[0014] Secondly, to further address the aforementioned technical problems, this invention provides a ship arrival prediction intelligent agent system based on the fusion of XGBoost and a large language model. This system includes: a data acquisition module for real-time acquisition of ship dynamic data, port information, marine meteorological data, and ship static attributes; processing of missing values ​​and construction of temporal features in the acquired data to form a structured feature set; a feature extraction module for inputting the feature set into a pre-trained XGBoost regression model, outputting the remaining arrival time of the ship, and extracting feature importance and prediction error signs to generate a structured analytical result; a feature retention module for desensitizing the structured analytical result, removing sensitive fields, and retaining only semantic summary features; a text generation module for injecting the desensitized summary features into a preset prompt template, generating readable explanatory text through a large language model fine-tuned in the port scheduling domain, including delay cause analysis and berthing suggestions; and a text feedback module for returning the generated explanatory text to the user via a web frontend, achieving integrated prediction-explanation-interaction.

[0015] Thirdly, embodiments of the present invention provide a computer device, including a memory and a processor, wherein the memory stores a computer program, wherein: when the computer program is executed by the processor, it implements any step of the ship arrival prediction agent method based on the fusion of XGBoost and large language model as described in the first aspect of the present invention.

[0016] Fourthly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, wherein: when the computer program is executed by a processor, it implements any step of the ship arrival prediction agent method based on the fusion of XGBoost and large language model as described in the first aspect of the present invention.

[0017] The beneficial effects of this invention are as follows: By organically integrating XGBoost with a large language model, this invention constructs a ship arrival prediction intelligent agent system with complete capabilities of "high-precision prediction—structured parsing—secure desensitization—semantic generation—natural interaction." In terms of prediction accuracy, through multi-source heterogeneous data fusion, enhanced missing value processing, and rich temporal feature engineering, the XGBoost model can accurately capture complex navigation patterns, achieving high-precision prediction of long-range ETA. In terms of interpretability, by outputting feature importance ranking and prediction error signs, and combining this with a large language model to generate natural language explanatory text, it breaks through the existing "black box" approach. The limitations of the model enable dispatchers to understand the basis of predictions and attribute deviations. At the interactive experience level, the intelligent agent based on the large language model supports multi-turn dialogue and decision suggestion generation, reducing the system's usage threshold with readable text and achieving a leap from "information provision" to "decision assistance." At the data security level, sensitive fields are removed through pre-processing to remove sensitive data, retaining only semantic summary features for the large language model to generate interpretations. This ensures that core shipping data does not leave the internal network, meeting high-level security protection requirements. Ultimately, an integrated intelligent agent system is formed from raw data input to interactively interpretable output, significantly improving port operation efficiency and supply chain collaboration. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein: Figure 1 This is a diagram of the overall system architecture of the present invention in Example 1.

[0019] Figure 2 This is a flowchart of the training and inference process of the XGBoost prediction model of the present invention in Example 1. Detailed Implementation

[0020] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0021] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0022] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0023] Example 1 Reference Figure 1 and Figure 2 This is the first embodiment of the present invention, which provides a ship arrival prediction agent method based on the fusion of XGBoost and a large language model, including the following steps: S1: Real-time collection of ship dynamic data, port information, marine meteorological data, and ship static attributes; processing of missing values ​​and construction of time-series features in the collected data to form a structured feature set.

[0024] Furthermore, missing value handling includes: performing bidirectional linear interpolation on continuous variables grouped by flight, and filling missing values ​​that still exist after interpolation with the global median.

[0025] Specifically, bidirectional linear interpolation of continuous variables grouped by voyage means that for time-series data of the same voyage, the data is sorted according to the chronological order of the timestamps, and the missing continuous variables (such as speed, longitude, and latitude) in the middle are filled by linear fitting using two consecutive non-missing observations. If there are consecutive missing values ​​at the beginning or end of the voyage, backward or forward filling is used. For isolated missing points that cannot be filled after interpolation, the median of the variable in the full historical dataset is used for global filling to avoid interference from extreme values ​​to model training.

[0026] Furthermore, the construction of time-series features includes: generating lag features, dynamic change features, spatial distance, meteorological-navigation interaction terms, time period coding, and navigation progress features.

[0027] Specifically, the lag characteristics include generating t-1, t-2, and t-3 lag terms for key variables; Specifically, key variables include at least ship speed, wind speed, and significant wave height; the t-1, t-2, and t-3 step lag terms represent the current observation time by 1, 2, and 3 sampling time points respectively (for example, if the AIS data reporting interval is 15 minutes, the lag terms represent the variable values ​​15 minutes, 30 minutes, and 45 minutes ago respectively), used to capture the historical continuity of the ship's motion state.

[0028] Dynamic change characteristics include velocity changes, acceleration, and position shift; Specifically, the speed change is the difference between the current speed and the speed at time t-1; Acceleration is the ratio of the change in velocity to the sampling time interval; Position offset is the spherical distance between the latitude and longitude at the current time and the latitude and longitude at time t-1, used to characterize the ship's maneuvering status.

[0029] Spatial distance is calculated based on the Haversine formula to determine the spherical distance from the current point to the destination port. The meteorological-navigation interaction terms include the product of wind speed and significant wave height, and the ratio of air speed to wind speed; Specifically, the product of wind speed and significant wave height is used to characterize the overall severity of sea conditions; The ratio of ship speed to wind speed is used to quantify the sensitivity of a ship's navigation to wind resistance. To avoid the denominator being zero, a minimum wind speed threshold (such as 0.1 m / s) is set during the calculation. Time period encoding includes sine / cosine encoding for hours, days of the week, and months; Specifically, to avoid treating time as a linear feature and causing the model to misunderstand the periodicity, trigonometric functions are used to encode time: for example, hours are encoded as sin(2π·hour / 24) and cos(2π·hour / 24), days of the week are encoded as sin(2π·week / 7) and cos(2π·week / 7), and months are encoded as sin(2π·month / 12) and cos(2π·month / 12), making the distance between 23:00 and 0:00 in the feature space relatively close, which conforms to the periodicity of time.

[0030] The sailing progress characteristic is the ratio of the current time step to the sailing time already traveled.

[0031] Specifically, the sailing progress = current sailing time / (current sailing time + remaining estimated sailing time). This feature is unknown in the early stages of model training, so it is only calculated iteratively by the model during the inference phase or the global average progress is used as an approximate feature to reflect the ship's relative position throughout the voyage.

[0032] S2: Input the feature set into the pre-trained XGBoost regression model, output the remaining arrival time of the ship, extract the feature importance and prediction error sign, and generate structured analytical results.

[0033] Preferably, the XGBoost model includes a quality assurance step before prediction: intercepting infinite values / NaN in the feature matrix, replacing inf / -inf with NaN and filling with the median; The system was trained on long-term samples (T-7 days or more) and short-term samples (T-7 days or less), and the predictions were evaluated based on the sample time periods.

[0034] Specifically, the pre-training process of the XGBoost regression model includes: constructing a training set based on historical AIS data, using the actual remaining arrival time of ships as labels, employing mean squared error as the loss function, controlling overfitting through the early stop method, and optimizing the model hyperparameters through grid search.

[0035] Feature importance is calculated based on the built-in Gain metric of XGBoost, which is the average reduction in impurity that each feature brings when splitting the tree model; The prediction error sign refers to the sign of the difference between the model's predicted value and the actual value. A negative value indicates that the prediction was too early (the actual arrival was later than the prediction), and a positive value indicates that the prediction was too late (the actual arrival was earlier than the prediction).

[0036] Preferably, the training process of the XGBoost regression model includes the following sub-steps: Training data preparation: Based on historical AIS data, meteorological data, and port data, a feature set is constructed according to the method described in S1, and the actual remaining time (in hours) of the ship's arrival at the destination port is used as the label value to form a training sample set.

[0037] Parameter initialization and configuration: Initialize the XGBoost model and set key hyperparameters: number of trees is 300~500, maximum depth is 5~8, learning rate is 0.01~0.1. Use early stopping to monitor the validation set error. Stop training when the validation set error no longer decreases for 50 consecutive rounds to prevent overfitting.

[0038] Iterative optimization: The model iterates using a gradient boosting framework, adding a new regression tree in each round and fitting the residual of the previous tree's prediction. The objective function consists of two parts: a loss function term and a regularization term. , where Ω(f_k)=γT+½λ||w||² is the regularization term used to control model complexity, T is the number of leaf nodes, w is the leaf weight, and γ and λ are penalty coefficients. By minimizing the objective function, the final strong prediction model is gradually integrated and generated.

[0039] Furthermore, the structured parsing result includes at least a JSON data structure containing the following fields: the number of remaining hours for prediction, a dictionary of feature importance ({feature name: contribution}), error sign, absolute error value, and feature contribution aggregated by semantic category (such as meteorological, navigation, and space categories) for subsequent module calls.

[0040] S3: Desensitize the structured parsing results, remove sensitive fields, and retain only semantic summary features.

[0041] Preferably, the structured parsing results include: feature importance ranking by semantic category, absolute error and error sign for each prediction.

[0042] Furthermore, the security desensitization process ensures that the output semantic summary features do not contain sensitive fields such as the ship's precise coordinates, shipowner information, and commercial contracts.

[0043] Specifically, removing sensitive fields includes deleting fields from the original data that can identify specific vessels or commercial entities, such as IMO number, MMSI, precise latitude and longitude coordinates, shipowner name, and cargo owner information.

[0044] Semantic summarization features refer to transforming specific numerical values ​​into fuzzy or hierarchical descriptions. For example, "wind speed 12.5 m / s" is transformed into "wind speed is relatively high", "remaining distance 345 nautical miles" is transformed into "remaining voyage is relatively long", and "historical on-time rate 92%" is transformed into "on-time performance is good".

[0045] Furthermore, the desensitization process also includes generalization rules: for example, coarsening the latitude and longitude coordinates into a grid and retaining them to one decimal place (approximately 10 kilometers in precision), thus ensuring semantic interpretation capabilities while completely avoiding the risk of leaking precise locations.

[0046] Specifically, the anonymized data structure still maintains the JSON format, but the field content has been replaced with the aforementioned semantic digest.

[0047] Furthermore, the desensitization rules can be configured according to the port information security level protection requirements, and support custom sensitive field lists and semantic mapping tables to adapt to different compliance scenarios.

[0048] S4: Inject the desensitized summary features into the preset prompt template, and generate readable explanatory text through a large language model fine-tuned in the port scheduling field, including delay cause analysis and berthing suggestions.

[0049] Preferably, the large language model fine-tuned in the field of port scheduling refers to a model that is based on a general large language model base (such as LLaMA, ChatGLM, etc.) and fine-tuned with instructions using vertical domain corpora such as port scheduling logs, ship delay reports, and pilotage operation manuals, so that it can master shipping terminology and scheduling logic.

[0050] Specifically, the preset prompt template is as follows: "Based on the following anonymized ship status summary: [Summary Features], please generate an ETA explanation text for the port dispatcher, including: 1. Predicted arrival time; 2. Analysis of major delay factors; 3. Berthing or resource scheduling suggestions for the current situation, requiring concise and professional language." The system will fill the template with the anonymized feature dictionary to form a complete input prompt.

[0051] Specifically, an example of readable explanatory text generated by the large language model is: "The vessel is expected to arrive at port at 14:00 on May 20th, a delay of approximately 6 hours from the scheduled arrival time. The main reason is that the effective wave height in the sea area is as high as 4 meters. To ensure safety, the vessel has proactively reduced its speed. It is recommended to dynamically adjust the original berthing plan and postpone the pilotage time at berth A to after 16:00." This text integrates numerical prediction (provided by XGBoost) and semantic understanding (provided by LLM).

[0052] S5: The generated explanatory text is returned to the user through the web front-end, realizing the integration of prediction, explanation and interaction.

[0053] Specifically, the web front-end provides an interactive dialogue interface, allowing users to input queries in natural language, such as "Why is MSC XXX ship late again?" or "How many ships are expected to be delayed in the next 24 hours?" The system back-end combines the user's question with desensitized contextual information and calls the large language model again for multi-turn dialogue to achieve continuous interaction.

[0054] Specifically, the web front-end interface includes at least: a ship list dashboard, an ETA forecast value display, an explanatory text display area, and a dialog input box. The front-end communicates with the business server deployed on the intranet via an API gateway to ensure secure data transmission.

[0055] In summary, this invention, through the organic integration of XGBoost and a large language model, constructs a ship arrival prediction intelligent agent system with complete capabilities including "high-precision prediction, structured parsing, secure desensitization, semantic generation, and natural interaction." Regarding prediction accuracy, through multi-source heterogeneous data fusion, enhanced missing value processing, and rich temporal feature engineering, the XGBoost model can accurately capture complex navigation patterns, achieving high-precision prediction of long-range ETA. Regarding interpretability, by outputting feature importance ranking and prediction error signs, and combining this with the large language model to generate natural language explanation text, it breaks through the limitations of existing "black box" models. Limitations are addressed by enabling dispatchers to understand the basis for forecasts and attribute deviations. At the interactive experience level, the intelligent agent based on a large language model supports multi-turn dialogues and decision suggestion generation, lowering the barrier to entry for system use with readable text and achieving a leap from "information provision" to "decision assistance." At the data security level, sensitive fields are removed through pre-processing to remove data stigmatization, retaining only semantic summary features for interpretation by the large language model. This ensures that core shipping data remains within the internal network, meeting high-level security protection requirements. Ultimately, this forms an integrated intelligent agent system from raw data input to interactively interpretable output, significantly improving port operational efficiency and supply chain collaboration.

[0056] Example 2, an embodiment of the present invention, provides a ship arrival prediction intelligent agent system based on the fusion of XGBoost and a large language model, comprising: a data acquisition module for real-time acquisition of ship dynamic data, port information, marine meteorological data, and ship static attributes; processing of missing values ​​and construction of temporal features in the acquired data to form a structured feature set; a feature extraction module for inputting the feature set into a pre-trained XGBoost regression model, outputting the remaining arrival time of the ship, and extracting feature importance and prediction error signs to generate a structured parsing result; a feature retention module for desensitizing the structured parsing result, removing sensitive fields, and retaining only semantic summary features; a text generation module for injecting the desensitized summary features into a preset prompt template, generating readable explanatory text through a large language model fine-tuned in the port scheduling domain, including delay cause analysis and berthing suggestions; and a text feedback module for returning the generated explanatory text to the user through a web front-end, realizing the integration of prediction, explanation, and interaction.

[0057] Example 3 is an embodiment of the present invention, which differs from the previous embodiment in that: If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0058] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-including system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0059] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0060] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0061] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A ship arrival prediction agent method based on the fusion of XGBoost and a large language model, characterized by: include: Real-time collection of ship dynamic data, port information, marine meteorological data, and ship static attributes; processing of missing values ​​and construction of time-series features in the collected data to form a structured feature set; The feature set is input into the pre-trained XGBoost regression model, which outputs the remaining arrival time of the ship, and extracts the feature importance and prediction error sign to generate structured analytical results. The structured parsing results are anonymized by removing sensitive fields and retaining only semantic summary features; The desensitized summary features are injected into a preset prompt template, and a readable explanatory text is generated through a large language model finely tuned for the port scheduling field, which includes an analysis of the causes of delays and berthing suggestions. The generated explanatory text is returned to the user through a web front-end, realizing the integration of prediction, explanation, and interaction.

2. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 1, characterized in that: The missing value handling includes: performing bidirectional linear interpolation on continuous variables grouped by flight number, and filling missing values ​​that still exist after interpolation with the global median.

3. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 1, characterized in that: The construction of the time-series features includes: generating lag features, dynamic change features, spatial distance, meteorological-navigation interaction terms, time period encoding, and navigation progress features.

4. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 3, characterized in that: The lag features include generating t-1, t-2, and t-3 lag terms for key variables; The dynamic change characteristics include velocity change, acceleration, and position shift; The spatial distance is calculated based on the Haversine formula to determine the spherical distance from the current point to the destination port. The meteorological-navigation interaction terms include the product of wind speed and significant wave height, and the ratio of air speed to wind speed; The time period encoding includes sine / cosine encoding for hours, days of the week, and months; The navigation progress feature is the ratio of the current time step to the time already traveled.

5. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 1, characterized in that: The XGBoost model includes a quality assurance step before prediction: intercepting infinite values / NaN in the feature matrix, replacing inf / -inf with NaN and filling with the median; The system was trained on long-term samples (T-7 days or more) and short-term samples (T-7 days or less), and the predictions were evaluated based on the sample time periods.

6. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 1, characterized in that: The structured parsing results include: feature importance ranking by semantic category, absolute error and error sign for each prediction.

7. The ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in claim 1, characterized in that: The security desensitization process ensures that the output semantic summary features do not contain sensitive fields such as the ship's precise coordinates, shipowner information, and commercial contracts.

8. A ship arrival prediction agent system based on the fusion of XGBoost and a large language model, based on the ship arrival prediction agent method based on the fusion of XGBoost and a large language model as described in any one of claims 1 to 7, characterized in that: include, The data acquisition module is used to collect real-time ship dynamic data, port information, marine meteorological data, and ship static attributes. It performs missing value processing and time-series feature construction on the collected data to form a structured feature set. The feature extraction module is used to input the feature set into the pre-trained XGBoost regression model, output the remaining arrival time of the ship, extract the feature importance and prediction error sign, and generate structured analytical results. The feature preservation module is used to desensitize the structured parsing results, remove sensitive fields, and retain only semantic summary features; The text generation module is used to inject the desensitized summary features into the preset prompt template, and generate readable explanatory text through a large language model finely tuned for the port scheduling domain, which includes delay cause analysis and berthing suggestions; The text feedback module is used to return the generated explanatory text to the user through the web front end, realizing the integration of prediction, explanation and interaction.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the ship arrival prediction agent method based on the fusion of XGBoost and large language model as described in any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the computer program is executed by the processor, it implements the steps of the ship arrival prediction agent method based on the fusion of XGBoost and large language model as described in any one of claims 1 to 7.