An ais data-based busy water area macro basic map acquisition method and system

By generating a macroscopic basic map of waterways using AIS data-based methods, the problem of difficulty in identifying water traffic flow status was solved, enabling accurate statistics and scientific management of water traffic flow status.

CN121682102BActive Publication Date: 2026-06-26WUHAN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WUHAN UNIV OF TECH
Filing Date
2025-12-10
Publication Date
2026-06-26

Smart Images

  • Figure CN121682102B_ABST
    Figure CN121682102B_ABST
Patent Text Reader

Abstract

The application provides an AIS data-based busy water area macroscopic basic graph acquisition method and system, an electronic device and a storage medium. By mining ship automatic identification system data, a water area macroscopic basic graph is constructed with water area ship cumulative amount as a state index and water area boundary outflow as a performance index, and a time sequence sequence capable of reflecting the number of ships in the water area and the outflow efficiency is formed. The two sequences are associated to generate scatter data representing the macroscopic relationship. Then, clustering analysis technology is used to automatically divide the scatter points into clusters corresponding to three typical states of free flow, saturated flow and congested flow. The data in each state cluster is separately subjected to piecewise linear fitting to construct a trapezoidal macroscopic basic graph model with a clear rising section, a stable section and a falling section. A series of key characteristic parameters such as a critical inflection point indicating state transition, a saturated flow representing the maximum traffic capacity and a slope reflecting the efficiency change rate are extracted, thereby completing the transformation from the original data to the macroscopic cognitive model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent water traffic management technology, and in particular to a method and system for obtaining macroscopic basic maps of busy waterways based on AIS data. Background Technology

[0002] The Macroscopic Fundamental Diagram (MFD) is a core tool for depicting the overall operational status of a transportation network. It has been widely applied in the field of road traffic for traffic control, congestion identification, and efficiency assessment, and related research has developed three main methods for acquiring this data: theoretical derivation, simulation data, and actual data. With the surge in waterway traffic, bottlenecks in busy waterways have become increasingly prominent. However, MFD research in the field of ship traffic flow is currently lacking, thus failing to provide support for the macroscopic assessment of waterway traffic conditions.

[0003] Automatic Identification Systems (AIS) serve as a core data source for vessel traffic flow research, encompassing both static (MMSI number, vessel type, etc.) and dynamic (position, speed, etc.) information about vessels. AIS data has been applied in navigation decision-making and flow analysis. However, AIS data suffers from errors, duplication, and missing data, and cannot directly extract outflow at water boundaries and vessel flow at key cross-sections. Traditional observation stations, due to their fixed locations, also struggle to obtain these core indicators, limiting the construction of macroscopic models of maritime traffic.

[0004] In existing technologies, methods for obtaining road traffic MFD cannot be directly transferred to waterborne scenarios. The large size of ships and the complex navigation environment render traditional road surface detection methods ineffective. At the same time, AIS data processing only focuses on data cleaning and has not developed an extraction scheme for core indicators of waterborne MFD. This results in the inability to accurately identify the free flow, maximum flow, and congestion states of traffic flow in busy waterways, making it difficult to support scientific traffic control decisions. Summary of the Invention

[0005] This invention provides a method, system, electronic device, and storage medium for acquiring macroscopic basic maps of busy waterways based on AIS data. This addresses the technical problems in the current technology, such as the lack of research on macroscopic basic maps (MFD) in the field of ship traffic flow, the inability of traditional ship size methods to calculate the equivalent number of standard ships to reflect the mutual influence of ships in inland waterways and the existence of limitations, and the fact that existing ship traffic control measures are mostly based on experience, lack scientific demonstration, are difficult to achieve proactive preventive control, and have unstable effects.

[0006] In a first aspect, embodiments of the present invention provide a method for obtaining a macroscopic basic map of busy waterways based on AIS data, including:

[0007] S1. Acquire dynamic AIS (Automatic Identification System) data of vessels in the target waters within a preset historical time period;

[0008] S2. Based on the dynamic AIS data, count the number of ships leaving the target water area within multiple consecutive preset time windows to form a water area boundary outflow sequence;

[0009] S3. Based on the dynamic AIS data, count the number of vessels in the target waters within each preset time window to form a cumulative vessel count sequence for the waters.

[0010] S4. Generate WMFD scatter plot data of the water area based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary;

[0011] S5. Perform cluster analysis on the WMFD scatter data to divide the WMFD scatter data into three data clusters representing free flow state, saturated flow state and congested flow state respectively.

[0012] S6. Perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model that includes an ascending segment, a stationary segment, and a descending segment.

[0013] S7. Based on the WMFD fitting model, determine the characteristic parameters used to characterize the macroscopic traffic flow state of the target water area. The characteristic parameters include the first inflection point between the rising segment and the steady segment, the second inflection point between the steady segment and the falling segment, the saturation flow value of the steady segment, and the slope of the rising segment and the falling segment.

[0014] Preferably, in step S1, the dynamic AIS data includes information on the ship's position, speed, heading, and acquisition time. After acquiring the dynamic AIS data, the process further includes preprocessing the dynamic AIS data.

[0015] Based on the actual operating conditions of vessels in the target waters, a reasonable range of values ​​for speed and heading is preset, and abnormal data that exceeds the reasonable range is eliminated.

[0016] If the amount of abnormal data corresponding to the dynamic AIS data of the same Maritime Mobile Service Identifier (MMSI) number exceeds a preset threshold, all dynamic AIS data corresponding to the MMSI number will be deleted.

[0017] For duplicate dynamic AIS data, only the first dynamic AIS data collected is retained, and the remaining duplicate data is deleted.

[0018] Preferably, S2 specifically includes:

[0019] A virtual detection section perpendicular to the waterway is set at the preset boundary of the target water area;

[0020] For each vessel, a trajectory segment is generated based on the vessel's continuous dynamic AIS position data;

[0021] Determine whether the trajectory line segment and the virtual detection section line segment meet the spatial intersection condition. If the spatial intersection condition is met, determine whether the ship has left the target water area based on the ship's movement direction represented by the trajectory line segment. The spatial intersection condition is: determine whether the circumscribed rectangle of the trajectory line segment and the virtual detection section line segment intersects. If the circumscribed rectangle intersects, further determine whether the trajectory line segment and the virtual detection section line segment cross each other. The condition for crossing each other is that the endpoints of each of the two line segments are located on both sides of the line containing the other line segment.

[0022] The number of ships determined to have departed within each preset time window is accumulated to form the outflow sequence of the water area boundary.

[0023] Preferably, S3 specifically includes:

[0024] Determine the geographical boundary of the target water area;

[0025] For each preset time window, based on the dynamic AIS data, the position point sequence belonging to the same ship is interpolated to obtain continuous trajectory data of the ship with uniform time intervals within the time window;

[0026] Based on the continuous trajectory data, the number of all vessels located within the geographical boundary at any sampling time within the preset time window is counted, and the counted number is taken as the cumulative number of vessels in the water area corresponding to that time window.

[0027] Preferably, S4 specifically includes:

[0028] Using the cumulative values ​​of each vessel in the water area cumulative volume sequence as the abscissa and the corresponding outflow values ​​of each vessel in the water area boundary outflow sequence as the ordinate, a scatter distribution is formed in a two-dimensional coordinate system to obtain WMFD scatter data.

[0029] The scattered distribution covers the macroscopic traffic flow changes of ships in the target waters from free flow to saturated flow to congested flow.

[0030] Preferably, S5 specifically includes:

[0031] Based on the WMFD scatter data, initialize the parameters of the mixture model containing three Gaussian components;

[0032] Iterative calculations are performed based on the expectation-maximization algorithm. In each iteration, the posterior probability of each scatter point belonging to each Gaussian component is calculated according to the current mixture model parameters, and the mixture model parameters of each Gaussian component are updated according to the posterior probability.

[0033] When the change in the parameters of the hybrid model is less than a preset threshold or the maximum number of iterations is reached, the iteration stops, and each scatter point is divided into the data cluster corresponding to the Gaussian component with the highest probability according to the final posterior probability, thereby completing the clustering of the WMFD scatter point data. The three data clusters obtained correspond to the free flow state, saturated flow state and congested flow state, respectively.

[0034] Preferably, S6 specifically includes:

[0035] For the data points in the three data clusters corresponding to the free flow state, saturated flow state, and congested flow state, respectively, the least squares method is used to perform linear fitting.

[0036] The fitted straight line corresponding to the free-flow state data cluster is defined as the rising segment of the WMFD fitting model;

[0037] The fitting results corresponding to the saturated flow state data cluster are processed into horizontal line segments representing the saturated flow rate and defined as the stationary segment of the WMFD fitting model.

[0038] The fitted straight line corresponding to the congestion flow state data cluster is defined as the descending segment of the WMFD fitting model;

[0039] Based on the rising segment, the stable segment, and the falling segment, a trapezoidal WMFD fitting model is constructed.

[0040] Secondly, embodiments of the present invention provide a system for acquiring a macroscopic basic map of busy waterways based on AIS data, comprising:

[0041] The data acquisition module is used to acquire dynamic AIS (Automatic Identification System) data of vessels in the target waters within a preset historical time period;

[0042] The boundary outflow statistics module is used to count the number of ships leaving the target water area within multiple consecutive preset time windows based on the dynamic AIS data, forming a water area boundary outflow sequence.

[0043] The cumulative quantity statistics module is used to count the number of vessels in the target water area within each preset time window based on the dynamic AIS data, and form a cumulative quantity sequence of vessels in the water area.

[0044] The scatter data generation module is used to generate WMFD scatter data of the water area macroscopic basic map based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary.

[0045] The clustering analysis module is used to perform clustering analysis on the WMFD scatter data, dividing the WMFD scatter data into three data clusters representing free flow state, saturated flow state and congested flow state respectively.

[0046] The fitting modeling module is used to perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model that includes rising segments, stationary segments, and falling segments.

[0047] The parameter determination module is used to determine characteristic parameters for characterizing the macroscopic traffic flow state of the target water area based on the WMFD fitting model. The characteristic parameters include the first inflection point between the rising and steady segments, the second inflection point between the steady and falling segments, the saturation flow value of the steady segment, and the slopes of the rising and falling segments.

[0048] Thirdly, embodiments of the present invention provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps of the method for obtaining a macroscopic basic map of busy waterways based on AIS data as described in the first aspect of the present invention.

[0049] Fourthly, embodiments of the present invention provide a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method for obtaining a macroscopic basic map of busy waterways based on AIS data as described in the first aspect of the present invention.

[0050] This invention provides a method, system, electronic device, and storage medium for acquiring a macroscopic basic map of busy waterways based on AIS data. Addressing the lack of tools for quantifying macroscopic dynamic characteristics in the field of vessel traffic flow, this invention creatively applies the macroscopic basic map theory from road traffic to busy waterways for the first time. By mining data from the Automatic Identification System (AIS), a macroscopic basic map of the waterway is constructed, using "cumulative vessel volume in the waterway" as a state indicator and "outflow volume at the waterway boundary" as a performance indicator. The technical implementation path is as follows: First, a time-series sequence reflecting the number of vessels in the waterway and outflow efficiency is acquired and statistically analyzed. Then, these two sequences are correlated to generate scatter plot data representing macroscopic relationships. Subsequently, cluster analysis is used to automatically divide the scatter plots into clusters corresponding to three typical states: free flow, saturated flow, and congested flow. Next, piecewise linear fitting is performed on the data within each state cluster to construct a trapezoidal macroscopic basic map model with clearly defined rising, stable, and falling segments. Finally, a series of key feature parameters are extracted from this model, such as the critical inflection point marking state transitions, the saturated flow representing maximum capacity, and the slope reflecting the rate of efficiency change, thereby completing the transformation from raw data to a macroscopic cognitive model. It successfully solved the problem that AIS data cannot directly obtain the flow rate of key sections and is difficult to depict the overall operation of the water area. Through automatic extraction by algorithms and virtual detection technology, it has achieved efficient and accurate statistics on the outflow of water area boundaries. Attached Figure Description

[0051] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0052] Figure 1 This is a block diagram of a method for obtaining a macroscopic basic map of busy waterways based on AIS data according to an embodiment of the present invention;

[0053] Figure 2 This is a detailed framework diagram of a method for obtaining a macroscopic basic map of busy waterways based on AIS data according to an embodiment of the present invention;

[0054] Figure 3 This is a flowchart illustrating the statistical flow of water boundary outflow according to an embodiment of the present invention;

[0055] Figure 4 This is a schematic diagram of a rapid rejection test according to an embodiment of the present invention;

[0056] Figure 5 This is a schematic diagram of a straddle test according to an embodiment of the present invention;

[0057] Figure 6This is a WMFD trend chart according to an embodiment of the present invention;

[0058] Figure 7 This is a block diagram of the system structure for obtaining macroscopic basic maps of busy waterways based on AIS data;

[0059] Figure 8 This is a schematic diagram of the physical structure according to an embodiment of the present invention. Detailed Implementation

[0060] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0061] This invention provides a method for obtaining a macroscopic basic map of busy waterways based on AIS data, such as... Figure 1 , Figure 2 As shown, the method includes:

[0062] S1. Obtain dynamic AIS (Automatic Identification System) data for the target waters within a preset historical time period.

[0063] The target water area refers to specific research areas with dense shipping traffic and intersecting shipping routes, such as the Liuwei River estuary on the Yangtze River and Taicang Port. The preset historical time period refers to historical data periods selected according to research needs, covering different hydrological periods (such as dry season and flood season) or traffic patterns. Dynamic AIS data specifically refers to time-series information including real-time ship positions (latitude and longitude), ground speed, heading, and UTC collection timestamps. This step, by clearly defining the spatial and temporal boundaries of the analysis, solves the problems of blind and scattered data collection, providing an accurate and well-organized data foundation for the subsequent construction of a macroscopic model with spatiotemporal consistency, and ensuring that all subsequent analyses are based on a clearly defined and fixed-scope actual traffic scenario.

[0064] S2. Based on the dynamic AIS data, count the number of ships leaving the target water area within multiple consecutive preset time windows to form a water area boundary outflow sequence.

[0065] The number of vessels departing from the target waterway refers to the count of vessels crossing the preset boundary of the waterway within a unit statistical time window, while the outflow sequence at the waterway boundary is the time series of this count. This step automatically and accurately identifies and counts departure events, achieving automated and high-precision measurement of the waterway's output performance and providing crucial ordinate data for measuring system efficiency.

[0066] S3. Based on the dynamic AIS data, count the number of vessels in the target waters within each preset time window to form a cumulative vessel count sequence for the waters.

[0067] The number of vessels in operation specifically refers to the total number of vessels located within the geographical boundary of the target waterway and in a navigation (including maneuvering) state at the statistical time, while the cumulative number of vessels in the waterway is a sequence of its changes over a time window. This step innovatively introduces trajectory interpolation processing (such as cubic spline interpolation) to fit sparse original location points into continuous trajectories with uniform time intervals, thereby accurately determining whether a vessel is within the area at any given time. This solves the problem of unreliably obtaining system state density (i.e., cumulative number of vessels) due to the inherent discontinuity of AIS data, providing accurate and reliable abscissa data for the macroscopic basic map, truly reflecting the traffic load status within the waterway.

[0068] S4. Generate WMFD scatter plot data of the water area based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary.

[0069] The generation of Waterway Macroscopic Fundamental Diagram (WMFD) scatter plot data involves mapping the cumulative amount (x-axis) and outflow amount (y-axis) corresponding to each time window as data points onto a two-dimensional plane to form a scatter plot. This step, by spatiotemporally aligning and pairing the two sequences obtained in S2 and S3, constructs for the first time a raw dataset describing the intrinsic relationships of macroscopic traffic flow in waterways. This overcomes the fundamental limitation of existing technologies, which are restricted to micro or mesoscopic analysis and cannot reveal the overall state and performance relationship of the system. It produces a preliminary raw data set that can be used to explore the universal laws governing macroscopic traffic flow in waterways, making it possible to extract macroscopic patterns from scatter plots.

[0070] S5. Perform cluster analysis on the WMFD scatter data to divide the WMFD scatter data into three data clusters representing free flow state, saturated flow state and congested flow state, respectively.

[0071] The clustering analysis employs a Gaussian mixture model with equal probability. The three data clusters correspond to the three macroscopic phases in traffic flow theory: free flow, saturated flow (maximum flow), and congested flow. This step uses an unsupervised clustering algorithm to automatically classify the scattered points into these three categories based on their inherent distribution characteristics, without relying on subjective thresholds. This solves the problems of subjective and inconsistent manual traffic state classification, which cannot handle complex distributions. It objectively and automatically identifies the different traffic phases implicit in the macroscopic basic map, providing a scientific basis for subsequent segmented modeling and making the state identification process reproducible and generalizable.

[0072] S6. Perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model that includes rising segments, stationary segments, and falling segments.

[0073] Piecewise linear fitting refers to applying the least squares method to fit straight lines to the three clustered data clusters. The WMFD fitting model is an approximate trapezoidal model consisting of an ascending segment (free flow), a stationary segment (saturated flow), and a descending segment (congested flow). This step, based on the clustering results, uses piecewise linear fitting—a simplified yet powerful method—to fit the data, particularly fitting the saturated flow clusters as horizontal line segments to clearly define the maximum capacity. This solves the problem of inconsistent assumptions about the shape of the macroscopic basic diagram and the complexity of its mathematical description, which hinders engineering applications. It yields a trapezoidal macroscopic basic diagram mathematical model with a clear structure and well-defined physical meaning of its parameters. This model is easy to understand, calculate, and apply, providing a concise and effective tool for quantitative analysis.

[0074] S7. Based on the WMFD fitting model, determine the characteristic parameters used to characterize the macroscopic traffic flow state of the target water area. The characteristic parameters include the first inflection point between the rising segment and the steady segment, the second inflection point between the steady segment and the falling segment, the saturation flow value of the steady segment, and the slope of the rising segment and the falling segment.

[0075] The first and second inflection points correspond to the critical accumulation amounts of traffic flow transitioning from free flow to saturated flow and from saturated flow to congested flow, respectively. This step, through model analysis, precisely extracts these core guiding parameters. This addresses the previous problem of lacking quantitative data and relying mainly on experience in waterway traffic management, transforming the macroscopic traffic flow situation into a set of measurable and monitorable key indicators. This provides direct and scientific decision support for assessing waterway traffic efficiency, identifying bottlenecks, and implementing preventative boundary flow control based on critical accumulation amounts.

[0076] Based on the above embodiments, as a preferred implementation, in step S1, the dynamic AIS data includes ship position, speed, heading, and acquisition time information. After acquiring the dynamic AIS data, the method further includes preprocessing the dynamic AIS data:

[0077] Based on the actual operating conditions of vessels in the target waters, a reasonable range of values ​​for speed and heading is preset, and abnormal data exceeding the reasonable range is eliminated; for dynamic AIS data of the same Maritime Mobile Service Identifier (MMS), if the amount of abnormal data exceeds a preset threshold, all dynamic AIS data corresponding to the MMSI number is deleted; for duplicate dynamic AIS data, only the first dynamic AIS data collected is retained, and the remaining duplicate data is deleted.

[0078] The reasonable value range refers to the physically feasible interval of speed and course determined based on typical ship maneuvering behaviors (such as speed change, turning, berthing and unberthing) within the target waters (such as the island navigation area at the mouth of the Liuwei River or the deep-water channel of Taicang Port). This range is used to filter obviously unreasonable data caused by signal interference or equipment failure. MMSI (Maritime Mobile Service Identity) is a unique digital identity identifier for a ship, used to track and associate all data records of a single ship. The preset threshold for abnormal data volume is the upper limit of tolerance set for the same MMSI ship, used to determine whether the ship's AIS equipment is continuously malfunctioning or whether the data is generally unreliable. This embodiment constructs a progressive, multi-level data cleaning mechanism: first, obvious anomalies are filtered based on domain knowledge (reasonable range of speed and course); then, ships with multiple anomalies are completely removed to eliminate the impact of persistent failure sources; finally, duplicate records caused by communication retransmission are removed. This solves the key technical problem that errors, anomalies, and redundant information in the original AIS data, when directly used for macroscopic analysis, will lead to model distortion and unreliable conclusions. Through automated and rule-based preprocessing, the credibility and consistency of the input data are significantly improved, laying a clean and high-quality data foundation for the accurate statistics and stable modeling of all subsequent macro indicators. This ensures that the final macro basic map and its characteristic parameters can truly reflect the operational nature of water traffic flow.

[0079] The concept of a macroscopic basic graph originates from the field of road traffic and is sometimes referred to as a network basic graph. It describes the relationship between the total number of traffic entities within a network and the outbound traffic flow. When the outbound traffic flow reaches its maximum value, if the inbound traffic flow continues to increase, the congestion in the network will worsen until the network collapses. Based on the data obtained from this model, managers can formulate reasonable control measures for traffic guidance.

[0080] The basic diagram has a certain degree of universality, and for road networks with uniform traffic density distribution (also known as homogeneous road networks), the weighted traffic flow can be obtained by observing the movement of vehicles within the road network. and weighted density The parabolic curve between the curves specifically describes the shape of the MFD, and the parameter expression is as shown in equation (1).

[0081] (1)

[0082] In the formula These represent the weighted flow and weighted density of traffic entities in the network, respectively. These represent channels within the network. The cross-sectional flow rate, density, and segment length are all factors considered. Increasing research indicates that MFD describes the relationship between the state of the entire transportation system and its performance, and can be characterized by various parameters, including the output flow of the road network and the number of vehicles present in the water area, the total traffic flow and the average density of the road network, and the total mileage traveled by vehicles within the road network and the total travel time of vehicles.

[0083] This invention draws on the concept of a road macro-basic map to define the descriptive indicators involved in WMFD, selects busy waterways as the research object, and draws their macro-basic maps.

[0084] Define the basic macroscopic map of the water area This represents the relationship between the number of vessels leaving the water boundary per unit time and the total number of vessels within the water area. The vertical axis represents the outflow from the water boundary. The horizontal axis represents the cumulative number of vessels in the waterway. Assuming the water area It is a closed region consisting of several boundaries, among which The boundary allows ships to enter and leave. It refers to the number of ships that cross the boundary from inside the region to outside the region per unit of time. This refers to the total number of all vessels operating within a given area, including vessels berthing, departing, and anchoring within the area. By selecting an appropriate sampling time interval, as shown in equation (2), the set of time points can be obtained as follows: ,

[0085] (2)

[0086] In the formula Sampling interval. Water area exists. The outflow boundary, the first The boundary is Inflow and outflow of water at the water boundary within a certain time period , for The number of vessels that sailed out of the study waters at all boundaries during the time period is shown in Equation (3). for Number of vessels operating in the waters within a given time period;

[0087] (3)

[0088] A complete WMFD should comprehensively describe the entire process of vessels in the studied waters from free navigation to congestion. The macroscopic basic map consists of a large amount of scattered data; in order to obtain a clear and complete macroscopic basic map, a high degree of comprehensiveness in data point coverage is required in parameter acquisition.

[0089] Obtaining a macroscopic basic map typically involves four steps: First, selecting appropriate traffic system parameters as the horizontal and vertical axes of the macroscopic basic map. The horizontal axis usually describes the system state, and the vertical axis describes the system performance. Second, data acquisition: using a high sampling rate and omnidirectional sampling, real-time or historical traffic data is collected, and the collected data undergoes preprocessing, including noise reduction, anomaly detection, and interpolation. Third, the horizontal and vertical axis data are plotted on a two-dimensional plane to construct a scatter plot of the macroscopic basic map. Fourth, data analysis: through analysis of the macroscopic basic map and cluster fitting, information such as traffic state trends and congestion causes can be obtained.

[0090] After determining the parameters of WMFD, they need to be extracted from actual data. In the field of road traffic, photographic methods or detection equipment installed at intersections are commonly used to investigate traffic flow characteristics such as traffic volume and traffic density. However, ships are much larger in shape and size than cars, making it impossible to use this method to record ship behavior.

[0091] The AIS (Air Traffic Management System) is a technology used for ship traffic management and surveillance. This system transmits ship information to other nearby vessels and shore-based stations via AIS equipment on board the vessel. This ship information is called AIS data. AIS data can be transmitted via satellite, radio, or other means, providing real-time information on the ship's operational status to navigators, ship management agencies, and other relevant parties. AIS data is divided into static and dynamic data, as shown in Table 1. Dynamic data includes ship position, speed, and heading, and is collected by the intelligent identification system installed on the ship. Static data includes MMSI (Maintainer Maritime Safety Administration) number, ship type, length, beam, and draft, mainly from ship registration files of the China Registry of Ships and registration files of various maritime administration departments.

[0092] Table 1 AIS Data Types

[0093]

[0094] This invention extracts the WMFD index from AIS data and statistically analyzes the macroscopic trend of waterway traffic flow. Due to factors such as marine equipment and the ship's operating environment, AIS data often contains significant errors and missing internal information after undergoing processes such as acquisition, encapsulation, signal reception, digital-to-analog conversion, and decoding. Direct use of this data can adversely affect the analysis results. Therefore, it is essential to detect and repair AIS data, preserving its original characteristics while cleaning up problematic data to meet the data mining needs of subsequent research.

[0095] To calculate the macroscopic basic graph parameters, the raw AIS data needs to be cleaned and preprocessed, and duplicate, erroneous, and missing data needs to be removed based on prior knowledge.

[0096] (1) Data anomalies: During the AIS data parsing process, the parsing algorithm itself has shortcomings, which may result in information exceeding the actual range and not conforming to reality. The research object of this invention is a specific water area. Combining the actual operation of ships in this water area, reasonable ranges are set for various parameters such as speed and heading. Based on the range, judgments are made and out-of-range data is eliminated. At the same time, a maximum number of outliers is set for the data of ships with the same MMSI number, and ships with too many outliers are all eliminated.

[0097] (2) Data duplication: Due to signal transmission failure of AIS equipment, data may be sent multiple times, resulting in data duplication. AIS data volume is large. In order to simplify the data cleaning process and improve cleaning efficiency, regardless of whether the original messages are from the same source, the duplicate data extracted by this invention only retains the first extracted data and deletes duplicate data to ensure that the data is normal and usable.

[0098] Based on the above embodiments, as a preferred implementation, step S2 specifically includes:

[0099] A virtual detection section perpendicular to the waterway is set at the preset boundary of the target water area.

[0100] For each vessel, a trajectory segment is generated based on the vessel's continuous dynamic AIS position data.

[0101] Determine whether the trajectory line segment and the virtual detection section line segment meet the spatial intersection condition. If the spatial intersection condition is met, determine whether the ship has left the target water area based on the ship's movement direction represented by the trajectory line segment. The spatial intersection condition is: determine whether the trajectory line segment and the circumscribed rectangle of the virtual detection section line segment intersect. If the circumscribed rectangle intersects, further determine whether the trajectory line segment and the virtual detection section line segment cross each other. The condition for crossing each other is that the endpoints of the two line segments are located on both sides of the line containing the other line segment.

[0102] The number of ships determined to have departed within each preset time window is accumulated to form the outflow sequence of the water area boundary.

[0103] Among them, the outflow from the water boundary The vertical axis of WMFD (Wide Water Flow Distribution) reflects the operational performance of the shipping traffic system within the waterway. Addressing the inefficiencies and high error rates of traditional manual observation methods, this paper proposes a statistical algorithm for outflow volume at waterway boundaries based on ship AIS data. This algorithm automatically determines whether a ship has passed through the observed waterway boundary section and counts the number of ships passing through the section, enabling automated observation of the cross-sectional flow and timely acquisition of outflow volume. The algorithm's advantages include significantly reducing manpower and observation time costs, lowering the error rate of manual counting, and ensuring the accuracy and reliability of the observation results. This invention proposes to establish cross-sectional sampling points at the controlled waterway boundary, and to control and adjust channel flow by monitoring changes in the outflow volume at the waterway boundary section to ensure ship safety and transportation efficiency. The algorithm can also provide traffic flow early warning and emergency response, ensuring maritime safety and waterway environmental protection. Through the statistical algorithm for outflow volume at waterway boundaries, the optimization and intelligentization of maritime management and waterway planning can be achieved, promoting the sustainable development of the shipping industry.

[0104] The method for statistically analyzing waterway outflow based on ship AIS consists of the following four steps, such as... Figure 3 As shown.

[0105] (1) Extracting AIS data: Delineate the cross section in the direction perpendicular to the waterway, and expand the small rectangular water area outward from the cross section as the center line. Extract all AIS data in the water area according to the observation time period required for the study.

[0106] (2) Process data by MMSI number: Use the uniqueness of MMSI number to classify AIS data, obtain the latitude and longitude of each ship and the corresponding collection time, and sort them according to the collection time to obtain continuous AIS data;

[0107] (3) Ship passing through section event judgment: Traverse all MMSI numbers and check whether the line segment formed by the position points of two consecutive time periods of each ship intersects with the observed section. If they intersect, the ship passes through the section, and the up and down directions are further determined by the vector direction; if they do not intersect, continue to check whether the position point in the next consecutive time period intersects with the observed section, until all AIS data of the ship are traversed, and it is determined that the ship has not passed through the section;

[0108] (4) Outflow statistics: Further screening and accumulation of ships passing through the cross section, and determination of whether ships have flowed out of the observed water area based on the direction of movement of the trajectory line, to obtain the outflow of the water area boundary.

[0109] Judging cross-sectional behavior is a key part of the algorithm, which will determine whether the trajectory line passes through the cross-section in two steps:

[0110] (1) Rapid rejection test

[0111] like Figure 4 As shown, let line segment Construct a rectangle for the diagonal , with line segments Construct a rectangle for the diagonal If rectangle and rectangle They do not intersect; the two line segments do not intersect.

[0112] (2) Straddle test

[0113] like Figure 5 As shown, if line segment and If two line segments intersect, their endpoints are located on opposite sides of the other line segment; this is called mutual straddling. straddle Then the vector and Located in vector Both sides, that is The above equation can be rewritten as follows: .when When, explain and Collinear, so points Online segment Above; similarly, illustrate Must be on the line segment superior.

[0114] In conclusion, the judgment straddle The basis is: Similarly, the judgment straddle The basis is: .

[0115] Based on the above embodiments, as a preferred implementation, step S3 specifically includes:

[0116] The geographical boundary of the target water area is determined; for each preset time window, based on the dynamic AIS data, the position point sequence belonging to the same vessel is interpolated to obtain continuous trajectory data of the vessel with uniform time intervals within the time window; based on the continuous trajectory data, the number of all vessels located within the geographical boundary at any sampling time within the preset time window is counted, and the counted number is taken as the cumulative number of vessels in the water area corresponding to that time window.

[0117] Among them, the cumulative number of vessels in the waterway The horizontal axis of the macroscopic basic graph represents the number of vessels operating within the studied waters per unit time. Due to the varying transmission frequencies of ship AIS equipment—static information is transmitted every 6 minutes, while the transmission rate of dynamic information is more complex—the broadcast frequency is 3 minutes when a vessel is anchored, and increases to 2 seconds when the vessel is traveling at high speed. This makes it impossible for the raw AIS data to capture continuous vessel data, thus hindering accurate calculation of the cumulative number of vessels.

[0118] For acquiring continuous ship data, fitting or interpolation can be used. Fitting aims to obtain a curve that roughly matches the trend of the original data point set, without requiring the curve equation to pass through all points. However, this may compromise data integrity when applied to AIS trajectory data. Interpolation, on the other hand, completes the data for each pair of original continuous data points by setting a certain step size. The resulting straight line or curve will pass through all original data points, so interpolation is chosen here to address trajectory gaps. This invention uses cubic spline interpolation, with a step size of 1 second, where the difference between the later and earlier times in a segmented continuous time interval is set as the step size for that segment, achieving smooth completion of AIS data.

[0119] Spline interpolation is a commonly used interpolation method used to construct a smooth curve given a set of data points. Its advantage lies in that it does not require assumptions about the data points; instead, it constructs a set of low-order polynomials to ensure a continuous and smooth result. In this embodiment, the interpolation function is represented as the sum of a set of piecewise polynomials, where the coefficients of each polynomial are determined by constraints (such as boundary conditions and goodness of fit). Cubic spline interpolation refers to the method of constructing a piecewise continuous second-order derivative interpolation function using cubic polynomials. Specifically, cubic spline interpolation divides the specified data points into several segments, fits a curve to each segment using a cubic polynomial, and ensures that the first and second derivatives of the interpolation function are continuous at each data point. Cubic spline interpolation has shown good application results in spatiotemporal trajectory synchronization and repair. The specific steps of cubic spline interpolation are as follows:

[0120] Step 1: Given a set of discrete data points Based on the sample points, divide into Divide each interval into smaller intervals. Treat it as a small interval and calculate the step size. .

[0121] Step 2: Within each inter-cell interval, fit the sample points using a cubic polynomial, and use this fitting polynomial to represent the expression of the piecewise spline function over that interval. The fitting polynomial function for each inter-cell interval is in the form of... in , , , It is the coefficient between communities. It is the endpoint of the interval.

[0122] Step 3: Solve for the coefficients between cells. In this embodiment, the coefficients between all cells are obtained by using interpolation conditions, namely the first-order continuity and second-order continuity of the piecewise function in each cell interval.

[0123] Step 4: Solve for the boundary conditions of the piecewise function.

[0124] Step 5: Calculate the estimated value at any point based on the coefficients and boundary conditions. .

[0125] The key to the above steps lies in solving the problem. coefficient , , , The boundary of the piecewise function is also shown in the following derivation:

[0126] Given spline curves There is a curve Nodes Node composition For each interval, spline interpolation is performed at every two adjacent points. and Using polynomials Interpolation.

[0127] spline interpolation requires curves The first and second derivatives at the nodes satisfy the following requirements, as shown in equations (4) and (5).

[0128] (4)

[0129] (5)

[0130] In the formula .

[0131] Cubic spline interpolation refers to interpolation using a cubic polynomial, as shown in equation (6). Taking the first and second derivatives of this equation yields... and As shown in equations (7) and (8).

[0132] (6)

[0133] (7)

[0134] (8)

[0135] In the formula, , , , , It represents One unknown coefficient.

[0136] Let the step size be When Y, we can obtain As shown in equation (9), then As shown in equation (10).

[0137] (9)

[0138] (10)

[0139] Since the first derivative and the second derivative are equal, substitute equations (7) and (8) into equations (4) and (5), as shown in equations (11) and (12).

[0140] (11)

[0141] (12)

[0142] set up: Then formula (12) can be written as Rearranging terms yields As shown in equation (13).

[0143] (13)

[0144] The obtained Substituting into formula (10), we can obtain As shown in equation (14). , Substituting into formula (11), all coefficients can be used It is represented as shown in equation (15).

[0145] (14)

[0146] (15)

[0147] according to It needs to be solved. unknown quantity Equation (15) can be used to construct... One by Since the equation has unknowns, we still need to check the first and last parts. and The derivative at that point is subject to restrictions. That is... Specifically, it manifests as , As shown in equation (16).

[0148] (16)

[0149] After obtaining AIS data with a uniform time distribution, the cumulative number of vessels in the waterway is statistically analyzed in the following three steps.

[0150] Step 4-1: Determine the study area: Using the cross-section as the two ends of the boundary, ensuring lateral coverage of all navigable waters, the four boundaries of the quadrilateral of the study area can be obtained. Extract all AIS data within the water area according to the required observation time period.

[0151] Step 4-2: Determine the sampling time window: Considering the actual busyness of the waterway, the statistical time interval of this invention is... Take 1 hour.

[0152] Step 4-3: Cumulative Statistical Analysis: According to By grouping and slicing the AIS data, the actual number of vessels operating in the waterway can be obtained by counting the MMSI numbers.

[0153] Based on the above embodiments, as a preferred implementation, step S4 specifically includes:

[0154] Using the cumulative values ​​of each vessel in the water area's cumulative volume sequence as the abscissa and the corresponding outflow values ​​in the water area's boundary outflow sequence as the ordinate, a scatter distribution is formed in a two-dimensional coordinate system to obtain WMFD scatter data; wherein, the scatter distribution covers the macroscopic traffic flow change process of vessels in the target water area from free flow state through saturated flow state to congested flow state.

[0155] Based on the above embodiments, as a preferred implementation, step S5 specifically includes:

[0156] Based on the WMFD scatter data, initialize the parameters of a mixture model containing three Gaussian components; perform iterative calculations based on the expectation-maximization algorithm. In each iteration, calculate the posterior probability of each scatter point belonging to each Gaussian component based on the current mixture model parameters, and update the mixture model parameters of each Gaussian component based on the posterior probability; stop iterating when the change in the mixture model parameters is less than a preset threshold or the maximum number of iterations is reached, and divide each scatter point into the data cluster corresponding to the Gaussian component with the highest probability based on the final posterior probability, thereby completing the clustering of the WMFD scatter data. The three data clusters obtained correspond to the free flow state, saturated flow state, and congested flow state, respectively.

[0157] Regarding the shape of the MFD, cubic curves were used to more accurately simulate and fit the MFD curve, such as... Figure 6As shown in (a). Its mathematical expression is shown in equation (17).

[0158] (17)

[0159] In the formula, This refers to the outflow from the water boundary. Veh (veh) is the cumulative number of vessels in the waterway. The parameters are the fitting parameters for the curve. MFD exhibits distinct rising, plateauing, and falling phases, with a trapezoidal shape, and it is inferred that there exists a range of network aggregation density reaching maximum flow. At this point, the macroscopic basic diagram corresponds to three stages of traffic flow change: low-density free flow, maximum flow, and high-density congestion. For example... Figure 6 As shown in (b).

[0160] Its mathematical expression is shown in formula (18):

[0161] (18)

[0162] In the formula, This refers to the outflow from the water boundary. This represents the maximum outflow from the water boundary in the system. To study the cumulative number of ships in the waterway; The preceding inflection point corresponds to the cumulative number of ships at the inflection point between the rising and stable phases of the MFD (Mean Demand). The inflection point is the cumulative number of ships corresponding to the inflection point from the steady phase to the downward phase of the MFD. This refers to the cumulative number of vessels during periods of severe waterway congestion. , These are the slopes of the rising and falling segments, respectively. , , These are the constant terms of the fitting functions for the rising segment, the stationary segment, and the falling segment, respectively.

[0163] Based on the above embodiments, as a preferred implementation, step S6 specifically includes:

[0164] For the data points within the three data clusters corresponding to the free-flow state, saturated flow state, and congested flow state, respectively, a least squares method is used to perform linear fitting. The fitted line corresponding to the free-flow state data cluster is defined as the rising segment of the WMFD fitting model. The fitting result corresponding to the saturated flow state data cluster is processed into a horizontal line segment representing the saturated flow rate and defined as the stationary segment of the WMFD fitting model. The fitted line corresponding to the congested flow state data cluster is defined as the falling segment of the WMFD fitting model. Based on the rising segment, the stationary segment, and the falling segment, a trapezoidal WMFD fitting model is constructed.

[0165] Because trapezoidal shapes can more clearly depict changes in waterway traffic flow and obtain accurate values ​​for inflection points, facilitating subsequent application to boundary flow control, this invention selects a trapezoidal distribution as the fitting distribution for the macroscopic basic map of the waterway. The specific piecewise fitting process involves the following steps:

[0166] By plotting the horizontal and vertical coordinate data, namely the outflow from the water boundary and the cumulative number of ships, on a two-dimensional plane, a WMFD scatter plot can be obtained.

[0167] Gaussian mixture clustering was used to determine the intervals corresponding to the rising, stable, and falling segments of WMFD.

[0168] The WMFD analytical expression is solved using least squares fitting.

[0169] To obtain the inflection points from the rising segment to the stationary segment and from the stationary segment to the falling segment in WMFD, the data points extracted by the algorithm need to be segmented. Clustering is one of the important research methods in data mining. It is an unsupervised learning method that can group data without prior knowledge of the data.

[0170] Compared to the traditional K-means clustering algorithm, the Gaussian Mixture Model (GMM) performs better in handling data with non-equivariance variance and high noise levels. Furthermore, as a probabilistic learning method, GMM offers advantages such as flexibility and strong interpretability. Due to the limitations in AIS data quality, resulting in significant noise in the scatter plots of macroscopic basic graphs, the Gaussian Mixture Model is more suitable for clustering macroscopic basic graphs.

[0171] Gaussian Mixture Models (GMMs) are statistical models that describe the distribution of data based on a weighted sum of probability density functions. In a GMM, each component is a Gaussian distribution, and by adjusting the parameters of each Gaussian distribution, complex data distributions can be simulated effectively. GMM clustering refers to an algorithm that uses Gaussian Mixture Models for cluster analysis. Specifically, it describes the probability density of observed data based on a set of Gaussian distributions, obtains a joint probability density function by summing the probability density functions, and then achieves clustering by classifying the data.

[0172] In Gaussian Mixture Model (GMM) clustering, each Gaussian distribution corresponds to a group. Data points are assigned to groups such that the Gaussian distribution of that group can adapt to the distribution of the data points, thereby determining the category. Different Gaussian distributions represent different clusters, and different weights can be assigned to control the size of each cluster. The calculation formula is shown in equation (19).

[0173] (19)

[0174] In the formula, For sample data; The number of models; For the first The weights of a Gaussian model; No. The probability density function of a Gaussian model; and For the mean and variance.

[0175] Gaussian mixture clustering mainly consists of two parts: initialization and iterative update.

[0176] First, during initialization, it's necessary to determine the number of Gaussian distributions, the mean, covariance matrix, and weights of each distribution. For boundary flow control, it's necessary to obtain the values ​​corresponding to the inflection points before and after the boundary flow, distinguishing between the rising, stable, and falling phases of the WMFD; therefore, the number of Gaussian models is determined. .

[0177] Then, during the iterative update process, the Expectation-Maximization algorithm (EM algorithm) is used for parameter estimation.

[0178] The EM algorithm requires that the product of the probabilities of all sample data belonging to the Gaussian model be satisfied. The maximum (where N is the number of sample data) is rewritten in a summation form by taking the logarithm, as shown in equation (20):

[0179] (20)

[0180] when The maximum value is obtained to get the parameter estimate of the Gaussian model. Specifically, it is divided into two steps. The first step is to calculate the probability of each data point belonging to each Gaussian distribution, as shown in Equation (21), which is the E step. The second step is to calculate the mean, covariance matrix and weight of each Gaussian distribution based on the probability value, as shown in Equation (22), which is the M step. The EM iterative update process is repeated until convergence.

[0181] (twenty one)

[0182] Indicates the first One data point, Representation Model (representing a certain model), This represents the total number of models, i.e., the number of clusters. Indicates the first One model, .

[0183] Obtained using E step Calculate the new Gaussian model parameters: mean, variance, and weights.

[0184] (twenty two)

[0185] Curve fitting is a model-solving problem for a given dataset, where a mathematical equation or graph is determined to best approximate or fit the original data. In curve fitting, functions are typically used to approximate the relationships between the data; these functions include linear, polynomial, exponential, logarithmic, and trigonometric functions. By finding the best function form and parameter values, a function model that can well interpret the original data can be obtained. Therefore, after studying the piecewise data of WMFD, the WMFD fitting function obtained through fitting is used to solve for key parameters analytically for boundary flow control.

[0186] Least squares is a commonly used fitting method. It solves the problem of minimizing the sum of squared errors between the actual data points and the fitting function. The model obtained can be judged by evaluating the error between the fitted curve and the actual data.

[0187] Assuming the observed data is , and The fitting function relationship between them is Then the sum of squared errors is When the partial derivative with respect to the function parameters is zero When the value is minimized, the function parameters can be calculated, and the resulting equation is the best matching equation for the observed data.

[0188] This invention assumes that the fitting function for the observed data is in the following form: , The parameters of the linear function are the observed data. with fitted value Sum of squares of the differences ,right Taking the partial derivatives, according to the above principles, the partial derivative equation is shown in equation (23). Substituting the experimental data values, we solve... Thus, a linear fitting equation for the experimental data is obtained.

[0189] (twenty three)

[0190] Secondly, embodiments of the present invention provide a system for obtaining a macroscopic basic map of busy waterways based on AIS data, and based on the methods for obtaining macroscopic basic maps of busy waterways based on AIS data in the above embodiments, the system includes:

[0191] The data acquisition module is used to acquire dynamic AIS (Automatic Identification System) data of vessels in the target waters within a preset historical time period;

[0192] The boundary outflow statistics module is used to count the number of ships leaving the target water area within multiple consecutive preset time windows based on the dynamic AIS data, forming a water area boundary outflow sequence.

[0193] The cumulative quantity statistics module is used to count the number of vessels in the target water area within each preset time window based on the dynamic AIS data, and form a cumulative quantity sequence of vessels in the water area.

[0194] The scatter data generation module is used to generate WMFD scatter data of the water area macroscopic basic map based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary.

[0195] The clustering analysis module is used to perform clustering analysis on the WMFD scatter data, dividing the WMFD scatter data into three data clusters representing free flow state, saturated flow state and congested flow state respectively.

[0196] The fitting modeling module is used to perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model that includes rising segments, stationary segments, and falling segments.

[0197] The parameter determination module is used to determine characteristic parameters for characterizing the macroscopic traffic flow state of the target water area based on the WMFD fitting model. The characteristic parameters include the first inflection point between the rising and steady segments, the second inflection point between the steady and falling segments, the saturation flow value of the steady segment, and the slopes of the rising and falling segments.

[0198] Based on the same concept, this invention also provides a schematic diagram of a physical structure, such as... Figure 8 As shown, the server may include a processor 810, a communications interface 820, a memory 830, and a communication bus 840. The processor 810, communications interface 820, and memory 830 communicate with each other via the communication bus 840. The processor 810 can call logical instructions stored in the memory 830 to execute the steps of the busy waterway macroscopic basic map acquisition method based on AIS data as described in the above embodiments.

[0199] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0200] Based on the same concept, embodiments of the present invention also provide a non-transitory computer-readable storage medium storing a computer program containing at least one piece of code that can be executed by a master control device to control the master control device to implement the steps of the method for obtaining a macroscopic basic map of busy waters based on AIS data as described in the above embodiments.

[0201] Based on the same technical concept, this application also provides a computer program, which, when executed by a main control device, is used to implement the above-described method embodiments.

[0202] The program may be stored, in whole or in part, on a storage medium packaged with the processor, or in part or in whole on a memory not packaged with the processor.

[0203] Based on the same technical concept, this application also provides a processor for implementing the above-described method embodiments. The processor can be a chip.

[0204] The various embodiments of the present invention can be combined arbitrarily to achieve different technical effects.

[0205] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for obtaining a macroscopic basic map of busy waterways based on AIS data, characterized in that, The method includes: S1. Acquire dynamic AIS (Automatic Identification System) data of vessels in the target waters within a preset historical time period; S2. Based on the dynamic AIS data, count the number of ships leaving the target water area within multiple consecutive preset time windows to form a water area boundary outflow sequence; S3. Based on the dynamic AIS data, count the number of vessels in the target waters within each preset time window to form a cumulative vessel count sequence for the waters. S4. Generate WMFD scatter plot data of the water area based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary; S5. Perform cluster analysis on the WMFD scatter data to divide the WMFD scatter data into three data clusters representing free flow, saturated flow, and congested flow states, respectively. Specifically, this includes: initializing the mixture model parameters containing three Gaussian components based on the WMFD scatter data; performing iterative calculations based on the expectation-maximization algorithm; in each iteration, calculating the posterior probability of each scatter point belonging to each Gaussian component based on the current mixture model parameters, and updating the mixture model parameters of each Gaussian component based on the posterior probability; stopping the iteration when the change in the mixture model parameters is less than a preset threshold or the maximum number of iterations is reached, and dividing each scatter point into the data cluster corresponding to the Gaussian component with the highest probability based on the final posterior probability, thereby completing the clustering of the WMFD scatter data. The three data clusters obtained correspond to the free flow, saturated flow, and congested flow states, respectively. S6. Perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model containing an ascending segment, a stationary segment, and a descending segment. Specifically, this includes: performing line fitting using the least squares method on the data points within the three data clusters corresponding to the free flow state, saturated flow state, and congested flow state, respectively; defining the fitted line corresponding to the free flow state data cluster as the ascending segment of the WMFD fitting model; processing the fitting result corresponding to the saturated flow state data cluster into a horizontal line segment representing the saturated flow rate and defining it as the stationary segment of the WMFD fitting model; defining the fitted line corresponding to the congested flow state data cluster as the descending segment of the WMFD fitting model; and constructing a trapezoidal WMFD fitting model based on the ascending segment, the stationary segment, and the descending segment. S7. Based on the WMFD fitting model, determine the characteristic parameters used to characterize the macroscopic traffic flow state of the target water area. The characteristic parameters include the first inflection point between the rising segment and the steady segment, the second inflection point between the steady segment and the falling segment, the saturation flow value of the steady segment, and the slope of the rising segment and the falling segment.

2. The method for obtaining a macroscopic basic map of busy waterways based on AIS data according to claim 1, characterized in that, In step S1, the dynamic AIS data includes information on the ship's position, speed, heading, and acquisition time. After acquiring the dynamic AIS data, the process also includes preprocessing the dynamic AIS data. Based on the actual operating conditions of vessels in the target waters, a reasonable range of values ​​for speed and heading is preset, and abnormal data that exceeds the reasonable range is eliminated. If the amount of abnormal data corresponding to the dynamic AIS data of the same Maritime Mobile Service Identifier (MMSI) number exceeds a preset threshold, all dynamic AIS data corresponding to the MMSI number will be deleted. For duplicate dynamic AIS data, only the first dynamic AIS data collected is retained, and the remaining duplicate data is deleted.

3. The method for obtaining a macroscopic basic map of busy waterways based on AIS data according to claim 1, characterized in that, S2 specifically includes: A virtual detection section perpendicular to the waterway is set at the preset boundary of the target water area; For each vessel, a trajectory segment is generated based on the vessel's continuous dynamic AIS position data; Determine whether the trajectory line segment and the virtual detection section line segment meet the spatial intersection condition. If the spatial intersection condition is met, determine whether the ship has left the target water area based on the ship's movement direction represented by the trajectory line segment. The spatial intersection condition is: determine whether the circumscribed rectangle of the trajectory line segment and the virtual detection section line segment intersects. If the circumscribed rectangle intersects, further determine whether the trajectory line segment and the virtual detection section line segment cross each other. The condition for crossing each other is that the endpoints of each of the two line segments are located on both sides of the line containing the other line segment. The number of ships determined to have departed within each preset time window is accumulated to form the outflow sequence of the water area boundary.

4. The method for obtaining a macroscopic basic map of busy waterways based on AIS data according to claim 1, characterized in that, S3 specifically includes: Determine the geographical boundary of the target water area; For each preset time window, based on the dynamic AIS data, the position point sequence belonging to the same ship is interpolated to obtain continuous trajectory data of the ship with uniform time intervals within the time window; Based on the continuous trajectory data, the number of all vessels located within the geographical boundary at any sampling time within the preset time window is counted, and the counted number is taken as the cumulative number of vessels in the water area corresponding to that time window.

5. The method for obtaining a macroscopic basic map of busy waterways based on AIS data according to claim 1, characterized in that, S4 specifically includes: Using the cumulative values ​​of each vessel in the water area cumulative volume sequence as the abscissa and the corresponding outflow values ​​of each vessel in the water area boundary outflow sequence as the ordinate, a scatter distribution is formed in a two-dimensional coordinate system to obtain WMFD scatter data. The scattered distribution covers the macroscopic traffic flow changes of ships in the target waters from free flow to saturated flow to congested flow.

6. A system for acquiring a macroscopic basic map of busy waterways based on AIS data, used to implement the method for acquiring a macroscopic basic map of busy waterways based on AIS data as described in any one of claims 1 to 5, characterized in that, include: The data acquisition module is used to acquire dynamic AIS (Automatic Identification System) data of vessels in the target waters within a preset historical time period; The boundary outflow statistics module is used to count the number of ships leaving the target water area within multiple consecutive preset time windows based on the dynamic AIS data, forming a water area boundary outflow sequence. The cumulative quantity statistics module is used to count the number of vessels in the target water area within each preset time window based on the dynamic AIS data, and form a cumulative quantity sequence of vessels in the water area. The scatter data generation module is used to generate WMFD scatter data of the water area macroscopic basic map based on the cumulative ship volume sequence of the water area and the corresponding outflow volume sequence of the water area boundary. The clustering analysis module is used to perform clustering analysis on the WMFD scatter data, dividing the WMFD scatter data into three data clusters representing free flow state, saturated flow state and congested flow state respectively. The fitting modeling module is used to perform piecewise linear fitting on the data points within each data cluster to obtain a WMFD fitting model that includes rising segments, stationary segments, and falling segments. The parameter determination module is used to determine characteristic parameters for characterizing the macroscopic traffic flow state of the target water area based on the WMFD fitting model. The characteristic parameters include the first inflection point between the rising and steady segments, the second inflection point between the steady and falling segments, the saturation flow value of the steady segment, and the slopes of the rising and falling segments.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method for obtaining a macroscopic basic map of busy waterways based on AIS data as described in any one of claims 1 to 5.

8. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method for obtaining a macroscopic basic map of busy waterways based on AIS data as described in any one of claims 1 to 5.