Railway train visual-inertial positioning method and system based on milepost information assistance

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining visual inertial odometer with kilometer marker information, using YOLOv5s and OCR to identify kilometer marker digital information, and combining LSD straight line detection and global optimization, the problems of GPS failure and lack of loop closure detection in railway train positioning systems in tunnels were solved, achieving high-precision positioning results.

CN117782070BActive Publication Date: 2026-06-26HARBIN INST OF TECH

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HARBIN INST OF TECH
Filing Date: 2023-12-14
Publication Date: 2026-06-26

Smart Images

Figure CN117782070B_ABST

Patent Text Reader

Abstract

The application relates to a railway train visual inertial positioning method and system based on milepost information assistance, and relates to the technical field of railway train positioning. The application solves the problems that the "vehicle-ground cooperation" type train positioning technology in the prior art has high cost and is difficult to maintain, and that the existing railway train intelligent positioning cannot eliminate the accumulated error in the case of lacking loop detection. The positioning method comprises the following steps: acquiring the RGB image of the front of a railway train, the digital semantic information of a milepost and the multi-source information of an IMU by using a visual inertial odometer; constructing a milepost detection module based on a VINS Fusion framework to acquire the pixel position of the milepost; identifying the digital semantic information of the milepost in the pixel position area of the milepost based on an OCR character recognizer to obtain the position information of the milepost on an electronic map; extracting the vertex of the milepost and the vertex coordinates; establishing a global optimization objective function containing the position information of the milepost according to the vertex coordinates of the milepost to complete positioning. The application is also suitable for the field of rail transit.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of railway train positioning technology. Technical Background

[0002] Currently, train positioning technologies in the rail transit field can be divided into two categories: trackside positioning and on-board positioning. On-board positioning technology uses on-board sensors to provide relative position information; trackside positioning technology uses trackside equipment in a "vehicle-ground cooperation" manner to provide absolute position information for the train. By combining the two technologies, precise positioning of the train can be achieved.

[0003] Based on traditional railway train positioning technology, existing technologies utilize the fusion of inertial navigation systems and GPS data for positioning. However, when trains traverse tunnels for extended periods, GPS failure leads to a decrease in positioning accuracy. With the rapid development of various visual inertial odometry systems in fields such as autonomous driving and robotics, existing technologies are directly being adapted to the rail transit sector. Experimental results show that monocular visual inertial odometry cannot maintain system positioning stability due to observational biases in local inertial sensors; while stereo vision odometry systems can significantly improve these issues and achieve better positioning results; stereo vision inertial odometry systems incorporating IMUs can further enhance the system's positioning accuracy. However, in actual long-term train operation, loop closure detection often fails, leading to a gradual decrease in positioning accuracy. Therefore, researchers are applying deep learning methods to train positioning to replace loop closure detection modules and improve train positioning accuracy. However, currently, there is no complete, robust, and high-precision vision-based train positioning system. Summary of the Invention

[0004] This invention addresses the problems of high cost, difficult maintenance, and inability to eliminate accumulated errors in existing "vehicle-ground cooperation" train positioning technology in practical applications, as well as the lack of loop closure detection in current railway train intelligent positioning systems. To solve these technical problems, this invention achieves the following technical solution:

[0005] The present invention proposes the following technical solution:

[0006] Option 1: A railway train visual inertial positioning method based on kilometer marker information, the positioning method comprising: using a visual inertial odometer to acquire RGB images of the railway train ahead, kilometer marker digital semantic information, and multi-source information from the IMU;

[0007] A kilometer marker detection module is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module.

[0008] Based on the OCR text recognition device, the numerical semantic information of the kilometer marker is recognized in the pixel location area of the kilometer marker, and the location information of the kilometer marker on the electronic map is obtained.

[0009] Based on the LSD line detection algorithm and the location information of the electronic map, kilometer marker vertices and vertex coordinates are extracted.

[0010] Based on the coordinates of the kilometer marker vertices, a global optimization objective function containing kilometer marker position information constraints is established, and the positioning accuracy of railway trains is improved through factor graph optimization.

[0011] Furthermore, a preferred embodiment is provided, in which a kilometer marker detection module is constructed based on the VINS Fusion framework, and the method for obtaining the kilometer marker pixel position based on the kilometer marker detection module is as follows:

[0012] The kilometer marker detection module obtains the kilometer marker pixel position based on the YOLOv5s network model, which includes an input end, a backbone network, a neck network, and an output end.

[0013] The YOLOv5s network model takes RGB images as input and preprocesses the RGB images.

[0014] The backbone network is used to detect RGB images at different scales;

[0015] The neck network is used to fuse RGB images of different scales to achieve detection of RGB images of different scales;

[0016] The output end is used to output RGB images of different scales to detect objects of different sizes. Each RGB image contains 3 prediction boxes, and each prediction box contains object confidence and location information. The weighted NMS method is used to remove duplicate location information, that is, to find the object detection location and complete the target detection.

[0017] Furthermore, a preferred embodiment is provided, wherein the method for recognizing the semantic information of the kilometer marker numbers in the pixel location region based on an OCR text recognizer is as follows:

[0018] It is implemented based on the CRNN character recognition model, which includes a CNN convolutional neural network layer, an RNN recurrent neural network layer, and a CTC transcription layer;

[0019] The CNN convolutional neural network layer extracts features from the pixel location region of the kilometer marker to obtain the image feature vector of the kilometer marker;

[0020] The image feature vector of the kilometer marker is passed to an RNN recurrent neural network layer to complete the distribution of the feature vector;

[0021] The CTC transcription layer transforms the image feature vector distribution of the predicted kilometer markers from the RNN recurrent neural network layer into sequence labels, outputting digital information to complete the recognition.

[0022] Furthermore, a preferred implementation method is provided, which extracts kilometer marker vertices based on the LSD line detection algorithm combined with the electronic map location information. Specifically, the type of line is distinguished by the slope of the same edge line segment. The LSD algorithm detects the candidate line type. If the candidate line segment does not meet the requirements of the line type, the candidate line segment that does not meet the line characteristics is removed.

[0023] Initial edge extraction is performed to determine the number of edge segments to be detected. Four edge lines are detected at the kilometer marker vertex. Four groups of edge segments with high consistency are selected according to the evaluation function. The overall least squares method is used for line fitting.

[0024] After the initial edge extraction, the edge segment extraction results are repeatedly judged based on the feature similarity of each edge line. Similar edge segments are merged and new edge segments are fitted. The new edge segment set is continuously supplemented. The above process is repeated until four dissimilar kilometer marker edge segments are determined.

[0025] The method for determining the vertex of the kilometer marker is as follows: the four dissimilar kilometer marker edge segments form a rectangle, and the intersection of the rectangular kilometer marker edge segments is the vertex of the kilometer marker.

[0026] Furthermore, a preferred embodiment is provided, wherein the method for extracting the coordinates of the kilometer marker vertex is as follows:

[0027] The coordinates of the kilometer marker vertices in the world coordinate system are obtained based on the location information of the electronic map. And the coordinates of the kilometer marker vertex in the camera coordinate system at time i:

[0028]

[0029] The reprojection error of the kilometer marker vertices, the reprojection error of the visual landmarks, and the error of the multi-source information of the IMU are calculated respectively to obtain the optimization objective function for relocalization; where N represents the kilometer marker number, and j = 1, 2, 3, 4 represent the four vertices of the rectangular kilometer marker.

[0030] Furthermore, a preferred embodiment is provided, wherein the reprojection error of the kilometer marker vertex, the reprojection error of the visual landmark point, and the error of the IMU's multi-source information are obtained by the following formulas, and the reprojection error of the kilometer marker vertex is:

[0031]

[0032] Among them, (fx ,f y ,c x ,c y ) represents the camera's internal parameters, (x) c ,y c ,z c (u,v) are the coordinates in the camera coordinate system and the coordinates in the pixel coordinate system.

[0033] The reprojection error of visual landmarks is:

[0034]

[0035] Any two image keyframes b k and b k+1 The error of the IMU's multi-source information between the two image keyframes is:

[0036]

[0037] Among them, [g] xyz The operation representing the extraction of the real part of a quaternion, [r p ,r q ,r v ,r ba ,r bg The numbers represent position, rotation, velocity, and accelerometer error, respectively. Accelerometer error represents the random walk of acceleration in the IMU coordinate system, and gyroscope error represents the random drift of the gyroscope in the IMU's own coordinate system.

[0038] Furthermore, a preferred embodiment is provided, in which the reprojection error of the kilometer marker vertex, the reprojection error of the visual landmark point, and the error of the multi-source information of the IMU are summed to obtain the optimization objective function for constructing the relocalization, specifically:

[0039]

[0040] Here, C is defined as the set of feature points matched between the initial frame of the positioning system and the last frame in which the first kilometer marker appears; B is defined as the set of multi-source information data from all IMUs; and L is defined as the set of all image frames that detect kilometer marker N. p -H p χ represents the prior information after marginalizing all sliding windows. For the residuals of IMU inertial constraints between all two frames, Let (l,k) be the reprojection error of the visual landmark, and (l,k) be the l-th image feature observed in the k-th image frame. This represents the reprojection error of the kilometer marker's vertex.

[0041] Option 2: A railway train visual inertial positioning system based on kilometer marker information, the system comprising:

[0042] The image information acquisition unit is used to acquire RGB images of the railway train ahead of it, kilometer marker digital semantic information, and multi-source information from the IMU using a visual inertial odometer.

[0043] A kilometer marker detection edge unit is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module.

[0044] The kilometer marker recognition unit identifies the pixel location area of the kilometer marker based on an OCR text recognizer, and obtains the location information of the kilometer marker on the electronic map.

[0045] The kilometer marker vertex coordinate extraction unit extracts the kilometer marker vertices and their coordinates based on the LSD line detection algorithm combined with the electronic map location information.

[0046] The global optimization unit establishes a global optimization objective function with kilometer marker position information constraints based on the kilometer marker vertex coordinates, and improves the positioning accuracy of railway trains through a factor graph optimization method with multi-source information.

[0047] Option 3: A computer device, including a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the railway train visual inertial positioning method based on kilometer marker information assisted as described in any of Option 1.

[0048] Option 4: A computer-readable storage medium storing a computer program, which, when executed by a processor, performs the railway train visual inertial positioning method based on kilometer marker information as described in any one of Options 1.

[0049] The advantages of this invention are:

[0050] The invention lies in identifying kilometer markers and extracting their edge segments and vertices based on the VINS-Fusion framework. It then combines electronic maps and kilometer marker digital information to establish a global optimization objective function constrained by kilometer marker location information in different scenarios, thus achieving the fusion of binocular visual inertial odometry and kilometer marker information. A hardware-in-the-loop simulation experimental platform was built, an electronic map was created, and positioning experiments were conducted on two sets of experimental data. The experimental results show that the kilometer marker fusion algorithm proposed in this invention can eliminate the cumulative error of the binocular visual inertial odometry system, reducing the average error of the positioning results in the two physical experiments by 57.0% and 62.6%, respectively.

[0051] The visual inertial odometer used in this invention has the function of detecting and identifying kilometer marker information, realizing high-precision train positioning based on multi-source fusion of electronic maps, visual information and IMU information.

[0052] The method for identifying kilometer marker character areas used in this invention mainly includes two parts: kilometer marker target detection and digital information recognition. By utilizing the digital information of trackside kilometer markers, the accurate position of the current train can be obtained, providing a basis for information fusion to eliminate cumulative errors.

[0053] This invention achieves the fusion of binocular visual inertial odometer data and kilometer marker digital information by establishing a global optimization objective function constrained by kilometer marker location information. This enables the elimination of accumulated errors even when the system cannot perform loopback detection.

[0054] This invention is applicable to the field of rail transit. Attached Figure Description

[0055] Figure 1 This is a schematic diagram of the railway train visual inertial positioning method based on kilometer marker information as described in Implementation Method 1.

[0056] Figure 2 This is a diagram of the YOLOv5s network structure described in Implementation Method 1.

[0057] Figure 3 This is a network structure diagram of the CRNN text recognition model described in Implementation Method 2.

[0058] Figure 4 The region growing method described in Implementation Method 3 is used to calculate the line segment support domain decomposition diagram.

[0059] Figure 5 This is a structural diagram of the Backbone network module as described in Implementation Method 1.

[0060] Figure 6 This is a structural diagram of the Neck network module as described in Implementation Method 1.

[0061] Figure 7 This is a network structure diagram of the DBNet text detector model described in Implementation Method 2.

[0062] Figure 8 This is a schematic diagram of the line segment connection in the support area for implementation method three.

[0063] Figure 9 This is a schematic diagram of the kilometer marker vertex determination method described in Implementation Method 4.

[0064] Figure 10 This is a schematic diagram of kilometer marker data fusion as described in Implementation Method Seven.

[0065] Figure 11 This is a schematic diagram of the hardware-in-the-loop simulation experimental platform described in this invention.

[0066] Figure 12 This is a schematic diagram illustrating the overall analysis and statistics of the positioning error results described in this invention.

[0067] Figure 13 This is a schematic diagram of the positioning trajectories of trajectory 1 and trajectory 2 described in this invention using two different algorithms.

[0068] Figure 14 This is a schematic diagram of the kilometer marker vertex detection results according to the present invention. (a) represents a schematic diagram of the detection area for determining the kilometer marker vertex. The left side of the figure is a schematic diagram of extracting the kilometer marker vertex, and the right side is an enlarged view of the kilometer marker in the left figure. (b) represents a schematic diagram of the line segment detection results and vertex determination results. From left to right, it represents a schematic diagram of the LSD straight line detection process for removing interfering line segments. Detailed Implementation

[0069] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them.

[0070] Implementation Method 1: This implementation method provides a railway train visual inertial positioning method based on kilometer marker information. The positioning method includes:

[0071] The system uses a visual inertial odometry system to acquire RGB images of the railway train ahead, kilometer marker digital semantic information, and multi-source information from the IMU.

[0072] A kilometer marker detection module is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module.

[0073] Based on the OCR text recognition device, the numerical semantic information of the kilometer marker is recognized in the pixel location area of the kilometer marker, and the location information of the kilometer marker on the electronic map is obtained.

[0074] Based on the LSD line detection algorithm and the location information of the electronic map, kilometer marker vertices and vertex coordinates are extracted.

[0075] Based on the coordinates of the kilometer marker vertices, a global optimization objective function containing kilometer marker position information constraints is established, and the positioning accuracy of railway trains is improved through factor graph optimization.

[0076] See Figure 1This implementation addresses the issues of high costs and maintenance difficulties in special operating environments caused by the reliance on trackside electrical equipment in traditional train positioning technologies. Firstly, it employs the VINS-Fusion framework, using data from binocular cameras and inertial measurement units (IMUs). Through multiple iterative optimizations and updates, it integrates pose estimation results from multiple data sources using tightly coupled constraints, achieving higher-precision pose information. Furthermore, a target detection thread is added to VINS-Fusion to detect and identify trackside kilometer markers in the image sequences acquired by the cameras. Finally, accurate train location information is obtained by combining known electronic maps, and a global optimization problem is constructed to eliminate the cumulative error of the visual-inertial odometer.

[0077] Implementation Method Two: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information assisted by Implementation Method One. A kilometer marker detection module is constructed based on the VINS Fusion framework. The method for obtaining the kilometer marker pixel position based on the kilometer marker detection module is as follows:

[0078] The kilometer marker detection module obtains the kilometer marker pixel position based on the YOLOv5s network model, which includes an input end, a backbone network, a neck network, and an output end.

[0079] The YOLOv5s network model takes RGB images as input and preprocesses the RGB images.

[0080] The backbone network is used to detect RGB images at different scales;

[0081] The neck network is used to fuse RGB images of different scales to achieve detection of RGB images of different scales; the output end is used to output RGB images of different scales to detect objects of different sizes. Each RGB image contains 3 prediction boxes, and each prediction box contains object confidence and location information. The weighted NMS method is used to remove duplicate location information, that is, to find the object detection location and complete the target detection.

[0082] The method for identifying kilometer marker character regions described in this embodiment mainly includes two parts: kilometer marker target detection and digital information recognition. This embodiment uses the YOLOv5s network model to realize the detection of railway trackside kilometer markers. Figure 2This diagram illustrates the network structure of YOLOv5s, designed to meet the requirements of detection accuracy and speed during high-speed train operation. It utilizes digital information from trackside kilometer markers to obtain the precise location of the current train. YOLOv5s is a lightweight object detection network model based on the PyTorch deep learning framework. The network structure of YOLOv5s mainly includes an input, backbone, neck, and output network. The input to the YOLOv5 network is an RGB image. The image resolution is optimized based on experimental results to improve computational speed while maintaining detection accuracy. It also supports detection at different scales, allowing for multi-scale testing to enhance detection accuracy. YOLOv5s requires image preprocessing at the input, including scaling the input image to a specified size and normalizing it so that pixel values are between 0 and 1. Furthermore, to enhance the model's robustness, Mosaic data augmentation is performed through random cropping, rotation, and flipping to increase the model's adaptability to various input conditions. For anchors, YOLOv5s can cluster targets in the training set to determine the optimal anchor size and aspect ratio. This dynamic adjustment allows the model to better adapt to targets in different datasets, thereby improving detection performance.

[0083] For the backbone network, YOLOv5s uses the Focus module to achieve fast downsampling. See [link / reference]. Figure 5 As shown in (a), this module can concentrate information onto the channel, thereby making subsequent feature extraction more comprehensive and achieving rapid downsampling without information loss. For the use of the C3 module, see [link to C3 module documentation]. Figure 5 As shown in (b), by reducing the number of channels in the feature map, the C3 module can significantly reduce computational cost and accelerate model training and inference, improving the model's computational efficiency and accuracy. The SPP module is applied after the C3 module, as follows: Figure 5 As shown in (c), the receptive field of the detection network is increased by using the SPP module. YOLOv5s is able to better handle targets of different scales, improving detection accuracy and robustness.

[0084] The Neck network part adopts a combination of FPN and PAN, such as Figure 6As shown, FPN achieves information exchange between low-level and high-level features by establishing connections between feature maps from different network depths. Starting from the bottom feature map, each layer fuses the features into the previous layer's feature map through upsampling and a simple 2×2 convolution operation to obtain higher resolution and richer semantic information. Starting from the high-level feature map, each layer reduces its feature dimension and upsamples it into the next layer's feature map through a 1×1 convolution operation to obtain more global and abstract semantic information. PAN, on the other hand, transfers localization information from shallow feature maps to deep feature maps. The combination of FPN and PAN structures fuses feature maps of different scales to achieve the detection of targets of different sizes. Through cross-layer connections, it promotes feature transfer and reuse, improving the network's computational efficiency.

[0085] At the output end, this implementation uses the GlOU_Loss loss function for bounding box regression. This function is calculated using Generalized Intersection Over Union (GlOU), as shown in equation (6). Compared to the traditional IOU calculation method, GlOU can measure the relationship between any two bounding boxes, including non-overlapping cases. Where A C Let A be the area of the smallest bounding box containing the two target boxes, U be the area of the intersection of the two target boxes, and A be the area of the smallest bounding box containing the two target boxes. C -U represents the area of the minimum bounding box after removing the two target boxes. The GlOU value is calculated based on the ratio of the three areas.

[0086]

[0087] In the process of object detection, since a large number of bounding boxes may point to the same object, non-maximum suppression (NMS) is usually required to eliminate redundant information.

[0088] In the YOLOv5 model, weighted Non-Maximum Suppression (NMS) is used for processing. This model detects objects of different sizes by outputting feature maps at different scales. Each feature map contains three bounding boxes, each containing object confidence and location information. The weighted NMS operation employs a weighting strategy based on intersecting scales to remove redundant information, thereby finding the optimal object detection location and achieving efficient object detection.

[0089] Implementation Method 3: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information assisted by Implementation Method 1. The method for recognizing the semantic information of the kilometer marker numbers in the pixel location area based on an OCR text recognition device is as follows:

[0090] It is implemented based on the CRNN character recognition model, which includes a CNN convolutional neural network layer, an RNN recurrent neural network layer, and a CTC transcription layer;

[0091] The CNN convolutional neural network layer extracts features from the pixel location region of the kilometer marker to obtain the image feature vector of the kilometer marker;

[0092] The image feature vector of the kilometer marker is passed to an RNN recurrent neural network layer to complete the distribution of the feature vector;

[0093] The CTC transcription layer transforms the image feature vector distribution of the predicted kilometer markers from the RNN recurrent neural network layer into sequence labels, outputting digital information to complete the recognition.

[0094] See Figure 3 This embodiment describes a method for recognizing kilometer markers based on the pixel positions of the kilometer markers in the image as described in Embodiment 1. Optical Character Recognition (OCR) is used in document digitization, identity authentication, digital financial systems, and vehicle license plate recognition. This invention employs an ultra-lightweight OCR system, which mainly consists of three parts: a text detector, a text orientation classifier, and a text recognizer.

[0095] The CRNN network framework mainly consists of three parts. First, the CNN (Convolutional Neural Network) layer is responsible for extracting kilometer marker character features; this is the underlying model structure of the CRNN network framework. Next is the feature sequence mapping layer, which performs feature extraction on the image, obtaining kilometer marker image feature vectors (convolutional feature maps), and passes them to the network's recurrent layers. Second, the RNN (Recurrent Neural Network) layers are responsible for predicting the label sequence. This layer uses a Deep Bidirectional LSTM network to predict the feature sequence, ultimately forming the label distribution of the feature vector. Finally, the CTC (Transcription Layer) layer is responsible for decoding and converting the predicted label distribution from the recurrent layers into sequence labels, outputting the predicted sequence. Combining the above models, this implementation constructs an ultra-lightweight OCR system, and by training the network on existing public datasets, the accuracy rate for English and digit text detection and recognition reaches 91.3%.

[0096] A text detector is used to locate text within an image. Text detection methods can be categorized into regression-based and segmentation-based methods; this method employs the segmentation-based DBNet algorithm. (See [link to relevant documentation]). Figure 7 As shown in the diagram, existing segmentation algorithms first output a probability map of text segmentation through a network, then convert the probability map into a binary map based on a set threshold, and finally obtain the detection result through post-processing. However, a drawback of this method is that the selection of the threshold is crucial; an unreasonable threshold can severely affect the network's detection results. The DBNet algorithm used in this method proposes the concept of differentiable binarization, i.e., adaptive binarization of each pixel, while the binarization threshold is obtained through model training. This can significantly improve the robustness of the detection model.

[0097] The text orientation classifier adjusts the text boxes output by the text detector into rectangular boxes and determines whether the text direction is positive; if not, it needs adjustment. The text orientation classifier task is relatively simple, primarily an image classification task. To improve the algorithm's real-time performance and reduce storage space and resource consumption, this method uses MobileNetV3 as the classifier's backbone network. Through a series of network design strategies and innovative components, MobileNetV3 maintains lightweight characteristics while improving accuracy and speed, making it suitable for resource-constrained mobile devices and embedded systems, providing an efficient and accurate solution for tasks such as image classification and object detection.

[0098] The purpose of text recognition is to identify the text content within a given text rectangle, enabling character recognition of text sequences of variable length. This model primarily consists of two parts: a Deep Convolutional Neural Network (DCNN) and a Recurrent Convolutional Neural Network (RNN). The DCNN is mainly responsible for extracting features from the input image, while the RNN transforms the feature sequence into an output character sequence. The underlying principle of this model is to transform text recognition into a time-dependent sequence learning problem, enabling character recognition of text sequences of variable length.

[0099] Implementation Method 4: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information as described in Implementation Method 1. Specifically, the extraction of kilometer marker vertices based on the LSD line detection algorithm combined with the electronic map location information is as follows: the type of line is distinguished by the slope of the same edge line segment. The LSD algorithm detects the candidate line type. If the line does not meet the requirements of the line type, the candidate line segment that does not meet the line characteristics is removed.

[0100] Initial edge extraction is performed to determine the number of edge segments to be detected. Four edge lines are detected at the kilometer marker vertex. Four groups of edge segments with high consistency are selected according to the evaluation function. The overall least squares method is used for line fitting.

[0101] After the initial edge extraction, the edge segment extraction results are repeatedly judged based on the feature similarity of each edge line. Similar edge segments are merged and new edge segments are fitted. The new edge segment set is continuously supplemented. The above process is repeated until four dissimilar kilometer marker edge segments are determined.

[0102] The method for determining the vertex of the kilometer marker is as follows: the four dissimilar kilometer marker edge segments form a rectangle, and the intersection of the rectangular kilometer marker edge segments is the vertex of the kilometer marker.

[0103] See Figure 4 This embodiment is based on the method of kilometer marker detection and information recognition in Embodiment 1, which determines the corresponding position of the kilometer marker on the electronic map. The purpose of this embodiment is to obtain the corresponding coordinates of the four vertices of the kilometer marker on the electronic map. It is mainly divided into extracting the edge line segments of the kilometer marker and detecting the corner points of the kilometer marker.

[0104] To obtain the coordinates of kilometer marker corner points, their edge segments must first be extracted. The Line Segment Detector (LSD) algorithm is a gradient-based algorithm that can efficiently detect straight line segments in an image. The advantage of the LSD algorithm lies in its robustness to noise and curved line segments, enabling it to accurately detect line segments in various complex scenarios, including broken, curved, and partially occluded lines.

[0105] The LSD algorithm first calculates the gradient of the input image, using pixel gradients as features to find lines. Filters such as Sobel, Prewitt, or Scharr are typically used to calculate the horizontal and vertical gradients. By applying non-maximum suppression (NMS) to the image, the algorithm identifies corner points as endpoints of potential line segments. NMS filters out responses from non-corner points, preserving stronger corner features. By connecting adjacent corner points, these corner points are combined into candidate line segments, thus determining the line segment support region. Figure 4 As shown. Finally, after merging the line segments, the LSD algorithm verifies the candidate line segments using geometric and statistical criteria, such as the length, curvature, and consistency of the line segments, to help eliminate candidate line segments that do not meet the characteristics of a straight line, thereby improving the accuracy of line detection.

[0106] LSD line detection is performed on the image region where the kilometer marker is located, resulting in a large number of line segments containing interference. This implementation combines the length and slope of the line segments to filter the detection results. During the kilometer marker edge detection process, different types of lines can be distinguished based on their slope; that is, line segments describing the same edge should have relatively similar slopes. The length of the line segment reflects its reliability; relatively short line segments are usually interference, but ignoring the slope information can lead to misjudgment. Let the set of line segment detection results be φ={L1,L2,...,L... N For the nth line segment:

[0107]

[0108] Where (μ) n ,ν n Let be the coordinates of the endpoints of the nth line segment. The quasi-slope and length information of the line segment are calculated as shown in equation (8), where a small deviation δ is introduced to avoid the special case where the slope does not exist:

[0109]

[0110] Because the image size of kilometer markers varies greatly at different shooting distances, a fixed threshold cannot be used for judgment. Therefore, this paper classifies all line segments according to the quasi-slope. Classified into category M:

[0111]

[0112] Among them G mm∈M It contains several line segments L. Taking into account its feature information, a category evaluation function, score, is defined. m As shown in equation (10):

[0113]

[0114] In the above formula, cluster(·) represents the clustering function. The total length of line segments contained in each category represents its evaluation function. Then, based on the evaluation function score... m Sort all categories in descending order, and based on experimental experience, take the first max(m / 3,2) terms to obtain a new set of line segments φ. c .

[0115] Since there are multiple line segment detection results for one edge of the kilometer marker, this implementation method will further connect and group the above line segment classification results. Under the conditions of satisfying the continuity and consistency assumptions, two continuous support line segments can be connected. Among them, the continuity condition requires that the two line segments have two endpoints that are close enough; the consistency condition requires that the slopes of the two line segments are close enough, and the calculation process is shown in formula (11).

[0116]

[0117] like Figure 8 As shown, for a given seed line segment, if multiple line segments satisfying the above conditions exist simultaneously in the support region, the left side of the figure is a schematic diagram of feature similarity repeated judgment edge extraction for each edge line, the middle is an enlarged view of multiple line segments satisfying the above conditions existing simultaneously in the support region, and the right side is... and Pixel set comparison image. Based on the assumptions of continuity and consistency, the new line segment set φ... c Clustering is performed, and the evaluation results of each group of line segments are calculated using formula (10). All groups are then sorted in descending order. The evaluation function of the line segment group describes the characteristic consistency of the line segments in that group. A higher evaluation function value indicates that the line segments contained in that group have strong consistency. Since the surface of the kilometer marker may have worn edges, there may be cases where one edge contains multiple line segment detection results. Therefore, this embodiment adopts a dynamic expansion of line segments when extracting the edge of the kilometer marker.

[0118] First, the LSD (Line Detection Scheme) method is used to efficiently detect the image region containing the kilometer marker. The LSD algorithm first calculates the gradient of the input image, using pixel gradients as features for finding lines. Filters such as Sobel, Prewitt, or Scharr are typically used to calculate the horizontal and vertical gradients of the image. Second, non-maximum suppression is applied to the image; see [link to relevant documentation]. Figure 4 As shown, where Figure 4 The corner points in the left image serve as endpoints of potential line segments. Non-maximum suppression can filter out responses from non-corner points, preserving stronger corner features. By connecting adjacent corner points (i.e., the two middle images), these corner points are combined into candidate line segments, thus determining the line segment support region. Figure 4 (Rightmost image). Finally, after merging line segments, the LSD algorithm verifies the candidate line segments using geometric and statistical criteria, such as line segment length, curvature, and consistency, to help eliminate candidate line segments that do not meet the characteristics of a straight line, thereby improving the accuracy of line detection.

[0119] See Figure 5 This embodiment describes how the detection results are filtered by combining the length and slope of a large number of straight line segments containing interference items obtained from the combined detection. In the process of kilometer marker edge detection, different types of straight lines can be distinguished according to the slope of the edge straight lines, that is, the slopes of line segments describing the same edge should be relatively close.

[0120] In this implementation method, when extracting the edges of kilometer markers, the first step is to determine the number of edges to be detected. To determine the corner points of a rectangular kilometer marker, four edges need to be detected. Next, four groups of line segments with high consistency are selected based on an evaluation function, and a global least squares method is used for line fitting. After the initial edge extraction is completed, the edge extraction results are repeatedly evaluated based on the feature similarity of each edge line. Similar edges are merged, and new edge segments are fitted, continuously supplementing the set of line segments. This process is repeated until four dissimilar kilometer marker edge lines are identified.

[0121] See Figure 9 This embodiment further defines the method for determining the vertices, specifically for obtaining the intersection points of the four sides of the detected rectangular kilometer marker; where a1 and a2 are the horizontal edge lines, b1 and b2 are the vertical edge lines, and A... ij This represents the kilometer marker vertex obtained by the intersection of four edges.

[0122] Implementation Method 5: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information assisted by Implementation Method 1. The method for extracting the kilometer marker vertex coordinates is as follows:

[0123] Based on the digital information recognition results, and combined with known electronic maps, the coordinates of the kilometer marker vertices in the world coordinate system are obtained. Where N represents the kilometer marker number, and j = 1, 2, 3, 4 represent the four vertices of the rectangular kilometer marker. In the image sequence, kilometer marker N is observed in M frames, where in the i-th frame, i = 1, 2, ..., M, the coordinates of the detected kilometer marker vertices are... The camera coordinate system is used to obtain the coordinates of the kilometer marker vertex in the camera coordinate system based on the digital information recognition results and the known electronic map; where x, y, z are the coordinates of the specific camera coordinate system; at this time, the coordinates of the kilometer marker N in the camera coordinate system at the i-th frame are:

[0124]

[0125] In this embodiment, during the operation of the visual inertial odometer, the proposed railway trackside kilometer marker positioning method is used to detect each frame of input image. If a kilometer marker exists in the current frame and its digital information is successfully identified, the system's global optimization function is activated after the kilometer marker disappears from the camera's field of view. The kilometer marker position information is added as a constraint term to the objective function of the nonlinear optimization to construct a larger-scale optimization problem.

[0126] Implementation Method Six: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information assisted by Implementation Method Five. The optimization objective function for repositioning is constructed as follows:

[0127] The reprojection error of the kilometer marker vertices is specifically as follows:

[0128]

[0129] Among them, (f x ,f y ,c x ,c y ) represents the camera's internal parameters, (x) c ,y c ,z c (u,v) are the coordinates in the camera coordinate system and the coordinates in the pixel coordinate system.

[0130] The reprojection error of visual landmarks is as follows:

[0131]

[0132] Any two image keyframes b k and b k+1 The IMU residual term between them is defined as:

[0133]

[0134] Among them, [g] xyz The operation representing the extraction of the real part of a quaternion, [r p ,r q ,r v ,r ba ,r bg The numbers represent position, rotation, velocity, and accelerometer error, respectively. Accelerometer error represents the random walk of acceleration in the IMU coordinate system, and gyroscope error represents the random drift of the gyroscope in the IMU's own coordinate system.

[0135] Implementation Method Seven: This implementation method further defines the railway train visual inertial positioning method based on kilometer marker information assisted by Implementation Method Six. It sums the reprojection error of the kilometer marker vertices, the reprojection error of the visual landmark points, and the error of the multi-source information from the IMU to obtain the optimized objective function for repositioning, specifically:

[0136]

[0137] Here, C is defined as the set of feature points matched between the initial frame of the positioning system and the last frame in which the first kilometer marker appears; B is defined as the set of multi-source information data from all IMUs; and L is defined as the set of all image frames that detect kilometer marker N. p -H p χ represents the prior information after marginalizing all sliding windows. For the residuals of IMU inertial constraints between all two frames, Let (l,k) be the reprojection error of the visual landmark, and (l,k) be the l-th image feature observed in the k-th image frame. This represents the reprojection error of the kilometer marker's vertex.

[0138] When the kilometer marker is first detected, see Figure 10 (a) This embodiment is described. Figure 10 (a) illustrates the global optimization from the initial frame to the last frame where the first kilometer marker appears. Let C be the set of feature points matched between the initial frame and the last frame where the first kilometer marker appears, B be the set of all IMU data, and L be the set of all image frames that detected kilometer marker N. The objective function for relocalization is constructed as follows:

[0139]

[0140] Where, r p -H p χ represents the prior information after marginalizing all sliding windows. For the residuals of IMU inertial constraints between all two frames, Let (l,k) be the reprojection error of the visual landmark, and (l,k) be the l-th image feature observed in the k-th image frame. This represents the reprojection error of the kilometer marker's vertex.

[0141] When multiple kilometer markers are detected, an optimization problem is constructed only between the two most recent kilometer markers for global optimization. See [link / reference]. Figure 10 As shown in (b) Figure 10 (b) illustrates the global optimization between the two most recent kilometer markers. The relocation optimization objective function remains Equation (5). Here, set C represents the set of feature points matched between the last frame of the current kilometer marker and the first frame of the previous kilometer marker, set B represents all IMU data within the interval, and set L is the set of all image frames that detected the current kilometer marker N and the previous kilometer marker N-1. The visual inertial odometry optimization method that fuses kilometer marker information turns the poses of all detected kilometer marker image frames into constraints, constructing a large-scale optimization problem, which can effectively eliminate the cumulative error of the positioning system.

[0142] Implementation Method Eight: This implementation method proposes a railway train visual inertial positioning system assisted by kilometer marker information. The positioning system includes:

[0143] The image information acquisition unit is used to acquire RGB images of the railway train ahead of it, kilometer marker digital semantic information, and multi-source information from the IMU using a visual inertial odometer.

[0144] A kilometer marker detection edge unit is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module.

[0145] The kilometer marker recognition unit identifies the pixel location area of the kilometer marker based on an OCR text recognizer, and obtains the location information of the kilometer marker on the electronic map.

[0146] The kilometer marker vertex coordinate extraction unit extracts the kilometer marker vertices and their coordinates based on the LSD line detection algorithm combined with the electronic map location information.

[0147] The global optimization unit establishes a global optimization objective function with kilometer marker position information constraints based on the kilometer marker vertex coordinates, and improves the positioning accuracy of railway trains through a factor graph optimization method with multi-source information.

[0148] The positioning system described in this embodiment is based on the railway train visual inertial positioning method assisted by kilometer marker information as described in embodiments one to seven.

[0149] The following describes the verification of the above implementation methods, specifically including the following steps:

[0150] Set up Figure 11 The aforementioned hardware-in-the-loop simulation experimental platform, in this embodiment, uses a ZED2 binocular camera and its built-in IMU for data acquisition. The camera is fixed on an unmanned vehicle that can be remotely controlled by a transmitter. Small kilometer markers are deployed in the field. With the help of six cameras in the external vision measurement system and the cooperative targets fixed on the unmanned vehicle, the entire experimental scene is fully covered. The system measures the movement of the unmanned vehicle and outputs the camera pose in real time at a frequency of 100Hz for the IMU and 30Hz for the image.

[0151] use Figure 11 An external visual measurement system, in conjunction with a cooperative target fixed to the autonomous vehicle, recorded the vehicle's trajectory, as well as small kilometer markers (10cm x 8cm) deployed on both sides of the trajectory. Two trajectories were recorded and mapped onto an electronic map. Trajectory 1 covered a total distance of 18.023m, and Trajectory 2 covered a total distance of 26.575m.

[0152] The localization results of the two experimental sequences were tested using VINS-Fusion and the localization method described in this paper, respectively. The mean standard deviation and root error are shown in Table 1.

[0153]

[0154] See Table 1 and Figure 12This embodiment explains that, by comparing the positioning method described in this embodiment with the existing VINS-Fusion technology, it can be seen that the positioning error of trajectory 1 and trajectory 2 using the positioning method described in this embodiment is smaller than that of VINS-Fusion.

[0155] According to Table 1 and Figure 12 As can be seen from the data, by recording two segments of the unmanned vehicle's operating trajectory, the algorithm used in this invention is compared with the existing VINS-Fusion technology. In trajectory 1, the Std (Root Square Error) of the algorithm used in this invention is 0.025m, while that of the existing technology is 0.040m. The RMSE (Mean Standard Deviation) of the algorithm used in this invention is 0.061m, while that of the existing technology is 0.134m. This means that the cumulative error of the positioning system can be eliminated, and the accuracy of the positioning system can be significantly improved. The average error of the positioning result of trajectory 1 is 0.055m, which is a reduction of 57.0% compared with that before fusing kilometer marker information. In trajectory 2, the Std (Root Square Error) of the algorithm used in this invention is 0.034m, while that of the existing technology is 0.055m. The RMSE (Mean Standard Deviation) of the algorithm used in this invention is 0.072m, while that of the existing technology is 0.180m. This means that the average error of the positioning result of trajectory 2 is 0.064m, which is a reduction of 62.6%.

[0156] See Figure 13 This embodiment is described, wherein, Figure 13 (a) shows the positioning trajectory of trajectory 1 using the existing VINS-Fusion algorithm on the left, and the positioning trajectory of the algorithm used in this invention on the right. As can be seen from the figure, this invention constructs a large-scale optimization problem and performs global optimization, so the positioning error distribution is relatively uniform. In contrast, the positioning error of using only binocular visual inertial odometry varies greatly in different intervals. The algorithm used in this invention has an average positioning error of 0.055m in trajectory 1, which is 57.0% lower than before fusion of kilometer marker information. Figure 13 (b) shows the positioning trajectory of trajectory 2 using the existing VINS-Fusion algorithm on the right, and the positioning trajectory of the algorithm used in this invention on the right. It can be seen from the figure that the algorithm used in this invention can eliminate the cumulative error of the positioning system and significantly improve the accuracy of the positioning system. Specifically, the average error of the positioning result of trajectory 2 is 0.064m, which is reduced by 62.6%.

[0157] In addition, this implementation method conducted a separate vertex extraction experiment on physical kilometer markers to verify the accuracy of kilometer marker vertex detection. The experimental results are available in [link to experimental results]. Figure 14 As shown, where Figure 14(a) A schematic diagram showing the vertex extraction of the kilometer markers. Kilometer marker target detection was performed on the original image to determine the corner detection area. Figure 14 (a) Schematic diagram on the right, next, Figure 14 (b) illustrates the process of eliminating interfering line segments in LSD line detection. LSD line detection is performed on this area, and the aforementioned screening method is used to eliminate interfering line segments from the detection results, retaining the desired detection results. Finally, four edge lines are fitted, and their intersection point is taken as the kilometer marker vertex. Experimental results show that the kilometer marker fusion algorithm proposed in this embodiment can eliminate the cumulative error of the binocular vision inertial odometry system, reducing the average error of the positioning results in two physical experiments by 57.0% and 62.6%, respectively.

[0158] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this disclosure can be combined or combined in various ways, even if such combinations or combinations are not explicitly described in this disclosure. In particular, the various embodiments and features of this disclosure can be combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations fall within the scope of this disclosure.

[0159] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention. Clearly, those skilled in the art can make various alterations and modifications to the invention without departing from its spirit and scope. Thus, if these modifications and modifications of the invention fall within the scope of the claims and their equivalents, the invention is also intended to include these modifications and modifications.

Claims

1. A railway train visual inertial positioning method based on kilometer marker information, characterized in that, The positioning method includes: The system uses a visual inertial odometry system to acquire RGB images of the railway train ahead, kilometer marker digital semantic information, and multi-source information from the IMU. A kilometer marker detection module is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module. Based on the OCR text recognition device, the numerical semantic information of the kilometer marker is recognized in the pixel location area of the kilometer marker, and the location information of the kilometer marker on the electronic map is obtained. Based on the LSD line detection algorithm and the location information of the electronic map, kilometer marker vertices and vertex coordinates are extracted. A global optimization objective function containing kilometer marker position information constraints is established based on the kilometer marker vertex coordinates, and the positioning accuracy of railway trains is improved by using the factor graph optimization method. The method for extracting the coordinates of the kilometer marker vertices is as follows: The coordinates of the kilometer marker vertices in the world coordinate system are obtained based on the location information of the electronic map. And the coordinates of the kilometer marker vertex in the camera coordinate system at time i: （1） The reprojection errors of kilometer marker vertices, visual landmark points, and IMU multi-source information were calculated separately to obtain the optimization objective function for relocalization; where N represents the kilometer marker number. Represents the four vertices of a rectangular kilometer marker; The reprojection error of kilometer marker vertices, the reprojection error of visual landmarks, and the error of multi-source information from the IMU are calculated using the following formulas. The reprojection error of kilometer marker vertices is: （2） in, These are the camera's internal parameters; The reprojection error of visual landmarks is: （3） Any two image keyframes b k and b k+1 The error of the IMU's multi-source information between the two image keyframes is: （4） in, This represents the operation of extracting the real part of a quaternion. These represent position, rotation, velocity, and accelerometer error, respectively. Accelerometer error represents the random walk of acceleration in the IMU coordinate system, and gyroscope error represents the random drift of the gyroscope in the IMU's own coordinate system. The reprojection errors of the kilometer marker vertices, the reprojection errors of the visual landmarks, and the errors from the multi-source information of the IMU are summed to obtain the optimization objective function for relocalization, which is as follows: (5) Here, C is defined as the set of feature points matched between the initial frame of the positioning system and the last frame in which the first kilometer marker appears; B is defined as the set of multi-source information data from all IMUs; and L is defined as the set of all image frames that detected kilometer marker N. This provides prior information after marginalizing all sliding windows. For the residuals of the IMU inertial constraints between all two frames, This refers to the reprojection error of visual landmarks. The first image frame observed is the kth image frame. Image features, This represents the reprojection error of the kilometer marker's vertex.

2. The railway train visual inertial positioning method based on kilometer marker information as described in claim 1, characterized in that, A kilometer marker detection module is constructed based on the VINS Fusion framework. The method for obtaining the kilometer marker pixel position based on the kilometer marker detection module is as follows: The kilometer marker detection module obtains the kilometer marker pixel position based on the YOLOv5s network model, which includes an input end, a backbone network, a neck network, and an output end. The YOLOv5s network model takes RGB images as input and preprocesses the RGB images. The backbone network is used to detect RGB images at different scales; The neck network fuses RGB images of different scales to enable detection of RGB images at various scales; the output network outputs RGB images of different scales to detect objects of different sizes, with each RGB image containing 3... The predicted bounding box contains the object's confidence score and location information. The weighted NMS method is used to remove duplicate location information, thus finding the object's detection location and completing the target detection.

3. The railway train visual inertial positioning method based on kilometer marker information as described in claim 1, characterized in that, The method for recognizing the semantic information of kilometer marker numbers in the pixel location region based on an OCR text recognition device is as follows: It is implemented based on the CRNN character recognition model, which includes a CNN convolutional neural network layer, an RNN recurrent neural network layer, and a CTC transcription layer; The CNN convolutional neural network layer extracts features from the pixel location region of the kilometer marker to obtain the image feature vector of the kilometer marker; The image feature vector of the kilometer marker is passed to an RNN recurrent neural network layer to complete the distribution of the feature vector; The CTC transcription layer transforms the image feature vector distribution of the predicted kilometer markers from the RNN recurrent neural network layer into sequence labels, outputting digital information to complete the recognition.

4. The railway train visual inertial positioning method based on kilometer marker information as described in claim 1, characterized in that, The extraction of kilometer marker vertices based on the LSD line detection algorithm combined with the electronic map location information is specifically as follows: the type of line is distinguished by the slope of the same edge line segment. The LSD algorithm detects the type of candidate line and removes the candidate line segment that does not meet the requirements of the line type. Initial edge extraction is performed to determine the number of edge segments to be detected. Four edge lines are detected at the kilometer marker vertex. Four groups of edge segments with high consistency are selected according to the evaluation function. The overall least squares method is used for line fitting. After the initial edge extraction, the edge segment extraction results are repeatedly judged based on the feature similarity of each edge line. Similar edge segments are merged and new edge segments are fitted. The new edge segment set is continuously supplemented. The above process is repeated until 4 dissimilar kilometer marker edge segments are determined. The method for determining the vertex of the kilometer marker is as follows: the four dissimilar kilometer marker edge segments form a rectangle, and the intersection of the rectangular kilometer marker edge segments is the vertex of the kilometer marker.

5. A railway train visual inertial positioning system based on kilometer marker information, characterized in that, The system includes: The image information acquisition unit is used to acquire RGB images of the railway train ahead of it, kilometer marker digital semantic information, and multi-source information from the IMU using a visual inertial odometer. A kilometer marker detection edge unit is constructed based on the VINS Fusion framework, and the kilometer marker pixel position is obtained based on the kilometer marker detection module. The kilometer marker recognition unit identifies the pixel location area of the kilometer marker based on an OCR text recognizer, and obtains the location information of the kilometer marker on the electronic map. The kilometer marker vertex coordinate extraction unit extracts the kilometer marker vertices and their coordinates based on the LSD line detection algorithm combined with the electronic map location information. The global optimization unit establishes a global optimization objective function with kilometer marker position information constraints based on the kilometer marker vertex coordinates, and improves the positioning accuracy of railway trains through a factor graph optimization method with multi-source information. The method for extracting the coordinates of the kilometer marker vertices is as follows: The coordinates of the kilometer marker vertices in the world coordinate system are obtained based on the location information of the electronic map. And the coordinates of the kilometer marker vertex in the camera coordinate system at time i: （1） The reprojection errors of kilometer marker vertices, visual landmark points, and IMU multi-source information were calculated separately to obtain the optimization objective function for relocalization; where N represents the kilometer marker number. Represents the four vertices of a rectangular kilometer marker; The reprojection error of kilometer marker vertices, the reprojection error of visual landmarks, and the error of multi-source information from the IMU are calculated using the following formulas. The reprojection error of kilometer marker vertices is: （2） in, These are the camera's internal parameters; The reprojection error of visual landmarks is: （3） Any two image keyframes b k and b k+1 The error of the IMU's multi-source information between the two image keyframes is: （4） in, This represents the operation of extracting the real part of a quaternion. These represent position, rotation, velocity, and accelerometer error, respectively. Accelerometer error represents the random walk of acceleration in the IMU coordinate system, and gyroscope error represents the random drift of the gyroscope in the IMU's own coordinate system. The reprojection errors of the kilometer marker vertices, the reprojection errors of the visual landmarks, and the errors from the multi-source information of the IMU are summed to obtain the optimization objective function for relocalization, which is as follows: (5) Here, C is defined as the set of feature points matched between the initial frame of the positioning system and the last frame in which the first kilometer marker appears; B is defined as the set of multi-source information data from all IMUs; and L is defined as the set of all image frames that detected kilometer marker N. This provides prior information after marginalizing all sliding windows. For the residuals of the IMU inertial constraints between all two frames, This refers to the reprojection error of visual landmarks. The first image frame observed is the kth image frame. Image features, This represents the reprojection error of the kilometer marker's vertex.

6. A computer device, characterized in that: It includes a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the railway train visual inertial positioning method based on kilometer marker information as described in any one of claims 1-4.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the railway train visual inertial positioning method based on kilometer marker information as described in any one of claims 1-4.