Railway light object identification method based on AI identification

By using AI multimodal large models and dynamic frame sampling technology, combined with three-dimensional trajectory calculation and multi-dimensional early warning, the problems of misjudgment and false alarm in railway light object identification have been solved, and high-precision light object identification and accurate early warning have been achieved.

CN122244837APending Publication Date: 2026-06-19CHINA TIESIJU CIVIL ENGINEERING GROUP CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA TIESIJU CIVIL ENGINEERING GROUP CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

Smart Images

  • Figure CN122244837A_ABST
    Figure CN122244837A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of railway safety monitoring and intelligent image recognition technology, and relates to an AI-based method for identifying lightweight objects on railways. This invention dynamically adjusts the frame sampling frequency and utilizes a pre-trained multimodal large model for cross-modal feature comparison and false alarm elimination, outputting the probability of the lightweight object category and its location coordinates. It iteratively calculates the drift trajectory envelope and determines the minimum safe distance. Simultaneously, it integrates image quality factors, recognition consistency factors, environmental interference factors, and historical data correction factors to obtain a weighted confidence score. Finally, it matches graded warnings based on category risk weights, minimum safe distances, and confidence scores. This invention solves the problems of high false alarm rates for dynamic interference objects, deviations in the mapping between planar projection and three-dimensional motion envelope space, and frequent invalid alarms due to the lack of comprehensive multi-dimensional factors in traditional two-dimensional image detection, thus improving the accuracy and effectiveness of lightweight object identification and warnings in complex railway environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of railway safety monitoring and intelligent image recognition technology, and relates to a method for identifying lightweight objects on railways based on AI recognition. Background Technology

[0002] With the increasing density of railway networks, especially the large-scale construction of high-speed railways, the safety of railway operations along the lines faces extremely high demands. Among various external environmental hazards, lightweight floating objects such as agricultural plastic film, dust nets, and kite strings, blown up by the wind, are easily carried by airflow onto railway overhead contact lines or power supply lines, potentially causing short circuits, power outages, or even train stoppages. To achieve early detection of such hazards, current technologies generally employ video surveillance equipment deployed along railway lines combined with computer vision technology to achieve automated detection.

[0003] However, existing conventional visual detection schemes still have the following shortcomings in practical railway scenarios: First, traditional two-dimensional image detection models mainly classify targets based on the geometric and texture features of image pixels. In real open environments, dynamic elements such as birds flying across the scene, branches swaying in the wind, halos produced by locomotive headlights at night, and reflections from fixed ground structures have imaging characteristics highly similar to those of lightweight floating objects, leading to frequent misjudgments in the process of identifying lightweight objects on railways.

[0004] Secondly, existing early warning calculations are mostly based on single-frame two-dimensional images for planar projection distance estimation, using a static pixel scale to map the target's spatial position. Lightweight objects, driven by continuous airflow, exhibit non-linear motion characteristics, resulting in spatial mapping discrepancies between the two-dimensional instantaneous coordinates and the actual motion envelope of the three-dimensional power supply corridor. The geometric differences between the planar distance calculation logic and the three-dimensional physical environment cause a deviation in the quantitative assessment of the target's actual approach trend in the railway lightweight object identification process, thus limiting the ability to predict foreign object intrusion into electrified safety areas.

[0005] Finally, existing systems often fail to comprehensively consider factors such as the clarity of the current image, nighttime lighting conditions, field of vision obstruction caused by severe weather such as heavy fog and rain, and the stability of target tracking over continuous time when outputting alarm signals. Furthermore, the utilization of past experience in detecting erroneous triggers for similar target types accumulated over long-term operation is relatively simplistic, leading to a massive number of invalid alarms in harsh environments or when equipment is aging, thus reducing the overall effectiveness of railway lightweight object identification. Summary of the Invention

[0006] In view of this, in order to solve the problems mentioned in the background technology, a method for identifying lightweight objects on railways based on AI recognition is proposed.

[0007] The objective of this invention can be achieved through the following technical solution: A method for identifying lightweight objects on railways based on AI recognition, comprising: acquiring real-time video streams along the railway line, dynamically adjusting the frame sampling frequency according to the current environmental wind speed, obtaining sampled frames, and generating image URLs.

[0008] The image URL is input into a pre-trained multimodal large model for light object recognition. The visual feature vector of the image is extracted and compared with the feature template across modalities. The class probability and location coordinates of the light object target are output.

[0009] Based on the probability matching of the category, the corresponding material density and windward area are matched. Combined with the location coordinates, ambient wind speed and wind direction, the drift trajectory envelope of the light object target is calculated. The spatial distance between the drift trajectory envelope and the three-dimensional spatial model of the contact wire is calculated, and the minimum safe distance is obtained.

[0010] Image quality factors are calculated based on sampled frames, recognition consistency factors are calculated based on the intersection-union ratio of position coordinates in multiple consecutive frames, environmental interference factors are calculated based on weather visibility, historical data correction factors are calculated based on historical records of similar targets, and confidence scores are obtained by weighted summation.

[0011] The category risk weight is determined based on the category probability of the light object target. The category risk weight, minimum safe distance and confidence score are matched and the corresponding graded warning is output.

[0012] Compared with the prior art, the beneficial effects of the present invention are as follows: (1) The present invention dynamically adjusts the frame sampling frequency and compares the cross-modal features based on the multimodal large model, and introduces the similarity interval judgment of features of flying birds, tree branches, fixed structures and light reflection, and eliminates interference targets that are highly similar to the imaging features of light objects. This solves the problem of frequent misjudgment of dynamic interference elements by the traditional two-dimensional image detection model in open environment, and realizes accurate identification and false alarm elimination of light object targets in complex railway scenarios.

[0013] (2) This invention calculates and generates a drift trajectory envelope by matching material density and windward area according to category probability, starting from position coordinates, and then spatially intersecting the envelope with the three-dimensional spatial model of the contact network to obtain the minimum safe distance. This makes up for the lack of spatial mapping deviation between the two-dimensional instantaneous coordinates and the actual motion envelope of the three-dimensional power supply corridor, and realizes the three-dimensional quantitative evaluation of the nonlinear drift trajectory of light objects driven by airflow, thus improving the accuracy of predicting foreign objects intruding into the electrified safety area.

[0014] (3) This invention generates a confidence score by fusion and inputs the confidence score, category risk weight and minimum safe distance into a multi-condition rule set for matching and hierarchical early warning. This solves the problem of a large number of invalid alarms caused by the lack of comprehensive consideration of multi-dimensional reliability factors under harsh environment or equipment aging conditions. It reduces the false alarm rate of railway light object identification and avoids the direct cause of missed alarms by single-dimensional defects by balancing various factors, thereby improving the overall effectiveness of alarms. Attached Figure Description

[0015] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0016] Figure 1 This is a flowchart of a railway lightweight object identification method based on AI recognition, as described in this invention.

[0017] Figure 2 This is a flowchart illustrating the method for obtaining category probability and location coordinates in this invention.

[0018] Figure 3 This is a flowchart of the confidence score acquisition method in this invention. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0020] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0021] The following description, in conjunction with the accompanying drawings, details a specific scheme for an AI-based method for identifying lightweight objects on railways provided by this invention.

[0022] Please see Figure 1As shown, the implementation of this invention includes S1 to S5: Addressing the technical problems of lightweight objects along railway lines easily intruding into the overhead contact line due to complex weather conditions, and the limitations of traditional monitoring methods such as fixed sampling strategies, high false alarm rates, lack of spatial trajectory prediction, and limited early warning dimensions, this invention obtains image URLs through an environmental wind speed adaptive dynamic frame sampling method. These URLs are then input into a pre-trained multimodal model for lightweight object recognition. Cross-modal feature comparison between the visual encoder and text encoder, along with similarity interval filtering for false alarm exclusion items, are performed to achieve high-precision output of category probabilities and location coordinates.

[0023] Based on this, aerodynamic iterations are performed by combining the matched material density and windward area to generate a three-dimensional drift trajectory envelope and perform spatial intersection processing with the three-dimensional model of the contact network to obtain the minimum safe distance. Simultaneously, image quality, recognition consistency, environmental interference, and historical data correction factors are fused and weighted to generate a comprehensive confidence score. The category risk weight, minimum safe distance, and confidence score are then input into a multi-level threshold rule set for logical combination and matching to achieve red, orange, yellow, and blue graded early warning.

[0024] S1. Acquire real-time video streams along the railway line, dynamically adjust the frame sampling frequency according to the current environmental wind speed, obtain the sampled frames, and generate image URLs.

[0025] Since a fixed frame sampling frequency is difficult to adapt to changes in environmental wind speed, a high sampling frame rate is required to capture clear and distinguishable images when the wind speed is high and light objects move quickly. When the wind speed is low, the sampling frame rate can be appropriately reduced to save computing and transmission resources. Therefore, by acquiring the environmental wind speed of the monitoring area in real time, comparing it with a preset wind speed threshold, and dynamically selecting the frame sampling frequency to sample the real-time video stream, the sampled frames are uploaded to the server to generate image URLs.

[0026] In one specific embodiment, real-time video streams are acquired from surveillance cameras along the railway line, while the ambient wind speed in the current monitoring area is obtained via a wind speed sensor. The ambient wind speed is compared with a preset wind speed threshold. When the ambient wind speed is greater than or equal to the preset wind speed threshold, the real-time video stream is sampled at a first preset frequency; otherwise, it is sampled at a second preset frequency, resulting in sampled frames. The first preset frequency is higher than the second preset frequency. The obtained sampled frames are transmitted to a server to generate an image URL.

[0027] The preset wind speed threshold is set to 6.0 m / s based on the actual measured wind speed of the takeoff of the light object; the first preset frequency and the second preset frequency are set to 10 frames / second and 2 frames / second, respectively.

[0028] S2. Input the image URL into the pre-trained multimodal large model for light object recognition, extract the image visual feature vector and perform cross-modal comparison with the feature template, and output the category probability and location coordinates of the light object target.

[0029] Considering that single-vision detection models are susceptible to false alarms in open railway environments due to interference from birds, tree branches, fixed structures, or light reflections, and that traditional models lack semantic generalization ability for specific lightweight objects on railways, making it difficult to accurately distinguish target categories and location boundaries, this paper proposes a method to address this issue. This method inputs the image URL into a pre-trained multimodal model for lightweight object recognition. Visual encoders and text encoders are used to extract visual feature vectors and feature template feature vectors from the image, respectively. Similarity is calculated to eliminate false alarms, and a language decoder decodes the retained target features to output category probabilities and location coordinates.

[0030] In one specific embodiment, the Qwen2-VL series of visual language models trained based on the Qwen architecture is first used as a multimodal large-scale model for light object recognition. This model includes a visual encoder, a text encoder, and a language decoder. The visual encoder divides the sampling frames corresponding to the image URL into fixed-size image blocks and maps them to a sequence of visual tokens. After multi-layer self-attention feature extraction, an image visual feature vector is generated.

[0031] Simultaneously, a pre-built feature template library of lightweight objects in the railway industry and a preset structured query prompt text are loaded through a text encoder. After word segmentation and self-attention feature extraction, feature template feature vectors and query prompt text feature vectors are generated.

[0032] The feature templates in the railway industry-specific lightweight object knowledge base are text descriptions of the material and appearance attributes of typical lightweight objects along railway lines, such as dust nets and corrugated steel sheets. These templates are obtained by collecting historical images of potential lightweight object hazards and manually extracting corresponding text tags. The preset query prompt text is a set of instruction texts in a specific format that guides the model to output the text. These texts are manually written and set according to the requirements of the target detection task.

[0033] The training process of the multimodal large model for lightweight object recognition includes: using the open-source pre-trained Qwen2-VL model as the basic weights, constructing a training set of lightweight object image-text pairs for railway scenes. This training set includes lightweight object images with bounding box annotations and category labels, as well as corresponding structured descriptive text; freezing the main parameters of the visual encoder and text encoder using a low-rank adaptive fine-tuning strategy, and only updating the parameters of the low-rank matrix inserted in the Transformer layer and the language decoder; using the autoregressive cross-entropy loss function as the optimization objective, employing the AdamW optimizer, and iteratively training through the backpropagation algorithm until the loss function converges.

[0034] Further, please refer to Figure 2 As shown, the method for obtaining the category probability and location coordinates includes the following steps: S21, calculating the cosine similarity between the image visual feature vector and the feature template feature vector (including positive sample templates of light objects and negative sample templates of flying birds, tree branches, fixed structures, light reflection, etc.).

[0035] S22. Compare the calculated cosine similarity with the false alarm similarity threshold. The false alarm similarity threshold is the similarity boundary value that distinguishes light objects from various types of interference. It is determined by collecting various interference samples and inputting them into the model, calculating the similarity distribution between their image features and the corresponding interference text feature templates, taking the upper limit of the distribution and adding a preset safety margin such as 0.05.

[0036] If the cosine similarity is greater than the corresponding false positive similarity threshold, the target is determined to fall within the similarity range of the features of the false positive exclusion item and is removed; otherwise, the corresponding target is retained.

[0037] S23. For the retained target, the visual feature vector of the retained target image and the feature vector of the query prompt text are concatenated end to end in the feature dimension and input into the language decoder. The autoregressive mechanism generates a text sequence in JSON format word by word. The JSON parsing tool extracts the category description and coordinate values ​​from the text sequence. At the same time, the probability value calculated by the softmax function when the language decoder generates the category description word is extracted as the category probability. The extracted coordinate values ​​are converted into position coordinates.

[0038] S3. Match the corresponding material density and windward area according to the category probability, and calculate the drift trajectory envelope of the light object target by combining the location coordinates, ambient wind speed and wind direction. Calculate the spatial distance between the drift trajectory envelope and the three-dimensional spatial model of the contact network to obtain the minimum safe distance.

[0039] Since the drift trajectory of lightweight objects in real three-dimensional space is constrained by the aerodynamic properties of their own materials and complex meteorological conditions, two-dimensional image coordinates alone cannot accurately reflect the degree of physical threat they pose to railway overhead contact line equipment. Therefore, three-dimensional spatial evolution calculations and geometric relationship solutions are necessary. Thus, by matching material density and windward area according to category probability, a spatial coordinate sequence is generated through discrete-time step iterations using the initial coordinates and real-time meteorological parameters. A three-dimensional convex hull algorithm is then used to generate the drift trajectory envelope, and the Euclidean distance between the drift trajectory and the three-dimensional spatial model of the overhead contact line is calculated and minimized.

[0040] In one specific embodiment, the method for obtaining the drift trajectory envelope is as follows: First, based on the category probability, the target category with the highest probability value is selected as the current identification result of the lightweight object. Then, by querying a preset category-physical attribute mapping table, the corresponding material density and windward area are matched, and the target mass is determined by the material density and target volume. The category-physical attribute mapping table is constructed as follows: Common lightweight object samples along railway lines are collected; the material density is obtained by measuring the mass per unit area using a weighing method; the windward area under typical postures is obtained using a geometric measurement method; and the material density and windward area of ​​each category are stored in the table as key-value pairs. For categories where samples cannot be directly obtained, values ​​are assigned based on standard density values ​​in material handbooks and empirically estimated areas.

[0041] Then, the position coordinates are transformed from the image pixel coordinate system to a three-dimensional spatial coordinate system, serving as the initial spatial coordinates for the drift trajectory. Specifically, assuming the bottom of the identified light, floating target is in contact with the ground, the pixel coordinates of the midpoint of the bottom edge of the target's bounding box are back-projected onto the ground plane using the intrinsic and extrinsic parameter matrices of the railway line monitoring camera. The coordinates of this point in the three-dimensional spatial coordinate system are then calculated using the known camera installation height and pitch angle. For targets determined to be in a suspended state through multi-frame tracking, the ground projection point is used as a reference, and its coordinates are corrected using multi-frame parallax estimation.

[0042] Then, using the initial spatial coordinates as the starting point for iteration, and combining ambient wind speed, wind direction, material density, windward area, and target mass, iterative calculations are performed according to a set iteration step size. In each iteration, using the wind direction unit vector as the direction, the ambient wind speed, windward area, air density, and aerodynamic drag coefficient of the light object category are multiplied by the iteration step size. The resulting product is then divided by the target mass to obtain the horizontal displacement increment. The air density is taken as the standard atmospheric density value. The aerodynamic drag coefficient is pre-calibrated according to the light object category; for example, it is 1.0-1.2 for plastic film, 0.8-1.0 for dust netting, and 0.5-0.8 for paper floating objects. The calibration method can be wind tunnel experiments or optimization based on historical drift trajectory data.

[0043] Simultaneously, the gravitational component is calculated based on the target mass. Combined with the vertical velocity, frontal area, and aerodynamic drag coefficient of the current iteration step, vertical air resistance is calculated. The resultant force in the vertical direction equals the gravitational component minus vertical air resistance (when the object falls, air resistance is upward, opposite to the direction of gravity). The magnitude of vertical air resistance is determined by the vertical velocity, frontal area, air density, and aerodynamic drag coefficient, and its direction is opposite to the vertical velocity. Vertical acceleration is calculated using Newton's second law, and then integrated to obtain the vertical displacement increment. The horizontal and vertical displacement increments are accumulated and added to the spatial coordinates of the current iteration step to obtain the spatial coordinates of the next iteration step. This next iteration step's spatial coordinates are then used as the new current spatial coordinates. This iterative process is repeated, generating a spatial coordinate sequence that changes with the number of iterations.

[0044] The iteration step size is set to 0.1 to 0.5 seconds. The iteration termination condition is that the spatial coordinate sequence exceeds the railway monitoring attention area or reaches the preset maximum number of iterations, such as 200. To improve computational efficiency, the iteration can be terminated in advance when the wind direction is towards the catenary and the current spatial coordinates have been detected to have intruded into the energized safety area of ​​the catenary.

[0045] Finally, the generated spatial coordinate sequence is used as a discrete set of spatial points. The 3D convex hull algorithm is then used to extract the outermost convex polyhedron boundary of this discrete point set in 3D space. The surface of this convex polyhedron boundary is used as the drift trajectory envelope. The spatial coordinate points included in the drift trajectory envelope not only include all the vertices that constitute the envelope convex polyhedron, but also the densely sampled point set obtained by interpolating each edge of the convex polyhedron.

[0046] Furthermore, to assess the risk of intrusion into the overhead contact line during the drift of lightweight objects, it is necessary to calculate the safe distance between the drift trajectory envelope and the overhead contact line. Therefore, the method for obtaining the minimum safe distance is as follows: traverse all vertices of the convex polyhedron of the drift trajectory envelope, calculate the Euclidean distance between each vertex and a triangular facet of the overhead contact line's geometric surface in the three-dimensional spatial model, and take the minimum value among all distances from all vertices to all triangular facets as the minimum safe distance. The three-dimensional spatial model of the overhead contact line is pre-constructed using lidar scanning and includes the geometric surfaces of components such as the catenary, contact wire, and droppers.

[0047] S4. Calculate the image quality factor based on the sampled frames, calculate the recognition consistency factor based on the intersection-union ratio of the position coordinates in multiple consecutive frames, calculate the environmental interference factor based on weather visibility, calculate the historical data correction factor based on the historical records of similar targets, and obtain the confidence score by weighted summation.

[0048] Recognition results at a single moment are often susceptible to interference from factors such as image blurring, sudden changes in lighting, severe weather occlusion, and short-term model fluctuations. Directly using raw probability output for early warning lacks fault tolerance and is prone to false alarms or missed alarms. Therefore, by constructing a multi-dimensional evaluation system that encompasses image acquisition quality, temporal positioning stability, environmental and meteorological conditions, and prior historical data, various interference factors are normalized, quantified, and weighted and fused to output a confidence score that reflects the reliability of the current recognition result.

[0049] In one specific embodiment, please refer to Figure 3 As shown, the method for obtaining the confidence score includes the following steps: S41, using the Sobel operator to calculate the pixel gradients of the sampled frame in the horizontal and vertical directions respectively, obtaining the mean value of the gradient magnitude, and mapping it to the 0 to 1 interval using the maximum-minimum normalization method to obtain the image sharpness component; converting the sampled frame from the RGB color space to the HSV color space and extracting the luminance channel, calculating the variance of all pixel values ​​in this channel, and mapping it to the 0 to 1 interval using the maximum-minimum normalization method to obtain the illumination condition component; multiplying the image sharpness component and the illumination condition component and then performing normalization processing again to generate an image quality factor with a value range of 0 to 1.

[0050] S42. Obtain the bounding box coordinates of the same target tracked in multiple consecutive sampling frames. Calculate the ratio of the overlap area to the union area of ​​the bounding boxes between the i-th frame and the (i-1)-th frame as the single-frame crossover-union ratio (CURBR). Calculate the arithmetic mean of all single-frame CURBRs to generate a recognition consistency factor with a value ranging from 0 to 1. Here, i represents the current frame number in which the CURBR is being calculated, i-1 represents the previous frame number adjacent to the current frame, and the value of i starts from 2 and continues until the end of the total number of consecutive frames.

[0051] S43. Obtain the current weather visibility value collected by meteorological monitoring equipment along the railway line, perform a reciprocal operation on the weather visibility value to reflect the degree of interference, and then perform linear normalization using the reciprocal of the extreme low visibility value of railway safety concern as the upper limit and the reciprocal of the ideal high visibility value as the lower limit to generate an environmental interference factor with a value range of 0 to 1.

[0052] S44. Obtain the total number of historical alarms and the total number of historical false alarms for the same type of target. When the total number of historical alarms is greater than 0, calculate the ratio of the total number of historical false alarms to the total number of historical alarms as the historical false alarm rate. Subtract the historical false alarm rate from the constant 1 to generate a historical data correction factor. When the total number of historical alarms is equal to 0, it indicates that there is no historical prior data. Directly assign the historical data correction factor to the preset default initial value, such as 1.0, to generate a historical data correction factor with a value range of 0 to 1.

[0053] S45. Assign corresponding weight coefficients to the image quality factor, recognition consistency factor, environmental interference factor, and historical data correction factor, respectively, with the sum of all weight coefficients being 1. The method for determining the weight coefficients is as follows: collect a historical detection sample set containing real alarms and false alarms, use the manually labeled confidence level as the expected value, and employ linear regression to optimize the values ​​of each weight coefficient by minimizing the error between the weighted fusion result and the manually labeled confidence level. For example, the weight coefficient of the image quality factor can be set to 0.2, the weight coefficient of the recognition consistency factor to 0.4, the weight coefficient of the environmental interference factor to 0.2, and the weight coefficient of the historical data correction factor to 0.2.

[0054] Each factor is multiplied by its corresponding weight coefficient and then summed to generate a confidence score ranging from 0 to 1. The magnitude of this confidence score directly represents the overall credibility of the currently identified light object target as a real threat target. The closer the score is to 1, the more credible the target is and the more attention it should be paid to. The closer the score is to 0, the greater the probability that it is a false alarm or invalid interference.

[0055] S5. Determine the category risk weight based on the category probability of the light object target, match the category risk weight, minimum safe distance and confidence score and output the corresponding graded warning.

[0056] Considering the varying degrees of short-circuit, entanglement, or mechanical damage hazards caused to overhead contact lines by lightweight objects of different materials, and the inability of single-dimensional distance or probability indicators to cover the comprehensive risk situation under complex operating conditions, a multi-dimensional coupled early warning and decision-making mechanism is needed. Therefore, by identifying the target category with the highest probability value and assigning a category risk weight corresponding to the material's hazard level, the category risk weight, minimum safe distance, and confidence score are input into a rule set composed of multi-condition judgment statements. These values ​​are then compared with pre-defined multi-level thresholds, and an early warning level is matched based on logical combinations.

[0057] In one specific embodiment, firstly, the target category with the highest probability value is determined. Then, by querying a pre-built category risk weight mapping table, the corresponding value is directly assigned as the category risk weight based on the material hazard level of the target category.

[0058] The method for constructing the category risk weight mapping table is as follows: based on the statistical manual of railway catenary short circuit and mechanical damage accidents, the severity of the damage consequences of various materials is assessed and the hazard level is classified. For example, metal color steel tiles are classified as high hazard level, plastic dustproof nets are classified as medium hazard level, and paper floating objects are classified as low hazard level. A corresponding numerical range is assigned to each hazard level. For example, high hazard level is assigned a value of 0.8 to 1.0, medium hazard level is assigned a value of 0.4 to 0.7, and low hazard level is assigned a value of 0.1 to 0.3.

[0059] Then, the category risk weight, minimum safe distance, and confidence score are input into a rule set consisting of multi-condition judgment statements. This rule set comprises a series of IF-THEN logical statements, each rule corresponding to a logical combination of parameter conditions and pointing to one of the four warning levels: red, orange, yellow, and blue. For example, a typical IF-THEN rule statement can be expressed as: IF(Category Risk Weight > 0.8) AND (Minimum Safe Distance < 2.0m) AND (Confidence Score > 0.9) THEN - Output Red Warning.

[0060] In the rule set, the first weight threshold is set to be greater than the second weight threshold, the first distance threshold is set to be less than the second distance threshold, and the first confidence threshold is set to be greater than the second confidence threshold.

[0061] The category risk weights are compared numerically with the first weight threshold and the second weight threshold in turn. The minimum safe distance is compared numerically with the first distance threshold and the second distance threshold in turn. The confidence score is compared numerically with the first confidence threshold and the second confidence threshold in turn.

[0062] Wherein, the first distance threshold is the sum of the minimum electrical safety distance of the contact network and the possible drift distance of light objects within the emergency response time of the operation and maintenance personnel, and the second distance threshold is 2 to 3 times the first distance threshold; the first weight threshold and the first confidence threshold are respectively the lower quartile values ​​of the category risk weight and confidence score in historical real intrusion events, and the second weight threshold and the second confidence threshold are respectively the upper quartile values ​​of the corresponding parameters in historical false alarm events.

[0063] Based on the logical combination of all the above comparison results, corresponding graded warnings are matched and output from the four warning levels: red, orange, yellow, and blue. Among them, the red warning indicates a high-risk level that requires immediate action. Its triggering conditions are that the category risk weight is greater than the first weight threshold, the minimum safe distance is less than the first distance threshold, and the confidence score is greater than the first confidence threshold, indicating that highly hazardous light objects have invaded the safety critical area of ​​the overhead contact line and the detection results are highly reliable.

[0064] An orange alert indicates a high level of risk, requiring close monitoring and preparedness for action. The triggering conditions are that the category risk weight is greater than the second weight threshold, the minimum safe distance is less than the second distance threshold, and the confidence score is greater than the second confidence threshold, but the conditions for a red alert cannot be met simultaneously.

[0065] A yellow alert indicates a medium-risk level, requiring enhanced monitoring. It is triggered when one or both of the following conditions are met: the category risk weight is greater than the second weight threshold, the minimum safe distance is less than the second distance threshold, or the confidence score is greater than the second confidence threshold, and the conditions for a red or orange alert are not met simultaneously.

[0066] A blue alert indicates a low-risk level, requiring continued routine monitoring. It is triggered when the conditions for red, orange, or yellow alerts are not met, but light, airborne objects are detected. The specific outputs for each alert level can be linked to audible and visual alarm devices to push alert signals to relevant maintenance personnel in real time.

[0067] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product.

[0068] Those skilled in the art will recognize that the algorithmic steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this application.

[0069] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.

[0070] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0071] Finally, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. An AI recognition-based railway light object identification method, characterized by, include: Acquire real-time video streams along the railway line, dynamically adjust the frame sampling frequency based on the current environmental wind speed, obtain sampled frames, and generate image URLs; The image URL is input into a pre-trained multimodal large model for light object recognition. The visual feature vector of the image is extracted and compared with the feature template across modalities. The class probability and location coordinates of the light object target are output. Based on the category probability, match the corresponding material density and windward area, and combine the location coordinates, ambient wind speed and wind direction to calculate the drift trajectory envelope of the light object target. Calculate the spatial distance between the drift trajectory envelope and the three-dimensional spatial model of the contact wire to obtain the minimum safe distance. Image quality factor is calculated based on sampled frames, recognition consistency factor is calculated based on intersection-union ratio of position coordinates in multiple consecutive frames, environmental interference factor is calculated based on weather visibility, historical data correction factor is calculated based on historical records of similar targets, and confidence score is obtained by weighted summation. The category risk weight is determined based on the category probability of the light object target. The category risk weight, minimum safe distance and confidence score are matched and the corresponding graded warning is output. 2.The railway light object identification method based on AI identification according to claim 1, wherein The method for obtaining the image URL is as follows: Acquire real-time video streams from monitoring equipment along the railway line and the ambient wind speed in the current monitoring area; The ambient wind speed is compared with a preset wind speed threshold. When the ambient wind speed is greater than or equal to the preset wind speed threshold, the real-time video stream is sampled at a first preset frequency. Otherwise, the real-time video stream is sampled at a second preset frequency to obtain a sampled frame. The sampled frame is transmitted to the server to generate an image URL. 3.The railway light object identification method based on AI identification according to claim 1, wherein The method for obtaining the image visual feature vector is as follows: The image URL is input into the pre-trained multimodal model for light object recognition. The visual encoder of the multimodal model for light object recognition performs image block processing and self-attention feature extraction on the sampled frames corresponding to the image URL to generate image visual feature vectors. 4.The railway light object identification method based on AI identification according to claim 3, wherein While generating the image visual feature vector, the process also includes: obtaining feature templates and preset query prompt text from a railway industry-specific lightweight object knowledge base through a text encoder; performing text segmentation and self-attention feature extraction on the feature templates and query prompt text to generate feature vectors for the feature templates and query prompt text. 5.The railway light object identification method based on AI identification according to claim 4, wherein The method for obtaining the category probability and location coordinates is as follows: Calculate the feature similarity between the visual feature vector of the image and the feature vector of the feature template; Determine whether the feature similarity falls within the similarity range of the false positive exclusion feature. If yes, remove the corresponding target; otherwise, retain the corresponding target. For the retained target, the visual feature vector of the retained target image is concatenated with the feature vector of the query prompt text. Autoregressive decoding is performed by the language decoder of the multimodal large model for light object recognition. The output is a text sequence containing category description and coordinate values. The text sequence is parsed, and the category probability and location coordinates output by the language decoder during the decoding process are extracted.

6. The AI recognition-based railway light object identification method of claim 1, wherein The method for obtaining the drift trajectory envelope is as follows: Match the corresponding material density and windward area based on the category probability; The position coordinates are transformed into a three-dimensional spatial coordinate system as the initial spatial coordinates of the drift trajectory; Starting from the initial spatial coordinates, the system iterates according to the set iteration step size, taking into account the ambient wind speed, wind direction, material density, and windward area. In each iteration, the system calculates the horizontal displacement increment using the wind direction as the direction vector and the ambient wind speed and windward area as scalars. The system calculates the vertical displacement increment based on the air resistance correction coefficient corresponding to the target mass and material density, as well as the gravitational acceleration. The horizontal and vertical displacement increments are then added to the current spatial coordinates to generate a spatial coordinate sequence that changes with the number of iterations. The outer boundary of the spatial coordinate sequence in three-dimensional space is extracted, and the three-dimensional convex hull algorithm is used to generate the envelope of the drift trajectory.

7. The AI recognition-based railway light object identification method of claim 1, wherein, The method for obtaining the minimum safe distance is as follows: Traverse the spatial coordinate points contained in the drift trajectory envelope, calculate the Euclidean distance between each spatial coordinate point and the geometric surface of the contact network in the three-dimensional spatial model of the contact network, and take the minimum value among all Euclidean distances as the minimum safe distance. 8.The railway light object identification method based on AI identification according to claim 1, wherein, The confidence score is obtained as follows: The gradient magnitude of adjacent pixels in the sampled frame is calculated and normalized to obtain the image sharpness component. The variance of the brightness channel of the sampled frame is calculated and normalized to obtain the illumination condition component. The image sharpness component and the illumination condition component are multiplied and normalized to generate the image quality factor. The location coordinates of the same target in multiple consecutive sampling frames are obtained, the intersection-union ratio (CIU) of the bounding boxes corresponding to the location coordinates between adjacent frames is calculated, and the arithmetic mean of all CIUs is calculated to generate the recognition consistency factor. The weather visibility of meteorological monitoring equipment along the railway line is obtained, and the weather visibility is calculated by reciprocal and normalized to generate an environmental interference factor. Obtain the total number of historical alarms and the total number of historical false alarms for the same type of target. Calculate the ratio of the total number of historical false alarms to the total number of historical alarms as the historical false alarm rate. Subtract the historical false alarm rate from a constant to generate a historical data correction factor. The image quality factor, recognition consistency factor, environmental interference factor, and historical data correction factor are each assigned a corresponding weight coefficient. The confidence score is generated by multiplying each factor by its corresponding weight coefficient and summing the results. 9.The railway light object identification method based on AI identification of claim 1, wherein, The method for determining the category risk weights is as follows: Identify the target category with the highest probability value and assign the corresponding value as the category risk weight based on the material hazard level of the target category. 10.The railway light object identification method based on AI identification according to claim 1, wherein, The method for obtaining the tiered early warning is as follows: Input the category risk weight, minimum safe distance, and confidence score into the rule set consisting of multi-condition judgment statements; In the rule set, the category risk weight is compared with the first weight threshold and the second weight threshold in turn; the minimum safe distance is compared with the first distance threshold and the second distance threshold in turn; and the confidence score is compared with the first confidence threshold and the second confidence threshold in turn. Based on the logical combination of all the above comparison results, the corresponding graded warning is matched and output from the four warning levels of red, orange, yellow and blue.