Railway facility three-dimensional scene generation method, device and equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining image, radar, and ultrasonic data to generate a 3D scene of the track facility, the problem of distorted reconstruction results in existing technologies has been solved, and accurate identification and high-quality rendering of the risk of internal voids in the track facility have been achieved.

CN122089968BActive Publication Date: 2026-06-30CHINA RAILWAY LIUYUAN GRP CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA RAILWAY LIUYUAN GRP CO LTD
Filing Date: 2026-04-22
Publication Date: 2026-06-30

Application Information

Patent Timeline

22 Apr 2026

Application

30 Jun 2026

Publication

CN122089968B

IPC: G06T17/00; G06T19/20; G06T7/73; G06T15/20

AI Tagging

Technology Topics

Computer graphics (images)Trackway

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies cannot accurately identify the risk of internal voids when reconstructing three-dimensional scenes of track facilities, resulting in distorted reconstruction results, physical inconsistencies, and a lack of precise measurability, making it difficult to meet the needs of digital operation and maintenance.

Method used

By combining image data, radar data, and ultrasonic data, a pre-trained scene generation model is used to determine the rendering density and color, generate a 3D scene representation of the track facility, and render it, thus achieving deep fusion of image data, radar data, and ultrasonic data.

Benefits of technology

It improves the realism and visualization of 3D scene representation, can accurately determine whether there is a risk of voids inside the track facility, and generates high-quality rendered images.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122089968B_ABST

Patent Text Reader

Abstract

This disclosure provides a method, apparatus, device, and storage medium for generating a 3D scene of a railway facility. The method includes: acquiring image data, radar data, and ultrasonic data of the railway facility in a cross-sectional direction; determining an image sampling path from the image data; determining the position coordinates and density label of each sampling point in the railway facility based on the radar data and the ultrasonic data; using a pre-trained scene generation model, determining a rendering density and rendering color based on the image sampling path, the position coordinates, and the density label; generating a 3D scene representation of the railway facility based on the position coordinates, the rendering density, and the rendering color; and rendering the 3D scene representation to obtain a rendered image of the railway facility. This makes the determined rendering density and rendering color more accurate and closely reflects the actual situation of the railway facility, enabling precise judgment of the potential for voids within the railway facility based on the rendered image.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing technology, and in particular to a method, apparatus, device and storage medium for generating three-dimensional scenes of rail facilities. Background Technology

[0002] In modern transportation systems, rail facilities (such as railway tracks and subway tunnels) are critical transportation infrastructures, and their safety and stability directly affect the normal operation of the entire transportation system. However, using single data points of rail facilities to reconstruct a 3D scene and then using that scene to determine whether there is a risk of voids inside the rail facilities can lead to inaccurate identification of such risks due to the inaccuracy of the reconstructed 3D scene.

[0003] In view of this, how to improve the accuracy of 3D scene reconstruction to accurately identify whether there is a risk of voids inside the track facility has become an urgent technical problem to be solved.

[0004] Based on the background description, the detection and quantification of hidden defects in rail transit infrastructure is a core challenge for operational safety. Related technical solutions mainly utilize non-destructive testing (NDT) and 3D visual reconstruction techniques. However, NDT techniques, such as ground-penetrating radar and ultrasonic testing, can acquire internal structural anomaly signals, but the output is only sparse, discrete detection point data, unable to directly form a complete 3D geometric model. 3D visual reconstruction techniques, such as photogrammetry, structured light scanning, and lidar, can reconstruct surface geometry with high precision, but cannot penetrate the surface to detect internal defects. Combining these two techniques, utilizing sparse internal detection data to guide or constrain the visual reconstruction process, is key to solving the problem, but related methods have significant shortcomings in this regard.

[0005] To address the aforementioned technical problems, two main approaches are used: traditional interpolation and geometric fitting methods, or fusion reconstruction methods based on classical neural radiance fields (NeRF).

[0006] Traditional interpolation and geometric fitting methods involve spatial interpolation (e.g., Kriging interpolation, radial basis function interpolation) of sparse anomaly points acquired by ground-penetrating radar to generate a continuous anomaly probability field or isosurface. However, the interpolation process lacks physical constraints, resulting in unrealistic geometric shapes that do not conform to material continuity (e.g., isolated bubble-like voids) in extremely sparse data regions. It also fails to integrate naturally with high-precision surface visual geometry, creating inconsistencies between the internal and external structures. Furthermore, the reconstructed result is typically a coarse mesh lacking fine geometric details.

[0007] A fusion reconstruction method based on classic NeRF involves inputting multi-view images of the surface into a standard NeRF model for training, and optimizing using only color and depth (if available) losses to obtain an implicit representation of the scene. However, standard NeRF relies entirely on optical consistency, resulting in poor reconstruction performance for textureless regions, repetitive structures, or occluded areas; it also cannot utilize non-visual physical detection data (e.g., radar reflection intensity, ultrasonic propagation time). These data provide crucial internal geometric cues, but the traditional NeRF framework lacks corresponding modeling and constraint methods; furthermore, it is prone to producing blurred or erroneous geometry (e.g., floating object artifacts) in sparse data regions.

[0008] The relevant technical solutions mainly have the following technical problems:

[0009] (1) Distortion of internal geometry reconstruction (the most significant drawback): With only a few detection points, the reconstructed internal defect shape (e.g., void) is seriously inconsistent with the actual situation, often being too idealized (e.g., spherical) or fragmented, and cannot be used for accurate engineering risk assessment and maintenance scheme design.

[0010] (2) Hard fusion of multi-source data with physical inconsistencies: Related methods attempt to stitch the visually reconstructed surface model with the interpolated internal anomalies using Boolean operations. This hard fusion often produces discontinuous geometry and morphologies that violate material mechanics at the interface (e.g., suspended surfaces appearing out of thin air).

[0011] (3) Inability to achieve precise and measurable reconstruction: The reconstruction results are only schematic models, lacking accurate boundary, volume and spatial location information, which makes it difficult to meet the needs of precise quantification and analysis of disease size and development trend in digital operation and maintenance.

[0012] For hidden defects inside rail transit infrastructure (e.g., tunnel linings, bridge components), how to accurately reconstruct a measurable and physically realistic three-dimensional geometric model of the defect, under the condition of only extremely sparse detection point data (from ground penetrating radar, ultrasound, etc.) and limited surface visual images, so as to achieve a precise digital representation of the defect's location, size, and morphology, has become an urgent technical problem to be solved. Summary of the Invention

[0013] In view of this, the purpose of this disclosure is to provide a method, apparatus, device and storage medium for generating three-dimensional scenes of rail facilities to solve or partially solve the above-mentioned technical problems.

[0014] To achieve the above objectives, the first aspect of this disclosure proposes a method for generating a three-dimensional scene of a rail facility, the method comprising:

[0015] Acquire image data, radar data, and ultrasonic data of the track facilities in the cross-sectional direction;

[0016] The image sampling path is determined from the image data, and the position coordinates and density labels of each sampling point in the track facility are determined based on the radar data and the ultrasonic data.

[0017] Using a pre-trained scene generation model, the rendering density and rendering color are determined based on the image sampling path, the location coordinates, and the density label;

[0018] A three-dimensional scene representation of the track facility is generated based on the location coordinates, the rendering density, and the rendering color, and the three-dimensional scene representation is rendered to obtain a rendered image of the track facility.

[0019] Based on the same inventive concept, a second aspect of this disclosure proposes a three-dimensional scene generation device for rail facilities, comprising:

[0020] The acquisition module is configured to acquire image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction;

[0021] The first determining module is configured to determine the image sampling path from the image data and determine the position coordinates and density label of each sampling point in the track facility based on the radar data and the ultrasonic data.

[0022] The second determining module is configured to use a pre-trained scene generation model to determine the rendering density and rendering color based on the image sampling path, the position coordinates, and the density label.

[0023] The 3D scene generation module is configured to generate a 3D scene representation of the track facility based on the position coordinates, the rendering density, and the rendering color, and to render the 3D scene representation to obtain a rendered image of the track facility.

[0024] Based on the same inventive concept, a third aspect of this disclosure proposes an electronic device including a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor implements the method described above when executing the computer program.

[0025] Based on the same inventive concept, a fourth aspect of this disclosure provides a non-transitory computer-readable storage medium that stores computer instructions for causing a computer to perform the methods described above.

[0026] As described above, this disclosure provides a method, apparatus, device, and storage medium for generating 3D scenes of track facilities. The process involves acquiring image data, radar data, and ultrasonic data of the track facility along its cross-sectional direction. An image sampling path is determined from the image data, and the position coordinates and density labels of each sampling point in the track facility are determined based on the radar and ultrasonic data. A pre-trained scene generation model is used to determine the rendering density and rendering color based on the image sampling path, position coordinates, and density labels. A 3D scene representation of the track facility is generated based on the position coordinates, rendering density, and rendering color, and this 3D scene representation is rendered to obtain a rendered image of the track facility. This approach enables deep fusion of image data, radar data, and ultrasonic data, resulting in more accurate and realistic rendering densities and colors that better reflect the actual conditions of the track facility. This effectively enhances the realism and visualization of the 3D scene representation, yielding high-quality rendered images of the track facility, which can then be used to accurately determine the potential for voids within the track facility. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in this disclosure or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0028] Figure 1 A flowchart of a method for generating a three-dimensional scene of track facilities according to an embodiment of this disclosure;

[0029] Figure 2 This is a schematic diagram of the structure of a three-dimensional geometric reconstruction system for hidden defects in rail transit facilities based on neural radiation fields and sparse sensor data, according to an embodiment of this disclosure.

[0030] Figure 3 This is a flowchart illustrating multi-resolution progressive training in an embodiment of the present disclosure;

[0031] Figure 4 A flowchart illustrating the training algorithm for the scene generation model in this embodiment of the disclosure;

[0032] Figure 5 This is a schematic diagram of the structure of a three-dimensional scene generation device for track facilities according to an embodiment of the present disclosure;

[0033] Figure 6 This is a schematic diagram of the structure of an electronic device according to an embodiment of the present disclosure. Detailed Implementation

[0034] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.

[0035] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this disclosure should have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms "first," "second," and similar terms used in the embodiments of this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0036] The terms used in this disclosure are explained as follows:

[0037] Neural Radiation Field (NeRF): A deep learning technique for 3D scene reconstruction and view synthesis. By training a multilayer perceptron (MLP) neural network, spatial coordinates and viewing direction are mapped to volume density and color values, thus implicitly representing a continuous 3D scene.

[0038] Hidden defects are structural damages that cannot be directly observed inside or beneath the surface of rail transit infrastructure. Examples include voids and cavities behind tunnel linings, honeycombing and pores inside concrete, and corrosion and expansion of reinforcing steel. These defects cannot be directly detected using conventional visual inspection methods.

[0039] Ground Penetrating Radar (GPR) is a non-destructive detection technology that uses high-frequency electromagnetic waves to detect anomalies underground or inside structures. By analyzing the reflected signals of electromagnetic waves, regions with abrupt changes in dielectric constant can be identified, thereby inferring the existence and approximate location of defects such as cavities and stratification.

[0040] Ultrasonic thickness measurement: A technique that uses the speed and reflection time of ultrasonic waves in a material to accurately measure its thickness or detect internal defects. For concrete structures, it can detect the thickness of the protective layer, the location of internal voids, etc.

[0041] Volume density: In NeRF, it represents the probability that a point in space is occupied or the optical density of the medium at that point. High density values typically correspond to solid surfaces or interiors, while low density values correspond to air or void regions.

[0042] Multi-resolution Progressive Training: A training strategy that uses low-resolution feature representations to learn the global geometry of the scene in the early stages of training, and gradually increases the resolution to capture finer details as training progresses.

[0043] Based on the above description, such as Figure 1 As shown in this embodiment, the method for generating a three-dimensional scene of a track facility includes:

[0044] Step 101: Acquire image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction.

[0045] In practical implementation, rail facilities refer to various civil engineering structures, electromechanical equipment, and auxiliary systems installed to ensure the normal and safe operation of the rail transit system. For example, rail facilities can include bridges, tunnels, tracks, roadbeds, stations, control centers, and vehicle depots. In this embodiment, the rail facility is preferably a tunnel.

[0046] An industrial-grade panoramic camera was used to capture cross-sectional images every 2 meters along the axis of the track facility (e.g., a tunnel). Each cross-section was captured from four directions (arch, invert, left wall, and right wall), with approximately 30% overlap between adjacent directions. A total of 140 high-resolution images were obtained, with a resolution of 4096×2048 pixels, and stored in JPEG format. LED supplemental lighting was used during acquisition to ensure uniform illumination.

[0047] Ground-penetrating radar (GPR) was used to collect radar data. The GPR employed a 500MHz shielded antenna (e.g., 512 sampling points, 50ns time window) to deploy 10 longitudinal survey lines at equal intervals along the circumference of the track facility (e.g., tunnel). These lines were located at the arch crown (90°), left arch waist (135°), inner side of left arch waist (120°), left left wall (180°), inner side of left left wall (160°), invert arch (270°), right arch waist (225°), inner side of right arch waist (240°), right right wall (0°), and inner side of right right wall (20°). The survey lines were spaced 0.3m apart, with a total length of 700m. Additionally, a circumferential survey line was deployed every 5 rings (6m) for cross-validation.

[0048] Using a 50kHz concrete ultrasonic instrument, test points were laid out in a 1.0m×1.0m grid in the suspected cavity area, and the density was increased to 0.5m×0.5m in the abnormal area detected by ground penetrating radar.

[0049] This disclosure describes the detection of hidden defects in a 70m section (K12+350~K12+420) of a subway tunnel's downline. This section is a circular tunnel constructed using the shield tunneling method, with an inner diameter of 5.5m, an outer diameter of 6.2m, a segment thickness of 350mm, and a ring width of 1.2m. Preliminary inspection revealed water seepage at the segment joints, suggesting the possible presence of voids behind the lining.

[0050] Step 102: Determine the image sampling path from the image data, and determine the position coordinates and density label of each sampling point in the track facility based on the radar data and the ultrasonic data.

[0051] In practice, the center point coordinates and viewing direction are determined from the image data, and the image sampling path is determined based on these coordinates. The lining depth of the track facility is determined based on radar data, and the first position coordinates of each radar sampling point are determined based on the lining depth. The first density value and first density label of each radar sampling point are determined based on the amplitude parameters in the radar data. The grid position of each ultrasonic sampling point is converted into corresponding second position coordinates. The lining thickness of the track facility is determined based on the ultrasonic data, and the second density value and second density label of each ultrasonic sampling point are determined based on the lining thickness.

[0052] Step 103: Using a pre-trained scene generation model, determine the rendering density and rendering color based on the image sampling path, the position coordinates, and the density label.

[0053] In practice, a pre-trained scene generation model is used to perform linear interpolation on the density labels to obtain multiple feature vectors. These feature vectors are then concatenated to obtain concatenated features, and the position coordinates are encoded to obtain position-encoded features. The concatenated features and position-encoded features are then concatenated to obtain intermediate features and volume density. Finally, the intermediate features and position-encoded features are concatenated to obtain color features. The rendering color is determined based on the volume density and color features of multiple sampling points along the image sampling path, and the rendering depth is determined based on the volume density of these sampling points.

[0054] Step 104: Generate a three-dimensional scene representation of the track facility based on the location coordinates, the rendering density, and the rendering color, and render the three-dimensional scene representation to obtain a rendered image of the track facility.

[0055] In practice, a three-dimensional scene representation of the track facility is generated based on the location coordinates, rendering density, and rendering color, and the three-dimensional scene representation is rendered to obtain a rendered image of the track facility.

[0056] Figure 2This is a schematic diagram of the structure of a three-dimensional geometric reconstruction system for hidden defects in rail transit facilities based on neural radiation fields and sparse sensor data, according to an embodiment of this disclosure. Figure 2 As shown, the system includes: an input layer, a physically constrained implicit neural radiation field, a training controller, loss calculation and parameter optimization, and an output layer. In the input layer, multi-view images of the surface are acquired. Image data, camera parameters P, sparse interior detection data (radar data), and coordinates and attributes of detection points (ultrasonic data). In a physically constrained implicit neural radiation field, the image sampling path for sampling points is determined based on camera parameters P and ray casting. ,in, To observe the distance, To determine the observation direction, a multilayer perceptron (MLP) within a neural radiation field network (NeRF) is used to determine volume density based on the image sampling path. and color features Utilizing volume rendering integrals based on volume density and color features Determine the rendering color (Composite color) and rendering density (Synthetic depth), physical constraint operators are determined based on ultrasonic data. During controller training, the corresponding feature mesh G is invoked using the resolution scheduler. In loss calculation and parameter optimization, the rendering color is considered. Determine the color loss function According to rendering density Determine the depth monitoring loss function According to volume density Determine the sparse loss function According to the physical constraint operator Determine the continuity loss function For color loss function Deep monitoring loss function sparse loss function and continuous loss function The total loss function is obtained by summing. Using the total loss function The model parameters of the neural radiation field network are updated to obtain the scene generation model. In the output layer, the neural radiation field network outputs an implicit scene representation. Isosurface extraction is performed on the implicit scene representation to obtain a 3D mesh model. New view synthesis is then performed on the implicit scene representation to obtain a rendered image / depth map.

[0057] Through the above embodiments, image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction are acquired. An image sampling path is determined from the image data, and the position coordinates and density labels of each sampling point in the track facility are determined based on the radar and ultrasonic data. Using a pre-trained scene generation model, the rendering density and rendering color are determined based on the image sampling path, position coordinates, and density labels. A 3D scene representation of the track facility is generated based on the position coordinates, rendering density, and rendering color, and the 3D scene representation is rendered to obtain a rendered image of the track facility. This enables deep fusion of image data, radar data, and ultrasonic data, making the determined rendering density and rendering color more accurate and consistent with the actual situation of the track facility. This effectively improves the realism and visualization effect of the 3D scene representation, resulting in a high-quality rendered image of the track facility, which can then be used to accurately determine whether there is a risk of voids inside the track facility.

[0058] In some embodiments, step 102 includes:

[0059] Step 1021: Determine the center point coordinates and the viewing direction from the image data, and determine the image sampling path based on the center point coordinates and the viewing direction.

[0060] In practice, the Structure from Motion (SFM) algorithm is used to estimate the camera intrinsic and extrinsic parameters for each image. Specific settings include: feature extraction using Scale-Invariant Feature Transform (SIFT), a peak threshold of 20, and an edge threshold of 10; exhaustive matching for feature matching, with guided matching and cross-validation enabled. The output is the camera center coordinates for each image. and rotation matrix Thus, the direction of observation is obtained. (Camera principal optical axis direction). All camera poses are unified to a world coordinate system with the tunnel design axis as the Z-axis (mileage direction) and the tunnel cross-section as the XY plane (X horizontal to the right, Y vertical upward).

[0061] For each image in the image data, 1024 image sampling paths (light rays) are randomly sampled, with each sampling path corresponding to one pixel in the image. For each image sampling path... ,in, A function for the image sampling path. The coordinates of the camera center are Sampling time, To determine the direction of observation, the camera is positioned on the near plane (distance between the camera and the near plane). ) and the distant plane (distance between the camera and the distant plane) Sampling is performed between ( ). A coarse-fine combined strategy is adopted: First, in 64 sampling points were uniformly sampled, and the volume density estimates of these sampling points were calculated. A weight distribution (i.e., cumulative transparency) was then calculated based on the volume density. Finally, 64 more sampling points were finely sampled in regions with high weights, resulting in 128 sampling points. The coordinates of all sampling points were then determined. Used for subsequent volume rendering integrals.

[0062] Step 1022: Determine the lining depth of the track facility based on the radar data, determine the first position coordinates of each radar sampling point based on the lining depth, and determine the first density value and first density label of each radar sampling point based on the amplitude parameters in the radar data.

[0063] In practice, radar data undergoes preprocessing to obtain preprocessed radar data. This preprocessing includes: zero-time correction, background removal, gain adjustment, bandpass filtering, and time-depth conversion. Specifically, during zero-time correction, channels are aligned according to the direct wave arrival time of the radar data; during background removal, the average channel is subtracted from each channel to eliminate horizontal layered interference; during gain adjustment, exponential gain is used to compensate for signal attenuation; during bandpass filtering, the effective frequency band around the center frequency of 500MHz (e.g., 300~700MHz) is retained; and during time-depth conversion, the relative permittivity of concrete is used. (On-site calibration), calculate lining depth ,in, At the speed of light, This refers to the two-way travel time of radar data.

[0064] On the preprocessed B-scan image, anomalous reflection areas (strong amplitude, multiples, phase axis misalignment) are manually picked out. The mileage (provided by the encoder), circumferential angle (determined by the survey line position), and depth (obtained from time-depth conversion) of each anomalous point are recorded. The (mileage, angle, depth) are then converted to Cartesian coordinates using the tunnel cross-section geometry model. ,in, The radius from the tunnel center to the inner surface of the lining. The depth from the inner surface of the lining outward (a positive value indicates outward). This is the circumferential angle (in radians). For radar sampling points, This refers to the detection depth.

[0065] Based on the preliminary interpretation of radar data collected by ground-penetrating radar, a key area of concern was identified: mileage K12+380~K12+400, circumferential angle 60°~120° (near the arch), and radial distance 2.6~3.2m (0~0.45m behind the lining, measured from the tunnel center). Random and uniform sampling was conducted within this cuboid area. A number of spatial points are used to calculate the material continuity loss. Sampling ensures that no two spatial points are repeated and that the entire area is covered.

[0066] Step 1023: Convert the grid position of each ultrasonic sampling point into the corresponding second position coordinates, determine the lining thickness of the track facility based on the ultrasonic data, and determine the second density value and second density label of each ultrasonic sampling point based on the lining thickness.

[0067] In practice, the ultrasound propagation time is recorded at each ultrasonic sampling point. Combined with concrete wave velocity (On-site calibration) Calculate lining thickness When the measured lining thickness is significantly greater than the design thickness (350 mm), it is inferred that there may be voids or loose areas behind the ultrasonic sampling points. The three-dimensional coordinates of each ultrasonic sampling point were obtained by total station measurement and converted to the world coordinate system.

[0068] The above scheme determines the center point coordinates and viewing direction from the image data, and then determines the image sampling path based on these coordinates. This facilitates subsequent determination of rendering color and depth based on the image sampling path. The lining depth of the track facility is determined based on radar data. The first position coordinates of each radar sampling point are then determined based on the lining depth, and the first density value and first density label of each radar sampling point are determined based on the amplitude parameters in the radar data, ensuring accurate determination of the first density value and first density label for each radar sampling point. The grid position of each ultrasonic sampling point is converted into corresponding second position coordinates. The lining thickness of the track facility is determined based on the ultrasonic data, and the second density value and second density label of each ultrasonic sampling point are determined based on the lining thickness, ensuring accurate determination of the second density value and second density label for each ultrasonic sampling point.

[0069] In some embodiments, the method further includes:

[0070] Step 1024: Obtain borehole data from the borehole sampling points on the track facility.

[0071] In practice, five boreholes (28mm in diameter) were drilled at key anomaly locations determined by a combination of radar and ultrasonic data. The boreholes penetrated the lining and entered the surrounding rock to a depth of 0.5m. A borehole endoscope was used to observe the existence of cavities and record the depth of the cavity boundaries. The coordinates of the borehole sampling points were precisely measured using a total station and used as hard constraint anchor points.

[0072] Step 1025: In response to the borehole status of the borehole data being a void state, the third density value of the borehole sampling point is determined to be a first preset value, and the third density label of the borehole sampling point is determined to be a void label.

[0073] In practice, when the borehole status of the borehole data is a void state, the third density value of the borehole sampling point is determined to be the first preset value, and the third density label of the borehole sampling point is determined to be a void label. For example, the first preset value is 0.05.

[0074] Step 1026: In response to the borehole state of the borehole data being in a dense state, the third density value of the borehole sampling point is determined to be a second preset value, and the third density label of the borehole sampling point is determined to be a dense label.

[0075] In practice, when the borehole status of the borehole data is dense, the third density value of the borehole sampling point is determined to be the second preset value, and the third density label of the borehole sampling point is determined to be a dense label. For example, the second preset value is 10.0.

[0076] The three-dimensional coordinates of all sampling points (including radar sampling points, ultrasonic sampling points, and borehole sampling points) were unified to the same world coordinate system. A total of M=187 valid sampling points were obtained, including 142 radar sampling points, 40 ultrasonic sampling points, and 5 borehole sampling points. Each sampling point is accompanied by original physical attribute values (e.g., radar data amplitude, ultrasonic thickness anomaly index, borehole observation category). The physical attribute values of each sampling point were mapped to density values, and the mapping rules were designed based on engineering experience. The mapping relationship between the physical attribute values and density values of each sampling point is shown in Table 1.

[0077] Table 1. Mapping relationship between physical attribute values and density values for each sampling point.

[0078]

[0079] In Table 1, the radar data amplitude is obtained after normalization (maximum amplitude is set to 1). In this embodiment of the disclosure, among the 187 sampling points, 81 points are marked as low density (target density < 1.0), and 106 sampling points are marked as high density (target density ≥ 1.0).

[0080] The density values obtained from the above mapping table It will be used as a sparse loss function The supervisory signal, along with the coordinates of the sampling points, is input into the subsequent neural network training process. This is achieved through a sparse loss function. The volume density predicted by the network at the sampling point is forced to approach the target value, thereby integrating physical detection information into the optimization of the neural radiation field and achieving precise guidance on the geometry of internal defects.

[0081] The above method acquires borehole data from sampling points on the track infrastructure. When the borehole data indicates a void state, the third density value of the sampling point is determined to be a first preset value, and the third density label of the sampling point is set to a void label. When the borehole data indicates a compacted state, the third density value of the sampling point is determined to be a second preset value, and the third density label of the sampling point is set to a compacted label. This allows for accurate determination of the third density and third density label for each sampling point.

[0082] In some embodiments, step 103 includes:

[0083] Step 1031: Using a pre-trained scene generation model, perform linear interpolation on the density label to obtain multiple feature vectors, concatenate the multiple feature vectors to obtain concatenated features, and encode the position coordinates to obtain position encoded features.

[0084] In practice, the scene generation model includes a three-level learnable feature grid. The resolutions of the three-level learnable feature grids are 16. 3 64 3 256 3 Each grid feature stores a 32-dimensional feature vector, initialized to a uniform distribution. The mesh feature covers the entire scene space: X direction. , Y direction Z direction (mileage) K12+350~K12+420.

[0085] For the position coordinates of the input spatial point First, trilinear interpolation is performed on each level of the feature grid to obtain three 32-dimensional feature vectors. These three 32-dimensional feature vectors are then concatenated to obtain a 96-dimensional concatenated feature vector. .

[0086] At the same time, the position coordinates of the spatial point Position encoding is performed to obtain position encoded features. ,in, For location encoding features, These are the position coordinates of a point in space. Let be the feature dimension. In the above formula, take . The output dimension is 60. Regarding the viewing direction... Similarly, position encoding is performed, and... The output dimension is 24.

[0087] Step 1032: Perform splicing processing on the splicing features and the position encoding features to obtain intermediate features and volume density.

[0088] In practical implementation, a multilayer perceptron is used. Achieve volume density from input and color features Mapping of splicing features. and location encoding features Intermediate features are obtained by splicing. and volume density The network structure of the scene generation model includes: an input layer, multiple fully connected layers, and an output layer.

[0089] For example, there could be eight fully connected layers, each containing 256 neurons and a ReLU activation function. Specifically, the features will be spliced together. (96-dimensional) and location-coded features (60-dimensional) inputs are fed into the input layer of the scene generation model, where the concatenated features of the inputs are... and location encoding features A total of 156 dimensions. Features are spliced using neurons in 8 fully connected layers. and location encoding features Intermediate features are obtained by splicing. (256-dimensional), using the softplus activation function for intermediate features Activation treatment is performed to obtain volume density. (1-dimensional) ,in, For volume density, These are the hyperparameters of the softplus activation function. intermediate features .

[0090] Step 1033: The intermediate feature and the position encoding feature are concatenated to obtain the color feature.

[0091] In practical implementation, a multilayer perceptron is used. Achieve volume density from input and color features Mapping of intermediate features. and location encoding features Color features are obtained by splicing. The network structure of the scene generation model includes: an input layer, multiple fully connected layers, and an output layer.

[0092] For example, there could be two fully connected layers: the first fully connected layer contains 256 neurons and a ReLU activation function, and the second fully connected layer contains 3 neurons. Specifically, intermediate features... (256-dimensional) and location-coded features (24-dimensional) inputs are fed into the input layer of the scene generation model, where the intermediate features of the input are... and location encoding features A total of 280 dimensions. The intermediate features are processed using neurons in the first fully connected layer. and location encoding features Intermediate features are obtained by splicing. (256-dimensional), using the sigmoid activation function in the second fully connected layer for intermediate features Activation processing is performed to obtain color features The total number of network parameters is approximately 2.3M.

[0093] Step 1034: Determine the rendering color based on the volume density and color features of multiple sampling points on the image sampling path, and determine the rendering depth based on the volume density of multiple sampling points on the image sampling path.

[0094] In practice, the target sampling point is determined from multiple sampling points along the image sampling path. The opacity of the target sampling point is then determined. and cumulative transmittance The rendered color is determined based on the opacity, cumulative transmittance, and color characteristics of the target sampling point. The rendering depth is determined based on the opacity, cumulative transmittance, and target sampling point. .

[0095] The above scheme utilizes a pre-trained scene generation model to perform linear interpolation on density labels to obtain multiple feature vectors. These feature vectors are then concatenated to obtain concatenated features, and the position coordinates are encoded to obtain position-encoded features. The concatenated features and position-encoded features are then concatenated to obtain intermediate features and volume density. Finally, the intermediate features and position-encoded features are concatenated to obtain color features. The rendering color is determined based on the volume density and color features of multiple sampling points along the image sampling path, and the rendering depth is determined based on the volume density of these sampling points. This enables deep fusion of image data, radar data, and ultrasonic data, resulting in more accurate rendering density and color that better reflects the actual conditions of the track infrastructure.

[0096] In some embodiments, step 1034 includes:

[0097] Step 10341: Determine the target sampling point from multiple sampling points on the image sampling path.

[0098] In practice, the image sampling path can be a ray of light on the image, and for each image sampling path... Determine the position of the target sampling point from multiple sampling points along the image sampling path. .

[0099] Step 10342: Determine the sampling interval between the target sampling point and the previous sampling point based on the image sampling path, and determine the opacity of the target sampling point based on the sampling interval and volume density of the target sampling point.

[0100] In practice, the sampling interval between the target sampling point and the previous sampling point is determined based on the image sampling path. ,in, This is the sampling interval (distance interval) between the target sampling point and the previous sampling point. This is the location corresponding to the previous sampling point. This refers to the location corresponding to the target sampling point.

[0101] The opacity of the target sampling point is determined based on the sampling interval and volume density of the target sampling point. ,in, The opacity of the target sampling point. The volume density of the target sampling point. This is the sampling interval between the target sampling point and the previous sampling point.

[0102] Step 10343: Determine multiple historical sampling points preceding the target sampling point, and determine the cumulative transmittance of the target sampling point based on the sampling interval and volume density of the multiple historical sampling points.

[0103] In practice, multiple historical sampling points preceding the target sampling point are identified, and the cumulative transmittance of the target sampling point is determined based on the sampling interval and volume density of these historical sampling points. ,in, The cumulative transmittance of the target sampling point. The volume density of historical sampling points. The sampling interval for historical sampling points is [value]. In the above formula, cumulative transmittance is [value]. Iterative calculations are used to ensure numerical stability.

[0104] Step 10344: Multiply the opacity, the cumulative transmittance, and the color feature to obtain the first product result of the target sampling point, and sum the first product results of multiple sampling points on the image sampling path to obtain the rendering color.

[0105] In practice, the opacity, cumulative transmittance, and color characteristics are multiplied to obtain the first product result of the target sampling point. The rendering color is obtained by summing the first product of multiple sampling points along the image sampling path. The specific formula for calculating rendered colors is expressed as follows: ,in, To render colors, The total number of sampling points. The cumulative transmittance of the target sampling point. The opacity of the target sampling point. The color features of the target sampling point.

[0106] For example, image sampling path There are 128 sampling points. That is, in the above formula Calculate the volume density of each target sampling point sequentially. and color features Based on image sampling path Volume density of multiple sampling points and color features Determine the rendering color.

[0107] In addition, the formula for calculating rendered colors can also be expressed as: ,in, To render colors, Image sampling path Near-plane distance, Image sampling path The far-plane distance, To the target sampling point The cumulative transmittance, To the target sampling point volume density, To the target sampling point Color characteristics.

[0108] Step 10345: Multiply the opacity, the cumulative transmittance, and the target sampling point to obtain the second product result of the target sampling point, and sum the second product results of multiple sampling points on the image sampling path to obtain the rendering depth.

[0109] In practice, the opacity, cumulative transmittance, and target sampling point are multiplied to obtain the second product result of the target sampling point. The rendering depth is obtained by summing the second product results of multiple sampling points along the image sampling path. The specific formula for calculating rendering depth is expressed as follows: ,in, To render depth, The total number of sampling points. The cumulative transmittance of the target sampling point. The opacity of the target sampling point. This refers to the location corresponding to the target sampling point.

[0110] For example, image sampling path There are 128 sampling points. That is, in the above formula Calculate the volume density of each target sampling point sequentially. and color features Based on image sampling path Volume density of multiple sampling points and color features Determine the rendering depth.

[0111] In addition, the formula for calculating rendering depth can also be expressed as: ,in, To render depth, Image sampling path Near-plane distance, Image sampling path The far-plane distance, To the target sampling point The cumulative transmittance, To the target sampling point volume density, The target sampling point.

[0112] The above scheme determines the target sampling point from multiple sampling points along the image sampling path. The sampling interval between the target sampling point and the previous sampling point is determined based on the image sampling path. The opacity of the target sampling point is determined based on the sampling interval and volume density. Multiple historical sampling points preceding the target sampling point are identified, and the cumulative transmittance of the target sampling point is determined based on the sampling interval and volume density of these historical sampling points. The opacity, cumulative transmittance, and color features are multiplied to obtain the first product result for the target sampling point. The first product results of multiple sampling points along the image sampling path are then summed to obtain the rendering color. The opacity, cumulative transmittance, and target sampling point are multiplied to obtain the second product result for the target sampling point. The second product results of multiple sampling points along the image sampling path are then summed to obtain the rendering depth. This enables deep fusion of image data, radar data, and ultrasonic data, resulting in more accurate and realistic rendering density and color that better reflects the actual conditions of the track infrastructure.

[0113] In some embodiments, the pre-training process of the scene generation model includes:

[0114] Step 103A: Obtain mesh features at different resolutions; wherein the mesh features include: first resolution mesh features, second resolution mesh features and third resolution mesh features.

[0115] In practice, three grid features are acquired: a first-resolution grid feature, a second-resolution grid feature, and a third-resolution grid feature. The first-resolution grid feature is a low-resolution grid feature, the second-resolution grid feature is a medium-resolution grid feature, and the third-resolution grid feature is a high-resolution grid feature.

[0116] To improve training efficiency and reconstruction quality, embodiments of this disclosure employ multi-resolution progressive training. Each training phase corresponds to different resolution grid features and loss weights. After each training phase, the current feature grid is upsampled to the next resolution (trilinear interpolation initialization), and training continues.

[0117] Step 103B: Train the initial simulation model using the first resolution grid features to obtain a first total loss function; update the model parameters of the initial simulation model based on the first total loss function to obtain a first updated simulation model; and record the first iteration number of the initial simulation model.

[0118] In practice, Figure 3 This is a flowchart illustrating multi-resolution progressive training as an embodiment of this disclosure. Figure 3 As shown, during the initialization phase, a minimum resolution (e.g., 16) is constructed. 3 The model learns a feature grid (first-resolution grid feature G0). In the first training phase, the initial simulation model is trained using the first-resolution grid feature G0. During this phase, the model primarily learns the global geometry of the scene and the approximate spatial distribution of defects. Physical constraint loss plays a dominant role in this phase, guiding the model to form preliminary defect areas near the sampling points.

[0119] Step 103C: In response to the first iteration number being greater than the preset iteration number, the first update simulation model is trained using the second resolution grid features to obtain a second total loss function. The model parameters of the first update simulation model are updated based on the second total loss function to obtain a second update simulation model, and the second iteration number of the first update simulation model is recorded.

[0120] In specific implementation, such as Figure 3 As shown, when the number of the first iteration exceeds the preset number of iterations (reaching iteration number N1), the second training phase begins. In the second training phase, the first resolution grid feature G0 is upsampled to obtain the second resolution grid feature G1 (e.g., 64). 3 The first update simulation model is trained using both the first-resolution mesh feature G0 and the second-resolution mesh feature G1. In the second training phase, the model begins to refine the geometric boundaries, and the color loss function... The effect of [the technology] gradually increases to better fit surface textures.

[0121] Step 103D: In response to the second iteration number being greater than the preset iteration number, the second update simulation model is trained using the third resolution grid features to obtain a third total loss function. The model parameters of the second update simulation model are updated based on the third total loss function to obtain a third update simulation model, and the third iteration number of the second update simulation model is recorded.

[0122] In specific implementation, such as Figure 3 As shown, when the second iteration count exceeds the preset iteration count (reaching iteration count N2), the third training phase begins. In the third training phase, the second-resolution grid feature G1 is upsampled to obtain the third-resolution grid feature G2 (e.g., 256). 3 The second-update simulation model is primarily trained using the third-resolution mesh feature G2. In the third training phase, the model focuses on recovering geometric details and high-frequency textures to generate the final high-fidelity result.

[0123] Step 103E: In response to the third iteration number being greater than the preset iteration number, the third update simulation model is used as the scene generation model.

[0124] In practice, when the third iteration number is greater than the preset iteration number (reaching iteration number N3), the training ends, and the third updated simulation model obtained by multi-resolution progressive training is used as the scene generation model.

[0125] In the three training phases described above, the Adam optimizer was used to update the model parameters, with an initial learning rate of 5×10⁻⁶. 4 Exponential decay strategy: multiply by 0.5 every 10,000 iterations. Gradient clipping threshold is set to 1.0.

[0126] Figure 4 This is a flowchart illustrating the training algorithm for the scene generation model in an embodiment of this disclosure. Figure 4 As shown, the image data Camera parameters, sampling points and the actual density value corresponding to the sampling point The initial simulation model (NeRF network) is input, the multilayer perceptron (MLP) of the NeRF network is initialized, and the lowest resolution grid feature G0 is obtained. The training stage S, resolution, and loss weights are set. The NeRF network is trained in the S-stage using the lowest resolution grid feature G0. Specifically, pixel rays (P1xe1 Rays) are randomly sampled, and the grid feature G0 is sampled. n The point features are obtained, and the predicted density and predicted color are determined through forward propagation. The total loss function is then determined based on the predicted density and predicted color. The updated simulation model is obtained by updating the model parameters through backpropagation. After completing the S-stage training, the next resolution grid features G are obtained through upsampling. n+1 Then, proceed with the training for the S+1 phase.

[0127] The above scheme obtains grid features at different resolutions; these grid features include: first-resolution grid features, second-resolution grid features, and third-resolution grid features. The first-resolution grid features are used to train the initial simulation model to obtain a first total loss function. Based on the first total loss function, the model parameters of the initial simulation model are updated to obtain a first updated simulation model, and the first iteration number of the initial simulation model is recorded. When the first iteration number exceeds a preset iteration number, the second-resolution grid features are used to train the first updated simulation model to obtain a second total loss function. Based on the second total loss function, the model parameters of the first updated simulation model are updated to obtain a second updated simulation model, and the second iteration number of the first updated simulation model is recorded. When the second iteration number exceeds a preset iteration number, the third-resolution grid features are used to train the second updated simulation model to obtain a third total loss function. Based on the third total loss function, the model parameters of the second updated simulation model are updated to obtain a third updated simulation model, and the third iteration number of the second updated simulation model is recorded. When the third iteration number exceeds a preset iteration number, the third updated simulation model is used as the scene generation model. In this way, through progressive training, the model can be trained from establishing its geometric foundation to refining its boundaries, achieving fine-grained training and enabling the trained scene generation model to accurately predict rendering colors and rendering depth.

[0128] In some embodiments, step 103B includes:

[0129] Step 103Ba: Determine the first predicted density and the first predicted color based on the first resolution grid features using the initial simulation model.

[0130] In practice, the first resolution grid features are input into the initial simulation model, and the initial simulation model is used to determine the first prediction density and the first prediction color based on the first resolution grid features.

[0131] Step 103Bb: Determine the first color loss function based on the first predicted color and the actual color corresponding to the first resolution grid feature.

[0132] In practice, the first color loss function is determined based on the first predicted color and the actual color corresponding to the first resolution grid feature.

[0133] ,

[0134] in, Let the first color loss function be used. The set of light rays along the image sampling path. The first predicted color, The actual color corresponding to the first-resolution grid feature. This is the square operation for the L2 norm.

[0135] For each batch of sampled rays (in this embodiment, the batch size is 1024 rays), the color loss function (L2 loss function) between the first predicted color and the actual color is calculated, where, It can be the three primary color (Red, Green, Blue, or RGB) values of the pixel corresponding to the first predicted color. It can be the RGB value of the pixel corresponding to the actual color.

[0136] Step 103Bc: Determine the first sparse loss function based on the first predicted density and the actual density value corresponding to the first resolution grid feature.

[0137] In practice, the first sparse loss function is determined based on the first predicted density and the actual density values corresponding to the first resolution grid features.

[0138] ,

[0139] in, For the first sparse loss function, The total number of sampling points. Sampling points The first predicted density, This represents the actual density value of the sampling points corresponding to the first-resolution grid feature. This is the square operation for the L2 norm.

[0140] have A sampling point from ground penetrating radar or ultrasound. Each sampling point is associated with an actual density value. The actual density value is mapped from the physical properties of the detection signal (e.g., radar reflection intensity, ultrasonic propagation time anomaly). A higher actual density value indicates a suspected defect (void) center, while a lower actual density value indicates healthy material.

[0141] The NeRF model is forced to predict volume density close to the actual density value at the sampling point coordinates. This is equivalent to providing sparse but precise anchor points for the neural network in three-dimensional space, guiding the model to generate high-density (solid) or low-density (void) regions in the correct locations.

[0142] For 187 sampling points The sparse loss function (L2 loss function) between the first predicted density and the actual density value of the network is calculated. In the above formula, The sparse loss function forces the network to make its predicted density values at sampling points approximate the actual density values measured by the physical measurements.

[0143] Step 103Bd: Determine the gradient parameters of the first predicted density with respect to spatial coordinates, and determine the first continuity loss function based on the gradient parameters.

[0144] In practice, the gradient parameters of the first prediction density with respect to spatial coordinates are determined, and the first continuity loss function is determined based on the gradient parameters.

[0145] ,

[0146] in, For the first continuous loss function, For multiple spatial points, The gradient parameters of the first predicted density with respect to spatial coordinates are... This is the square operation for the L2 norm.

[0147] Based on the physical properties of engineering materials such as concrete, defects (e.g., voids) are typically continuous regions with relatively smooth boundaries, and do not produce isolated, highly irregular density variations within the material. A gradient smoothing constraint is applied to the density field of the scene space to penalize unreasonable and drastic density changes that occur in regions lacking data support.

[0148] As a strong prior, the continuity loss function effectively suppresses the tendency of neural networks to produce unrealistic geometry in sparse data regions, ensuring that the reconstructed hole shapes are natural, continuous, and in line with physical common sense.

[0149] For the 5000 pre-sampled spatial points The gradient of the volume density with respect to spatial coordinates at each spatial point is calculated (obtained through automatic differentiation) and L2 regularization is applied to obtain the continuity loss function. The gradient calculation uses PyTorch's `torch.autograd.grad` function, and the created image is set to true (`create_graph=True`) to preserve the computational graph for backpropagation.

[0150] Step 103Be: Summing the color loss function, the sparse loss function, and the continuous loss function to obtain the total loss function.

[0151] In practice, the first total loss function is obtained by weighted summation of the first color loss function, the first sparse loss function, and the first continuous loss function.

[0152] ,

[0153] in, For the first total loss function, The first weight corresponds to the first color loss function. Let the first color loss function be used. The second weight corresponds to the first sparse loss function. For the first sparse loss function, The third weight corresponds to the first continuous loss function. This is the first continuous loss function.

[0154] In some scenarios, when the collected data includes actual sparse depth information provided by sensors such as LiDAR, a first depth monitoring loss function is determined based on the first predicted density and the actual sparse depth. A first total loss function is obtained by weighted summation of the first color loss function, the first depth monitoring loss function, the first sparse loss function, and the first continuity loss function. ,in, The fourth weight corresponding to the first depth monitoring loss function. This is the first depth monitoring loss function. This embodiment does not use depth monitoring, therefore... .

[0155] In some scenarios, the weights corresponding to each loss function can be dynamically adjusted at different training stages. For example, when training the model using low-resolution grid features in the early stages, and A larger size is needed to establish the geometric foundation; when training the model using high-resolution mesh features in later stages, the size should be increased. To optimize the appearance. The weights are dynamically adjusted during training, as shown in Table 2.

[0156] Table 2. Weights corresponding to each loss function in each training phase.

[0157]

[0158] The above scheme utilizes an initial simulation model to determine the first predicted density and the first predicted color based on the first resolution grid features. A color loss function is determined based on the first predicted color and the actual color corresponding to the first resolution grid features. A sparse loss function is determined based on the first predicted density and the actual density value corresponding to the first resolution grid features. The gradient parameters of the first predicted density with respect to spatial coordinates are determined, and a continuity loss function is determined based on these gradient parameters. The color loss function, sparse loss function, and continuity loss function are summed to obtain the total loss function. Thus, the total loss function encompasses the color loss function, sparse loss function, and continuity loss function, making the obtained total loss function more comprehensive and accurate, allowing for more precise adjustment of model parameters.

[0159] Through the above embodiments, image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction are acquired. An image sampling path is determined from the image data, and the position coordinates and density labels of each sampling point in the track facility are determined based on the radar and ultrasonic data. Using a pre-trained scene generation model, the rendering density and rendering color are determined based on the image sampling path, position coordinates, and density labels. A 3D scene representation of the track facility is generated based on the position coordinates, rendering density, and rendering color, and the 3D scene representation is rendered to obtain a rendered image of the track facility. This enables deep fusion of image data, radar data, and ultrasonic data, making the determined rendering density and rendering color more accurate and consistent with the actual situation of the track facility. This effectively improves the realism and visualization effect of the 3D scene representation, resulting in a high-quality rendered image of the track facility, which can then be used to accurately determine whether there is a risk of voids inside the track facility.

[0160] Every 1000 iterations, the Peak Signal-to-Noise Ratio (PSNR) and the average density of the hole regions are calculated on the validation set (randomly selected 10% of the images, i.e., 14 images). PSNR is used to monitor image reconstruction quality, and the average density of the hole regions is used to monitor whether physical constraints are met. The total number of training iterations is 50,000, which takes approximately 8 hours on a Graphics Processing Unit (GPU).

[0161] After training, plot the curves for each loss function to confirm that the scene generation model has converged. A typical convergence value is... , , .

[0162] In the key monitoring area (K12+380~K12+400), , , Within the corresponding mileage, the density field is uniformly sampled at intervals of 0.02m to obtain a three-dimensional density grid. The Marching Cubes algorithm is used to extract isosurfaces, with an isosurface threshold set to... The isosurface threshold, falling between low-density voids (0.05) and high-density solids (10.0), effectively separates voids from the lining. The extracted mesh model contains voids and part of the lining, which are then spatially clipped (retaining only...). And density The region) is given an empty, independent mesh.

[0163] The following post-processing steps are performed on the punctured mesh: hole filling, smoothing, and simplification. Hole filling utilizes the Close Holes function of the mesh editing tool to fill small holes caused by insufficient sampling. Smoothing applies Laplacian smoothing (iteration number 5, smoothing coefficient...). Remove staircase noise. Simplification involves keeping the number of mesh faces below 100,000 for easier engineering applications.

[0164] The geometric parameters of the voids are calculated using the processed mesh, including volume, bounding box, center location, and minimum protective layer thickness. Specifically, these parameters are calculated from multiple sampling points. Determine the coordinates of the first vertex of the cavity region. Second vertex coordinates and the coordinates of the third vertex The volume of the cavity is calculated using grid integration (divergence theorem). ,in, The coordinates of the triangle vertices in the cavity region are given, and the calculated cavity volume is 1.23 m³. The minimum / maximum coordinates of all vertices of the boundary frame are calculated, yielding a longitudinal length of 2.85 m, a circumferential width of 1.92 m, and a radial height of 0.28 m. Using the average coordinates of the boundary frame vertices as the center position, the mileage is K12+388.3, the circumferential angle is approximately 89° (arch top), and the radial distance is 2.75+0.32=3.07 m. The minimum distance from the lower boundary point of the cavity to the outer surface of the lining (radius 3.075 m) is calculated as the minimum protective layer thickness, which is 0.12 m.

[0165] The above-described embodiment was validated, and the results showed that of the five boreholes, four were located inside the reconstructed cavity, and one was located at the boundary. Endoscopic observations closely matched the reconstructed morphology: the cavity actually existed, and the boundary location deviated from the model prediction by less than 0.1m.

[0166] A 0.5m × 0.5m window was excavated near K12+388 to directly measure the radial height and circumferential width of the cavity. The measured values were: height 0.25~0.30m, width approximately 2.1m, consistent with the reconstructed results (height 0.28m, width 1.92m). The width error was mainly due to the limited excavation area, which did not reveal the complete boundary. The reconstructed geometric parameters were compared with the verification values, and the results are shown in Table 3.

[0167] Table 3 Comparison of Reconstructed and Verified Geometric Parameters

[0168]

[0169] In the same section, a cavity model was reconstructed using traditional Kriging interpolation based on the same 187 probe points (using only probe points, not images). The reconstruction results showed that the cavity morphology was a regular ellipsoid with a volume of 1.62 m³, an error of 32%, and significant discontinuity when stitched with the surface model. The method of this invention significantly outperforms traditional methods in terms of geometric accuracy, morphological realism, and integration with the surface model.

[0170] Based on the reconstructed 3D model of the cavity, the grouting volume was accurately calculated (cavity volume 1.23m³, considering 10% loss, 1.35m³ of grout is required). The grouting hole layout was optimized: the original plan required 8 grouting holes (grid layout), but now, based on the cavity morphology, only 5 holes are needed (critical locations), saving 3 drilling holes and resulting in a direct cost saving of 24,000 yuan. Post-grouting retesting verified that the cavity was completely filled.

[0171] Since the minimum protective layer thickness of 0.12m is greater than the standard requirement of 0.10m, the void is determined not to affect structural safety and no additional reinforcement is required. This avoids unnecessary reinforcement measures and saves approximately 150,000 yuan in costs.

[0172] The cavity mesh model is registered with the tunnel laser scanning inner surface model, imported into a 3D platform, and a digital twin model with disease information is generated for long-term health monitoring and operation and maintenance management.

[0173] This disclosure successfully reconstructed a three-dimensional geometric model of the cavity behind the tunnel lining, achieving high accuracy, physical reliability, and measurability. The data processing strictly followed the technical solution of this disclosure, verifying the effectiveness and superiority of the method.

[0174] The main inventive points of this disclosure are as follows:

[0175] (1) Fusion mechanism of sparse physical detection data and neural radiation field: Non-image sensing data such as ground penetrating radar and ultrasound are fused together through target density mapping and sparse loss function. As a specific method for embedding hard constraints into the NeRF optimization process, this is the key to realizing the reconstruction from pure appearance to internal perception.

[0176] (2) Physical regularization constraints based on material continuity: continuity loss function based on density field gradient ( As prior knowledge, this ensures that the reconstructed internal defect geometry (e.g., voids) in data-scarce areas is continuous, natural, and conforms to the physical properties of engineering materials, avoiding unrealistic geometric abrupt changes.

[0177] (3) Multi-resolution progressive training framework for reconstruction of hidden diseases: staged training strategy and structure and usage of learnable multi-resolution feature grid G. The staged training strategy significantly improves the convergence speed, stability and final reconstruction accuracy of the model under sparse data conditions through coarse-to-fine learning.

[0178] (4) Overall architecture and loss function of implicit neural radiation field under physical constraints: The color loss function of standard NeRF is used. sparse loss function Continuous loss function (and optional depth monitoring loss function) The total loss function is formed by combining these two methods. The complete design and its overall technical solution for three-dimensional reconstruction of hidden defects in rail transit facilities.

[0179] (5) Transformation and application methods from implicit field to explicit disease model: The specific process of generating an explicit three-dimensional mesh model using a trained NeRF model through isosurface extraction algorithms (e.g., Marching Cubes), and the application methods of using the model for disease volume calculation, spatial positioning and morphological analysis.

[0180] The technical effects achieved by the embodiments of this disclosure are as follows:

[0181] (1) High reliability of reconstructed geometry and physics: Through the continuous loss function, the morphology of the disease body generated between sparse detection points is natural and smooth, avoiding the unreal geometry such as single or spikes produced by traditional interpolation methods. The results are more in line with engineering practice and can be directly used for mechanical analysis and risk assessment.

[0182] (2) Achieving seamless soft fusion of multi-source data: Unlike hard fusion, which involves reconstructing each data source separately and then stitching them together using Boolean, this embodiment of the present disclosure unifies visual signals and physical detection signals under an implicit representation framework through a differentiable loss function during the optimization process. The reconstructed model is a geometrically continuous and consistent whole from the inside out, without any seams or contradictions.

[0183] (3) High accuracy and measurability of the output model: The continuous representation based on the neural radiation field can capture sub-voxel level details. The extracted mesh model has accurate vertex coordinates, which can accurately calculate key engineering indicators such as the volume of the defect (e.g., the number of cubic meters of voids), surface area, and minimum protective layer thickness, providing reliable input for digital maintenance.

[0184] (4) Low data requirements and strong applicability: It only requires a small number (a few to dozens) of internal detection points and some surface images to work, which greatly reduces the reliance on expensive and comprehensive CT scans or dense borehole sampling. It has extremely high practical value and operability in field operation and maintenance.

[0185] In a simulated tunnel lining void detection scenario, using only 5 radar detection points and 20 surface images, it is expected to reconstruct a 3D model with a void volume error of less than 10% and a boundary positioning accuracy better than 2 cm, which is significantly better than traditional interpolation methods (volume error is usually >30%).

[0186] Alternative solutions that can achieve the objectives of the embodiments of this disclosure are also within the scope of this disclosure. For example, alternative solutions include:

[0187] (1) Replacement of physical constraint form: In addition to L2 smoothing constraint on density field gradient, total variation regularization can be used to promote piecewise smoothing, or more complex priors based on material damage mechanics model can be introduced.

[0188] (2) Alternatives to neural network representations: Implicit scene representations are not limited to standard MLPs and can adopt more efficient architectures based on tensor decomposition (e.g., TensoRF) or hash grids (e.g., InstantNGP), as long as they are compatible with physical constraint loss.

[0189] (3) Alternatives to data fusion methods: In addition to converting sparse detection data into target density values for constraint, it can also be converted into virtual, extremely sparse depth points or virtual, extremely low-resolution voxel labels, and supervised by the corresponding loss terms.

[0190] (4) Alternative training strategies: Multi-resolution progressive training can be replaced by a course learning strategy, such as gradually increasing the resolution of the training images or gradually relaxing the weights of physical constraints.

[0191] While these alternatives differ in their specific implementations, they all embody the core inventive concept of fusing visual and physical sensing data using neural implicit representations and solving the ill-conditioned reconstruction problem under sparse data by introducing physical prior constraints.

[0192] The embodiments disclosed herein can be directly applied to the digital detection and archiving of internal hidden defects in concrete structures such as railway tunnels, bridge piers, and retaining walls, providing a precise three-dimensional geometric basis for condition assessment, maintenance and reinforcement design, and engineering quantity calculation.

[0193] It should be noted that the method of this disclosure embodiment can be executed by a single device, such as a computer or server. The method of this embodiment can also be applied to a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method of this disclosure embodiment, and the multiple devices will interact with each other to complete the method described.

[0194] It should be noted that the above description describes some embodiments of this disclosure. Other embodiments are within the scope of the appended claims of this disclosure. In some cases, the actions or steps described in the claims of this disclosure may be performed in a different order than that shown in the above embodiments and still achieve the desired results. In addition, the processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0195] Based on the same inventive concept, corresponding to any of the above embodiments, this disclosure also provides a three-dimensional scene generation device for track facilities.

[0196] refer to Figure 5 The three-dimensional scene generation device for the track facility includes:

[0197] The acquisition module 301 is configured to acquire image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction;

[0198] The first determining module 302 is configured to determine an image sampling path from the image data and to determine the position coordinates and density label of each sampling point in the track facility based on the radar data and the ultrasonic data.

[0199] The second determining module 303 is configured to use a pre-trained scene generation model to determine the rendering density and rendering color based on the image sampling path, the position coordinates and the density label.

[0200] The 3D scene generation module 304 is configured to generate a 3D scene representation of the track facility based on the position coordinates, the rendering density, and the rendering color, and to render the 3D scene representation to obtain a rendered image of the track facility.

[0201] In some embodiments, the first determining module 302 includes:

[0202] The image sampling path determination unit is configured to determine the center point coordinates and the viewing direction from the image data, and to determine the image sampling path based on the center point coordinates and the viewing direction;

[0203] The first density parameter determination unit is configured to determine the lining depth of the track facility based on the radar data, determine the first position coordinates of each radar sampling point based on the lining depth, and determine the first density value and first density label of each radar sampling point based on the amplitude parameter in the radar data.

[0204] The second density parameter determination unit is configured to convert the grid position of each ultrasonic sampling point into corresponding second position coordinates, determine the lining thickness of the track facility based on the ultrasonic data, and determine the second density value and second density label of each ultrasonic sampling point based on the lining thickness.

[0205] In some embodiments, the apparatus further includes: a density parameter determination module; the density parameter determination module includes:

[0206] The borehole data acquisition unit is configured to acquire borehole data from borehole sampling points on the track facility.

[0207] The third density parameter determination unit is configured to determine the third density value of the borehole sampling point as a first preset value in response to the borehole state of the borehole data being a void state, and to determine the third density label of the borehole sampling point as a void label.

[0208] The third density parameter determination unit is configured to determine the third density value of the borehole sampling point as a second preset value in response to the borehole state of the borehole data being a dense state, and to determine the third density label of the borehole sampling point as a dense label.

[0209] In some embodiments, the second determining module 303 includes:

[0210] The encoding processing unit is configured to use a pre-trained scene generation model to perform linear interpolation on the density label to obtain multiple feature vectors, concatenate the multiple feature vectors to obtain concatenated features, and encode the position coordinates to obtain position encoded features.

[0211] The first splicing processing unit is configured to splice the splicing features and the position encoding features to obtain intermediate features and volume density;

[0212] The second splicing processing unit is configured to splice the intermediate feature and the position encoding feature to obtain the color feature;

[0213] The rendering parameter determination unit is configured to determine the rendering color based on the volume density and color features of multiple sampling points on the image sampling path, and to determine the rendering depth based on the volume density of multiple sampling points on the image sampling path.

[0214] In some embodiments, the rendering parameter determination unit includes:

[0215] The target sampling point determination subunit is configured to determine a target sampling point from a plurality of sampling points on the image sampling path;

[0216] An opacity determination subunit is configured to determine the sampling interval between the target sampling point and the previous sampling point based on the image sampling path, and to determine the opacity of the target sampling point based on the sampling interval and volume density of the target sampling point.

[0217] The cumulative transmittance determination subunit is configured to determine multiple historical sampling points prior to the target sampling point, and to determine the cumulative transmittance of the target sampling point based on the sampling interval and volume density of the multiple historical sampling points.

[0218] The rendering color determination subunit is configured to multiply the opacity, the cumulative transmittance, and the color features to obtain a first product result of the target sampling point, and to sum the first product results of multiple sampling points on the image sampling path to obtain the rendering color.

[0219] The rendering depth determination subunit is configured to multiply the opacity, the cumulative transmittance, and the target sampling point to obtain a second product result of the target sampling point, and to sum the second product results of multiple sampling points on the image sampling path to obtain the rendering depth.

[0220] In some embodiments, the apparatus further includes: a model training module, the model training module comprising:

[0221] The mesh feature acquisition unit is configured to acquire mesh features at different resolutions; wherein the mesh features include: a first resolution mesh feature, a second resolution mesh feature, and a third resolution mesh feature;

[0222] The first training unit is configured to train the initial simulation model using the first resolution grid features to obtain a first total loss function, update the model parameters of the initial simulation model based on the first total loss function to obtain a first updated simulation model, and record the first iteration number of the initial simulation model.

[0223] The second training unit is configured to, in response to the first iteration number being greater than a preset iteration number, train the first update simulation model using the second resolution grid features to obtain a second total loss function, update the model parameters of the first update simulation model based on the second total loss function to obtain a second update simulation model, and record the second iteration number of the first update simulation model.

[0224] The third training unit is configured to, in response to the second iteration number being greater than a preset iteration number, train the second update simulation model using the third resolution grid features to obtain a third total loss function, update the model parameters of the second update simulation model based on the third total loss function to obtain a third update simulation model, and record the third iteration number of the second update simulation model;

[0225] The scene generation model determination unit is configured to use the third update simulation model as the scene generation model in response to the third iteration number being greater than a preset iteration number.

[0226] In some embodiments, the first training unit includes:

[0227] The first prediction subunit is configured to determine a first prediction density and a first prediction color based on the first resolution grid features using the initial simulation model.

[0228] The color loss function determination sub-unit is configured to determine the first color loss function based on the first predicted color and the actual color corresponding to the first resolution grid feature.

[0229] The sparse loss function determines the sub-unit and is configured to determine the first sparse loss function based on the first predicted density and the actual density value corresponding to the first resolution grid feature.

[0230] A continuity loss function determination subunit is configured to determine the gradient parameters of the first prediction density with respect to spatial coordinates, and to determine the first continuity loss function based on the gradient parameters;

[0231] The total loss function determination sub-unit is configured to sum the first color loss function, the first sparse loss function, and the first continuous loss function to obtain the first total loss function.

[0232] For ease of description, the above apparatus is described in terms of its functions, divided into various modules. Of course, in implementing this disclosure, the functions of each module can be implemented in one or more software and / or hardware.

[0233] The apparatus described above is used to implement the three-dimensional scene generation method for the corresponding track facilities in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0234] Based on the same inventive concept, corresponding to the methods of any of the above embodiments, this disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for generating a three-dimensional scene of the track facility as described in any of the above embodiments.

[0235] Figure 6 This embodiment illustrates a more specific hardware structure of an electronic device, which may include a processor 1010, a memory 1020, an input / output interface 1030, a communication interface 1040, and a bus 1050. The processor 1010, memory 1020, input / output interface 1030, and communication interface 1040 are interconnected internally via the bus 1050.

[0236] The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this specification.

[0237] The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program code is stored in the memory 1020 and is called and executed by the processor 1010.

[0238] The input / output interface 1030 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components within the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touchscreens, microphones, various sensors, etc., while output devices may include displays, speakers, vibrators, indicator lights, etc.

[0239] The communication interface 1040 is used to connect the communication module (not shown in the figure) to enable communication between this device and other devices. The communication module can communicate via wired means (such as USB (Universal Serial Bus), network cable, etc.) or wireless means (such as mobile network, WIFI (Wireless Fidelity), Bluetooth, etc.).

[0240] Bus 1050 includes a pathway for transmitting information between various components of the device, such as processor 1010, memory 1020, input / output interface 1030, and communication interface 1040.

[0241] It should be noted that although the above-described device only shows the processor 1010, memory 1020, input / output interface 1030, communication interface 1040, and bus 1050, in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the embodiments of this specification, and not necessarily all the components shown in the figures.

[0242] The electronic devices described above are used to implement the three-dimensional scene generation method for the corresponding track facilities in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0243] Based on the same inventive concept, corresponding to the methods of any of the above embodiments, this disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the three-dimensional scene generation method for track facilities as described in any of the above embodiments.

[0244] The computer-readable medium of this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0245] The computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the three-dimensional scene generation method of the track facility as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0246] Based on the same inventive concept, corresponding to any of the above embodiments, this application also provides a computer program product, including computer program instructions. When the computer program instructions are run on a computer, the computer executes the three-dimensional scene generation method for track facilities as described in any of the above embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0247] It is understood that before using the technical solutions of the various embodiments in this disclosure, users will be informed of the type, scope of use, and usage scenarios of the personal information involved in an appropriate manner, and user authorization will be obtained.

[0248] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose, based on the prompt message, whether to provide personal information to the software or hardware such as electronic devices, applications, servers, or storage media performing the operations of this disclosed technical solution.

[0249] As an optional but not limited implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0250] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0251] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of this disclosure is limited to these examples; within the framework of this disclosure, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of this disclosure as described above, which are not provided in detail for the sake of brevity.

[0252] Additionally, to simplify the description and discussion, and to avoid obscuring the embodiments of this disclosure, the provided drawings may or may not show well-known power / ground connections to integrated circuit (IC) chips and other components. Furthermore, the apparatus may be shown in block diagram form to avoid obscuring the embodiments of this disclosure, and this also takes into account the fact that the details of implementation of these block diagram apparatuses are highly dependent on the platform on which the embodiments of this disclosure will be implemented (i.e., these details should be fully understood by those skilled in the art). While specific details (e.g., circuits) have been set forth to describe exemplary embodiments of this disclosure, it will be apparent to those skilled in the art that the embodiments of this disclosure can be implemented without these specific details or with variations thereof. Therefore, these descriptions should be considered illustrative rather than restrictive.

[0253] Although this disclosure has been described in conjunction with specific embodiments thereof, many substitutions, modifications, and variations of these embodiments will be apparent to those skilled in the art from the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may be used with the embodiments discussed.

[0254] This disclosure is intended to cover all such substitutions, modifications, and variations that fall within the broad scope of this disclosure. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the protection scope of this disclosure.

Claims

1. A method for generating a three-dimensional scene of a rail facility, characterized in that, The method includes: Acquire image data, radar data, and ultrasonic data of the track facilities in the cross-sectional direction; The image sampling path is determined from the image data, and the position coordinates and density labels of each sampling point in the track facility are determined based on the radar data and the ultrasonic data. Using a pre-trained scene generation model, the rendering depth and rendering color are determined based on the image sampling path, the location coordinates, and the density label; A three-dimensional scene representation of the track facility is generated based on the location coordinates, the rendering depth, and the rendering color, and the three-dimensional scene representation is rendered to obtain a rendered image of the track facility. The pre-training process of the scene generation model includes: Obtain grid features at different resolutions; wherein the grid features include: a first resolution grid feature, a second resolution grid feature, and a third resolution grid feature; wherein, the first resolution grid feature is constructed, the first resolution grid feature is upsampled to obtain the second resolution grid feature, and the second resolution grid feature is upsampled to obtain the third resolution grid feature; The initial simulation model is trained using the first resolution grid features to obtain a first total loss function. The model parameters of the initial simulation model are updated based on the first total loss function to obtain a first updated simulation model. The first iteration number of the initial simulation model is recorded. The initial simulation model is a NeRF network. In response to the first iteration number being greater than the preset iteration number, the first update simulation model is trained using the second resolution grid features to obtain a second total loss function. Based on the second total loss function, the model parameters of the first update simulation model are updated to obtain a second update simulation model, and the second iteration number of the first update simulation model is recorded. In response to the second iteration number being greater than the preset iteration number, the second update simulation model is trained using the third resolution grid features to obtain a third total loss function. The model parameters of the second update simulation model are updated based on the third total loss function to obtain a third update simulation model, and the third iteration number of the second update simulation model is recorded. If the third iteration number is greater than the preset iteration number, then the third update simulation model is used as the scene generation model; The step of training the initial simulation model using the first resolution grid features to obtain the first total loss function includes: The initial simulation model is used to determine the first prediction density and the first prediction color based on the first resolution grid features; The first color loss function is determined based on the first predicted color and the actual color corresponding to the first resolution grid feature. A first sparse loss function is determined based on the first predicted density and the actual density value corresponding to the first resolution grid feature; wherein the actual density value is obtained by mapping the physical properties of the probe signal; Determine the gradient parameters of the first predicted density with respect to spatial coordinates, and determine the first continuity loss function based on the gradient parameters; The first total loss function is obtained by summing the first color loss function, the first sparse loss function, and the first continuous loss function.

2. The method according to claim 1, characterized in that, The step of determining the image sampling path from the image data and determining the position coordinates and density label of each sampling point in the track facility based on the radar data and the ultrasonic data includes: The center point coordinates and viewing direction are determined from the image data, and the image sampling path is determined based on the center point coordinates and the viewing direction; The lining depth of the track facility is determined based on the radar data, the first position coordinates of each radar sampling point are determined based on the lining depth, and the first density value and first density label of each radar sampling point are determined based on the amplitude parameters in the radar data. The grid position of each ultrasonic sampling point is converted into the corresponding second position coordinates. The lining thickness of the track facility is determined based on the ultrasonic data, and the second density value and second density label of each ultrasonic sampling point are determined based on the lining thickness.

3. The method according to claim 1, characterized in that, The method further includes: Obtain borehole data from borehole sampling points on the track facility; wherein, the borehole data includes: the third position coordinates of the borehole sampling points; In response to the borehole status of the borehole data being a void state, the third density value of the borehole sampling point is determined to be a first preset value, and the third density label of the borehole sampling point is determined to be a void label; In response to the borehole status being a dense state in the borehole data, the third density value of the borehole sampling point is determined to be a second preset value, and the third density label of the borehole sampling point is determined to be a dense label.

4. The method according to claim 1, characterized in that, The step of using a pre-trained scene generation model to determine rendering depth and rendering color based on the image sampling path, the position coordinates, and the density label includes: Using a pre-trained scene generation model, the density labels are linearly interpolated to obtain multiple feature vectors. The multiple feature vectors are concatenated to obtain concatenated features, and the position coordinates are encoded to obtain position encoded features. The concatenation features and the position encoding features are concatenated to obtain intermediate features and volume density; The color feature is obtained by concatenating the intermediate feature and the positional encoding feature. The rendering color is determined based on the volume density and color features of multiple sampling points on the image sampling path, and the rendering depth is determined based on the volume density of multiple sampling points on the image sampling path.

5. The method according to claim 4, characterized in that, The step of determining the rendering color based on the volume density and color features of multiple sampling points along the image sampling path, and determining the rendering depth based on the volume density of multiple sampling points along the image sampling path, includes: Determine the target sampling point from multiple sampling points along the image sampling path; The sampling interval between the target sampling point and the previous sampling point is determined based on the image sampling path, and the opacity of the target sampling point is determined based on the sampling interval and volume density of the target sampling point. Determine multiple historical sampling points preceding the target sampling point, and determine the cumulative transmittance of the target sampling point based on the sampling interval and volume density of the multiple historical sampling points; The opacity, the cumulative transmittance, and the color feature are multiplied to obtain the first product result of the target sampling point, and the first product results of multiple sampling points on the image sampling path are summed to obtain the rendering color. The opacity, the cumulative transmittance, and the position corresponding to the target sampling point are multiplied to obtain the second product result of the target sampling point, and the second product results of multiple sampling points on the image sampling path are summed to obtain the rendering depth.

6. A three-dimensional scene generation device for track facilities, characterized in that, include: The acquisition module is configured to acquire image data, radar data, and ultrasonic data of the track facility in the cross-sectional direction; The first determining module is configured to determine the image sampling path from the image data and determine the position coordinates and density label of each sampling point in the track facility based on the radar data and the ultrasonic data. The second determining module is configured to use a pre-trained scene generation model to determine the rendering depth and rendering color based on the image sampling path, the position coordinates, and the density label. The 3D scene generation module is configured to generate a 3D scene representation of the track facility based on the position coordinates, the rendering depth, and the rendering color, and to render the 3D scene representation to obtain a rendered image of the track facility. The device further includes: a model training module, the model training module comprising: A mesh feature acquisition unit is configured to acquire mesh features at different resolutions; wherein the mesh features include: a first resolution mesh feature, a second resolution mesh feature, and a third resolution mesh feature; wherein the first resolution mesh feature is constructed, the first resolution mesh feature is upsampled to obtain the second resolution mesh feature, and the second resolution mesh feature is upsampled to obtain the third resolution mesh feature; The first training unit is configured to train an initial simulation model using the first resolution grid features to obtain a first total loss function, update the model parameters of the initial simulation model based on the first total loss function to obtain a first updated simulation model, and record the first iteration number of the initial simulation model; wherein, the initial simulation model is a NeRF network; The second training unit is configured to, in response to the first iteration number being greater than a preset iteration number, train the first update simulation model using the second resolution grid features to obtain a second total loss function, update the model parameters of the first update simulation model based on the second total loss function to obtain a second update simulation model, and record the second iteration number of the first update simulation model. The third training unit is configured to, in response to the second iteration number being greater than a preset iteration number, train the second update simulation model using the third resolution grid features to obtain a third total loss function, update the model parameters of the second update simulation model based on the third total loss function to obtain a third update simulation model, and record the third iteration number of the second update simulation model; The scene generation model determination unit is configured to use the third update simulation model as the scene generation model in response to the third iteration number being greater than a preset iteration number. The first training unit includes: The first prediction subunit is configured to determine a first prediction density and a first prediction color based on the first resolution grid features using the initial simulation model. The color loss function determination sub-unit is configured to determine the first color loss function based on the first predicted color and the actual color corresponding to the first resolution grid feature. The sparse loss function determination sub-unit is configured to determine a first sparse loss function based on the first predicted density and the actual density value corresponding to the first resolution grid feature; wherein the actual density value is obtained by mapping the physical properties of the probe signal; A continuity loss function determination subunit is configured to determine the gradient parameters of the first prediction density with respect to spatial coordinates, and to determine the first continuity loss function based on the gradient parameters; The total loss function determination sub-unit is configured to sum the first color loss function, the first sparse loss function, and the first continuous loss function to obtain the first total loss function.

7. An electronic device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor, when executing the program, implements the method as claimed in any one of claims 1 to 5.

8. A non-transitory computer-readable storage medium, characterized in that, The non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform the method described in any one of claims 1 to 5.