A method for inverse calculation of road structure modulus based on large model and reinforcement learning
By combining large language models with reinforcement learning, the contradiction between computational efficiency and global convergence capability in road modulus inverse calculation is resolved, achieving high-precision and reliable modulus inverse calculation, which is applicable to modulus inverse calculation of structures such as asphalt pavement and semi-rigid base pavement.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TONGJI UNIV
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for back-calculating pavement modulus present a contradiction between computational efficiency and global convergence capability. They lack intelligent initial value generation and physical constraint execution mechanisms for the optimization process, resulting in insufficient back-calculation accuracy and adaptability.
By employing a combination of large language model guidance and reinforcement learning, a technical closed loop is constructed through dual-mode input, initial estimation, positive mechanical modeling of partial differential equations, iterative optimization of PPO reinforcement learning, and periodic guidance of large language model, thereby realizing the intelligent back calculation of pavement structure modulus.
It significantly improves inversion accuracy and convergence reliability, adapts to various structural types, reduces computational complexity and enhances adaptability to non-standard operating conditions, and supports efficient modulus inversion calculation for natural language description and multi-layer systems.
Smart Images

Figure CN121936036B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of road engineering technology and artificial intelligence, specifically involving a method for back-calculating the modulus of road structures based on large models and reinforcement learning, which is applicable to the back-calculation of the modulus of structures such as asphalt pavement and semi-rigid base pavement. Background Technology
[0002] The elastic modulus of each structural layer of a pavement is a core parameter characterizing the pavement's load-bearing capacity, assessing its structural health, and predicting its service life. Accurately obtaining the modulus of each layer is crucial for pavement structure evaluation and maintenance decisions. While traditional indoor testing methods (core sampling, laboratory testing) can directly obtain material parameters, they are destructive, costly, and time-consuming, making them unsuitable for large-scale road network evaluations.
[0003] Falling weight deflectometer (FWD) has become the industry standard technology for non-destructive testing of pavements. FWD applies pulsed loads to the pavement to simulate vehicle movement, measuring the resulting deflection basin to provide valuable information about the overall structural condition of the pavement. Modulus inverse calculation based on FWD deflection data is essentially an inverse optimization problem: finding the material modulus parameters of each layer that minimize the difference between theoretically calculated and measured deflections.
[0004] Existing methods for back-calculating pavement modulus each have their limitations:
[0005] (1) Gradient-type methods (such as Newton-Raphson combined with Levenberg-Marquardt damping) converge quickly, but are prone to getting trapped in local optima due to the inherent non-convexity of the deflection-modulus relationship, and the convergence result is highly dependent on the initial parameter estimation.
[0006] (2) Database search method is efficient for pre-calculated working conditions, but it requires a lot of storage space and cannot handle unregistered working conditions;
[0007] (3) Heuristic optimization algorithms (genetic algorithm GA, particle swarm optimization PSO, etc.) can explore the global solution space, but rely on a large number of repeated forward analysis calculations, which are computationally expensive, and the inverse calculation accuracy of multi-layer systems (more than 3 layers) is significantly reduced.
[0008] (4) Neural network methods provide near real-time predictions, but require a large amount of high-quality training data, lack physical interpretability, and have insufficient generalization ability under limited training data conditions.
[0009] The core contradiction facing existing technologies lies in the fundamental trade-off between computational efficiency and global convergence capability, and the lack of intelligent initial value generation and physical constraint enforcement mechanisms for the optimization process. Currently, there is no method that can simultaneously achieve a harmonious balance between global optimization capability, computational efficiency, intelligent initial estimation, and physical constraint guarantees. Summary of the Invention
[0010] To address the shortcomings of existing technologies, this invention proposes a method for back-calculating pavement structure modulus based on large language model guidance and reinforcement learning. By constructing a technical closed loop of "dual-mode input—large language model analysis and initial estimation—partial differential equation forward mechanical modeling—PPO reinforcement learning iterative optimization—large language model periodic guidance—convergence judgment and output", the method achieves an intelligent upgrade of pavement structure modulus back-calculation, significantly improving inversion accuracy, convergence reliability, and adaptability to multiple structure types.
[0011] Technical solution
[0012] A method for inverse calculation of road structure modulus based on large models and reinforcement learning includes the following steps:
[0013] Step 101: Obtain pavement structure information and measured deflection basin data, and convert them into a unified structured parameter vector through a dual-mode input interface. The dual-mode input interface includes natural language description mode A and structured numerical input mode B.
[0014] Step 102: Send the structured parameters obtained in Step 101 to the pre-trained large language model. The large language model will then perform road surface type recognition, structural layer parameter extraction, and engineering knowledge-driven initial modulus estimation to generate an initial modulus vector. When the large language model fails, the system automatically reverts to an empirical formula based on the characteristic parameters of the deflection basin.
[0015] Step 103: Based on the initial modulus vector generated in Step 102, construct a two-dimensional axisymmetric finite element forward mechanical model. Use a partial differential equation (PDE) solver to calculate the theoretical deflection values at each measuring point, forming a theoretical deflection basin. ;
[0016] Step 104: Using theoretical deflection basins Compared with the measured deflection basin The deviations between them are used to construct the state vector of the reinforcement learning agent. The proximal policy optimization (PPO) algorithm is used to drive the agent to output a four-dimensional modulus relative adjustment action, and the modulus of each structural layer is iteratively adjusted.
[0017] Step 105: Set up fusion center deflection matching reward in the reinforcement learning agent. Bent basin shape reward Shape index rewards Progress Rewards Composite reward function This is used to guide the agent in generating effective modulus adjustment action suggestions;
[0018] Step 106: During the PPO optimization process, the large language model is periodically called every 5 training rounds to construct prompt words containing the current modulus value, optimization history and convergence status, and to obtain the optimization direction guidance provided by the large language model. After verification, the PPO action distribution is modulated to prevent the optimization from getting stuck in local optima.
[0019] Step 107: Based on the action suggestions from Steps 104 to 106, press... Update the modulus parameters of each layer, re-execute the finite element forward calculation and input the results into the reinforcement learning agent for the next round of training, and iterate until the convergence condition is met.
[0020] Step 108: After convergence, output the optimal pavement structure layer modulus inversion results, the comparison results of the corresponding predicted deflection values and measured values, error evaluation indicators, and the layer sensitivity coefficient and modulus variation coefficient used for engineering decision support.
[0021] The processing method for the dual-mode input interface mentioned in step 101 is as follows:
[0022] Natural Language Description Mode A: The road structure description text input by the user in natural language is parsed by a large language model to extract the road type and the thickness of each structural layer. ( (Corresponding to the thicknesses of the asphalt concrete layer, base course, and subbase course, respectively), measured deflection basin. and load pressure P, and mapped to the standard structured parameter set; among which, the measured deflection basin as follows:
[0023]
[0024] The values are deflection values at the center point and offsets of 200mm, 300mm, 600mm, 900mm, 1200mm, and 1500mm, respectively, in μm.
[0025] Structured numerical input mode B: Users directly input measured deflection data, layer thickness and load parameters in a standardized format, and the system directly generates parameter vectors after data verification;
[0026] Both input modes ultimately converge into a unified set of parameters, including: the thickness vectors of each structural layer. Load pressure With the radius of the bearing plate Poisson's ratio and measured deflection basin .
[0027] In step 102, the large language model generates initial modulus estimates. The specific methods are as follows:
[0028] Based on the range of central deflection D0, the pavement is divided into five levels: ultra-flexible, flexible, medium stiffness, high stiffness, and high stiffness. Adaptive grade constraints are applied to the modulus of each structural layer according to the engineering feasible domain constraints. The lower and upper limits of the modulus of the corresponding levels are set for the asphalt concrete layer (AC), base course (BC), subbase course (SB), and subgrade (SG).
[0029] The parameters are filled into the prompt word template and sent to the pre-trained large language model, which then performs road surface type recognition, structural layer parameter extraction, and engineering knowledge-driven initial modulus estimation.
[0030] When the large language model estimation fails or the output is invalid, the system reverts to calculating the initial modulus value based on the empirical formula of the deflection basin characteristic parameters to ensure that the system can be robustly initialized under various conditions.
[0031] The prompt word template is as follows:
[0032] "You are a pavement structure engineering expert specializing in pavement modulus inversion. Please provide an engineering-based initial modulus estimate based on the following FWD deflection data and structural information. Pavement type: {pavement_type}; Layer thickness (cm): Asphalt concrete layer {h_AC} cm, base course {h_BC} cm, subbase course {h_SB} cm, with a semi-infinite soil base course below; FWD deflection basin ( ): ={D0}, ={D200}, ={D300}, ={D600}, ={D900}, ={D1200}, ={D1500}; Please estimate the elastic modulus (MPa) of each layer. Please return in JSON format: {"E_AC": value, "E_BC": value, "E_SB": value, "E_SG": value, "reasoning": "brief explanation"}. Adjusted values must meet the engineering feasible region constraints. Here, "{parameter name}" is a placeholder; the parameter name entered before calling the large language model is the actual parameter value.
[0033] The forward mechanical model described in step 103 is constructed as follows:
[0034] A two-dimensional axisymmetric finite element method was used to establish a forward calculation model for pavement mechanics. The calculation domain was extended 8.0m radially and 10.0m in depth to eliminate boundary effects.
[0035] Mesh generation strategy: The mesh size is 0.015m in the load application area and 0.08m in the far field area;
[0036] Boundary condition settings: Symmetry axis constraint (At r=0), a uniformly distributed load P=0.707 MPa is applied to the road surface, with a contact radius of... (Corresponding to a standard 50kN drop weight load), bottom fixed support ( Lateral radial constraints (at r=8.0m);
[0037] Where r is the radial coordinate, These are the coordinates in the depth direction;
[0038] Indicates radial displacement;
[0039] Indicates axial (vertical) displacement;
[0040] Extract surface deflection at a specified offset position. The default offset distance is [0, 0.2, 0.3, 0.6, 0.9, 1.2, 1.5] m, which is used to compare with the measured deflection basin and form an optimized feedback signal;
[0041] in, The radial distance of the road surface The vertical deflection value at that location.
[0042] The positive mechanical model serves as a mechanical simulator, measuring the modulus parameters of each layer. Mapped to theoretical deflection basin This provides an evaluation basis for strengthening learning optimization.
[0043] In step 104, the PPO policy update uses a pruned proxy objective function to maintain training stability, as follows:
[0044]
[0045] The function consists of the minimum value of two terms: the first term The gradient objective is for an unconstrained policy; the second term passes through... The operation limits the importance sampling ratio to Within a certain range, this prevents excessively large single-step policy updates. The minimum of both values is taken to ensure the objective function is a lower bound, thus maintaining training stability.
[0046] in, The importance sampling ratio measures the relative difference in preference between the old and new strategies for the same action.
[0047] For the current policy network (parameters are) In state Down Output Action The probability density (Gaussian distribution density value in continuous action space);
[0048] The probability density of the old strategy under the same state before the previous parameter update is used as the importance sampling benchmark;
[0049] This is the trimming parameter, with a value of 0.2.
[0050] The dominance function value calculated for generalized dominance estimation (GAE);
[0051] The dominance function is calculated using GAE, and the GAE smoothing parameters are... :
[0052]
[0053] in,
[0054] Let be the dominant function value at step t, representing the state. Next action Excess returns compared to the current average level of the strategy; This indicates that the action is better than the mean, and the probability should be increased; Conversely;
[0055] This refers to the TD error (time series differential residual).
[0056] As a discount factor, controlling the decay weight of future rewards, this invention takes... ;
[0057] The optimization process uses a turn-based structure, with each turn containing multiple interaction steps, which are as follows:
[0058] (1) Observe the current state, substitute the current modulus parameters into the finite element model, and calculate. Construct state vector ;
[0059] (2) Sample actions from the policy distribution; the Actor network then... From Gaussian policy distribution Mid-sampling four-dimensional motion ;
[0060] (3) Calculate deflection through positive analysis using partial differential equations and perform modulus adjustment, according to Update the modulus (step 103), and recalculate the new finite element model. ;
[0061] (4) Calculate the reward, based on the new and The deviation is used to calculate the compound reward. ;
[0062] (5) Store the state transition samples in the empirical buffer;
[0063] Update the Actor and Critic networks after each round.
[0064] The state space of the reinforcement learning agent described in step 105 The motion space and structure are as follows:
[0065] State space of reinforcement learning This is a ten-dimensional vector, including: the normalized modulus of each layer, the current central deflection error, the deflection proportion direction, and shape indices reflecting the geometric characteristics of the deflection basin. These shape indices include the surface curvature index (SCI), the base course damage index (BDI), the base course curvature index (BCI), and the deflection attenuation rate (DR). As shown in the following formula:
[0066]
[0067] in,
[0068] Normalized modulus of each layer , The modulus of each layer is linearly normalized to the [0,1] interval, corresponding to the thickness of the asphalt concrete layer, base course, subbase course, and subgrade, respectively.
[0069] For the center deflection error;
[0070] in, Let be the relative error of the center deflection at step t, with a value range of [0, 1].
[0071] This represents the theoretical deflection value at the center point (r=0) calculated by the finite element forward model under the current modulus parameters, in μm.
[0072] The value is the center point deflection (target value) measured by FWD, in μm;
[0073] To truncate the operation and prevent the value from becoming too large when the error exceeds 100%, the state space dimension is normalized to [0,1].
[0074] The deviation between the calculated deflection value and the target deflection value at the center point;
[0075] ,
[0076] ,
[0077] ,
[0078]
[0079] The geometric features representing the standardized measurement of the deflection basin. Among them,
[0080] Reflects the flexural stiffness of the asphalt surface layer. and The greater the difference, the greater the surface curvature;
[0081] Reflecting the state of the grassroots structure, and The difference represents the contribution of base deflection;
[0082] Reflecting the condition of the grassroots level, and The difference represents the bearing capacity contribution of the subbase.
[0083] The far-field attenuation characteristics of the deflection basin are mainly determined by the soil stiffness;
[0084] The value is the target deflection at the center point (r=0 mm) measured by FWD, in μm.
[0085] The deflection value measured by FWD at a offset of 300mm is expressed in μm.
[0086] The deflection value measured by FWD at a offset of 600mm is expressed in μm.
[0087] The deflection value at a 900mm offset measured by FWD is in μm.
[0088] The deflection value at an offset of 1500mm measured by FWD is in μm.
[0089] The superscript "target" indicates the measured FWD value (optimization target), which is different from the superscript "calc" (finite element calculation value).
[0090] Action space of reinforcement learning It is a four-dimensional vector, as shown in the following equation:
[0091]
[0092] in This represents the relative adjustment of the modulus of the i-th layer, corresponding to the asphalt concrete layer, base course, subbase course, and subgrade; the modulus update rule is as follows:
[0093]
[0094] The reward function for reinforcement learning is set as follows:
[0095]
[0096] in:
[0097] Center deflection matching bonus A segmented structure is adopted, based on the relative error of the center deflection. Rate it:
[0098]
[0099] Bent basin shape reward The calculation is as follows:
[0100]
[0101] in, Distance-weighted error is used to evaluate the overall fitting quality of the deflection basin.
[0102]
[0103] Among them, the weight vector This gives higher weight to far-field sensors;
[0104] Shape index rewards The four standard deflection basin morphology indices (SCI, BDI, BCI, DR) are integrated to comprehensively assess morphological errors. Evaluation of the mechanical contributions of each layer:
[0105]
[0106]
[0107] Progress Rewards Based on the change in center deflection error in adjacent steps Rate it:
[0108]
[0109] Reinforcement learning network structure: The Actor-Critic architecture is adopted; the Actor network parameterizes the Gaussian policy distribution for action sampling; the Critic network estimates the state value for advantage function calculation; both networks adopt a fully connected architecture, containing two hidden layers with 128 neurons each, and the activation function is ReLU.
[0110] The specific implementation of the periodic optimization guidance of the large language model in step 106 is as follows:
[0111] In the first training round and every 5 rounds thereafter, cue words containing the current modulus value, optimization history and convergence state are constructed and sent to the large language model;
[0112] The optimized prompt word template for large language models is as follows:
[0113] "You are a pavement structure engineering expert. The current modulus inversion optimization progress is as follows: Current episode: {episode}; Current deflection error: {D0_error:.2f}%; Current deflection basin error: {basin_error:.2f}%; Current modulus (MPa): Asphalt concrete layer E_AC={E_AC}, base course E_BC={E_BC}, subbase course E_SB={E_SB}, subgrade E_SG={E_SG}; Recent optimization history: {optimization_history}. Please analyze the current convergence status and provide adjustment suggestions. Please return in JSON format: {"modulus_adjustments": {"E_AC": relative adjustment amount, "E_BC": relative adjustment amount, "E_SB": relative adjustment amount, "E_SG": relative adjustment amount}, "engineering_analysis": {"optimization_direction": "optimization direction suggestions", "convergence_issue": "convergence problem analysis", "key_layer": "Key adjustment level". The adjustment range is -0.3 to 0.3.
[0114] Here, "{parameter name}" is a placeholder; the parameter name entered before calling the large language model is the actual parameter value. The recent optimization history (optimization_history) includes the value sequence of the modulus of each structural layer within the previous preset period and the corresponding trend of the central deflection error, which is used by the large language model to identify whether the current convergence state has stalled or oscillated. For example: [Round 5: E=[2000, 500, 200, 40], Error=15.2%; Round 10: E=[2100, 550,180, 42], Error=8.5%]
[0115] After the adjustment suggestions output by the large language model are verified by the system, they are applied to the PPO action distribution to guide the agent's exploration direction to the region suggested by the large language model.
[0116] The specific modulation method is as follows: using mixed weights Adjust the output vector of the large language model ;
[0117] With the mean of the Actor network output Perform linear fusion, that is This operation only applies to the action sampling phase and does not modify the PPO network parameters θ or prune the objective function. and cutting parameters This ensures that training stability is not affected.
[0118] If the output of the large language model is invalid or the verification fails, ignore the result of this guidance and continue to execute according to the original PPO strategy to ensure the robustness of the system.
[0119] The convergence condition described in step 107 is that both of the following conditions must be met simultaneously:
[0120] (1) Relative error of center deflection ;
[0121] (2) Average error of the deflection basin ;
[0122] The above convergence conditions ensure that the inversion results not only match the central deflection, but also reproduce the overall deflection basin morphology, thus guaranteeing the reliability of the output results.
[0123] If convergence is not achieved after reaching the maximum number of training rounds, the modulus combination with the smallest error during the optimization process is output as the final result.
[0124] The output content mentioned in step 108 includes:
[0125] Inversion results of the modulus of each structural layer of the optimal pavement Unit: MPa;
[0126] Comparison results between the corresponding theoretical deflection basin prediction and the measured deflection basin;
[0127] Central deflection error Average error with deflection basin Quantitative evaluation;
[0128] Sensitivity coefficient S of each layer modulus i It is used to assess the strength of each layer's influence on surface deflection and to guide engineering decisions.
[0129] Wherein, the layer sensitivity coefficient The calculation method is as follows: based on the moduli of each layer obtained by inversion, a small perturbation is applied to the modulus of the i-th layer. (Typically 10%), the change in surface center deflection after disturbance is calculated using the aforementioned positive mechanical model. According to the formula The coefficient was calculated. The larger the value, the more significant the influence of the structural layer on the deflection basin response, and the higher the reliability of the inversion results.
[0130] Beneficial effects
[0131] This invention constructs an intelligent input parsing engine through a pre-trained large language model, enabling engineers to describe road structure and FWD measurement data in natural language. The large language model automatically extracts structured parameters, overcoming the limitations of traditional inversion methods that require professional software operation skills, significantly reducing the application threshold, and improving adaptability to non-standard working conditions.
[0132] This invention utilizes the material property knowledge accumulated by large language models in engineering corpus pre-training, and adaptively generates engineering-oriented initial modulus estimation under hierarchical constraints based on the characteristics of the deflection basin. It replaces the traditional method that relies on engineer experience or fixed initial values, and fundamentally solves the problem of initial value dependence in gradient-based methods and heuristic methods.
[0133] This invention innovatively uses the PPO algorithm to drive the modulus iterative adjustment process. By pruning the proxy objective function, the stability of the strategy update is maintained. Compared with genetic algorithms and particle swarm optimization, the computational complexity is reduced from "population size × number of iterations × number of forward analyses" to "number of rounds × number of steps × number of forward analyses", which significantly reduces the amount of computation.
[0134] This invention periodically introduces a large language model optimization guidance mechanism, which organically integrates the domain knowledge reasoning ability of the large language model with the global search ability of the PPO algorithm, achieving a 100% convergence rate on atypical pavement structures (thick asphalt layers, thin surface semi-rigid structures, etc.), effectively overcoming the failure problem of existing methods on complex structures.
[0135] This invention features model-independent large language model compatibility, supports flexible switching between commercial APIs (DeepSeek, GPT-4o, etc.) and open-source models (such as Qwen2.5-7B), and provides an open interface architecture to adapt to different computing environments and cost constraints. Attached Figure Description
[0136] Figure 1 This is a flowchart of a road structure modulus inverse calculation method based on large model and reinforcement learning according to the present invention;
[0137] Figure 2 This is a schematic diagram illustrating the data acquisition and structural parameter sources for road surface deflection detection in an embodiment of the present invention.
[0138] Figure 3 This is a schematic diagram of the dual-mode input interface workflow according to an embodiment of the present invention;
[0139] Figure 4 This is a flowchart illustrating the iterative optimization of PPO reinforcement learning and the periodic guidance of a large language model in an embodiment of the present invention.
[0140] Figure 5 This is a schematic diagram of the overall framework of the road surface modulus inverse calculation method based on large language model and reinforcement learning in an embodiment of the present invention. Detailed Implementation
[0141] The technical solution provided in this application will be further described below with reference to specific embodiments and accompanying drawings. The advantages and features of this application will become clearer from the following description.
[0142] A method for inverse calculation of road structure modulus based on large models and reinforcement learning includes the following steps: (e.g.) Figure 1 )
[0143] Step 1: Large language model-assisted input parsing and initial estimation. The data sources required for this invention are as follows: Figure 2 As shown, the data includes the deflection basin vector Dm obtained from field tests using a falling weight deflectometer (FWD), the thickness vector h of each layer provided by the pavement structure design data, and the load pressure P, contact radius r, and Poisson's ratio μ (radius) recorded at the test site. Fixed parameters for FWD equipment (typically 0.15m), Poisson's ratio (Obtained from actual measurements or determined from soil properties according to specifications). Users provide the above pavement information through natural language description (Mode A) or structured numerical input (Mode B). The dual-mode input interface workflow is as follows: Figure 3 As shown, after both input modes converge to a unified parameter set, the large language model sends structured prompts based on the features of the deflection basin to obtain an initial modulus estimate driven by engineering knowledge.
[0144] The specific graded constraints for the initial modulus estimation are as follows: Based on the range of the central deflection D0, the pavement is divided into five stiffness levels, and upper and lower limits are set for the modulus of each structural layer according to the following engineering feasible domain:
[0145] When D0>0.50mm, the road surface is an ultra-flexible road surface with AC modulus constraints of [500, 4000] MPa, BC modulus constraints of [100, 1000] MPa, SB modulus constraints of [50, 400] MPa, and SG modulus constraints of [20, 80] MPa.
[0146] When D0∈[0.35, 0.50]mm, the road surface is a flexible road surface with AC modulus constraints of [800, 5000]MPa, BC modulus constraints of [150, 1500]MPa, SB modulus constraints of [60, 500]MPa, and SG modulus constraints of [25, 100]MPa.
[0147] When D0∈[0.20, 0.35]mm, the road surface is a medium stiffness road surface, with AC modulus constraints of [1200, 6500]MPa, BC modulus constraints of [300, 2000]MPa, SB modulus constraints of [100, 700]MPa, and SG modulus constraints of [50, 180]MPa;
[0148] When D0∈[0.10, 0.20]mm, the road surface is a high-stiffness road surface, with AC modulus constraints of [2000, 8000]MPa, BC modulus constraints of [500, 3000]MPa, SB modulus constraints of [200, 1000]MPa, and SG modulus constraints of [80, 250]MPa;
[0149] When D0 < 0.10 mm, the road surface is a high-stiffness road surface with AC modulus constraints of [3000, 15000] MPa, BC modulus constraints of [3000, 35000] MPa, SB modulus constraints of [1000, 8000] MPa, and SG modulus constraints of [60, 300] MPa.
[0150] Step 2: Forward mechanical modeling using partial differential equations. Based on the initial modulus combination, the forward modeling module uses two-dimensional axisymmetric finite element analysis to calculate the theoretical deflection basin, serving as a mechanical simulator connecting the modulus parameters and the deflection response, and providing feedback signals for optimization.
[0151] Step 3: PPO reinforcement learning iterative optimization and periodic large language model guidance. The PPO agent evaluates the difference between the calculated deflection and the measured deflection through a composite reward function, outputs a modulus adjustment action, and feeds the adjusted modulus back to the positive model for the next iteration. Every 5 training rounds, the large language model guidance module provides optimization direction suggestions based on the current convergence state to prevent getting trapped in local optima. The complete iterative process is as follows: Figure 4 As shown.
[0152] Step 4: Convergence Judgment and Output. When the center deflection error... And the average error of the deflection basin Convergence is determined at the time of convergence, and the optimal modulus combination and related evaluation indicators are output.
[0153] Figure 5This is the overall framework implemented by the method of the present invention, including three core modules: the large language model initial estimation module, the finite element forward mechanics modeling module, and the PPO reinforcement learning agent.
[0154] To verify the performance of the method of the present invention, systematic verification was carried out on the synthetic dataset (ABAQUS high-fidelity finite element simulation) and the measured data of the RIOHTRACK full-scale ring road pavement test site, and the results are as follows:
[0155] (1) Numerical verification of the synthetic dataset
[0156] The synthetic dataset contains 15 flexible pavement structures. After excluding the full-depth asphalt structure (AC≥25 cm), 12 structures were finally used for verification. They are classified by asphalt layer thickness as thin-layer AC (AC≤10 cm), standard structure (10 cm<AC<15 cm), thick-layer AC (AC≥15 cm), and thin surface layer (AC≤5 cm, typical semi-rigid base covering layer).
[0157] As shown in Table 1, the method of the present invention achieves a 100% convergence rate (ε_D0 error threshold of 3%), and the average deflection error is 1.69%±0.69%. The layer-by-layer accuracy shows an obvious pattern: the subgrade inversion accuracy is the highest (error 11.3%) because of its dominant influence on the far-field deflection; the surface layer has a larger dispersion (error 21.0%), reflecting the known equivalent solution phenomenon. 58.3% of the cases are converged within a single round (within 60 seconds), which reflects the effectiveness of the large language model-guided initialization.
[0158] Table 1 Summary of verification performance indicators for the synthetic dataset
[0159]
[0160] (2) Verification of the RIOHTRACK measured data
[0161] The method of the present invention was deployed and verified on 5 test sections (typical semi-rigid base asphalt pavement structure) of the full-scale ring road pavement test site. The FWD test was carried out under controlled conditions, with a peak load of 50 kN, a contact pressure of 0.707 MPa, and a loading radius of 0.15 m. The original deflection basin was screened according to the following quality control criteria: (Ensure high fidelity of the near-field response); strictly monotonically decreasing deflection pattern (exclude sensor anomalies); and conversion ratio <20% (exclude structural abnormal discontinuities).
[0162] As shown in Table 2, the method of this invention exhibits strong reproducibility of deflection basins for all five test sections. The central deflection error ε_D0 ranges from 1.13% to 7.54%, with an average of 4.58%; the average error of the deflection basin at all sensor locations is 6.52%, verifying the robustness of the method on semi-rigid pavement structures. The inversion modulus results show good physical consistency: the AC modulus (3000~6900 MPa) matches the viscoelastic properties at the test temperatures (8~14°C); the cement-stabilized crushed stone base modulus (3000~3800 MPa) reflects the operational deterioration pattern of in-service pavements; and the subgrade modulus falls within the regional standard range.
[0163] Table 2. Structural configuration and inversion results of full-scale ring road sections
[0164]
[0165] The above description is merely a description of preferred embodiments of this application and is not intended to limit the scope of this application in any way. Any changes or modifications made by those skilled in the art based on the above-disclosed technical content should be considered as equivalent and valid embodiments, and all fall within the scope of protection of the technical solution of this application.
Claims
1. A method for inverse calculation of road structure modulus based on large model and reinforcement learning, characterized in that, Includes the following steps: Step 101: Obtain pavement structure information and measured deflection basin data, and convert them into a unified structured parameter vector through a dual-mode input interface. The dual-mode input interface includes natural language description mode A and structured numerical input mode B. Step 102: Send the structured parameters obtained in Step 101 to the pre-trained large language model. The large language model will then perform road surface type recognition, structural layer parameter extraction, and engineering knowledge-driven initial modulus estimation to generate an initial modulus vector. When the large language model fails, the system automatically reverts to an empirical formula based on the characteristic parameters of the deflection basin. Step 103: Based on the initial modulus vector generated in Step 102, construct a two-dimensional axisymmetric finite element forward mechanical model, and use a partial differential equation solver to calculate the theoretical deflection value at each measuring point to form a theoretical deflection basin. ; Step 104: Using theoretical deflection basins Compared with the measured deflection basin The deviations between them are used to construct the state vector of the reinforcement learning agent. The proximal policy optimization PPO algorithm is used to drive the agent to output a four-dimensional modulus relative adjustment action, and the modulus of each structural layer is iteratively adjusted. Step 105: Set up fusion center deflection matching reward in the reinforcement learning agent. Bent basin shape reward Shape index rewards Progress Rewards Composite reward function This is used to guide the agent in generating effective modulus adjustment action suggestions; Step 106: During the PPO optimization process, the large language model is periodically called every 5 training rounds to construct prompt words containing the current modulus value, optimization history and convergence status, and to obtain the optimization direction guidance provided by the large language model. After verification, the PPO action distribution is modulated to prevent the optimization from getting stuck in local optima. Step 107: Based on the action suggestions from Steps 104 to 106, press... Update the modulus parameters of each layer, re-execute the finite element forward calculation, and input the results into the reinforcement learning agent for the next round of training. Iterate this process until the convergence condition is met. For the first Current modulus value of the layer structure. For the updated version Layer modulus values of layer structure For the first Layer structure layer modulus update amount; Step 108: After convergence, output the optimal pavement structure layer modulus inversion results, the comparison results of the corresponding predicted deflection values and measured values, error evaluation indicators, and the layer sensitivity coefficient and modulus variation coefficient used for engineering decision support.
2. The method according to claim 1, characterized in that, The processing method for the dual-mode input interface mentioned in step 101 is as follows: Natural Language Description Mode A: The road structure description text input by the user in natural language is parsed by a large language model to extract the road type and the thickness of each structural layer. Actual measured deflection basin and load pressure P, and mapped to the standard structured parameter set; among which, the measured deflection basin as follows: The values are deflection values at the center point and offsets of 200mm, 300mm, 600mm, 900mm, 1200mm, and 1500mm, respectively, in μm. Structured numerical input mode B: Users directly input measured deflection data, layer thickness and load parameters in a standardized format, and the system directly generates parameter vectors after data verification; Both input modes ultimately converge into a unified set of parameters, including: the thickness vectors of each structural layer. Load pressure With the radius of the bearing plate Poisson's ratio and measured deflection basin .
3. The method according to claim 2, characterized in that, In step 102, the large language model generates initial modulus estimates. The specific methods are as follows: Based on the range of central deflection D0, the pavement is divided into five levels: ultra-flexible, flexible, medium stiffness, relatively stiffness, and high stiffness. The modulus of each structural layer is adaptively graded and constrained according to the engineering feasible domain constraint, namely asphalt concrete layer AC, base course BC, subbase course SB, and subgrade SG, and the lower and upper limits of the modulus of the corresponding level are set. The parameters are filled into the prompt word template and sent to the pre-trained large language model, which then performs road surface type recognition, structural layer parameter extraction, and engineering knowledge-driven initial modulus estimation. When the large language model estimation fails or the output is invalid, the system reverts to calculating the initial modulus value based on the empirical formula of the deflection basin characteristic parameters to ensure that the system can be robustly initialized under various conditions. The prompt word template is as follows: "You are a pavement structure engineering expert specializing in pavement modulus inversion; please provide an engineering-based initial modulus estimate based on the following FWD deflection data and structural information; pavement type: {pavement_type}; layer thickness (cm): asphalt concrete layer {h_AC} cm, base course {h_BC} cm, subbase course {h_SB} cm, with a semi-infinite soil base course below; FWD deflection basin ( ): ={D0}, ={D200}, ={D300}, ={D600}, ={D900}, ={D1200}, ={D1500}; Please estimate the elastic modulus (MPa) of each layer; Please return the following JSON format: {"E_AC": value, "E_BC": value, "E_SB": value, "E_SG": value, "reasoning": "brief explanation"}; Adjusted values must meet the engineering feasible domain constraints. Here, "{parameter name}" is a placeholder. The parameter name is filled in before calling the large language model as the actual parameter value.
4. The method according to claim 2, characterized in that, The forward mechanical model described in step 103 is constructed as follows: A two-dimensional axisymmetric finite element method was used to establish a forward calculation model for pavement mechanics. The calculation domain was extended 8.0m radially and 10.0m in depth to eliminate boundary effects. Mesh generation strategy: The mesh size is 0.015m in the load application area and 0.08m in the far field area; Boundary condition settings: Symmetry axis constraint A uniformly distributed load P = 0.707 MPa is applied to the road surface, with a contact radius of... Fixed support at the bottom, with lateral radial constraint at r=8.0m. ; Extract surface deflection at a specified offset position. The default offset distance is [0, 0.2, 0.3, 0.6, 0.9, 1.2, 1.5] m, used for comparison with the measured deflection basin to form an optimized feedback signal; where r is the radial coordinate. These are the coordinates in the depth direction; Indicates radial displacement; Indicates axial vertical displacement; The positive mechanical model serves as a mechanical simulator, measuring the modulus parameters of each layer. Mapped to theoretical deflection basin This provides an evaluation basis for strengthening learning optimization.
5. The method according to claim 2, characterized in that, The PPO algorithm described in step 104 employs a pruned surrogate objective function to maintain training stability, as follows: The function consists of the minimum value of two terms: the first term The gradient objective is for an unconstrained policy; the second term passes through... The operation limits the importance sampling ratio to Within the range, to prevent excessively large single-step policy updates; the minimum of the two values ensures that the objective function is the lower bound, thereby maintaining training stability; in, The importance sampling ratio measures the relative difference in preference between the old and new strategies for the same action. For parameters The current policy network is in state Down Output Action The probability density; The probability density of the old strategy under the same state before the previous parameter update is used as the importance sampling benchmark; These are the trimming parameters; The advantage function value calculated for the generalized advantage estimation (GAE); The dominance function is calculated using GAE, and the GAE smoothing parameters are... : in, Let be the dominant function value at step t, representing the state. Next action Excess returns compared to the current average level of the strategy; This indicates that the action is better than the mean, and the probability should be increased; Conversely; For TD error; This serves as a discount factor, controlling the weighting of future reward decay. The optimization process uses a turn-based structure, with each turn containing multiple interaction steps, which are as follows: (1) Observe the current state, substitute the current modulus parameters into the finite element model, and calculate. Construct state vector ; (2) Sample actions from the policy distribution; the Actor network then... From Gaussian policy distribution Mid-sampling four-dimensional motion ; (3) Calculate deflection through positive analysis using partial differential equations and perform modulus adjustment, according to Update the modulus and recalculate the new finite element model. ; (4) Calculate the reward, based on the new and The deviation is used to calculate the compound reward. ; (5) Store the state transition samples in the empirical buffer; Update the Actor and Critic networks after each round.
6. The method according to claim 5, characterized in that, The state space of the reinforcement learning agent described in step 105 The motion space and structure are as follows: State space of reinforcement learning This is a ten-dimensional vector, including: the normalized modulus of each layer, the current central deflection error, the deflection ratio direction, and shape indices reflecting the geometric characteristics of the deflection basin. The shape indices include the surface curvature index, the base layer damage index, the base layer curvature index, and the deflection attenuation rate; as shown in the following formula: in, Normalized modulus of each layer , The modulus of each layer is linearly normalized to the [0,1] interval, corresponding to the thickness of the asphalt concrete layer, base course, subbase course, and subgrade, respectively. For the center deflection error; in, Let be the relative error of the center deflection at step t, with a value in the range [0, 1]. This is the theoretical deflection value at the center point calculated by the finite element forward model under the current modulus parameters, in μm. The value is the center point deflection measured by FWD, in μm; To truncate the operation and prevent the value from becoming too large when the error exceeds 100%, the state space dimension is normalized to [0,1]. The deviation between the calculated deflection value and the target deflection value at the center point; , , , The geometric features representing the standardized measurement of the deflection basin; among which, Reflects the flexural stiffness of the asphalt surface layer. and The greater the difference, the greater the surface curvature; Reflecting the state of the grassroots structure, and The difference represents the contribution of the base deflection; Reflecting the condition of the grassroots level, and The difference represents the bearing capacity contribution of the subbase. The far-field attenuation characteristics of the deflection basin are mainly determined by the soil stiffness. The value is the center point deflection measured by FWD, in μm; The deflection value measured by FWD at a offset of 300mm is expressed in μm. The deflection value measured by FWD at a offset of 600mm is expressed in μm. The deflection value at a 900mm offset measured by FWD is in μm. The deflection value at an offset of 1500mm measured by FWD is in μm. The superscript "target" indicates the measured FWD value, which is different from the superscript "calc" which indicates the finite element calculation value; Action space of reinforcement learning It is a four-dimensional vector, as shown in the following equation: in This represents the relative adjustment of the modulus of the i-th layer, corresponding to the asphalt concrete layer, base course, subbase course, and subgrade; the modulus update rule is as follows: The reward function for reinforcement learning is set as follows: in: Center deflection matching bonus A segmented structure is adopted, based on the relative error of the center deflection. Rate it: Bent basin shape reward The calculation is as follows: in, Distance-weighted error is used to evaluate the overall fitting quality of the deflection basin. Among them, the weight vector This gives higher weight to far-field sensors; Shape index rewards Integrating four standard deflection basin morphology indicators to comprehensively assess morphological errors Evaluation of the mechanical contributions of each layer: Progress Rewards Based on the change in center deflection error between adjacent steps Rate it: Reinforcement learning network structure: An Actor-Critic architecture is adopted; the Actor network parameterizes the Gaussian policy distribution for action sampling; the Critic network estimates the state value for advantage function calculation.
7. The method according to claim 6, characterized in that, The specific implementation of the periodic optimization guidance of the large language model in step 106 is as follows: In the first training round and every 5 rounds thereafter, a cue word containing the current modulus value, optimization history and convergence state is constructed and sent to the large language model; The optimized prompt word template for large language models is as follows: "You are a pavement structure engineering expert; the current modulus inversion optimization progress is as follows: Current episode: {episode}; Current deflection error: {D0_error:.2f}%; Current deflection basin error: {basin_error:.2f}%; Current modulus (MPa): Asphalt concrete layer E_AC={E_AC}, base course E_BC={E_BC}, subbase course E_SB={E_SB}, subgrade E_SG={E_SG}; Recent optimization history: {optimization_history}; Please analyze the current convergence status and provide adjustment suggestions; Please return in JSON format: {"modulus_adjustments": {"E_AC": relative adjustment amount, "E_BC": relative adjustment amount, "E_SB": relative adjustment amount, "E_SG": relative adjustment amount}, "engineering_analysis": {"optimization_direction": "optimization direction suggestions", "convergence_issue": "convergence problem analysis", "key_layer":"key adjustment layer"}}; the adjustment range is -0.3 to 0.3"; wherein, the recent optimization history includes the value sequence of the modulus of each structural layer in the previous preset period and the corresponding trend of the change of the central deflection error, which is used to help the large language model identify whether the current convergence state has fallen into stagnation or oscillation; After the adjustment suggestions output by the large language model are verified by the system, they are applied to the PPO action distribution to guide the agent's exploration direction to the region suggested by the large language model. The specific modulation method is as follows: using mixed weights Adjust the output vector of the large language model ; With the mean of the Actor network output Perform linear fusion, that is This operation only applies to the action sampling phase and does not modify the PPO network parameters θ or prune the objective function. and cutting parameters This ensures that training stability is not affected; If the output of the large language model is invalid or the verification fails, ignore the result of this guidance and continue to execute according to the original PPO strategy to ensure the robustness of the system.
8. The method according to claim 7, characterized in that, The convergence condition described in step 107 is that both of the following conditions must be met simultaneously: (1) Relative error of center deflection ; (2) Average error of the deflection basin ; The above convergence conditions ensure that the inversion results not only match the central deflection, but also reproduce the overall deflection basin morphology, thus guaranteeing the reliability of the output results. If convergence is not achieved after reaching the maximum number of training rounds, the modulus combination with the smallest error during the optimization process is output as the final result.
9. The method according to claim 8, characterized in that, The output content mentioned in step 108 includes: Inversion results of the modulus of each structural layer of the optimal pavement Unit: MPa; Comparison results between the corresponding theoretical deflection basin prediction and the measured deflection basin; Central deflection error Average error with deflection basin Quantitative evaluation; Sensitivity coefficient S of each layer modulus i It is used to assess the strength of each layer's influence on surface deflection and to guide engineering decisions.