A network communication construction wiring dynamic adjustment system based on reinforcement learning
The network communication construction cabling dynamic adjustment system based on reinforcement learning solves the problem of hidden network damage caused by fiber micro-bending deformation and mechanical tension during optical cable construction. It realizes real-time monitoring and dynamic routing adjustment, improving network stability and fault response capabilities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 西安广科微基通信科技有限公司
- Filing Date
- 2026-04-28
- Publication Date
- 2026-06-19
AI Technical Summary
The existing communication network architecture cannot monitor the micro-bending deformation of optical fibers and physical media damage caused by mechanical tension in real time during the construction of optical cables. This leads to hidden packet loss and a drop in network throughput during the dynamic construction process, and it is difficult to accurately locate the fault point, resulting in high troubleshooting costs.
A network communication construction cabling dynamic adjustment system based on reinforcement learning is adopted, including a physical extraction module, a state fusion module, a backoff inference module, and a closed-loop execution module. By reading the received power and temperature, the temperature difference and difference quotient are calculated to obtain the perturbation vector. Combined with the bit error rate, the number of lost frames, and the queue depth, a state tensor is generated to perform policy network inference and route adjustment, thereby realizing cross-layer state awareness and dynamic route avoidance.
This technology enables early detection of communication attenuation risks before optical cables are damaged by external machinery, avoiding network-wide routing table oscillations and traffic congestion caused by traditional passive response mechanisms. It improves the network's real-time adjustment capabilities and fault location accuracy, and reduces troubleshooting costs.
Smart Images

Figure CN122247880A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of network communication technology, and in particular to a dynamic adjustment system for network communication construction cabling based on reinforcement learning. Background Technology
[0002] With the continuous expansion of modern data exchange networks, the installation of optical cables in underground pipelines and the installation of wiring in the low-voltage wells of smart buildings have become increasingly frequent. In these complex incremental wiring or dynamic expansion scenarios, network equipment is usually in a powered-on state to meet the needs of phased service delivery and real-time verification of network connectivity. Due to the narrow space and tortuous routing of the wiring ducts, construction workers inevitably apply varying degrees of mechanical tension when pulling optical cables. When the tension exceeds the standard limit, or when the bending angle of the optical cable is lower than the safe bending radius of the fiber core due to the dead angle of the cable tray, the physical medium inside the optical fiber will undergo micro-bending deformation, and the optical module connectors at the ends will also loosen slightly due to the force. This change in physical form accompanying the construction work directly disrupts the total internal reflection condition for optical signal transmission inside the fiber core, thereby causing dynamic fluctuations in the physical characteristics of the underlying communication medium.
[0003] Regarding the physical media damage during the aforementioned construction process, the existing communication network architecture suffers from a core technical deficiency: a disconnect in cross-layer state perception. It cannot provide real-time feedback on the damage caused by physical construction actions to the network layer for dynamic routing avoidance. In current engineering practice, the quality inspection of optical cable construction generally relies on static illumination testing using an optical time-domain reflectometer after the overall cabling is completed, which is insufficient to meet the real-time monitoring needs during dynamic pulling processes. Furthermore, at the data transmission level, traditional Layer 3 routing devices lack fine-grained perception capabilities for the hidden degradation of the underlying physical layer. Micro-bending deformation of the optical fiber does not immediately trigger a complete interruption of the underlying optical signal; the physical ports of network devices still maintain the appearance of normal connectivity. However, beneath this appearance, the optical signal-to-noise ratio of the link has decreased sharply due to external forces, and the underlying bit error rate has secretly surged within a very short time. Because the existing route discovery mechanism mechanically uses the physical connectivity status of the port as the basis for path allocation, the router will continue to schedule massive amounts of data packets to this potentially damaged link. The system only triggers a passive congestion reduction response when a large number of underlying errors accumulate and eventually cause massive packet loss in the upper-layer transmission control protocol. This extremely delayed response mechanism directly leads to hidden packet loss and blind retransmission in core services, causing a precipitous drop in network throughput. Furthermore, because the source of degradation is the dynamically changing construction stress, conventional monitoring methods are extremely difficult to accurately locate the fault point, resulting in enormous troubleshooting costs. Summary of the Invention
[0004] This application proposes a network communication construction cabling dynamic adjustment system based on reinforcement learning to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, this application adopts the following technical solution: a network communication construction cabling dynamic adjustment system based on reinforcement learning, comprising: a physical extraction module, a state fusion module, a backoff deduction module, and a closed-loop execution module, wherein; The physical extraction module, configured in the first device, is used to read the received power and temperature, calculate the temperature difference between the temperature and the preset reference temperature, subtract the product of the temperature difference and the preset coefficient from the received power to generate the effective power, and extract the difference quotient and variance of the effective power and concatenate them to generate a perturbation vector. A state fusion module, configured in the second device, is used to acquire the perturbation vector, collect the bit error rate, the number of lost frames, and the queue depth, calculate the difference between the effective power and the preset noise floor and replace it with a truncation constant during non-positive time, calculate an exponential term based on the replaced difference, multiply the exponential term by the difference quotient to generate an optical component, multiply the bit error rate by a preset weight to generate an electrical component, accumulate the optical component and the electrical component to generate a degradation exponent, and concatenate the perturbation vector, the degradation exponent, the bit error rate, the number of lost frames, and the queue depth to generate a state tensor. The backoff inference module, configured in the second device, is used to input the state tensor into the preset policy network, generate a historical penalty item based on the number of lost frames, generate a look-ahead penalty item based on the difference between the degradation index and the preset threshold, accumulate the historical penalty item and the look-ahead penalty item, and output the routing increment by rounding at the output layer of the preset policy network. A closed-loop execution module, configured in the first device, is used to add the routing increment to the preset baseline cost to generate a cost value, overwrite the cost value to the routing library of the first device and trigger a forwarding table refresh, monitor the subsequent changes in the effective power and the queue depth, generate a reward signal when the effective power stops decaying and the queue depth is emptied, and use the reward signal to update the preset policy network.
[0006] Furthermore, the specific operation of the physical extraction module in reading the received power and temperature, calculating the temperature difference between the temperature and the preset reference temperature, and subtracting the product of the temperature difference and the preset coefficient from the received power to generate the effective power is as follows: The physical extraction module uses a field-programmable gate array (FPGA) to poll the digital diagnostic monitoring register of the optical transceiver module via an out-of-band bus at a fixed discrete sampling period, reads the total energy of photons actually received on the photodetector target surface as the received power, and reads the real-time temperature of the optical transceiver module as the temperature; obtains the thermoluminescence power drift coefficient of the optical module as the preset coefficient; calculates the temperature difference between the temperature and the preset reference temperature; multiplies the temperature difference by the preset coefficient to obtain the product characterizing optical thermal drift; and subtracts the product from the received power to remove the background noise caused by changes in the room temperature, generating the denoised effective power.
[0007] Furthermore, the specific operation of the physical extraction module in extracting the difference quotient and variance of the effective power and concatenating them to generate a perturbation vector is as follows: the physical extraction module obtains the effective power at the current time step, and subtracts the effective power of the adjacent previous time step from the effective power at the current time step to obtain the power change difference; the power change difference is divided by the sampling period to obtain the difference quotient of the effective power; within a set sliding time window, the fluctuation variance of the effective power is calculated as the variance; the difference quotient of the effective power and the variance are concatenated into a one-dimensional matrix to quantize the frequency of the transient alternating force generated by the construction machinery pulling into the perturbation vector.
[0008] Further, the specific operations of the state fusion module in acquiring the perturbation vector, collecting the bit error rate, the number of dropped frames, and the queue depth, and calculating the difference between the effective power and the preset noise floor and replacing it with a truncation constant in non-positive time are as follows: The state fusion module acquires the perturbation vector; collects the bit error rate increment of the forward error correction protocol as the bit error rate, collects the actual number of dropped frames as the number of dropped frames, collects the queue depth in the outgoing direction as the queue depth, and extracts the sensitivity noise floor limit value of the avalanche photodiode completely submerged by thermal noise and shot noise as the preset noise floor; subtracts the preset noise floor from the effective power to obtain the difference, and determines whether the difference is less than or equal to zero. When the difference is less than or equal to zero, a preset small constant is acquired as the truncation constant, and the difference is replaced by the truncation constant.
[0009] Furthermore, the specific operations of the state fusion module in calculating the exponent term based on the replaced difference, multiplying the exponent term by the difference quotient to generate the optical component, and multiplying the bit error rate by a preset weight to generate the electrical component are as follows: the state fusion module obtains a preset safety margin, divides the safety margin by the replaced difference to obtain the division calculation result; obtains the base of the natural logarithm, uses the base of the natural logarithm as the base, and uses the division calculation result as the exponent to obtain the exponent term; obtains the absolute value of the difference quotient, multiplies the exponent term by the absolute value of the difference quotient to generate the optical component; obtains a preset electrical layer adaptation factor as the preset weight; and multiplies the bit error rate by the preset weight to generate the electrical component.
[0010] Furthermore, the specific operation of the state fusion module in generating a degradation index by accumulating the optical component and the electrical component, and concatenating the perturbation vector, the degradation index, the bit error rate, the number of lost frames, and the queue depth to generate a state tensor is as follows: The state fusion module obtains a pre-set sliding time window, adds the optical component and the electrical component within the sliding time window, and performs discrete integration on the sum to generate a discrete cross-layer degradation correlation index, which is used as the degradation index; extracts the perturbation vector, the degradation index, the bit error rate, the number of lost frames, and the queue depth, and performs normalization scaling on the perturbation vector, the degradation index, the bit error rate, the number of lost frames, and the queue depth; while preserving causal characteristics, the data after normalization scaling is merged into a global state data set of a Markov decision process, and the global state data set of the Markov decision process is merged into the state tensor.
[0011] Furthermore, the backoff inference module inputs the state tensor into the preset policy network, generates historical penalty items based on the number of lost frames, and generates forward penalty items based on the difference between the degradation index and the preset threshold. The specific operations are as follows: inputting the state tensor into the preset policy network; obtaining a pre-set service level agreement (SLA) default penalty factor; multiplying the number of lost frames by the SLA default penalty factor to obtain a default penalty value; using the default penalty value as the historical penalty item; subtracting the preset threshold from the degradation index to obtain a calculated difference; using the calculated difference as the difference value; determining whether the difference is greater than zero; if the difference is less than or equal to zero, forcibly replacing the difference with zero; if the difference is greater than zero, retaining the difference to obtain a non-negative overflow difference; obtaining a pre-set potential physical degradation risk penalty factor; multiplying the non-negative overflow difference by the potential physical degradation risk penalty factor to obtain a degradation prevention value; and using the degradation prevention value as the forward penalty item.
[0012] Furthermore, the specific operation of the backoff inference module accumulating the historical penalty term and the look-ahead penalty term, and rounding the output route increment at the output layer of the preset policy network, is as follows: Obtain the continuous action instructions generated by the preset policy network based on the state tensor inference; square the continuous action instructions to obtain the action square value; obtain the preset control plane oscillation penalty factor; multiply the action square value by the control plane oscillation penalty factor to generate a stability penalty term; extract the historical penalty term after sign negation, the look-ahead penalty term after sign negation, and the stability penalty term after sign negation. The process involves: adding the inverted historical penalty term, the inverted forward penalty term, and the inverted stability penalty term to accumulate the historical penalty term and the forward penalty term, generating a multi-objective game constraint reward function; performing iterative constraint updates on the preset policy network based on the multi-objective game constraint reward function; extracting the continuous action instructions output by the preset policy network at the output layer; rounding the continuous action instructions down to obtain an integer conforming to the network protocol metric format; and outputting the integer conforming to the network protocol metric format as the routing increment.
[0013] Furthermore, the specific operation of the closed-loop execution module in adding the routing increment to the preset baseline cost to generate a cost value, and then overwriting the cost value to the routing library of the first device and triggering a forwarding table refresh is as follows: Constructing an asynchronous lock-free queue; obtaining the routing increment through the asynchronous lock-free queue; obtaining a preset network link baseline metric value and using the network link baseline metric value as the preset baseline cost; adding the routing increment to the preset baseline cost to obtain a dynamic link cost value and using the dynamic link cost value as the cost value; extracting the routing library of the first device; writing the cost value into the routing library to complete a seamless overwrite operation of the routing library, thereby overwriting the cost value to the routing library of the first device; synchronizing the underlying hardware forwarding information library of the first device based on the routing library after the seamless overwrite operation; and after synchronizing the underlying hardware forwarding information library of the first device, performing a refresh action on the forwarding table, using the refresh action on the forwarding table as the trigger for refreshing the forwarding table.
[0014] Furthermore, the closed-loop execution module monitors subsequent changes in the effective power and the queue depth. When the effective power stops decaying and the queue depth is emptied, a reward signal is generated. The specific operation of updating the preset strategy network using this reward signal is as follows: after triggering the forwarding table refresh, proceed to the next discrete time step; within the next discrete time step, continuously acquire the effective power and the queue depth, using this continuous acquisition process as monitoring subsequent changes in the effective power and the queue depth; determine whether the effective power remains stable or shows an increasing trend; when it is determined that the effective power remains stable or shows an increasing trend, confirm that the effective power has stopped decaying; determine the value of the queue depth. If the queue depth is zero, the queue depth is confirmed to be empty. When the effective power stops decaying and the queue depth is empty, external construction stress exists in the physical link, and a positive excitation signal characterizing the successful verification of the physical disturbance is generated. This positive excitation signal is used as the reward signal, completing the operation of generating a reward signal when the effective power stops decaying and the queue depth is empty. The policy gradient parameters within the preset policy network are extracted. The reward signal is transmitted to the preset policy network, and backpropagation is performed on the policy gradient parameters using the reward signal. The process of performing the backpropagation is used as updating the preset policy network using the reward signal.
[0015] The beneficial effects of this invention are as follows: This application extracts and removes purely physical perturbation features from the underlying hardware, stripping away environmental temperature interference, and fuses these features with the bit error rate, frame loss, and queue congestion status of higher-layer protocols across layers. This allows the network control center to detect the risk of attenuation approaching the communication limit during the hidden degradation stage before the optical cable is completely physically damaged by external construction machinery. Simultaneously, this solution introduces a noise floor difference truncation mechanism and a comprehensive penalty system including historical correction and forward-looking prevention into the underlying algorithm of artificial intelligence state fusion and game theory. This not only fundamentally prevents the fatal flaw of complex models collapsing due to computational overflow under extreme and devastating network outages, but also forces the deep reinforcement learning agent to output smooth and repulsive continuous routing intervention amounts, effectively avoiding the network-wide routing table oscillations and repeated traffic congestion problems easily caused by traditional passive switching mechanisms. Furthermore, this invention constructs a strict physical state and logical queue cross-loop verification mechanism after the underlying forwarding table is refreshed, removing false-positive environmental noise interference caused by the aging and attenuation of the optical transceiver components themselves. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort: Figure 1 A flowchart for system data perception and fusion simulation; Figure 2 This is a flowchart of the routing execution and closed-loop verification phases. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Example
[0018] like Figure 1 and Figure 2 As shown, this invention discloses a network communication construction cabling dynamic adjustment system based on reinforcement learning, comprising: a physical extraction module, a state fusion module, a backoff deduction module, and a closed-loop execution module, wherein; The physical extraction module, configured in the first device, is used to read the received power and temperature, calculate the temperature difference between the temperature and the preset reference temperature, subtract the product of the temperature difference and the preset coefficient from the received power to generate the effective power, extract the difference quotient and variance of the effective power and splice them to generate a perturbation vector. The state fusion module, configured in the second device, is used to acquire the perturbation vector, collect the bit error rate, the number of lost frames and the queue depth, calculate the difference between the effective power and the preset noise floor and replace it with a truncation constant in non-positive time, calculate the exponent term based on the replaced difference, multiply the exponent term by the difference quotient to generate the optical component, multiply the bit error rate by the preset weight to generate the electrical component, accumulate the optical component and the electrical component to generate the degradation exponent, and concatenate the perturbation vector, degradation exponent, bit error rate, number of lost frames and queue depth to generate the state tensor; The backoff inference module, configured in the second device, is used to input the state tensor into the preset policy network, generate historical penalty terms based on the number of lost frames, generate look-ahead penalty terms based on the difference between the degradation index and the preset threshold, accumulate the historical penalty terms and the look-ahead penalty terms, and output the route increment at the output layer of the preset policy network. The closed-loop execution module, configured on the first device, is used to add the route increment to the preset baseline cost to generate a cost value, write the cost value to the routing library of the first device and trigger the forwarding table refresh, monitor the subsequent changes in effective power and queue depth, generate a reward signal when the effective power stops decaying and the queue depth is emptied, and use the reward signal to update the preset policy network.
[0019] As can be seen from the above process, this application extracts and removes the purely physical perturbation features caused by environmental temperature interference directly from the underlying hardware, and performs cross-layer state fusion with the bit error rate, frame loss, and queue congestion of higher-layer protocols. This allows the network control center to detect the risk of attenuation approaching the communication limit in advance during the hidden degradation stage before the optical cable is completely physically damaged by external construction machinery. At the same time, this solution creatively introduces a noise floor difference truncation mechanism and a comprehensive penalty system including historical correction and forward-looking prevention into the underlying algorithm of artificial intelligence state fusion and game theory. This not only fundamentally prevents the fatal defect of complex models collapsing due to computational overflow under extreme and devastating network outage conditions, but also forces the deep reinforcement learning agent to output smooth and continuous routing intervention with repulsive properties, effectively avoiding the network-wide routing table oscillation and repeated traffic congestion problems that are easily caused by traditional passive switching mechanisms. Furthermore, this invention constructs a strict physical state and logical queue cross-loop verification mechanism after the underlying forwarding table is refreshed, removing the false positive environmental noise interference caused by the aging and attenuation of the optical transceiver components themselves.
[0020] The following describes in detail each module in the above process and the effects that can be produced, with reference to the embodiments.
[0021] First, the physical extraction module described above, namely "configured in the first device, used to read the received power and temperature, calculate the temperature difference between the temperature and the preset reference temperature, subtract the product of the temperature difference and the preset coefficient from the received power to generate the effective power, extract the difference quotient and variance of the effective power and splice them to generate a perturbation vector", will be described in detail with reference to the embodiment.
[0022] The first device is specifically the underlying communication node hardware in the network, such as a network switch, core router, or more specifically, its internal physical port interface board. The physical extraction module is configured within this first device. Its logic is to isolate physical environmental noise from the complex hardware operating environment and accurately capture instantaneous changes in mechanical stress. Specifically, in actual powered operation, the optical transceiver module's received power fluctuations are not only affected by the physical morphology of the optical fiber medium but also by thermal drift (i.e., slow temperature background noise) caused by natural temperature changes in the equipment room environment. Therefore, this module reads the received power and temperature in real time, and uses the product of the temperature difference and a preset coefficient (such as a thermally induced emission power drift parameter) to accurately subtract the power deviation caused by thermal fluctuations, thereby purifying the effective power that truly reflects the physical stress state of the optical fiber. Subsequently, by extracting the difference quotient (quantifying the rate of power transients) and variance (quantifying the severity of alternating jitter caused by mechanical pulling) of this effective power in the time dimension and performing matrix splicing, a perturbation vector is generated. This module not only accurately transforms invisible physical damage behaviors such as mechanical pulling and excessive bending at the construction site into machine-readable high-dimensional feature vectors, but also completely solves the technical pain points of traditional network management systems that rely solely on absolute optical power thresholds for simple alarms, failing to distinguish between slow fluctuations in ambient temperature and sudden external violent construction, resulting in frequent false alarms, difficulties in troubleshooting, and serious delays in routing adjustments.
[0023] As an feasible approach, the physical extraction module reads the received power and temperature, calculates the temperature difference between the received power and the preset reference temperature, and subtracts the product of the temperature difference and the preset coefficient from the received power to generate the effective power. Specifically, the physical extraction module uses a field-programmable gate array (FPGA) via an out-of-band bus to poll the digital diagnostic monitoring register of the optical transceiver module at a fixed discrete sampling period. It reads the total energy of photons actually received on the photodetector target surface as the received power and the real-time temperature of the optical transceiver module as the temperature. It obtains the thermoluminescence power drift coefficient of the optical module as the preset coefficient; calculates the temperature difference between the received power and the preset reference temperature; multiplies the temperature difference by the preset coefficient to obtain a product characterizing optical thermal drift; and subtracts the product from the received power to remove the background noise caused by changes in the room's ambient temperature, generating the denoised effective power.
[0024] Specifically, in practical communication network engineering, such as online cutovers in large data center server rooms or fiber optic cable installation in underground low-voltage networks, the pulling of physical cables by construction workers often occurs instantaneously. To capture this instantaneous mechanical stress and avoid the second-level latency and task scheduling jitter caused by the main control board's central processing unit polling via conventional network protocols, in this embodiment, the physical extraction module utilizes a field-programmable gate array (FPGA) via an out-of-band bus to poll the digital diagnostic monitoring register of the optical transceiver module at a fixed discrete sampling period. The value of this discrete sampling period is set between 1 millisecond and 10 milliseconds, determined by using direct sampling from the underlying hardware to achieve nanosecond to millisecond-level physical timing detection accuracy, completely bypassing software layer interference from the operating system.
[0025] During each precise polling, the field-programmable gate array (FPGA) reads the total photon energy actually received on the photodetector target surface as the received power. This received power directly characterizes, at the physical level, whether energy leakage violating the law of total internal reflection is occurring within the optical fiber due to micro-bending caused by physical forces. Based on the hardware physical sensitivity characteristics of the photodetector, its normal value range is typically between -30 dB / mW and 0 dB / mW. Simultaneously, the FPGA reads the real-time temperature feedback from the sensors inside the optical transceiver module as the temperature reading.
[0026] After acquiring the aforementioned basic hardware data, the system performs environmental background noise stripping. This is because in a real data center environment, the start / stop of precision air conditioning or changes in the power consumption of peripheral equipment cause natural fluctuations in ambient temperature. The core light-emitting / receiving devices inside the optical module, such as avalanche photodiodes, are extremely sensitive to temperature, and these thermal fluctuations inevitably cause a slow physical drift in received power. Without differentiation, the system could easily misinterpret temperature changes as physical drag. Therefore, the system pre-configures a standard data center constant temperature as a preset reference temperature in the storage medium. This parameter is typically set to 25 degrees Celsius according to industry standards. Furthermore, the system obtains the thermoluminescent power drift coefficient of the corresponding optical module (such as a single-mode or multi-mode optical module) as a preset coefficient based on the corresponding optical module's factory hardware manual. This preset coefficient is typically calibrated between 0.01 and 0.05 dBmW / degree Celsius to accurately quantify the absolute power physical attenuation caused by a unit temperature change.
[0027] In the specific stripping calculation logic, the system first calculates the temperature difference between the current temperature and the preset reference temperature to quantify the thermodynamic degree of the current hardware deviating from standard operating conditions. Then, the system multiplies the temperature difference by a preset coefficient to obtain a product characterizing optical thermal drift. This product is essentially the optical power deviation caused solely by thermal fluctuations in the data center environment. Finally, the system subtracts the product from the received power to strip away the background noise caused by changes in the data center ambient temperature, generating the denoised effective power.
[0028] As an feasible approach, the physical extraction module extracts the effective power difference quotient and variance and concatenates them to generate a perturbation vector. The specific operation is as follows: the physical extraction module obtains the effective power of the current time step and subtracts the effective power of the adjacent previous time step from the effective power of the current time step to obtain the power change difference; the power change difference is divided by the sampling period to obtain the effective power difference quotient; within the set sliding time window, the fluctuation variance of the effective power is calculated as the variance; the effective power difference quotient and variance are concatenated into a one-dimensional matrix to quantize the frequency of the transient alternating force generated by the construction machinery pulling into a perturbation vector.
[0029] After obtaining the effective power stripped of environmental temperature interference, the physical extraction module extracts the difference quotient and variance of the effective power and concatenates them to generate a perturbation vector. The core purpose of this operation is to accurately characterize the complex physical and mechanical actions at the construction site into structured data that can be analyzed by deep learning models. The specific explanation is as follows: In real-world scenarios involving the installation of conduits in low-voltage wells or the traction of optical cables, the physical damage to optical fibers caused by the violent pulling of cables by construction workers or the unstable operation of heavy traction machinery is not only reflected in the reduction of absolute luminous power, but more importantly, in the transient rate and intensity of power attenuation. Traditional network monitoring systems often only focus on whether the optical power drops below a certain fixed threshold; this static monitoring method completely fails to capture the stress changes during dynamic construction processes.
[0030] To compensate for this technical deficiency, in this embodiment, the physical extraction module first obtains the effective power of the current time step and subtracts the effective power of the adjacent previous time step from the effective power of the current time step, thereby obtaining the power change difference. The sign and magnitude of this power change difference directly reflect the deepening or release of the fiber bending degree within the current microsecond level. Subsequently, the system divides the power change difference by the sampling period fixed by the underlying hardware (such as the aforementioned setting of 1 millisecond to 10 milliseconds) to obtain the effective power difference quotient. This difference quotient physically represents the first derivative (i.e., rate of change) of optical signal attenuation, and its mechanism is to accurately quantify the transient acceleration of physical medium deterioration. For example, the difference quotient of optical attenuation caused by equipment aging is extremely small, while the difference quotient of micro-bending distortion caused by a construction worker's instantaneous force pulling will show a significant spike pulse, based on which the system can accurately identify sudden external force damage.
[0031] Meanwhile, considering that actual physical construction is far from uniform motion, cables inside conduits often experience friction and jamming, hand tremors from construction workers, and periodic vibrations from machinery tracks. To quantify this physical vibration, the system calculates the variance of effective power fluctuations within a set sliding time window. The sliding time window is typically set between 50 and 200 milliseconds, chosen because this time span aligns with the alternating frequency period of macroscopic muscle exertion in humans and the mechanical vibrations of common construction machinery. The calculated variance is greater than or equal to zero; a larger value indicates more intense alternating stress and greater physical instability of the optical fiber within that short window.
[0032] After extracting the difference quotient representing the rate of degradation and the variance representing the degree of force fluctuation, the physical extraction module concatenates the difference quotient and variance of the effective power into a one-dimensional matrix. Through this data processing method of dimensionality reduction and reconstruction, the system quantizes the frequency of the transient alternating force generated by the pulling of construction machinery into a perturbation vector.
[0033] The following describes in detail the state fusion module described above, which is "configured on the second device to acquire a perturbation vector, collect bit error rate, number of lost frames and queue depth, calculate the difference between effective power and preset noise floor and replace it with a truncation constant at non-positive times, calculate an exponential term based on the replaced difference, multiply the exponential term by the difference quotient to generate an optical component, multiply the bit error rate by a preset weight to generate an electrical component, accumulate the optical component and the electrical component to generate a degradation exponent, and concatenate the perturbation vector, degradation exponent, bit error rate, number of lost frames and queue depth to generate a state tensor;"
[0034] In a specific embodiment of the present invention, the second device serves as the centralized control and decision-making hub in the network architecture, such as an SDN controller, a core network management server, or an integrated network operation and maintenance cloud platform. The state fusion module is configured within this second device, acting as a core data hub connecting the underlying physical sensing with the upper-layer AI decision-making.
[0035] The reason this module performs the aforementioned cross-layer data collection and fusion operations is that, in real-world construction damage scenarios, network degradation caused by physical damage such as fiber optic micro-bending exhibits cross-layer propagation and concealment. Single physical layer optical attenuation alarms or upper-layer network packet loss statistics cannot accurately assess the true health of the link before a complete service interruption. Therefore, this module aggregates the underlying physical perturbation vectors with the upper-layer protocol stack's state data (bit error rate, frame loss, queue depth), and quantifies the degradation avalanche effect (i.e., degradation exponent) when the channel approaches its physical limits by calculating the difference between effective power and a preset noise floor and mapping it exponentially. Finally, this is unified and concatenated into a global state tensor for a Markov decision process. This fundamentally solves the serious lag problem of existing network routing passively triggering routing switching only after the physical port of the link is completely disconnected or a large number of service packets have been lost. By structuring and tensorizing multi-dimensional, cross-layer fragmented information, the system can proactively identify concealed degraded links before substantial network congestion or paralysis occurs, laying the data foundation for millisecond-level proactive traffic avoidance.
[0036] As an feasible approach, the state fusion module acquires the perturbation vector, collects the bit error rate, number of dropped frames, and queue depth, and calculates the difference between the effective power and the preset noise floor, replacing it with a truncation constant during non-positive time. The specific operations are as follows: The state fusion module acquires the perturbation vector; collects the bit error rate increment of the forward error correction protocol as the bit error rate, collects the actual number of dropped frames as the number of dropped frames, collects the queue depth in the outgoing direction as the queue depth, and extracts the sensitivity noise floor limit value of the avalanche photodiode completely submerged by thermal noise and shot noise as the preset noise floor; subtracts the preset noise floor from the effective power to obtain the difference, and determines whether the difference is less than or equal to zero. When the difference is less than or equal to zero, a pre-set small constant is acquired as the truncation constant, and the difference is replaced with the truncation constant.
[0037] Specifically, in the dynamic adjustment of network communication cabling, monitoring at a single physical layer is insufficient to reflect the true damage to upper-layer services. Physical strain will inevitably generate a cascading destructive effect from the bottom up along the communication protocol stack. Based on this, in this embodiment, the state fusion module first obtains the perturbation vector output by the physical extraction module as a reference input to characterize the underlying pure mechanical stress.
[0038] Subsequently, the system performs cross-layer data acquisition: First, at the data link layer, the pre-correction bit error rate increment of the forward error correction protocol is collected as the bit error rate. This selection is based on the fact that modern optical communication commonly uses forward error correction (FEC) technology. When the link experiences slight physical deformation, the post-correction bit error rate often appears to be zero (appearing normal), but the pre-correction bit error rate can very sensitively reflect the hidden trend of deterioration in the underlying physical medium, thus providing the system with early warning time. Second, at the media access control layer, the system collects the microsecond-level actual number of lost frames as the frame loss count. This value, greater than or equal to zero, directly reflects the number of damaged packets that cannot be successfully verified and recovered by the MAC layer due to physical layer signal distortion. Finally, at the network layer, the system collects the outbound queue depth as the queue depth. Its value directly reflects the degree of internal logical congestion caused by the degradation of the underlying link quality and packet backlog and retransmission.
[0039] After acquiring the aforementioned multi-dimensional state parameters across layers, the system needs to further quantify the critical margin of complete physical link failure. The system extracts the sensitivity noise floor limit of the avalanche photodiode (or PIN photodiode) inside the optical transceiver module when it is completely overwhelmed by thermal noise and shot noise, and uses this as the preset noise floor. The value of this parameter is entirely determined by the hardware physical characteristics of the specific optical transceiver module (for example, for a 10 Gigabit single-mode module, this limit is usually around -14.4 dBmW). This preset noise floor physically marks the singularity of complete channel collapse; that is, when the received optical power is lower than this value, the photonic signal will be completely swallowed by noise, and the communication physical medium will experience substantial breakage or extreme attenuation.
[0040] Next, the system subtracts the preset noise floor from the aforementioned denoised effective power to obtain the difference. This difference represents the remaining safe optical power margin of the current link. The system determines in real time whether the difference is less than or equal to zero. In extremely harsh construction scenarios (such as an excavator instantly severing an underground optical cable, or the optical fiber being completely broken due to traction during conduit installation), the effective power may instantly drop below the noise floor limit, causing the difference to be zero or negative.
[0041] When the system determines that the difference is less than or equal to zero, directly substituting this non-positive value into the subsequent nonlinear calculation model can easily trigger an overflow error caused by dividing by zero, leading to the entire control center system crashing or outputting an infinitely large invalid strategy. In order to mathematically and safely replicate this extreme physical condition, the system obtains a pre-set small constant (e.g., a data constant set to the level of 10 to the power of -6 or 10 to the power of -8) as a cutoff constant and replaces the non-positive difference with the cutoff constant.
[0042] As an feasible approach, the state fusion module calculates the exponent term based on the difference after replacement, multiplies the exponent term by the difference quotient to generate the optical component, and multiplies the bit error rate by a preset weight to generate the electrical component. The specific operations are as follows: The state fusion module obtains a preset safety margin, divides the safety margin by the difference after replacement to obtain the division result; obtains the base of the natural logarithm, uses the natural logarithm base as the base, and uses the division result as the exponent to obtain the exponent term; obtains the absolute value of the difference quotient, multiplies the exponent term by the absolute value of the difference quotient to generate the optical component; obtains a preset electrical layer adaptation factor as a preset weight; and multiplies the bit error rate by the preset weight to generate the electrical component.
[0043] In communication network engineering practice, the degradation process of optical signals in optical fiber media exhibits nonlinear characteristics. As the received optical power gradually decreases and approaches the sensitivity noise floor limit of the optical transceiver module, the decrease in signal-to-noise ratio leads to an exponential increase in the packet loss rate of the communication link. To enable the system to accurately fit this physical attenuation law at the algorithm level, in this embodiment, the state fusion module first obtains a preset safety margin. This safety margin sets a warning buffer between the optical power attenuation and the noise floor limit, and its value range is typically set to 2 to 5 dBmW depending on the engineering tolerance of the optical module.
[0044] Subsequently, the system divides the safety margin by the difference after substitution in the previous processing step to obtain the division result. Here, since the aforementioned steps have used a truncation constant to replace non-positive differences, even if the optical fiber breaks instantly due to violent tension (i.e., the difference approaches zero), the division result will converge within a preset maximum effective value range and will not trigger a system operation overflow error when dividing by zero.
[0045] Next, the system obtains the base of the natural logarithm, uses it as the base, and uses the result of the division as the power to perform an exponential operation to obtain the exponential term. This exponential operation constitutes a nonlinear mapping mechanism of the physical state: when the physical link is in the normal operating range (with a large difference), the result of the division calculation approaches zero, and the value of the exponential term is stable and approaches one; when external construction stress causes the optical power to approach the noise floor limit (with the difference approaching zero), the value of the exponential term increases exponentially, objectively representing the critical state where the physical link is about to disconnect.
[0046] After calculating the exponential term, the system obtains the absolute value of the difference quotient output by the physical extraction module. This absolute value of the difference quotient represents the transient rate of optical power drop per unit time, directly reflecting the severity of the applied external mechanical stress. The system multiplies the exponential term by the absolute value of the difference quotient to generate the optical component. The physical significance of this multiplication operation lies in constructing a dynamic degradation weight assessment model: when the link is already in a high-risk critical state (the exponential term value is extremely large), even if the transient optical decay rate is low at this time (the absolute value of the difference quotient is small), the system will output a significantly amplified optical component value; conversely, if the link is in a healthy state, the value of the optical component is mainly determined by the current optical decay rate.
[0047] After completing the above quantization, the system further incorporates the protocol state of the data link layer. The system obtains a pre-set electrical layer adaptation factor as a preset weight. The value of this preset weight is typically greater than zero and less than one, used to adjust the system's sensitivity to logical layer data verification errors under different network service scenarios. Finally, the system multiplies the pre-correction bit error rate increment (i.e., bit error rate) of the forward error correction protocol collected earlier by the preset weight to generate an electrical component characterizing the degree of logical impairment at the protocol layer.
[0048] As an feasible approach, the state fusion module accumulates optical and electrical components to generate a degradation index, and concatenates the perturbation vector, degradation index, bit error rate, number of lost frames, and queue depth to generate a state tensor. The specific operations are as follows: The state fusion module obtains a pre-set sliding time window. Within the sliding time window, the optical and electrical components are added, and the sum is discretely integrated and accumulated to generate a discrete cross-layer degradation correlation index, which is then used as the degradation index. The perturbation vector, degradation index, bit error rate, number of lost frames, and queue depth are extracted, and these are normalized and scaled. While preserving causal characteristics, the normalized and scaled data are merged into a global state data set for a Markov decision process, and this global state data set is then converted into a state tensor.
[0049] Specifically, in actual construction interference of communication networks, a single, extremely brief transient optical attenuation or sporadic bit errors may be caused by occasional internal clock jitter in the system. If this frequently triggers routing switches, it will cause severe oscillations in the network control plane. In order to accurately filter high-frequency transient noise and objectively assess the cumulative destructive force of persistent physical damage, in this embodiment, the state fusion module first obtains a pre-set sliding time window. The length of this sliding time window is usually set between 100 milliseconds and 500 milliseconds, and its value is based on the fact that this time span can cover the complete round-trip cycle of data packet transmission and effectively smooth out occasional jitter from external mechanical stress.
[0050] Within the sliding time window, the system adds the optical component (representing the degree to which the physical layer approaches its limit) and the electrical component (representing the degree of damage to the link layer's logic verification) obtained from the previous processing. Then, the system performs discrete integration on the sum. The physical meaning of this discrete integration is to calculate the area under the channel degradation curve within the sliding time window, thereby converting the transient attenuation rate into the total destructive energy caused by the accumulation of physical stress over a period of time. The system then generates a discrete cross-layer degradation correlation index, which is used as a degradation index to comprehensively characterize the current link health trend.
[0051] After completing the in-depth extraction of a single indicator, the system needs to construct a global observation field for the subsequent reinforcement learning agent. The system sequentially extracts the perturbation vector representing the underlying pure mechanical stress, the degradation index representing the cross-layer comprehensive degradation trend, the bit error rate representing the data link layer verification status, the number of lost frames representing the media access control layer message loss status, and the queue depth representing the network layer congestion level.
[0052] Because the five types of parameters mentioned above differ significantly in their physical dimensions and numerical distributions (for example, optical power-derived parameters are floating-point decimals, while the number of dropped frames and queue depth are large integers), directly inputting them into a neural network would lead to an imbalance in the model's gradient updates. Therefore, the system performs normalization and scaling on the extracted perturbation vector, degradation exponent, bit error rate, number of dropped frames, and queue depth. Specifically, the system uses preset baseline maximum and minimum values to linearly or nonlinearly map the data with various dimensions to a dimensionless standard range of zero to one or negative one to positive one.
[0053] During the above splicing and normalization process, the system strictly adheres to the principle of preserving causal characteristics during the reorganization. Specifically, the causal characteristic preservation mechanism involves the system sequentially arranging the data after normalization in a one-dimensional space according to the fault propagation order from the bottom up of the network protocol stack (from the physical layer, data link layer to the network layer). Finally, the system merges the permuted and combined data into a global state data set for the Markov decision process, and then converts this global state data set into a state tensor.
[0054] The following describes in detail the backoff deduction module described above, which is "configured on the second device, used to input the state tensor into the preset policy network, generate a historical penalty term based on the number of lost frames, generate a look-ahead penalty term based on the difference between the degradation index and the preset threshold, accumulate the historical penalty term and the look-ahead penalty term, and output the routing increment at the output layer of the preset policy network".
[0055] In a specific embodiment of this invention, the backoff simulation module is configured in the second device, serving as the intelligent decision-making brain of the entire system. It is responsible for converting the multi-dimensional link states perceived by the front end into specific routing control actions. The reason this module introduces a preset policy network and constructs a penalty mechanism that includes both historical and forward-looking aspects is that traditional networks, when faced with construction damage, often only passively trigger routing switching after the link is completely interrupted (i.e., the physical layer port is completely disconnected). This easily leads to a large number of irreversible packet losses for services, and may even cause routing oscillations across the entire network. Therefore, this module combines the number of lost frames reflecting the current substantial damage to services with a degradation index reflecting the risk of a complete link failure in the future. Through joint game theory simulation using reinforcement learning, it aims to output a dynamic routing intervention quantity that can be executed by the underlying protocol. This translates the complex underlying physical degradation alarm into a routing cost increment that traditional network routing protocols can directly identify and execute. This fundamentally solves the problem of sluggish response in existing technologies that do not switch routes until the network is completely cut off. It enables the system to guide data packets to bypass the construction-damaged area in advance and without interruption by smoothly increasing the routing cost of dangerous links during the hidden deterioration window before the excavator completely cuts the optical cable.
[0056] As an feasible approach, the backoff simulation module inputs the state tensor into a preset policy network, generates historical penalty terms based on the number of lost frames, and generates forward penalty terms based on the difference between the degradation index and a preset threshold. The specific operations are as follows: input the state tensor into the preset policy network; obtain a pre-set service level agreement (SLA) default penalty factor; multiply the number of lost frames by the SLA default penalty factor to obtain a default penalty value; use the default penalty value as a historical penalty term; subtract the preset threshold from the degradation index to obtain a calculated difference; use the calculated difference as the difference value; determine if the difference is greater than zero; if the difference is less than or equal to zero, forcibly replace the difference with zero; if the difference is greater than zero, retain the difference to obtain a non-negative overflow difference; obtain a pre-set potential physical degradation risk penalty factor; multiply the non-negative overflow difference by the potential physical degradation risk penalty factor to obtain a degradation prevention value; use the degradation prevention value as a forward penalty term.
[0057] Specifically, in the dynamic adjustment of network communication construction cabling, in order to transform the underlying physical degradation and upper-layer protocol state into dynamic routing intervention instructions, the backoff inference module in this embodiment adopts a deep reinforcement learning architecture. The preset policy network is the actor network branch of a conventional deep deterministic policy gradient network in this field. It mainly consists of an input layer corresponding to the input feature dimension, a fully connected hidden layer responsible for nonlinear feature extraction, and a continuous action output layer, serving as the algorithm carrier for processing multi-dimensional cross-layer data.
[0058] Based on the aforementioned network platform, the backoff simulation module first inputs a state tensor containing both physical and logical perception information into a preset policy network, using this as the global environmental benchmark for the network model's decision-making simulation. Regarding the constraint mechanism for the simulation, the system abandons traditional static rules and instead constructs a dynamic penalty mechanism with dual perspectives.
[0059] For any substantial damage already occurring at the business logic layer, the system retrieves a pre-defined Service Level Agreement (SLA) breach penalty factor. This SLA breach penalty factor is a dimensionless weighted parameter, its value determined by quantifying the breach cost of actual lost packets to core business operations. The system multiplies the number of lost frames detected at the lower layer by the SLA breach penalty factor to obtain a breach penalty value; subsequently, the system uses this breach penalty value as a historical penalty item. The engineering significance of this historical penalty item lies in: providing post-event feedback and correction for current communication congestion, and forcing the pre-defined network strategy to prioritize intervention on degraded links that are experiencing large-scale packet loss.
[0060] Meanwhile, to enable the system to proactively avoid physical damage, a second layer of penalty is constructed based on the degradation trend of the physical layer. The system subtracts a preset threshold from the degradation index generated by the previous processing to obtain a calculated difference, which is then used as the differential value. The preset threshold represents the highest safety threshold that the system can tolerate for physical degradation of the optical medium. Subsequently, the system executes nonlinear physical perturbation shielding logic: it checks whether the differential value is greater than zero. When the differential value is less than or equal to zero, it indicates that the optical degradation of the current link has not yet exceeded the safety threshold (e.g., only normal temperature thermal drift in the equipment room). In this case, the system forcibly replaces the differential value with zero to shield normal environmental noise and prevent unwarranted oscillations in the control plane. When the differential value is greater than zero, it indicates that external construction tension has caused severe micro-bending of the optical fiber and crossed the safety threshold. In this case, the system retains the differential value and obtains a non-negative overflow differential value.
[0061] After determining the degree of degradation exceeding the red line, the system obtains a pre-set potential physical degradation risk penalty factor. To emphasize the importance of proactive prevention, the potential physical degradation risk penalty factor is typically set to be significantly higher than the service level agreement (SLA) default penalty factor. The system multiplies the non-negative overflow difference with the potential physical degradation risk penalty factor to obtain a degradation prevention value, which is then used as a forward penalty item.
[0062] As an feasible approach, the backoff inference module accumulates historical and prospective penalty terms, and the specific operation of rounding and outputting the route increment at the output layer of the preset policy network is as follows: Obtain the continuous action instructions generated by the preset policy network based on the state tensor inference; square the continuous action instructions to obtain the squared action value; obtain the pre-set control plane oscillation penalty factor; multiply the squared action value by the control plane oscillation penalty factor to generate a stability penalty term; extract the historical penalty term, the prospective penalty term, and the stability penalty term after sign inversion; add the historical penalty term, the prospective penalty term, and the stability penalty term to complete the accumulation of the historical and prospective penalty terms, generating a multi-objective game constraint reward function; perform iterative constraint updates on the preset policy network based on the multi-objective game constraint reward function; extract the continuous action instructions output by the preset policy network at the output layer; round the continuous action instructions down to obtain an integer conforming to the network protocol metric format; and output the integer conforming to the network protocol metric format as the route increment.
[0063] Specifically, in practical engineering applications where artificial intelligence takes over the dynamic adjustment of network routing, control plane oscillations caused by algorithmic probing must be absolutely avoided. Therefore, in this embodiment, after establishing dual penalties for service disruption and physical risks, the system introduces a third constraint on the agent's own adjustment actions.
[0064] Specifically, the system first obtains continuous action commands generated by the preset strategy network based on the state tensor. These continuous action commands essentially represent the magnitude of the repulsive force the system attempts to exert on the damaged physical medium. To severely punish drastic, abrupt route adjustment attempts by the algorithm, the system squares the continuous action commands to obtain the squared action value. The physical meaning of this squared operation is similar to the drag model in fluid mechanics: the larger the amplitude of the action, the more nonlinearly exponentially the destructive force of the resulting oscillations increases, thus requiring more stringent suppression.
[0065] Subsequently, the system obtains a pre-set control plane oscillation penalty factor. The control plane oscillation penalty factor is a dimensionless weighting parameter, and its value range can be manually calibrated based on the network topology size and the computing power limit of the core router. The system multiplies the squared action value by the control plane oscillation penalty factor to generate a stability penalty term used to penalize ineffective or violent attempts.
[0066] After collecting the three penalty dimensions, the system must integrate them into a unified evaluation standard. Since the underlying optimization logic of reinforcement learning aims to maximize the reward function, the system extracts the historical penalty term (after negation), the prospective penalty term (after negation), and the stability penalty term (after negation). Then, the system adds these three terms together. This addition operation, which transforms each penalty into a negative reward, completes the accumulation of the historical and prospective penalty terms and generates a multi-objective game-constrained reward function.
[0067] Based on the multi-objective game constraint reward function, the system performs iterative constraint updates on the preset strategy network. In this iterative optimization process, the system forces the preset strategy network to seek the optimal balance (i.e., Pareto optimality) among the three objectives of "solving current service packet loss", "preventing complete disconnection in the next second", and "avoiding excessive route modification that leads to control plane collapse".
[0068] After thorough iterative simulation, the system extracts the continuous action instructions ultimately output by the preset policy network at the output layer. Since traditional low-level network routing protocols (such as Open Shortest Path First or Intermediate System to Intermediate System protocols) can only recognize integer link metrics (e.g., integers from zero to 65,535), they cannot directly execute floating-point instructions output by artificial intelligence. Therefore, the system rounds down the continuous action instructions to obtain integers conforming to the network protocol metric format. Finally, the system outputs these integers as routing increments, serving as a dynamic penalty added to the original damaged link's overhead.
[0069] The following describes in detail the closed-loop execution module described above, which is "configured on the first device, used to add the route increment to the preset baseline cost to generate a cost value, write the cost value to the routing library of the first device and trigger the forwarding table refresh, monitor the subsequent changes in effective power and queue depth, generate a reward signal when the effective power stops decaying and the queue depth is emptied, and use the reward signal to update the preset policy network".
[0070] In a specific embodiment of the present invention, a closed-loop execution module is configured in the first device. The reason why this module performs closed-loop operations of cost overwriting and continuous monitoring is that most existing intelligent routing control systems are open-loop architectures. That is, after the control plane issues a routing adjustment command, there is a lack of secondary verification of the subsequent physical state of the damaged link. If the previous link degradation is merely an occasional component aging and attenuation or transient thermal noise (i.e., false positive alarm) of the optical transceiver module itself, rather than actual external construction damage, the system will blindly adjust the routing cost, leading to a serious waste of network resources and ineffective oscillation of the control plane. Therefore, this module, at the physical execution level, not only completes seamless flexible diversion of data traffic through incremental overlay and forwarding table refresh, but also constructs a reverse verification mechanism of "action execution - effect verification - model update". By verifying whether the optical power of the link stops attenuating and whether the congestion queue is emptied after the traffic is diverted, the system can be certain that the previous network warning was indeed caused by real external construction physical stress. This fundamentally eliminates the logical loophole of misjudging the underlying noise of fake hardware as the risk of link breakage, and solves the technical pain point in existing intelligent network technologies where AI algorithms are easily deceived by occasional noise and thus produce incorrect learning.
[0071] As an implementable approach, the closed-loop execution module adds the route increment to the preset baseline cost to generate a cost value, and then overwrites the cost value to the routing database of the first device and triggers a forwarding table refresh. The specific operations are as follows: Construct an asynchronous lock-free queue; obtain the route increment through the asynchronous lock-free queue; obtain a preset network link baseline metric value and use it as the preset baseline cost; add the route increment to the preset baseline cost to obtain a dynamic link cost value and use it as the cost value; extract the routing database of the first device; write the cost value into the routing database, completing a seamless overwrite operation of the routing database, thus overwriting the cost value to the routing database of the first device; synchronize the underlying hardware forwarding information database of the first device based on the routing database after the seamless overwrite operation; after completing the synchronization of the underlying hardware forwarding information database of the first device, perform a refresh action on the forwarding table, using this refresh action as the trigger for the forwarding table refresh.
[0072] Specifically, in real communication network equipment (such as core routers or optical transmission nodes), the line-rate forwarding of lower-layer data packets and the calculation of upper-layer routing protocols are handled independently by the data plane and the control plane, respectively. To ensure that the dynamic instructions frequently issued by the artificial intelligence algorithm do not block the normal scheduling of the router's central processing unit, in this embodiment, the system first constructs an asynchronous lock-free queue in the operating system's kernel mode, and then obtains the routing increment calculated and output by the intelligent control center (i.e., the backoff deduction module) through this asynchronous lock-free queue. The mechanism of using an asynchronous lock-free queue is to achieve high-concurrency throughput and non-blocking reading of instructions, eliminating the risk of system deadlock caused by high-frequency routing intervention.
[0073] Upon receiving an intervention command, the system acquires a pre-set network link baseline metric. This baseline metric is typically an inherent cost statically determined by the network device's physical bandwidth (e.g., the default Open Shortest Path First (OSB) cost for a 10 Gigabit Ethernet link). The system uses this baseline metric as the preset baseline cost. Subsequently, the system adds the routing increment to the preset baseline cost to obtain a dynamic link cost value, which is then used as the cost. The physical meaning of this addition operation is that, without changing the network's inherent physical topology, artificially increasing the logical cost of a specific damaged link is equivalent to gradually closing the valve in a fluid dynamics model, thereby generating a flexible routing repulsion force.
[0074] After completing the above logical calculations, the system enters the configuration distribution phase of the underlying hardware. The system extracts the routing library of the first device (i.e., the routing information library at the software level) and writes the cost value into the routing library, completing the overwrite operation of the routing library, thereby overwriting the cost value into the routing library of the first device. To ensure that the original normal communication data flow is not interrupted during the overwrite process, the system synchronizes the underlying hardware forwarding information library of the first device based on the routing library after the seamless overwrite operation, using smooth synchronization mechanisms such as uninterrupted routing. This underlying hardware forwarding information library usually resides in the cache of an application-specific integrated circuit or network processor, directly guiding the physical forwarding of photoelectric signals.
[0075] Finally, after synchronizing the underlying hardware forwarding information database of the first device, the system performs a refresh operation on the forwarding table in the data plane, and uses this refresh operation as the trigger for further refresh of the forwarding table. At this point, the routing cost of the damaged link is broadcast across the entire network. Upstream devices, when looking up and forwarding data, will automatically detect the increased resistance of the link and, according to the new optimal path, smoothly and proportionally redirect the IP packets flowing towards the construction-damaged area to a safe bypass network.
[0076] As an feasible approach, the closed-loop execution module monitors subsequent changes in effective power and queue depth. A reward signal is generated when effective power stops decaying and the queue depth is empty. The specific operation of updating the preset strategy network using this reward signal is as follows: After triggering a forwarding table refresh, proceed to the next discrete time step; within the next discrete time step, continuously acquire effective power and queue depth, using this continuous acquisition process as a means to monitor subsequent changes in effective power and queue depth; determine whether the effective power remains stable or shows an increasing trend; when it is determined that the effective power remains stable or shows an increasing trend, confirm that the effective power has stopped decaying; determine the queue depth... The system checks whether the value is equal to zero. When the value of the queue depth is equal to zero, it confirms that the queue depth is empty. When it is confirmed that the effective power has stopped decaying and the queue depth is empty, it confirms that there is external construction stress on the physical link and generates a positive excitation signal to indicate that the physical disturbance has been successfully verified. The positive excitation signal is used as a reward signal to complete the operation of generating a reward signal when the effective power has stopped decaying and the queue depth is empty. The policy gradient parameters inside the preset policy network are extracted. The reward signal is transmitted to the preset policy network, and the policy gradient parameters are backpropagated using the reward signal. The process of performing backpropagation is used as updating the preset policy network using the reward signal.
[0077] Specifically, closed-loop feedback is the core mechanism to ensure that the control algorithm converges to the true physical laws. To verify whether the routing intervention commands output in the preceding steps truly resolved the underlying physical congestion crisis, in this embodiment, after triggering a forwarding table refresh, the system enters the next discrete-time step. The next discrete-time step represents a new observation period after the underlying network hardware has fully executed the traffic diversion commands.
[0078] In the next discrete time step, the system continuously acquires effective power and queue depth, using this acquisition process to monitor changes in subsequent effective power and queue depth. Subsequently, the system performs a logical judgment on the underlying physical state: determining whether the effective power remains stable or shows an increasing trend. When the effective power is determined to remain stable or show an increasing trend, it confirms that the effective power has stopped decaying. The physical basis for this judgment is that if the decrease in optical power in the previous time step was caused by irreversible aging or continuous thermal decay of the internal components of the optical transceiver module, then regardless of how the upper-layer network data traffic is scheduled, the physical received power on the target surface of the optical transceiver module will inevitably continue to fall irreversibly. Conversely, if the effective power stops deteriorating, it indirectly confirms that the previous degradation was most likely caused by mechanical micro-bending due to transient pulling of the optical cable by external construction personnel, and that the stress state has stabilized.
[0079] Simultaneously, the system performs secondary cross-validation based on the network logic layer status. The system checks if the queue depth is equal to zero; if it is, the queue depth is confirmed to be empty. Empty queue depth directly proves that the previously issued dynamic routing costs have successfully taken effect, and the data packets that were originally congested on the damaged physical link port have been completely diverted to a safe bypass topology.
[0080] Combining the above two layers of logical judgment, when it is confirmed that the effective power has stopped decaying and the queue depth is empty, the system is certain that there is external construction stress on the physical link (completely eliminating the possibility of false alarms from internal noise), and confirms that the preceding actions were executed successfully. Based on the above confirmation results, the system generates a positive excitation signal characterizing the successful confirmation of the physical disturbance; the system uses the positive excitation signal as a reward signal, thus strictly completing the operation of generating a reward signal when the effective power has stopped decaying and the queue depth is empty.
[0081] After receiving definite positive physical feedback, the system enters the iterative evolution phase of the agent model. The system extracts the policy gradient parameters from the preset policy network; the system transmits the reward signal to the preset policy network, uses the reward signal to perform backpropagation on the policy gradient parameters, and uses the backpropagation process as an update to the preset policy network using the reward signal.
[0082] To further illustrate the technical effects achievable by the proposed solution, a specific implementation method is given below, along with simulation verification comparison results between this method and existing technologies.
[0083] In this embodiment, the control system of the present invention is applied to a 100G backbone network scenario for interconnecting city-level data centers. The network topology includes five core routing nodes and multiple directly connected single-mode fiber optic links, with an average link length of approximately twenty kilometers. The network control plane is implemented based on a software-defined network backbone controller, and the underlying devices employ high-speed optical transceiver modules with digital diagnostic and monitoring functions. The discrete sampling period for system data flow and strategy deduction is set to five milliseconds.
[0084] The implementation steps are as follows: First, the physical extraction module polls the underlying hardware at high frequency via an out-of-band bus. After removing the product of the 25°C data center reference temperature and the thermoluminescence power drift coefficient, it obtains the pure effective power and extracts the difference quotient and fluctuation variance characterizing the physical attenuation rate. Then, the state fusion module collects data from the data link layer's forward error correction bit error rate increment, the media access control layer's actual frame loss count, and the network layer's queue depth across layers. After truncating the non-positive optical power differences at the underlying layer using a small constant, it constructs an overflow-prevention degradation index based on the natural logarithm base and concatenates them to generate a multi-dimensional state tensor. Finally, the backoff inference module inputs the state tensor into the preset policy network, generates historical penalty terms based on the service frame loss penalty factor, and generates forward penalty terms based on the overflow amount of the degradation index exceeding the safety red line. After multi-objective game constraints and rounding, it outputs the routing increment. The closed-loop execution module asynchronously overwrites this routing increment to the underlying hardware's forwarding information database, implementing flexible traffic diversion.
[0085] To verify the effectiveness of this method, an equivalent digital twin model was constructed in a network discrete event simulation platform, and comparative simulation tests were conducted at the time domain and the service layer. The baseline scenario was set as follows: at the second second, the primary fiber optic link was subjected to continuous pulling by external excavating machinery (simulating the physical noise floor limit of the optical power dropping from -5 dBmW at a nonlinear rate to -15 dBmW).
[0086] Traditional comparison scenario: Under traditional fixed link overhead and a passive switching mechanism based on Bidirectional Forwarding and Detection (BFD), the system is completely unaware of the physical degradation from the second to the fourth second. At the fourth second, the optical fiber completely breaks through the total reflection limit, resulting in a physical link break. At this point, the traditional routing protocol triggers passive reconvergence, leading to a routing black hole lasting up to fifty milliseconds, instantly dropping over 300,000 service packets. A large number of Transmission Control Protocol (TCP) connections trigger timeout retransmissions, causing a severe network congestion storm, with the overall network throughput plummeting by 40%, and the routing oscillation lasting for more than three seconds.
[0087] Application scenario of this application: After adopting the method of this invention, the system can keenly perceive the deterioration of the physical medium's concealment at the second 1 second through the exponential leap of the underlying perturbation vector and the cross-layer degradation index. Based on the rapidly increasing look-ahead penalty term, the preset strategy network smoothly increases the routing cost of the primary link at a rate of fifty metric units every ten milliseconds without causing control plane oscillations. By the third 1.5 seconds, 95% of the core service traffic has been flexibly diverted to the backup link without any packet loss. When the optical fiber experiences a substantial break at the fourth 1 second, the system only generates less than fifty unavoidable residual packet losses, and the service transitions smoothly with zero awareness. The overall network throughput remains above 99%, and the routing convergence and stabilization time is shortened to 0.2 seconds (a reduction of approximately 93%).
[0088] Furthermore, in extreme operating conditions (simulating the fiber optic cable being instantly cut at the fifth second, causing the difference to become instantaneously non-positive), this method, relying on the underlying truncation constant mechanism, successfully prevented the computational model from crashing due to division by zero overflow. The system instantly outputs the maximum routing increment and completes causal closed-loop verification by relying on the queue emptying characteristics of subsequent time steps. No control plane downtime or deadlock occurred during the entire process. Statistical analysis of multiple Monte Carlo random simulations (including one hundred random construction disturbances with different stress frequencies and attenuation rates) shows that after adopting this scheme, the average lead time for concealment degradation warning across the entire network reaches 1.5 to 2.2 seconds, and the service zero-packet-loss escape success rate under extreme operating conditions remains stable at over 98.5%. The above objective data fully verify the extremely high robustness and industrial superiority of this invention in the field of cross-layer physical paralysis prevention.
[0089] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A network communication construction cabling dynamic adjustment system based on reinforcement learning, characterized in that, include: The module comprises a physical extraction module, a state fusion module, a backoff simulation module, and a closed-loop execution module, among which; The physical extraction module, configured in the first device, is used to read the received power and temperature, calculate the temperature difference between the temperature and the preset reference temperature, subtract the product of the temperature difference and the preset coefficient from the received power to generate the effective power, and extract the difference quotient and variance of the effective power and concatenate them to generate a perturbation vector. A state fusion module, configured in the second device, is used to acquire the perturbation vector, collect the bit error rate, the number of lost frames, and the queue depth, calculate the difference between the effective power and the preset noise floor and replace it with a truncation constant during non-positive time, calculate an exponential term based on the replaced difference, multiply the exponential term by the difference quotient to generate an optical component, multiply the bit error rate by a preset weight to generate an electrical component, accumulate the optical component and the electrical component to generate a degradation exponent, and concatenate the perturbation vector, the degradation exponent, the bit error rate, the number of lost frames, and the queue depth to generate a state tensor. The backoff inference module, configured in the second device, is used to input the state tensor into the preset policy network, generate a historical penalty item based on the number of lost frames, generate a look-ahead penalty item based on the difference between the degradation index and the preset threshold, accumulate the historical penalty item and the look-ahead penalty item, and output the routing increment by rounding at the output layer of the preset policy network. A closed-loop execution module, configured in the first device, is used to add the routing increment to the preset baseline cost to generate a cost value, overwrite the cost value to the routing library of the first device and trigger a forwarding table refresh, monitor the subsequent changes in the effective power and the queue depth, generate a reward signal when the effective power stops decaying and the queue depth is emptied, and use the reward signal to update the preset policy network.
2. The network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 1, characterized in that, The physical extraction module reads the received power and temperature, calculates the temperature difference between the temperature and the preset reference temperature, and subtracts the product of the temperature difference and a preset coefficient from the received power to generate the effective power. The specific operation is as follows: The physical extraction module uses a field-programmable gate array (FPGA) via an out-of-band bus to poll the digital diagnostic monitoring register of the optical transceiver module at a fixed discrete sampling period, reads the total energy of photons actually received on the target surface of the photodetector as the received power, and reads the real-time temperature of the optical transceiver module as the temperature. The thermoluminescent power drift coefficient of the optical module is obtained as the preset coefficient; Calculate the temperature difference between the stated temperature and the preset reference temperature; Multiplying the temperature difference by the preset coefficient yields the product characterizing the optical thermal drift; The received power is subtracted from the product to remove the background noise caused by changes in the ambient temperature of the computer room, thereby generating the noise-reduced effective power.
3. The network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 2, characterized in that, The specific operation of the physical extraction module in extracting the difference quotient and variance of the effective power and concatenating them to generate a perturbation vector is as follows: The physical extraction module obtains the effective power at the current time step and subtracts the effective power of the adjacent previous time step from the effective power at the current time step to obtain the power change difference. Divide the power change difference by the sampling period to obtain the difference quotient of the effective power; Within a set sliding time window, the fluctuation variance of the effective power is calculated as the variance; The difference quotient of the effective power and the variance are concatenated into a one-dimensional matrix to quantize the frequency of the transient alternating force generated by the pulling of construction machinery into the perturbation vector.
4. The network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 3, characterized in that, The specific operations of the state fusion module to obtain the perturbation vector, collect the bit error rate, number of dropped frames and queue depth, calculate the difference between the effective power and the preset noise floor, and replace it with a truncation constant in non-positive time are as follows: The state fusion module obtains the perturbation vector; The bit error rate increment of the forward error correction protocol is collected as the bit error rate, the actual number of lost frames is collected as the number of lost frames, the queue depth in the outgoing direction is collected as the queue depth, and the sensitivity noise floor limit value of the avalanche photodiode completely submerged by thermal noise and shot noise is extracted as the preset noise floor. The effective power is subtracted from the preset noise floor to obtain the difference. It is then determined whether the difference is less than or equal to zero. When the difference is less than or equal to zero, a preset small constant is obtained as the cutoff constant, and the difference is replaced by the cutoff constant.
5. A network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 4, characterized in that, The specific operation of the state fusion module to calculate the exponent term based on the replaced difference, multiply the exponent term by the difference quotient to generate the optical component, and multiply the bit error rate by a preset weight to generate the electrical component is as follows: The state fusion module obtains a preset safety margin, divides the safety margin by the replaced difference, and obtains the division calculation result. Obtain the base of the natural logarithm, use the base of the natural logarithm as the base, and use the result of the division as the power to obtain the exponent term; Obtain the absolute value of the difference quotient, multiply the exponent term by the absolute value of the difference quotient, and generate the light component; Obtain a pre-set electric layer adaptation factor as the preset weight; The electrical component is generated by multiplying the bit error rate by the preset weight.
6. The network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 5, characterized in that, The specific operation of the state fusion module in accumulating the optical component and the electrical component to generate a degradation index, and concatenating the perturbation vector, the degradation index, the bit error rate, the number of dropped frames, and the queue depth to generate a state tensor is as follows: The state fusion module obtains a pre-set sliding time window, and within the sliding time window, adds the optical component and the electrical component, and performs discrete integration on the sum to generate a discrete cross-layer degradation correlation index, and uses the discrete cross-layer degradation correlation index as the degradation index. Extract the perturbation vector, the degradation index, the bit error rate, the number of lost frames, and the queue depth, and then normalize the perturbation vector, the degradation index, the bit error rate, the number of lost frames, and the queue depth to achieve a unified scale. While preserving causal properties, the data after normalization and scaling are merged into a global state data set of the Markov decision process, and the global state data set of the Markov decision process is merged into the state tensor.
7. A network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 6, characterized in that, The backoff inference module inputs the state tensor into a preset policy network, generates a historical penalty term based on the number of lost frames, and generates a look-ahead penalty term based on the difference between the degradation index and a preset threshold. The specific operation is as follows: Input the state tensor into the preset policy network; Obtain a pre-set service level agreement (SLA) breach penalty factor, and multiply the number of lost frames by the SLA breach penalty factor to obtain the breach penalty value; The default penalty value is used as the historical penalty item. The degradation index is subtracted from the preset threshold to obtain the calculated difference. The calculated difference is used as the difference value. It is determined whether the difference value is greater than zero. When the difference value is less than or equal to zero, the difference value is forcibly replaced with zero. When the difference value is greater than zero, the difference value is retained to obtain the non-negative overflow difference value. Obtain a pre-set potential physical degradation risk penalty factor, multiply the non-negative overflow difference by the potential physical degradation risk penalty factor to obtain a degradation prevention value, and use the degradation prevention value as the forward penalty term.
8. A network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 7, characterized in that, The specific operation of the backoff deduction module, which accumulates the historical penalty term and the forward penalty term, and then rounds down the output route increment at the output layer of the preset policy network, is as follows: Obtain the continuous action instructions generated by the preset strategy network based on the state tensor, and perform a square operation on the continuous action instructions to obtain the action square value; Obtain a pre-set control plane oscillation penalty factor, and multiply the squared value of the action by the control plane oscillation penalty factor to generate a stability penalty term; Extract the history penalty term after sign inversion, the look-ahead penalty term after sign inversion, and the stability penalty term after sign inversion; The historical penalty term, the prospective penalty term, and the stability penalty term after sign inversion are added together to complete the accumulation of the historical penalty term and the prospective penalty term, thereby generating a multi-objective game-constrained reward function. Iterative constraint updates are performed on the preset policy network based on the multi-objective game constraint reward function; Extract the continuous action instructions output by the preset strategy network at the output layer; The continuous action command is rounded down to obtain an integer that conforms to the network protocol metric value format; The integer conforming to the network protocol metric format is used as the routing increment and output.
9. A network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 8, characterized in that, The closed-loop execution module adds the route increment to the preset baseline cost to generate a cost value, and then writes the cost value to the routing database of the first device and triggers a forwarding table refresh. The specific operation is as follows: Construct an asynchronous lock-free queue, and obtain the route increment through the asynchronous lock-free queue; Obtain a pre-set network link baseline metric value and use the network link baseline metric value as the preset baseline cost; The routing increment is added to the preset baseline cost to obtain the dynamic link cost value, and the dynamic link cost value is used as the cost value. Extract the routing library from the first device; The value is written into the routing library to complete the seamless overwrite operation of the routing library, thereby overwriting the value into the routing library of the first device. Based on the routing library after the seamless overwrite operation is completed, synchronize the underlying hardware forwarding information library of the first device; After synchronizing the underlying hardware forwarding information database of the first device, a refresh action is performed on the forwarding table, and the refresh action on the forwarding table is used as the trigger for refreshing the forwarding table.
10. A network communication construction cabling dynamic adjustment system based on reinforcement learning according to claim 9, characterized in that, The closed-loop execution module monitors the subsequent changes in the effective power and the queue depth, and generates a reward signal when the effective power stops decaying and the queue depth is emptied. The specific operation of updating the preset strategy network using the reward signal is as follows: After triggering the forwarding table refresh, proceed to the next discrete time step; Within the next discrete time step, the effective power and the queue depth are continuously acquired, and the continuous acquisition process is used to monitor the subsequent changes in the effective power and the queue depth. Determine whether the effective power remains stable or shows an increasing trend. When it is determined that the effective power remains stable or shows an increasing trend, confirm that the effective power stops decaying. Determine whether the value of the queue depth is equal to zero. If the value of the queue depth is equal to zero, confirm that the queue depth is empty. When it is confirmed that the effective power has stopped decaying and the queue depth is empty, it is confirmed that there is external construction stress on the physical link, and a positive excitation signal characterizing the successful confirmation of physical disturbance is generated. The positive excitation signal is used as the reward signal to complete the operation of generating a reward signal when the effective power stops decaying and the queue depth is emptied; Extract the policy gradient parameters within the preset policy network; The reward signal is transmitted to the preset policy network, and the policy gradient parameters are backpropagated using the reward signal. The process of performing the backpropagation is used as updating the preset policy network using the reward signal.