A deep learning-based polarization multiplexed fiber channel modeling method

By using the BERT model based on Transformer for polarization multiplexing fiber channel modeling, the problems of high computational complexity and lack of consideration of polarization mode dispersion in traditional methods are solved, achieving efficient fiber channel modeling and improving fitting accuracy and speed.

CN118869118BActive Publication Date: 2026-06-23SOUTHWEST JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHWEST JIAOTONG UNIV
Filing Date
2024-06-28
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional fiber optic channel modeling methods have high computational complexity when dealing with complex or nonlinear systems, and fail to effectively consider the randomness and time variation of polarization mode dispersion. The computational complexity increases sharply, especially in long-distance fiber optic channels, and the requirements for datasets are high.

Method used

A Transformer-based BERT model is used for polarization multiplexing fiber channel modeling. By constructing a bidirectional attention mechanism and a pre-trained fine-tuning architecture, combined with a multi-head self-attention mechanism and the GELU activation function, the model is optimized to reduce computational complexity. The Adam optimizer and mean squared error loss function are used for training.

Benefits of technology

It significantly reduces computational complexity, improves the system's fitting accuracy and computation speed, and can effectively fit the characteristics of polarization multiplexed fiber channels, reducing reliance on mathematical models and expert experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure QLYQS_1
    Figure QLYQS_1
  • Figure QLYQS_2
    Figure QLYQS_2
  • Figure BDA0004916926240000021
    Figure BDA0004916926240000021
Patent Text Reader

Abstract

The application discloses a polarization multiplexing optical fiber channel modeling method based on deep learning, and particularly relates to the following steps: constructing a BERT model for polarization multiplexing optical fiber channel modeling; building a transmission link, collecting a training set for training the constructed BERT model; and using the trained BERT model to fit the QAM signal transmitted in the polarization multiplexing optical fiber channel, so as to obtain the output optical signal of the corresponding channel. The application can effectively fit the characteristics and response of the polarization multiplexing optical fiber channel, avoids the dependence on strict mathematical model calculation and expert experience modeling in the traditional modeling method, and does not need explicit physical models and prior knowledge, thereby significantly reducing the calculation complexity.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of optical communication, and in particular relates to a deep learning-based method for modeling polarization multiplexed optical fiber channels. Background Technology

[0002] Optical fiber transmission technology, with its high bandwidth utilization and superior performance, occupies an important position in modern communication systems. To ensure the communication quality of the link, an efficient and reliable optical fiber channel modeling method is essential. Traditional optical fiber modeling methods are diverse, mainly including mathematical modeling, Fourier methods, and transmission matrix methods, among which the Fourier method is the most widely used. Optical fiber channel modeling based on the split-step Fourier transform (SSFM) is as follows: Figure 1 As shown, the Fourier transform principle is used to analyze and design fiber optic systems. Its high frequency resolution characteristic enables precise analysis of the frequency components of optical signals, facilitating a better understanding and control of signal transmission within optical fibers. The split-step Fourier method improves the performance of fiber optic communication systems by precisely controlling the refractive index distribution of the fiber and optimizing mode selectivity.

[0003] However, the split-step Fourier method for modeling fiber optic channels still has some limitations. It relies heavily on traditional mathematical and physical analysis, which limits its flexibility in handling complex or nonlinear systems. While it simplifies computation in some cases, the computational complexity remains high when dealing with large-scale data and long-distance fiber optic channels. Furthermore, the split-step Fourier method typically requires an explicit physical model and prior knowledge, which limits its effectiveness when data is insufficient or the system is unknown.

[0004] To address the computational challenges of traditional modeling methods, a deep learning (DL)-based modeling approach is proposed. Currently, models based on Transformer, BILSTM, and FNO perform well in single-mode fiber modeling. However, these methods do not consider the effects of polarization mode dispersion (PMD) caused by different polarization states, while this invention further considers the dual polarization states of the fiber. In traditional fiber modeling methods, the randomness and time-varying nature of PMD exacerbate the modeling complexity. Simultaneously, as the fiber length increases, the cumulative effect of birefringence random fluctuations significantly increases, leading to a sharp rise in the computational complexity of PMD. When using deep learning methods to model fiber channels, compared to single-polarization fiber channel modeling, polarization-multiplexed fiber channel modeling requires considering the transmission characteristics between two polarization states simultaneously. Signals in each polarization state are subject to different nonlinear effects and noise during transmission, increasing the modeling difficulty. Furthermore, dual polarization states mean that deep learning models need to process higher-dimensional data, significantly increasing computational complexity and necessitating model optimization to reduce computational costs. In addition, polarization-multiplexed fiber channel modeling places higher demands on the dataset, requiring more extensive and accurate datasets covering different polarization states and signal characteristics. Summary of the Invention

[0005] To address the shortcomings of traditional step-Fourier fiber modeling methods, this invention provides a deep learning-based polarization multiplexing fiber channel modeling method.

[0006] The present invention provides a deep learning-based method for modeling polarization multiplexed optical fiber channels, comprising the following steps:

[0007] Step 1: Construct a BERT (Bidirectional Encoder Representations from Transformers) model for polarization multiplexing fiber channel modeling.

[0008] Step 2: Establish a transmission link and collect and construct a training set for training the BERT model constructed in Step 1.

[0009] Step 3: Use the BERT model trained in Step 2 to fit the received optical signal of the polarization multiplexed fiber channel.

[0010] Furthermore, the transmission link uses an IQ modulator to modulate the continuous optical signal emitted by the laser. The modulator is driven by a quadrature amplitude modulation (QAM) electrical signal. Then, the generated optical modulation signal is transmitted through a deep learning network and received by a photodetector. The received electrical signal is then received by the receiving module after polarization mode dispersion compensation and coherent reception.

[0011] Furthermore, the BERT model in step 1 is specifically as follows:

[0012] 1) The input data size of the model is (batch_size, 2*seq_len+1, d_model); where batch_size refers to the number of data samples processed by the network in each iteration; seq_len refers to the number of elements in the input data sequence; and d_model refers to the dimension of the input and output vectors of each layer in the model.

[0013] 2) The BERT model is based on the encoder structure of the Transformer model. Compared with the standard Transformer, the position encoding matrix of the input layer in the BERT model is learned and updated synchronously with the model weights, and it has a bidirectional attention mechanism and a pre-training fine-tuning architecture. Its structure includes batch normalization operation, Dropout, residual structure, multi-head self-attention mechanism and forward propagation. The data size remains unchanged in this structure.

[0014] 3) The input layer adds the result of position encoding of the input sequence to the input sequence to obtain the output of the input layer.

[0015] 4) The model employs multiple self-attention mechanisms to construct a multi-head self-attention mechanism structure. This multi-head attention mechanism executes multiple self-attention operations in parallel. The expression for each self-attention mechanism is as follows:

[0016]

[0017] Where Q, K, and V are the query vector, key vector, and value vector, respectively. V is the dimension of the key vector; in the softmax function, to calculate the score, the softmax function converts it into a probability form, multiplies it by V, and then sums the results to obtain the output vector.

[0018] 5) The forward layer consists of a fully connected layer, a GELU activation function, and another fully connected layer in that order, with the input and output data sizes remaining unchanged.

[0019] 6) BERT's output layer is implemented by a single fully connected network, and the size of the data in the output layer remains unchanged.

[0020] 7) The BERT model samples Adam as the model optimizer, and the loss function is the mean squared error loss function.

[0021] Furthermore, in step 2, the transmitted and received signals of the polarization-multiplexed optical fiber channel in the QAM transmission link are collected as a training set. The transmitted signal is used as the input signal of the BERT model, and the BERT model is used to fit the received information. The BERT model is trained based on the training set. The data is converted into a size of (batch_size, 2*seq_len+1, d_model) as the input of the BERT model, and the model fits to generate the received signal. The optimizer Adam uses the mean square difference between the received signal generated by the model fitting and the received signal collected by the link to guide the adjustment of the weights of the BERT model until the model converges.

[0022] Furthermore, batch_size is set to 64, seq_len to 20, and d_model to 64.

[0023] Furthermore, in step 3, a sequence is selected from the transmitted signal using a sliding sequence method and fed into the BERT model. The model generates the received signal of the polarization multiplexed fiber channel, and then the normalized mean square error of the model is calculated by comparing it with the received signal of the link.

[0024] The beneficial technical effects of this invention are as follows:

[0025] 1. This invention can effectively fit the characteristics and response of polarization multiplexed optical fiber channels, avoiding the reliance on rigorous mathematical model calculations and expert experience modeling in traditional modeling methods, and also eliminating the need for explicit physical models and prior knowledge, thus significantly reducing computational complexity.

[0026] 2. The BERT model of this invention has powerful context learning and parallel computing capabilities, which can effectively improve the fitting accuracy and computing speed of the system. Attached Figure Description

[0027] Figure 1 Flowchart of the step-by-step Fourier fiber modeling method.

[0028] Figure 2 This is a structural diagram of the BERT model of the present invention.

[0029] Figure 3 This is a structural diagram of a multi-head self-attention mechanism.

[0030] Figure 4 This is a flowchart illustrating the implementation of the deep learning-based fiber optic modeling method of this invention.

[0031] Figure 5 The following are fitted waveforms of the received signals from different models at a transmit power of 3dBm at the transmitter end of a 1200km polarization-multiplexed optical fiber channel transmission link with 16QAM (a is CNN; b is DNN; c is BERT).

[0032] Figure 6 The N_MSE performance of received signals at different transmit powers at the transmitter end of a 1200 km polarization-multiplexed optical fiber channel transmission link in 16QAM is studied.

[0033] Figure 7 The N_MSE performance of received signals at different transmit powers at the transmitter end of a 1200 km polarization-multiplexed optical fiber channel transmission link with 64QAM is studied. Detailed Implementation

[0034] The present invention will be further described in detail below with reference to the accompanying drawings and specific implementation methods.

[0035] The present invention provides a deep learning-based method for modeling polarization multiplexing fiber optic channels, as follows: Figure 4 As shown, specifically:

[0036] 1. First, construct a BERT network for polarization multiplexing channel modeling.

[0037] The BERT model structure is as follows: Figure 2 As shown.

[0038] The first layer is the input layer 100. The input data size of the input layer is (batch_size, 2*seq_len+1, d_model). The result of position encoding of the input sequence is added to the input sequence to obtain the output of the input layer. In this invention, seq_len is 20, d_model is 64, and batch_size is 64. Therefore, the input size of the model is (64, 41, 64).

[0039] The second layer is the feature extraction layer 200, which consists of multiple Transformer encoder structures strung together, comprising six structural blocks in this invention. In each encoder structural block, data first passes through a multi-head self-attention mechanism structure 210, then undergoes a dropout operation with a probability of 0.1, followed by batch normalization and passing through a residual structure 220, then into a forward propagation network 230, followed by another dropout operation with a probability of 0.1, and finally batch normalization and passing through a residual structure 240. The residual structure alleviates the vanishing gradient problem, better learns sequence features, improves network generality, and accelerates model training speed. The data feature size remains unchanged within each structural block.

[0040] In multi-head self-attention mechanism structures (such as...) Figure 3In the diagram, this invention sets n_heads to 16, representing the number of heads in the multi-head self-attention mechanism. For each input vector, the self-attention mechanism structure uses three sets of weight matrices to generate three distinct vectors: the query vector Q, the key vector K, and the value vector V. For each element in the sequence, its attention score with all elements in the sequence (including itself) is calculated. This is obtained by taking the dot product of the query vector and each key vector, and then typically divided by a scaling factor (e.g., ...). in The training process is stabilized by using the dimension of the key vectors. A softmax function is applied to the scores, converting them into probabilistic form. Each value vector is multiplied by the output of the softmax function, and then summed to obtain the output vector at that position. This output vector is a weighted sum of all positions in the input sequence, with weights determined by the attention scores. The calculation formula is as follows:

[0041]

[0042] Building upon self-attention, the multi-head attention mechanism executes multiple self-attention operations in parallel. The query, key, and value matrix 211 is divided into multiple smaller matrices, each corresponding to a "head." Each head independently performs steps to calculate the attention score 212, apply the softmax function, and generate a weighted value vector 213, producing an output vector for each position in the sequence. Finally, the output vectors of all heads are concatenated and subjected to another linear transformation to obtain the final output vector.

[0043] The forward propagation network consists of two fully connected layers: an input layer with 64 neurons, a hidden layer with 512 neurons, and an output layer with 64 neurons. The activation function between them is GELU. Its expression is:

[0044] GELU(x)=xΦ(x)

[0045] Where x is the input feature, and Φ(x) is the cumulative distribution function of the standard normal distribution, expressed as:

[0046]

[0047] Here, erf is the error function, defined as:

[0048]

[0049] The third layer is the output layer, which consists of only one fully connected network. The data size in the output layer remains unchanged, that is, the data size is still (64, 41, 64).

[0050] The mean squared error loss function of the model is:

[0051]

[0052] Where x i Let y represent the i-th symbol obtained through the BERT model. i Let be the i-th real symbol at the link output, and n be the total number of input samples.

[0053] For further testing, the model uses normalized mean squared error to measure test performance, defined as:

[0054]

[0055] 2. Establish a transmission link, collect the dataset used for training, and train the network.

[0056] To construct the dataset, this invention establishes a QAM transmission link, such as... Figure 4 As shown, a continuous laser source generates a 1550nm optical signal with a sampling rate of 80GHz. Two identical optical waves are generated by a polarization beamsplitter and input to an x-polarized IQ modulator and a y-polarized IQ modulator. A QAM signal is generated at the transmitter with a baud rate of 10Gbaud / s. The QAM signal passes through a root-raised cosine filter (low-pass type) with a roll-off factor of 0.18. The signal is modulated by the IQ modulator, with both the half-wave voltage and bias voltage at 5V and the extinction ratio set to the default parameter of 35dB. After the two polarization signals enter the polarization combiner, an amplifier controls the output power at the transmitter, with the power set to 3dBm. Each span in the channel is 80km long, containing one EDFA amplifier and a section of optical fiber. The total length of the polarization-multiplexed fiber channel in this link is 1200km, totaling 15 spans. The EDFA uses power control mode with an output power of 3mW, and the device exhibits 4.5dB of noise. The fiber attenuation in the link is 0.2 dB / km, and the first-order dispersion is 16e-6 s / m. 2 Second-order dispersion 0.08e3s / m 3 Polarization mode dispersion coefficient After channel output, polarization mode dispersion is compensated. The compensated optical signal first passes through a bandpass Gaussian filter, then undergoes coherent detection and is converted into an electrical signal by a photodetector (PD). The electrical signal is then low-pass filtered, and finally, the DSP module compensates for the dispersion and nonlinear effects generated by the optical fiber. The x-polarization and y-polarization states of the optical signal are acquired in the link, and these two polarization states are integrated into a one-dimensional real number array in the order of the real part of the x-polarization state, the imaginary part of the x-polarization state, the real part of the y-polarization state, and the imaginary part of the y-polarization state. The number of transmission symbols is set to 2097152 and 131072, respectively. The input and output optical signals of the polarization-multiplexed fiber channel in the link are acquired, and training and test datasets are constructed according to the above method. The BERT model is trained using the obtained training dataset and labels.

[0057] The model training process is as follows:

[0058] Step 1: Input the transmitted signal and use the BERT model to fit the received signal to generate the signal.

[0059] Step 2: Calculate the loss value between the received signal generated by the model and the actual received signal using the loss function, and then use the Adam optimizer to adjust the model weights based on the loss value.

[0060] Step 3: Use another set of unlearned signals as a validation set to compare whether the N_MSE of the validation results reaches the expected N_MSE.

[0061] Step 4: If the N_MSE of the received signal sequence generated by the model reaches the expected value, record the model's weight values ​​and the training process ends; if the N_MSE of the model does not reach the expected value, repeat steps 1-3 until the expected N_MSE is achieved.

[0062] 3. Use the BERT model trained in step 2 to fit the polarization multiplexing fiber channel of the QAM link.

[0063] The transmitted signal is fitted using a BERT model to generate the corresponding received signal. The fitted received signal is then compared with the actual received signal to calculate the N_MSE of the test signal. To compare the superiority of the BERT model in polarization-multiplexed fiber channel modeling, this invention compares it with other modeling methods.

[0064] Under 16QAM modulation and 3dBm transmit power, the fitting of the channel received signal is as follows: Figure 5 As shown in the figure, CNN performs poorly, DNN performs slightly better, while BERT achieves a good fit. Figure 6 The N_MSE performance of the 16QAM modulated transmitter receiving signal at different transmit powers is demonstrated. Figure 6 The results show that the BERT model performs well under different transmit powers. This invention demonstrates that BERT outperforms DNN and CNN models in modeling polarization-multiplexed fiber channels. To further verify the generalization performance of the BERT model under higher-order modulation formats, this invention trains the BERT model using a 16QAM signal and tests the trained BERT model with a 64QAM signal. The N_MSE performance of the received signal at different transmit powers at the 64QAM modulated transmitter is shown in the figure. Figure 7 As shown in the figure, the results indicate that BERT performs better and has less fluctuation.

[0065] In summary, the BERT model proposed in this invention can accurately model polarization multiplexed fiber channels, achieving good accuracy in 16QAM and exhibiting good generalization performance in 64QAM transmission links.

[0066] This invention proposes for the first time a deep learning model for signal regression fitting, namely the BERT model. This model is based on the encoder structure of the Transformer model, constructed by concatenating an input layer, an output layer, and multiple encoder layers. However, compared to the standard Transformer model, the BERT model has several unique improvements. First, the input layer position encoding matrix in the BERT model can be learned synchronously with the update of the model weights. Furthermore, the BERT model employs a bidirectional attention mechanism and a pre-trained fine-tuning architecture. The bidirectional attention mechanism means that when processing the input, the model considers not only preceding information but also all information before and after it. The input data size of the BERT model is (batch_size, 2*seq_len+1, d_model), where batch_size represents the number of data samples processed by the network in each iteration (each training step), seq_len represents the number of elements in the input data sequence (e.g., words, characters, or time steps), and d_model represents the dimension of the input and output vectors of each layer in the model. Validation shows that when seq_len is 20, the model achieves better training results. Since the BERT model is used for regression tasks, the Adam optimizer is employed, and the mean squared error is used as the loss function of the model. The learning rate is 0.0004, and the number of iterations is 300.

[0067] Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention. Unless otherwise specified, the embodiments and features described in this application can be arbitrarily combined with each other.

Claims

1. A deep learning-based method for modeling polarization multiplexing fiber optic channels, characterized in that, Includes the following steps: Step 1: Construct a BERT model for polarization multiplexing fiber channel modeling; The BERT model is as follows: 1) The input data size of the BERT model is (batch_size, 2*seq_len+1, d_model); where batch_size refers to the number of data samples processed by the BERT model in each iteration; seq_len refers to the number of elements in the input data sequence; and d_model refers to the dimension of the input and output vectors of each layer in the model. 2) The BERT model is based on the encoder structure of the Transformer model. Compared with the Transformer model, the position encoding matrix of the input layer in the BERT model is learned and updated synchronously with the weights of the BERT model, and it has a bidirectional attention mechanism and a pre-training fine-tuning architecture. The BERT model structure includes batch normalization, Dropout, residual structure, multi-head self-attention mechanism and forward propagation. The data size remains unchanged in this structure. 3) The input layer adds the result of positional encoding of the input sequence to the input sequence to obtain the output of the input layer; 4) The BERT model employs multiple self-attention mechanisms to form a multi-head self-attention mechanism structure. This multi-head self-attention mechanism executes multiple self-attention operations in parallel. The expression for each self-attention mechanism is as follows: ; Where Q, K, and V are the query vector, key vector, and value vector, respectively. V represents the dimension of the key vector; the attention score in the softmax function is converted into a probability form and multiplied by V to obtain the output vector. 5) The forward layer consists of a fully connected layer, a GELU activation function, and another fully connected layer in that order, with the input and output data sizes remaining unchanged; 6) The output layer of the BERT model is implemented by only one fully connected network, and the size of the data in the output layer remains unchanged; 7) The BERT model uses Adam as the model optimizer, and the loss function is the mean squared error loss function; Step 2: Build a QAM transmission link, collect the transmitted and received signals of the polarization multiplexed optical fiber channel in the QAM transmission link as a training set, and use it to train the BERT model constructed in Step 1. The transmitted signal is used as the input signal of the BERT model, and the BERT model is used to fit the received signal. Step 3: Use the BERT model trained in Step 2 to fit the received optical signal of the polarization multiplexed fiber channel.

2. The deep learning-based polarization multiplexing optical fiber channel modeling method according to claim 1, characterized in that, The transmission link uses an IQ modulator to modulate the continuous optical signal emitted by the laser. The modulator is driven by a quadrature amplitude modulation (QAM) electrical signal. Then, the generated optical modulation signal is transmitted through a BERT model and received by a photodetector. The received electrical signal is then received by the receiving module after polarization mode dispersion compensation and coherent reception.

3. The deep learning-based polarization multiplexing optical fiber channel modeling method according to claim 1, characterized in that, In step 2, the BERT model is trained based on the training set. The data is converted to a size of (batch_size, 2*seq_len+1, d_model) as the input of the BERT model. The BERT model fits and generates the received signal. The optimizer Adam uses the mean square difference between the received signal fitted by the BERT model and the received signal collected by the link to guide the adjustment of the weights of the BERT model until the BERT model converges.

4. The deep learning-based polarization multiplexing optical fiber channel modeling method according to claim 1, characterized in that, The batch_size is set to 64, the seq_len to 20, and the d_model to 64.

5. The deep learning-based polarization multiplexing optical fiber channel modeling method according to claim 3, characterized in that, In step 3, a sequence is selected from the transmitted signal using a sliding sequence method and fed into the BERT model. The BERT model generates the received signal of the polarization multiplexed fiber channel, and then the normalized mean square error of the model is calculated by comparing it with the received signal of the link.