A method for feedback optimization of online fountain codes

By combining reinforcement learning and deep neural networks to optimize the feedback scheme of online fountain codes, the decoding overhead and error rate problems of online fountain codes under limited feedback are solved, achieving more efficient decoding performance and lower error rate.

CN116192332BActive Publication Date: 2026-06-12BEIJING INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING INST OF TECH
Filing Date
2022-12-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing online fountain code feedback schemes suffer from inaccurate decoding overhead calculations with a limited number of feedback cycles, and the accumulation of decoding state estimation errors leads to poor decoding performance.

Method used

By combining a reinforcement learning-based feedback degree determination method with the OFC decoding state prediction at the transmitter, the receiver is modeled as an agent using a deep neural network. The agent is trained using the DQN algorithm to optimize the distribution of feedback points and degree values, thereby improving the efficiency of feedback utilization.

🎯Benefits of technology

With a limited number of feedback attempts, this method reduces decoding overhead, improves decoding performance, adapts to different channel erasure probabilities, and has a lower bit error rate than existing methods, outperforming HTLO and EBDS.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116192332B_ABST
    Figure CN116192332B_ABST
Patent Text Reader

Abstract

The application provides an online fountain code feedback optimization method, comprising the following steps: step S1, obtaining an overall degree value distribution of an online fountain code; step S2, modeling a receiving end as an agent of reinforcement learning; step S3, modeling an encoder, a decoder and a binary erasure channel as a reinforcement learning environment; step S4, training the agent by using a DQN algorithm; and step S5, interacting the trained agent with the environment in step S3, recording an optimal feedback degree value and a corresponding feedback point, and when the proportion of recovered symbols at the decoding end reaches a certain proportion, the receiving end feeds back to the sending end to inform the degree value switching, and the sending end sends the encoding symbols according to the degree distribution. The application improves the feedback point utilization efficiency, finds the feedback scheme with the lowest decoding overhead in the case of limited feedback times, and better approaches the recovery performance of OFC.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a feedback optimization method for online fountain codes. Background Technology

[0002] Fountain codes are an efficient channel coding method where the encoder can generate an infinite number of coded symbols from K source symbols, and the receiver can recover the source symbols from any number of coded symbols. Compared to traditional fixed-rate channel coding methods, the code rate of fountain codes can adaptively change with channel conditions without prior acquisition of channel state information, and is therefore widely used in multimedia communication, satellite communication, data distribution, and other scenarios.

[0003] To improve the real-time decoding capability of fountain codes, Cassuto et al. proposed Online Fountain Codes (OFC), whose online characteristic means that for any decoding state, the corresponding optimal coding strategy can be found.

[0004] However, the online nature of online fountain codes relies on feedback. In practical communication systems, due to the asymmetry of uplink and downlink, the uplink bandwidth is usually much smaller than the downlink bandwidth, thus limiting the number of feedbacks that can be supported. In broadcast scenarios, excessive feedback can also lead to problems such as feedback storms. To reduce the number of feedbacks in online fountain codes, Cai et al. proposed the Heuristic Overhead Table Lookup (HTLO) method. This method models the number of overhead symbols required to convert between different coding symbol degree values ​​as an overhead table and uses heuristic table lookup to find the degree value feedback sequence that minimizes decoding overhead given a certain number of feedbacks, achieving similar decoding performance with fewer feedbacks. Qin et al., in their paper titled "Improved HTLO Algorithm for On-Line Fountain Codes with Limited Feedback," improved the HTLO method by proposing the IO-FSS-HTLO algorithm. This algorithm further reduces the decoding overhead under limited feedback by using more accurate decoding overhead calculations and more flexible feedback point selection. Huang et al. proposed a feedback reduction algorithm based on decoder state estimation (EBDS). This algorithm estimates the decoder's decoding state in real-time at the encoder through theoretical calculations, updates the encoded symbol degree value accordingly, and corrects errors in the state estimation through feedback. However, both HTLO and IO-FSS-HTLO methods suffer from inaccurate decoding overhead calculations, leading to suboptimal decoding overhead in the feedback scheme. The EBDS method, on the other hand, suffers from the accumulation of decoding state estimation errors, resulting in high decoding overhead when the number of feedback iterations is limited. Summary of the Invention

[0005] The purpose of this invention is to propose a feedback optimization method for online fountain codes, which combines a reinforcement learning-based feedback degree value determination method with the prediction of OFC decoding state at the transmitter to improve the utilization efficiency of feedback points. Under the condition of limited feedback times, it seeks the feedback scheme with the lowest decoding overhead and better approaches the recovery performance of OFC.

[0006] This invention is achieved through the following technical solution:

[0007] A feedback optimization method for online fountain codes includes the following steps:

[0008] Step S1: Determine the optimal coding degree value based on the proportion of each decoded recovered symbol. Obtain the overall degree value distribution of the online fountain code based on the probability of each coding degree value appearing as the optimal coding degree value. ;

[0009] Step S2: Model the receiver as a reinforcement learning agent. The agent consists of an input layer with one 2-node node and several... A fully connected layer of a node and a A deep neural network with 2 nodes in the output layer, where the 2 nodes in the input layer correspond to a state space of size 2. , For the first The number of encoded symbols stored in the decoder after each encoded symbol is received, M. This is the coding degree value fed back by the decoder in the last iteration. The number of source symbols, ;

[0010] Step S3: Model the encoder, decoder, and binary erase channel as a reinforcement learning environment. When the proportion of recovered symbols reaches... At that time, the number of encoded symbols in the decoder's stored state M is updated to the current value. The decoder's last feedback code degree value was updated to this. The agent observes the state Then take action The decoder then updates the current coding degree value to... The sending end is distributed in degrees. Send encoded symbols until the percentage of recovered symbols reaches [a certain threshold]. The action taken by the agent if decoding is not completed. The reward is The decoding process will be interrupted; if decoding is successful, a reward will be given. , For degree distribution The range of degree values The number of encoded symbols sent within, degree distribution Distribution of overall degree values In the degree range The above was intercepted. Indicative value Sum of values Points with equal effective probabilities;

[0011] Step S4: Using the DQN algorithm and the reinforcement learning environment of step S3, train the agent. During training, each feedback is defined as a step, and each successful decoding is defined as a round.

[0012] Step S5: Utilize the trained agent to interact with the reinforcement learning environment from Step S3, based on... and Record the best feedback value and according to and Calculate the corresponding feedback points When the percentage of recovered symbols at the decoding end reaches When the receiving end switches the degree value, it sends a feedback notification to the sending end, and the sending end then distributes the degree value accordingly. Send encoded symbols, where, This refers to the proportion of decoded recovery symbols. The probability that the code contains only one unrecovered coded symbol. This refers to the proportion of decoded recovery symbols. The probability that the code contains only two unrecovered coded symbols. The calculation formula is: .

[0013] Furthermore, according to the formula Calculate the According to the formula Calculate the ,in This is the coding degree value, the optimal coding degree value. According to the formula Calculate and count the number of times each coding degree value appears as the optimal coding degree value. Based on this count, obtain the probability of each coding degree value appearing as the optimal coding degree value. This probability distribution is the overall coding degree value distribution. .

[0014] Furthermore, the proportion of the decoded recovery symbols The expectation is Once the current optimal coding degree value is determined, the proportion of symbols recovered in the next decoding stage is estimated based on this expected value. To determine the next optimal coding degree value, where, For the first The average degree value of the unrecovered source symbols after each coded symbol is received and processed. , , .

[0015] Furthermore, the encoded symbol of case M stored in the decoder is an encoded symbol obtained by modulo two sum of more than two unrecovered symbols and several recovered symbols.

[0016] Furthermore, the degree value The effective probability is .

[0017] Furthermore, the degree distribution Specifically .

[0018] Furthermore, in step S3, the transmitting end first sends encoded symbols with a degree value of 1 until the receiving end recovers the proportion of symbols to a preset value.

[0019] The present invention has the following beneficial effects:

[0020] 1. This invention first obtains the overall degree distribution of the online fountain code. Then, it models the receiver as a reinforcement learning agent. This agent is related to the number of encoded symbols in the decoder's stored state M and the previous encoded degree value fed back by the decoder. Next, the encoder, decoder, and channel are modeled as a reinforcement learning environment. The DQN algorithm is used to train the agent in conjunction with this reinforcement learning environment. Finally, the trained agent interacts with the reinforcement learning environment, recording the optimal feedback degree value and the corresponding feedback point. The transmitter sends new encoded symbols according to the degree distribution based on the optimal feedback degree value and feedback point. This achieves reinforcement learning-based... The method for determining the feedback degree value is combined with the prediction of the OFC decoding state at the transmitting end to improve the utilization efficiency of feedback points. Under the condition of limited feedback times, the feedback scheme with the lowest decoding overhead is found to better approach the recovery performance of OFC. Given the same number of feedback times, the decoding overhead required by this invention is lower than that of HTLO, IO-FSS-HTLO and EBDS. Under the same maximum number of coded symbols and feedback times, the bit error rate of this invention is lower than that of HTLO and EBDS. This invention can more flexibly adapt to different channel erasure probabilities, and the performance of the agent under different channels is improved compared with the HTLO method. Attached Figure Description

[0021] The present invention will now be described in further detail with reference to the accompanying drawings.

[0022] Figure 1 This is a flowchart of the present invention.

[0023] Figure 2 The overall degree value distribution of this invention A schematic diagram.

[0024] Figure 3 This is a graph showing the relationship between decoding overhead and channel erasure probability for the present invention and a comparative example.

[0025] Figure 4 This is a graph showing the relationship between the bit error rate and the upper limit of the number of encoded symbols for the present invention and the comparative example. Detailed Implementation

[0026] Example 1:

[0027] In this embodiment, it is assumed that there are two reconnaissance drones, A and B, each carrying a mobile terminal. and .now Need to Report target information, namely Towards Send a segment of length Information regarding the potential for enemy electromagnetic dominance. Considering the strong anti-jamming capabilities of the UAVs, online fountain codes are used as the channel coding method. Furthermore, due to the limited computing power of the reconnaissance aircraft, the encoding and decoding process should not consume excessive resources; therefore, a limited-feedback online fountain code is adopted, with the number of feedback cycles limited to [number missing]. .

[0028] like Figure 1 As shown, the feedback optimization method for online fountain codes is a reinforcement learning-based, piecewise, degree-distributed (RL-SD) method, which includes the following steps:

[0029] Step S1: Recover the proportion of each decoded symbol. Determine the corresponding optimal coding degree value, and obtain the overall degree value distribution of the online fountain code based on the probability of each coding degree value appearing as the optimal coding degree value. The overall degree value distribution in this embodiment like Figure 2 As shown;

[0030] This embodiment of OFC is a segmented, degree-distributed, setup-free OFC. In the prior art, the transmission degree value gradually increases, but in this embodiment, the transmission degree value follows a random distribution of the overall degree value. In the prior art, OFC encoding requires a setup phase, that is, symbols with a degree value of 2 are continuously sent until a certain number of recovered symbols are reached. This embodiment eliminates this phase and uses degree-distributed symbol transmission throughout, thus it is a setup-free OFC (OFCNB). This can improve intermediate performance and facilitate the optimization of decoding overhead.

[0031] Specifically, according to the formula calculate According to the formula calculate ,in To recover the percentage of symbols in the decoding process, This is the coding degree value, the optimal coding degree value. According to the formula Calculate and statistically complete the coding degree values ​​at each stage at the sending end. The probability of each coding degree value being the optimal coding degree value is calculated by dividing the number of times the optimal coding degree value occurs by the total number of coding degree values. This probability distribution is the overall coding degree value distribution. ;

[0032] Decoding recovery symbol ratio The expectation is Once the current optimal coding degree value is determined, the proportion of symbols recovered in the next decoding stage is estimated based on this expected value. To determine the next optimal coding degree value, where, For the first The average degree value of the unrecovered source symbols after each coded symbol is received and processed. , , , For the first The number of encoded symbols stored in the decoder after each encoded symbol is received;

[0033] Step S2: Model the receiver as a reinforcement learning agent. The agent has a 2-node input layer and 3 nodes. A fully connected layer of a node and a A deep neural network with 2 nodes in the output layer, where the 2 nodes in the input layer correspond to a state space of size 2. , For the first The number of encoded symbols stored in the decoder after each encoded symbol is received, M. This is the coding degree value fed back by the decoder in the last iteration. The number of source symbols, satisfy The decoder stores the encoded symbol of case M as an encoded symbol obtained by modulo-2 sum of more than 2 unrecovered symbols and several recovered symbols;

[0034] Step S3: Model the encoder, decoder, and binary erase channel as a reinforcement learning environment. The transmitter first sends coded symbols with a degree value of 1 until the receiver recovers a preset percentage of symbols. Then it enters the learning state, that is, when the proportion of recovered symbols reaches... At that time, the reinforcement learning environment updates the number of encoded symbols in the decoder's stored state M to the current number. The decoder's last feedback code degree value was updated to this. The agent observes the state Then take action The decoder then updates the current coding degree value to... The sending end is distributed in degrees. Send encoded symbols until the percentage of recovered symbols reaches [a certain threshold]. The action taken by the agent if decoding is not completed. The reward is The decoding process will be interrupted; if decoding is successful, a reward will be given. , For degree distribution The range of degree values The number of encoded symbols sent within, degree distribution Distribution of overall degree values In the degree range The above was intercepted. Indicative value Sum of values Points with equal effective probabilities, degree value The effective probability is The sending end first sends encoded symbols with a degree value of 1 until the receiving end recovers the symbol ratio to a preset value. This stage is a degree value of 1, which allows for better intermediate performance. These are preset parameters;

[0035] Assuming the previous feedback value If the action given by the agent is The reinforcement learning environment then provides the following operations based on the agent:

[0036] Update the state space, updating the stored state symbol count m to... Update the current feedback score to ;

[0037] The sending end is distributed by degree value Send encoded symbols until the percentage of recovered symbols reaches [a certain threshold]. ,in, The expression is ;

[0038] The reward for the agent's feedback in the reinforcement learning environment is ;

[0039] Let's assume the previous feedback value was... When the agent gives an action The number of subsequent feedback attempts has reached the limit. The reinforcement learning environment will then provide the following operations based on the agent's actions:

[0040] Update the state space, updating the stored state symbol count m to... Update the current feedback score to ;

[0041] The sending end is distributed by degree value Send encoded symbols until the percentage of recovered symbols reaches [a certain threshold]. ,in, The expression is ,

[0042] Subsequently, the sending end distributes the values ​​in degrees. Send encoded symbols until the recovered symbol ratio reaches 1, where, The expression is ;

[0043] Since the decoding process is complete, the environment returns a reward to the agent equal to the negative of the number of encoded symbols sent. ;

[0044] Step S4: Using the DQN algorithm, combined with the reinforcement learning environment of step S3, train the agent. During training, each feedback is defined as a step, and each successful decoding is defined as a round. The learning rate is set as follows: Number of training samples each time Number of training rounds The target network update cycle is set to Discount factor The maximum number of moves per round is ;

[0045] Step S5: Deploy the trained agent to the receiving end to interact with the environment from Step S3. Determine the next feedback degree value based on the number of feedbacks and the feedback degree value. and Record the best feedback value and according to , and optimal feedback value Calculate the corresponding feedback points When the percentage of recovered symbols at the decoding end reaches When the receiving end switches the degree value, it sends a feedback notification to the sending end, and the sending end then distributes the degree value accordingly. Send encoded symbols, where, This refers to the proportion of decoded recovery symbols. The probability that the code contains only one unrecovered coded symbol. This refers to the proportion of decoded recovery symbols. The probability that the code contains only two unrecovered coded symbols. The calculation formula is: ;

[0046] like Figure 3 As shown , When the receiver uses BP decoding, the decoding overhead and channel erasure probability of RL-SD, HTLO, IO-FSS-HTLO, EBDS, and OFCNB without feedback in this embodiment are as follows: Relationship diagram, by Figure 3 It can be seen that the RL-SD in this embodiment is closer to the OFCNB decoding overhead under different channel erasure probabilities;

[0047] like Figure 4 As shown , Channel erasure probability At that time, the relationship between the bit error rate and the upper limit of the number of transmitted symbols is shown in the graph. Figure 4 It can be seen that when the allowed number of coded symbols is greater than 184, RL-SD has a bit error rate performance closer to that of OFCNB than other schemes. More specifically, when the receiver also uses BP decoding, the number of coded symbols required for full recovery using RL-SD is 166.4, while the number required for full recovery using the HTLO method is 175.4, the number required for full recovery using the IO-FSS-HTLO method is 172.8, and the number required for full recovery using the EBDS method is 186.9. Therefore, RL-SD in this embodiment is closer to the 164.6 number of fully recovered symbols required by OFCNB without limiting the number of feedback cycles.

[0048] Example 2:

[0049] Suppose there are two satellites, A and B, each carrying a mobile terminal. and .now Confirmation required Is it working properly, i.e. Towards Send a segment of length paging information, After receiving it A response was given. Considering the space communication scenario, online fountain codes, which have strong anti-interference capabilities, are used as the channel coding method between satellites. Also... The current state is unknown, considering to In cases where the uplink may be severely blocked, an online fountain code with limited feedback is used.

[0050] make , ,because Therefore, the size of the hidden layer can be appropriately reduced to simplify the system. For example, the number of fully connected layer nodes can be reduced to 64. Other steps are the same as in Embodiment 1 and will not be repeated here. When the channel erasure probability is... When the receiver also uses BP decoding, the number of encoded symbols required for full recovery by the receiver using the RL-SD method is 53.8, while the number of full recovery symbols required by the OFCNB-HTLO method is 55.4. Therefore, our proposed method is closer to the number of full recovery symbols of OFCNB without limiting the number of feedbacks, which is 46.9.

[0051] The above description is merely a preferred embodiment of the present invention and should not be construed as limiting the scope of the present invention. All equivalent changes and modifications made in accordance with the scope of the patent application and the contents of the specification of the present invention should still fall within the scope of the patent of the present invention.

Claims

1. A feedback optimization method for online fountain codes, characterized in that: Includes the following steps: Step S1: Determine the optimal coding degree value based on the proportion of each decoded recovered symbol. Obtain the overall degree value distribution of the online fountain code based on the probability of each coding degree value appearing as the optimal coding degree value. ; Step S2: Model the receiver as a reinforcement learning agent. The agent consists of an input layer with one 2-node node and several... A fully connected layer of a node and a A deep neural network with 2 nodes in the output layer, where the 2 nodes in the input layer correspond to a state space of size 2. , For the first The number of encoded symbols stored in the decoder after each encoded symbol is received, M. This is the coding degree value fed back by the decoder in the last iteration. The number of source symbols, ; Step S3: Model the encoder, decoder, and binary erase channel as a reinforcement learning environment. When the proportion of recovered symbols reaches... At that time, the number of encoded symbols in the decoder's stored state M is updated to the current value. The decoder's last feedback code degree value was updated to this. The agent observes the state Then take action The decoder then updates the current coding degree value to... The sending end is distributed in degrees. Send encoded symbols until the percentage of recovered symbols reaches [a certain threshold]. The action taken by the agent if decoding is not completed. The reward is The decoding process will be interrupted; if decoding is successful, a reward will be given. , For degree distribution The range of degree values The number of encoded symbols sent within, degree distribution Distribution of overall degree values In the degree range The above was intercepted. Indicative value Sum of values Points with equal effective probabilities; Step S4: Using the DQN algorithm and the reinforcement learning environment of step S3, train the agent. During training, each feedback is defined as a step, and each successful decoding is defined as a round. Step S5: Utilize the trained agent to interact with the environment from step S3, based on... and Record the best feedback value and according to and Calculate the corresponding feedback points When the percentage of recovered symbols at the decoding end reaches When the receiving end switches the degree value, it sends a feedback notification to the sending end, and the sending end then distributes the degree value accordingly. Send encoded symbols, where, This refers to the proportion of decoded recovery symbols. The probability that the code contains only one unrecovered coded symbol. This refers to the proportion of decoded recovery symbols. The probability that the code contains only two unrecovered coded symbols. The calculation formula is .

2. The feedback optimization method for online fountain codes according to claim 1, characterized in that: According to the formula Calculate the According to the formula Calculate the ,in This is the coding degree value, the optimal coding degree value. According to the formula Calculate and count the number of times each coding degree value appears as the optimal coding degree value. Based on this count, obtain the probability of each coding degree value appearing as the optimal coding degree value. This probability distribution is the overall coding degree distribution. .

3. The feedback optimization method for online fountain codes according to claim 2, characterized in that: The proportion of decoding recovery symbols The expectation is Once the current optimal coding degree value is determined, the proportion of symbols recovered in the next decoding stage is estimated based on this expected value. To determine the next optimal coding degree value, where, For the first The average degree value of the unrecovered source symbols after each coded symbol is received and processed. , , .

4. A feedback optimization method for online fountain codes according to claim 1, 2, or 3, characterized in that: The decoder stores the encoded symbol of case M, which is the encoded symbol obtained by modulo 2 sum of more than 2 unrecovered symbols and several recovered symbols.

5. A feedback optimization method for online fountain codes according to claim 1, 2, or 3, characterized in that: The degree value The effective probability is .

6. A feedback optimization method for online fountain codes according to claim 1, 2, or 3, characterized in that: The degree distribution Specifically .

7. A feedback optimization method for online fountain codes according to claim 1, 2, or 3, characterized in that: In step S3, the transmitting end first sends encoded symbols with a degree value of 1 until the receiving end recovers the proportion of symbols to a preset value.