PCIe communication system based on FPGA and communication method thereof

By using an FPGA-based PCIe communication system, the system clock rate and data bit width are dynamically adjusted, which solves the problem of insufficient data bandwidth caused by the low PCIe clock rate of the FPGA platform, and enables normal communication and link training between the FPGA platform and other PCIe devices.

CN115935875BActive Publication Date: 2026-06-19S2C

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
S2C
Filing Date
2023-01-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Because the PCIe clock rate of the FPGA platform is slow, the data bandwidth cannot meet the requirements of the PCIe protocol, link training cannot proceed normally, and the PCIe devices at both ends of the FPGA cannot communicate normally.

Method used

An FPGA-based PCIe communication system was designed, including a clock module, a data receiving module, a data sending module, a data scrambling/descrambling module, and a link training state machine detection module. By identifying the current link PCIe communication protocol version and data bit width, the system clock rate and data bit width are dynamically adjusted to achieve asynchronous data synchronization across clock domains and detection of the link training state machine, thus solving the communication problem between the FPGA platform and other PCIe devices.

🎯Benefits of technology

It enables normal communication between the FPGA platform and other PCIe devices, dynamically adapts data transmission bandwidth, solves the problem of insufficient data bandwidth caused by the low PCIe clock rate of the FPGA platform, and ensures successful link training.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115935875B_ABST
    Figure CN115935875B_ABST
Patent Text Reader

Abstract

This disclosure relates to an FPGA-based PCIe communication system and its communication method. The FPGA-based PCIe communication system identifies the current link PCIe communication protocol version and data bit width through a clock module, and outputs the system clock corresponding to the current link PCIe clock rate. A data receiving module receives and synchronizes asynchronous data across clock domains as system clock domain data, and sends the system clock domain data to a data sending module and a data scrambling / descrambling module. The data scrambling / descrambling module descrambles the system clock domain data of PCIe communication protocol version 3.0 and above, and sends it to a link training state machine detection module. The link training state machine detection module detects the descrambled system clock domain data, analyzes the state of the link training state machine, and outputs the data packet in the state of the link training state machine to the data sending module. The data sending module outputs the system clock domain data and data packet after data bit width processing. This enables normal communication between PCIe devices at both ends of the FPGA.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of integrated circuit chip design technology, and specifically relates to an FPGA-based PCIe communication system and its communication method. Background Technology

[0002] The Peripheral Component Interconnect Express (PCIe) standard is a next-generation high-speed I / O interconnect technology proposed by Intel. PCIe is widely used in PCIe devices such as personal computers and servers. Due to the increasing demand for high-speed data transmission, the PCIe protocol has iterated to PCIe 5.0, which achieves speeds of 32GT / s and a single-lane data bandwidth of 4000MB.

[0003] The PCIe protocol is divided into three layers: the transaction layer, the data link layer, and the physical layer. Data is sent at the transaction layer, passes through the data link layer, and is finally sent to other PCIe devices by the physical layer. When two PCIe devices begin communication, link training must first be performed at the physical layer (i.e., initializing the PCIe link's physical layer, port configuration information, transmit and receive modules, and related link states, understanding the topology of the other end of the link, and ultimately enabling data communication between the devices at both ends of the PCIe link). Normal data communication can only begin after link training is complete. During link training, PCIe devices recognize incoming data to implement LTSSM state machine transitions. Most transitions in the LTSSM state machine are achieved by detecting a specific sequence of states; as long as a specific sequence is detected, the device can transition to the corresponding state.

[0004] However, due to the limitations of the FPGA platform, PCIe on the FPGA platform cannot reach the clock rate specified by the protocol, resulting in insufficient data bandwidth. When communicating with other PCIe devices, because of the slow clock rate of the FPGA platform's PCIe, a single data transmission from the FPGA platform's PCIe might be sampled 10 times by the connected PCIe device—meaning the same data is sampled 10 times. This causes errors in the LTSSM state machine's state transitions, ultimately preventing the link training from completing properly. The inability to complete link training leads to a lack of communication between the two PCIe devices.

[0005] The commonly used solution to this problem is to increase the data bit width, which increases the data bandwidth. However, this method is no longer very applicable. First, the PCIe protocol undergoes version iterations during transmission. Protocol changes lead to a doubling of data bandwidth, which in turn doubles the clock rate, making adaptation to the PCIe protocol even more difficult. For example, in PCIe 1.0, the data bandwidth is 250MB. With an 8-bit data bit width, the clock rate is 250MHz; with a 16-bit data bit width, the clock rate is 125MHz; and with a 32-bit data bit width, the clock rate is 62.5MHz. If the PCIe protocol iterates to PCIe 2.0, the data bandwidth becomes 500MB. The clock rate is 500MHz with an 8-bit data bit width, 250MHz with a 16-bit data bit width, and 125MHz with a 32-bit data bit width. Increasing the data bit width in PCIe 1.0 could reduce the clock rate to 62.6MHz, but in PCIe 2.0, it can only be reduced to 125MHz. Increasing the data bit width introduces two problems. First, a larger data bit width leads to more severe EMI interference between parallel data, causing unexpected and difficult-to-trouble problems during data transmission. Second, the data bit width cannot be increased indefinitely; it increases placement and routing complexity, resource constraints, and the protocol itself has a maximum data bit width limit. The PIPE protocol supports a maximum data bit width of 64 bits, so theoretically, the data bit width can only be reduced by a maximum of 8 times the clock rate. Increasing the bit width can reduce the clock rate, but the upper limit of the clock rate is determined by the complexity of the current design and the performance of the FPGA. The minimum clock rate supported by a maximum reduction of 8 times is 62.5MHz. This method requires ensuring that the reduced clock rate is greater than 62.5MHz, but PCIe typically cannot reach 62.5MHz on FPGAs, so the traditional method is no longer applicable.

[0006] Secondly, the data bit width cannot be increased indefinitely. Increasing the data bit width will lead to a shortage of wiring resources and an increase in wiring difficulty. The biggest problem is that EMI noise between parallel data lines will cause data instability. Finally, the data bit width can only be increased to a maximum of 64 bits. This is due to the limitation of the PIPE protocol. The current PIPE protocol supports a maximum data bit width of 64 bits, so theoretically, increasing the data bit width can only reduce the clock rate by a maximum of 8 times. Summary of the Invention

[0007] This invention overcomes one of the shortcomings of the prior art and provides a PCIe communication system and communication method based on FPGA, which can solve the problem that the data bandwidth cannot meet the protocol requirements due to the speed inconsistency of the PCIe devices at both ends of the FPGA, and the PCIe devices at both ends of the FPGA cannot communicate normally.

[0008] According to one aspect of this disclosure, an FPGA-based PCIe communication system is proposed, the system comprising: a clock module, a data receiving module, a data sending module, a data scrambling / descrambling module, and a link training state machine detection module;

[0009] The clock module is used to identify the current link PCIe communication protocol version and data bit width, and output a system clock that conforms to the clock rate of the current link PCIe communication protocol version to the data receiving module, data sending module, data scrambling / descrambling module and link training state machine detection module.

[0010] The data receiving module is used to receive asynchronous data across clock domains, synchronize the asynchronous data across clock domains into system clock domain data, and send the synchronized system clock domain data to the data sending module and the data scrambling / descrambling module.

[0011] The data scrambling / descrambling module is used to descramble the synchronized PCIe communication protocol 3.0 and above system clock domain data and send it to the link training state machine detection module.

[0012] The link training state machine detection module is used to detect the descrambled system clock domain data, analyze the state of the link training state machine based on the descrambled system clock domain data, and output the data packets in the state of the link training state machine to the data sending module.

[0013] The data transmission module is used to process the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine, and output the system clock domain data and the data packet after data bit width processing.

[0014] In one possible implementation, the data scrambling / descrambling module is further configured to scramble the data packets output by the link training state machine detection module of PCIe communication protocol 3.0 and above, and output the scrambled data packets to the data sending module.

[0015] In one possible implementation, the clock module includes a frequency divider circuit and a data selector;

[0016] The frequency divider circuit is used to divide the input reference clock into two frequency divider clocks using an internal counter.

[0017] The data selector is used to identify the current link PCIe communication protocol version and data bit width, and selects a system clock from the two frequency divider clocks to output a clock rate that conforms to the current link PCIe communication protocol version based on the current link PCIe communication protocol version and data bit width.

[0018] In one possible implementation, the cross-clock domain asynchronous data includes: asynchronous data sent from the FPGA platform's PCIe to other PCIe devices, and asynchronous data sent from other PCIe devices to the FPGA platform's PCIe, wherein the clock rate of the FPGA platform's PCIe is lower than the clock rate of the other PCIe devices.

[0019] In one possible implementation, the data receiving module includes an asynchronous handshake circuit and an asynchronous FIFO circuit;

[0020] The asynchronous handshake circuit is used to synchronize asynchronous data sent from the FPGA platform's PCIe to other PCIe devices into system clock domain data.

[0021] The asynchronous FIFO circuit is used to synchronize asynchronous PCIe data sent from other PCIe devices to the FPGA platform into system clock domain data.

[0022] In one possible implementation, the data scrambling / descrambling module includes a linear feedback shift register, a first data selector, a second data selector, and a scrambling / descrambling module;

[0023] The linear feedback shift register is used to scramble the data packets output by the link training state machine detection module of PCIe communication protocol 3.0 and above.

[0024] The first data selector is used to select the output of a linear feedback shift register with a corresponding data bit width according to the data bit width of the current link PCIe;

[0025] The second data selector is used to select the data packets to be scrambled according to the clock rate of the current link PCIe communication protocol version;

[0026] The scrambling / descrambling module is used to perform logical operations on the output data of the linear feedback shift register with the corresponding data bit width and the data message that needs to be scrambled.

[0027] In one possible implementation, the data packet scrambled by the linear feedback shift register is scrambled again to descramble the system clock domain data of PCIe communication protocol 3.0 and above.

[0028] In one possible implementation, the data transmission module includes a data bit width conversion module, a counter, and a data selector;

[0029] The data bit width conversion module is used to convert the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine.

[0030] The counter is used to count the system clock, and generates an SKP sequence when the count value reaches a preset value.

[0031] The data selector is used to output the SKP sequence, the system clock domain data after data bit width processing, and the data packets in the state of the link training state machine in sequence according to priority.

[0032] In one possible implementation, the data bit width of the data packet in the synchronized system clock domain data and the state of the link training state machine includes:

[0033] Detect the data bit width of the data packets in the synchronized system clock domain data and the state of the link training state machine;

[0034] If the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine is less than the output bit width of the data sending module, the synchronized system clock domain data and the data packet in the state of the link training state machine are buffered, and then the synchronized system clock domain data and the data packet in the state of the link training state machine are output.

[0035] If the data bit width of the synchronized system clock domain data and the data packet in the link training state machine state is greater than or equal to the output bit width of the data sending module, the system clock domain data and the data packet in the link training state machine state are output sequentially according to the order of the input system clock domain data and the data packet in the link training state machine state.

[0036] According to another aspect of this disclosure, a PCIe communication method based on FPGA is proposed, the method comprising:

[0037] The data receiving module receives asynchronous data sent from the FPGA platform's PCIe to other PCIe devices or from other PCIe devices to the FPGA platform's PCIe, and synchronizes the asynchronous data into system clock domain data.

[0038] The system clock domain data of PCIe communication protocol version 3.0 and above is descrambled using a data scrambling / descrambling module to generate the original SKP sequence of the system clock domain data;

[0039] The original SKP sequence is detected and analyzed using the link training state machine detection module to obtain the state of the link training state machine and generate data packets under the state of the link training state machine.

[0040] The data packets output by the link training state machine detection module of PCIe communication protocol version 3.0 and above are scrambled using a data scrambling / descrambling module;

[0041] The data transmission module outputs the scrambled data packets from the link training state machine detection module and the system clock domain data.

[0042] This disclosure discloses an FPGA-based PCIe communication system and method. The clock module identifies the current link PCIe communication protocol version and data bit width, and outputs a system clock conforming to the clock rate of the current link PCIe communication protocol version to the data receiving module, data sending module, data scrambling / descrambling module, and link training state machine detection module. The data receiving module receives asynchronous data across clock domains, synchronizes the asynchronous data to system clock domain data, and sends the synchronized system clock domain data to the data sending module and the data scrambling / descrambling module. The data scrambling / descrambling module is used to scramble and descramble the data... The synchronized PCIe communication protocol version 3.0 and above system clock domain data is descrambled and sent to the link training state machine detection module. The link training state machine detection module detects the descrambled system clock domain data, analyzes the state of the link training state machine based on the descrambled system clock domain data, and outputs the data packet in the state of the link training state machine to the data sending module. The data sending module processes the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine, and outputs the system clock domain data and the data packet after data bit width processing. This can solve the problem that the data bandwidth cannot meet the protocol requirements due to the speed inconsistency of the PCIe devices at both ends of the FPGA, resulting in the inability of the PCIe devices at both ends of the FPGA to communicate normally.

[0043] Other optional features and technical effects of the embodiments of the present invention are partly described below and partly apparent from reading this document. Attached Figure Description

[0044] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The elements shown are not limited to the scale shown in the drawings, and the same or similar reference numerals in the drawings denote the same or similar elements, wherein:

[0045] Figure 1A schematic block diagram of an FPGA-based PCIe communication system according to an embodiment of the present disclosure is shown.

[0046] Figure 2 A schematic block diagram of an FPGA-based PCIe communication system according to another embodiment of the present disclosure is shown;

[0047] Figure 3 A schematic block diagram of a clock module according to an embodiment of the present disclosure is shown;

[0048] Figure 4 A schematic block diagram of a data receiving module according to an embodiment of the present disclosure is shown;

[0049] Figure 5 A schematic block diagram of a data scrambling / descrambling module according to an embodiment of the present disclosure is shown;

[0050] Figure 6 A schematic block diagram of a data transmission module according to an embodiment of the present disclosure is shown;

[0051] Figure 7 A schematic diagram illustrating the principle of link training state machine detection according to an embodiment of the present disclosure is shown.

[0052] Figure 8 A flowchart of an FPGA-based PCIe communication method according to an embodiment of the present disclosure is shown;

[0053] Figure 9 A schematic diagram of an application scenario of an FPGA-based PCIe communication system according to an embodiment of the present disclosure is shown.

[0054] Figure 10 A schematic diagram of an electronic device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0055] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.

[0056] The term "comprising" and its variations as used herein signify open inclusion, i.e., "including but not limited to". Unless otherwise stated, the term "or" means "and / or". The term "based on" means "at least partially based on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.

[0057] Furthermore, the steps illustrated in the flowcharts of the accompanying drawings can be executed in a computer, such as a set of computer-executable instructions. Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than that presented here.

[0058] Figure 1 and Figure 2 The schematic block diagrams of an FPGA-based PCIe communication system according to an embodiment of the present disclosure are shown respectively.

[0059] The PCIe communication system (PCIe Speed ​​Bridge) identifies the data packets transmitted by FPGA platform PCIe devices and other high-speed interconnect PCIe devices. It can dynamically adjust the system clock and data bit width to adapt to the bandwidth requirements of data transmission. It can be used to complete link training between high-speed peripheral interconnect PCIe devices and PCIe devices on FPGA, enabling FPGA platform PCIe to communicate with other high-speed interconnect PCIe devices.

[0060] like Figure 1 and Figure 2 As shown, the system may include: a clock module, a data receiving module, a data sending module, a data scrambling / descrambling module, and a link training and status state machine (LTSSM) detection module.

[0061] The clock module is used to identify the current link PCIe communication protocol version and data bit width, and outputs a system clock that conforms to the clock rate of the current link PCIe communication protocol version to the data receiving module, data sending module, data scrambling / descrambling module and link training state machine detection module.

[0062] The PCIe communication system has two clock inputs: a reference clock ref_clk, which is the reference clock for frequency division by the internal clock module, and a MAC layer (Media Access Control, a sub-layer below the data link layer) clock mac_clk, which is the clock of the MAC layer of the PCIe device on the FPGA platform.

[0063] The clock module is primarily responsible for generating the internal reference clock `ref_clk` and the MAC layer clock `mac_clk`. It can identify the current PCIe communication protocol version and data width, and output the clock rate corresponding to the current PCIe communication protocol version. For example, during PCIe communication system initialization, the initial clock rate is 125MHz. The PCIe communication system clock can be dynamically adjusted according to the PCIe version protocol and data width. For instance, if the PCIe version protocol is PCIe 1.0 with an 8-bit data width, the clock rate is 250MHz. However, the highest clock frequency achievable by the FPGA platform is 125MHz, so the clock of this PCIe communication system is 125MHz, corresponding to a data width of 16 bits.

[0064] Figure 3 A schematic block diagram of a clock module according to an embodiment of the present disclosure is shown.

[0065] In one example, such as Figure 3 As shown, the clock module may include a frequency divider circuit (DIV) and a data selector (MUX).

[0066] A frequency divider circuit (DIV) is used to divide the input reference clock into two divided clocks using an internal counter.

[0067] The data selector (MUX) is used to identify the current link PCIe communication protocol version and data bit width, and selects the system clock from two divided clocks based on the current link PCIe communication protocol version and data bit width to output a clock rate that conforms to the current link PCIe communication protocol version.

[0068] like Figure 3 As shown, when the frequency divider circuit DIV is triggered by the reference clock ref_clk, the internal counter of the frequency divider circuit DIV counts and generates two frequency divider clocks through the counter (for example, the two frequency divider clocks can be ref_clk_62p5m and ref_clk_125m). The two frequency divider clocks generated are input to the data selector MUX.

[0069] The data selector MUX selects the clock rate that conforms to the current link PCIe communication protocol version (rate, a PIPE (Physical Interface for PCI Express) protocol signal) and the data width (current effective PIPE data width) signal, and outputs the clock rate to the system clock sys_clk. For example, if the current link PCIe communication protocol is PCIe 1.0, the PCIe communication protocol version number (rate) is 2'b00, and the data width (width) is 2'b01, the data selector MUX selects to output a divided clock ref_clk_125m. If the current PCIe communication protocol is PCIe 1.0, the PCIe protocol version number is 2'b00, the data width is 2'b00, and the expected output clock rate is 250MHz, but the maximum clock rate achievable by this PCIe communication system is 125MHz, then the output clock rate will still be a divided clock ref_clk_125m. In this case, the data width needs to be modified from 2'b00 to 2'b01. This modification can be completed in the data receiving module. This clock module can dynamically adjust the clock according to the current PCIe communication protocol version and data width to adapt to the bandwidth requirements of data transmission between the FPGA platform PCIe and other PCIe devices.

[0070] The data receiving module is used to receive asynchronous data across clock domains, synchronize the asynchronous data across clock domains into system clock domain data, and send the synchronized system clock domain data to the data sending module and the data scrambling / descrambling module.

[0071] The asynchronous data across clock domains includes: asynchronous data sent from the FPGA platform's PCIe to other PCIe devices, and asynchronous data sent from other PCIe devices to the FPGA platform's PCIe, where the clock rate of the FPGA platform's PCIe is lower than the clock rate of other PCIe devices.

[0072] For example, due to the limitations of FPGA, the PCIe clock rate of the FPGA platform may only reach 10MHz or even lower. When the FPGA platform's PCIe sends data to other PCIe devices through the PCIe Speed ​​Bridge system (PCIe communication system), there is a problem of slow-to-fast cross-clock domain. A handshake protocol can be used to handle the slow-to-fast cross-clock domain problem.

[0073] Other PCIe devices sending data to the FPGA platform's PCIe via the PCIe Speed ​​Bridge system encounter a fast-to-slow clock domain transition issue. This can be addressed by using an asynchronous FIFO to buffer the data. The asynchronous FIFO has a backpressure mechanism: when the FIFO detects it is nearing full, it generates a credit_full signal to indicate that the FIFO memory is about to be full and the credit value (a specific term used in the PCIe protocol's flow control mechanism to represent the remaining FIFO memory size) is insufficient.

[0074] Figure 4 A schematic block diagram of a data receiving module according to an embodiment of the present disclosure is shown.

[0075] In one example, the data receiving module may include an asynchronous handshake circuit and an asynchronous FIFO circuit.

[0076] The asynchronous handshake circuit is used to synchronize asynchronous data sent from the FPGA platform's PCIe to other PCIe devices into system clock domain data. For example... Figure 4 As shown, the asynchronous handshake circuit processes asynchronous data sent from the FPGA platform's PCIe to other PCIe devices via the PIPE interface (Physical Interface for PCI Express, PCIe physical layer interface). This data can include signals such as txdata, txdatak, and txdatavalid. All asynchronous signal data is processed by the asynchronous handshake circuit and connected to its input port mac_txdatai. The principle of the asynchronous handshake circuit (synchronizing asynchronous data across clock domains to system clock domain data) is as follows: After the system clock sys_clk performs a three-level synchronization step with the mac clock mac_clk, the rising edge of the mac clock mac_clk is detected using XOR logic. When the rising edge of the mac clock mac_clk is detected, the input port mac_txdatai outputs to the output port mac_txdatao under the drive of the system clock sys_clk. At this point, it is also necessary to detect the current link PCIe communication protocol version number (rate) and data bit width (width). For example, the PCIe 1.0 protocol mentioned in the clock module above has a rate of 2'b00 and an effective data bit width of 2'b00. The expected clock rate should be 250MHz. The maximum clock rate of the PCIe communication system is 125MHz. Therefore, the output effective data bit width is 2'b01.

[0077] An asynchronous FIFO circuit is used to synchronize asynchronous PCIe data sent to the FPGA platform from other PCIe devices into system clock domain data. For example... Figure 4 As shown, the asynchronous FIFO circuit processes asynchronous data sent from other high-speed interconnect PCIe devices to the FPGA platform's PCIe network. Similar to the asynchronous handshake circuit, it requires synchronization of the asynchronous data. Specifically, asynchronous data is connected to the input port `phy_txdadai`, written using the system clock `sys_clk`, and read using the MAC clock `mac_clk`. Simultaneously, the asynchronous FIFO circuit internally determines the remaining capacity of the FIFO. When the remaining capacity reaches a preset value, it generates a credit_full signal, indicating the amount of data to be processed during the response from the peer PCIe device. In other words, by writing asynchronous data into the asynchronous FIFO for data synchronization, the asynchronous data sent from other PCIe devices to the FPGA platform's PCIe network—after cross-clock domain processing from slow to fast clock—is synchronized to the clock domain of the PCIe communication system (PCIe Speed ​​Bridge).

[0078] The data scrambling / descrambling module is used to descramble the system clock domain data of the synchronized PCIe communication protocol version 3.0 and above, and send it to the link training state machine detection module; and to scramble the data packets output by the link training state machine detection module of the PCIe communication protocol version 3.0 and above, and output the scrambled data packets to the data sending module.

[0079] The data scrambling / descrambling module mainly handles the descrambling of data for PCIe 3.0 and above protocols (this part of the logic is designed specifically for protocols 3.0 and above, and mainly detects control messages; control messages are not scrambled in 1.0 and 2.0, so descrambling is not required). In PCIe protocol versions 3.0 and above, the sequences in the LTSSM state machine are scrambled, so data packets need to be descrambled.

[0080] Communication protocols in PCIe 3.0 and later versions can scramble data packets (system clock domain data) using an LFSR (Linear Feedback Shift Register). The polynomial of the LFSR is G(X) = X. 23 +X 21 +X 16 +X 8 +X 5 +X 2 +1, where X represents a data packet.

[0081] The PCIe protocol uses parallel scrambling for data packets, and therefore the descrambling logic is also parallel. Specifically, the data packet scrambled by the linear feedback shift register (LFSR) is scrambled again to descramble the system clock domain data of PCIe communication protocol version 3.0 and above. The LFSR initializes itself when it detects the COM character in the LTS sequence (Light TaskSchedule) of the current PCIe link's LTSSM state machine (Link Training and Status State Machine). The initialization data varies depending on the channel.

[0082] Figure 5 A schematic block diagram of a data scrambling / descrambling module according to an embodiment of the present disclosure is shown.

[0083] In one example, such as Figure 5 As shown, the data scrambling / descrambling module may include a linear feedback shift register (LFSR), a first data selector (MUX), a second data selector (MUX), and a scrambling / descrambling module.

[0084] The Linear Feedback Shift Register (LFSR) is used to scramble the data packets output by the link training state machine detection module of PCIe communication protocol 3.0 and above. Because the LFSR sequence uses different scrambling logic functions for different data bit widths when scrambling parallel data, such as... Figure 5 As shown, the data scrambling / descrambling module has four sets of linear feedback shift registers, namely LFSR_8, LFSR_16, LFSR_32, and LFSR_64, which can simultaneously scramble the data packets output by the link training state machine detection module of PCIe communication protocol version 3.0 and above.

[0085] The first data selector (MUX) is used to select the output of the linear feedback shift register (LFSR) corresponding to the current PCIe data width (with). The second data selector (MUX) is used to select the data packet (input data datai) to be scrambled according to the clock rate of the current PCIe communication protocol version (rate). The scrambling / descrambling module is used to perform logical operations between the output data of the linear feedback shift register (LFSR) corresponding to the current data width and the data packet (input data datai) to be scrambled, and output the result of the logical operation to the output data port (datao).

[0086] The data transmission module is used to process the data bit width of the synchronized system clock domain data and the data packets in the state of the link training state machine, and output the system clock domain data and data packets after data bit width processing.

[0087] The data transmission module is mainly responsible for sending data, including: the FPGA platform's PCIe sending data to other PCIe devices via the PCIe communication system (PCIeSpeed ​​Bridge), and other PCIe devices sending data to the FPGA platform's PCIe via the PCIe communication system (PCIe SpeedBridge).

[0088] Figure 6 A schematic block diagram of a data transmission module according to an embodiment of the present disclosure is shown.

[0089] In one example, such as Figure 6 As shown, the data transmission module may include a data bit width conversion module, a counter cnt, and a data selector MUX;

[0090] The data bit width conversion module is used to convert the data bit width of the data packets in the synchronized system clock domain data and the data packets in the state of the link training state machine.

[0091] In one example, the process may include: detecting the data bit width of the data packets in the synchronized system clock domain data and the state of the link training state machine;

[0092] If the data bit width of the data packet in the synchronized system clock domain data and the link training state machine state is less than the output bit width of the data sending module, the synchronized system clock domain data and the data packet in the link training state machine state are buffered before the synchronized system clock domain data and the data packet in the link training state machine state are output.

[0093] If the data bit width of the synchronized system clock domain data and the data packet in the link training state machine state is greater than or equal to the output bit width of the data sending module, the system clock domain data and the data packet in the link training state machine state are output sequentially according to the order of the input system clock domain data and the data packet in the link training state machine state.

[0094] For example, such as Figure 6As shown, the synchronized system clock domain data mac_txdatao is output by the data receiving module, and the data packet ltssm_out, representing the current state of the link training state machine, is output by the link training state machine detection module (LTSSM state machine). First, it checks whether the data bit widths of the system clock domain data mac_txdatao and the data packet ltssm_out need to be converted. If the input data bit width in_width of the system clock domain data mac_txdatao and the data packet ltssm_out is smaller than the output data bit width out_width, then the input system clock domain data mac_txdatao and the data packet ltssm_out need to be buffered before outputting them. If the input data width in_width of the system clock domain data mac_txdatao and the data message ltssm_out is greater than or equal to the output data width out_width, then the input system clock domain data mac_txdatao and the data message ltssm_out need to be output in sequence, thus realizing the conversion of the effective data width of the system clock domain data mac_txdatao and the data message ltssm_out.

[0095] The counter `cnt` is used to count the system clock `sys_clk`. When the count reaches a preset value, it generates an SKP sequence. The preset value can be, for example,... Figure 6 As shown in Figure 1180, when the counter cnt equals 1180, an SKP sequence is generated and input into the data selector MUX. The SKP sequence is a special sequence on the PCIe bus used to compensate for clock differences. It does not transmit valid data information but is used to reserve positions. For example, the SKP sequence will periodically occupy several positions in the data sequence on the PCIe bus. The specific number of positions occupied varies between different generations of PCIe.

[0096] The data selector MUX is used to output the SKP sequence, system clock domain data after data width conversion, and data packets in the state of the link training state machine according to priority. The data after width conversion and the generated SKP sequence are input into the data selector MUX. The data selector MUX determines priority based on the following: when the SKP enable signal skp_en is high, the SKP sequence has the highest priority; if the credit value full signal credit_full is high, the data packet ltssm_out is output first; finally, when the rising edge detection signal mac_pos (generated by the data receiving module) mac_clk is high, the system clock domain data mac_txdatao is output first; finally, if no signal is high, the data packet ltssm_out (output data from the LTSSM state machine detection module) is output first. The asynchronous output signal phy_txdatao is directly output after the same data width conversion.

[0097] Actively sending SKP sequences through the data transmission module can solve the problem that the current PCIe protocol does not carry a clock during transmission. The clock is recovered within the data packet (asynchronous data), which may cause the recovered clock to be misaligned with the local clock due to clock drift. The PCIe protocol solves the clock misalignment problem by sending SKP sequences. However, because the PCIe of the FPGA platform itself runs very slowly, it cannot send SKP sequences in time, which may cause misalignment problems for other PCIe devices.

[0098] Because the PCIe communication system (PCIe Speed ​​Bridge) needs to send three data packets to other PCIe devices simultaneously—the system clock domain data mac_txdatao sent by the FPGA platform's PCIe itself, the data packet ltssm_out sent by the LTSSM state machine detection module, and the internally generated SKP sequence data—the data transmission module prioritizes these three data packets. The internally generated SKP sequence data (sent by the FPGA's PCIe and generated by this module) has the highest priority, followed by the flow control mechanism packet. The system clock domain data mac_txdatao sent by the FPGA platform's PCIe has the next lowest priority, and the data packet ltssm_out generated by the LTSSM state machine has the lowest priority. This effectively ensures that the FPGA platform's PCIe data is sent to other PCIe devices. Sending data from other PCIe devices to the FPGA platform's PCIe via the PCIe communication system (PCIe Speed ​​Bridge) only requires checking the data bit width.

[0099] The link training state machine detection module is used to detect the descrambled system clock domain data, analyze the state of the link training state machine based on the descrambled system clock domain data, and output the data packets in the state of the link training state machine to the data sending module.

[0100] The Link Training State Machine (LTSSM) detection module primarily sends the corresponding data packet ltssm_out based on the currently detected LTSSM state machine, and detects the credit value full signal credit_full for the asynchronous FIFO. Upon detecting both the data packet ltssm_out and the credit value full signal credit_full, it sends a flow control mechanism message. The LTSSM state machine state of the PCIe protocol has a clearly defined number and type of packets; the detection function of the LTSSM state machine can be achieved by detecting the corresponding packets and counting them.

[0101] Figure 7 A schematic diagram illustrating the principle of link training state machine detection according to an embodiment of the present disclosure is shown.

[0102] like Figure 7 As shown, the function of each state in the LTSSM state machine detection module is as follows:

[0103] Polling_Active: In this state, TS1 and TS2 sequences are sent. The purpose of this sequence is to prevent other PCIe devices from recognizing the continuous TS1 sequence, that is, the LTSSM state machine of other PCIe devices will not jump to the next state. When the system receives 8 consecutive TS1 sequences sent by the PCIe on the FPGA, it will then send the continuous TS1 sequence to other PCIe devices, and at the same time, the state will change to Polling_Configuration.

[0104] Polling_Configuration: This state sends TS1 and TS2 sequences. The purpose of sending these sequences is to prevent other PCIe devices from recognizing continuous TS2 sequences. In other words, other PCIe devices need to wait for the LTSSM state machine of the PCIe device on the FPGA to jump to the corresponding state. When the number and type of TS2 sequences sent by the PCIe on the FPGA are detected to match the state transition, the state transition will jump to Config_LinkWidthStart.

[0105] Config_LinkWidthStart: In this state, the TS2 sequence is sent. Since the correct sequence for this state is the TS1 sequence, the TS2 sequence is sent before the complete TS1 sequence is detected on the FPGA PCIe. The complete TS1 sequence is then sent out after the complete TS1 sequence is detected, and the state transitions to Config_LinkWidthAccept.

[0106] Config_LinkWidthAccept: In this state, TS2 sequences are sent, and the number and type of TS1 sequences sent by other PCIe devices and the number and type of TS1 sequences sent by PCIe on the FPGA are detected. When both conditions for state transition are met, the state transition proceeds to Config_LanenumWait.

[0107] Config_LanenumWait: This state sends a TS1 sequence. This sequence is only used to supplement the sequence during the period of detecting the number and type of consecutive TS1 sequences sent by other PCIe devices. When a consecutive TS1 sequence that matches the state transition is detected, the state switches to Config_LanenumAccept.

[0108] Config_LanenumAccept: In this state, TS2 sequences are sent, and the number and type of TS1 sequences sent by other PCIe devices and the number and type of TS1 sequences sent by PCIe on the FPGA are detected. When both conditions for state transition are met, the state transition proceeds to Config_Complete.

[0109] Config_Complete: In this state, a TS1 sequence is sent, and the number and type of TS2 sequences sent by other PCIe devices and the number and type of TS2 sequences sent by PCIe on the FPGA are detected. When both conditions for state transition are met, the state transition proceeds to Config_Idle.

[0110] Config_Idle: In this state, a TS1 sequence is sent, and the number of idle sequences sent by other PCIe devices and the number of idle sequences sent by PCIe on the FPGA are detected. When both conditions for state transition are met, the state transition proceeds to L0.

[0111] The L0 state is the normal data transmission state of the PCIe protocol. If there is no need to switch the link speed, data transmission can proceed normally. If it is necessary to switch the link speed, the TS1 or TS2 sequence or EIEOS sequence or EIOS sequence is detected. When a sequence that matches the state switching is detected, the process jumps to Recovery_RcvrLock.

[0112] Similarly, in the remaining states, TS1 or TS2 sequences are sent until a sequence matching the state transition is detected. Finally, after the link speed switch is complete, it will return to the L0 state for data transmission.

[0113] The PCIe communication system (PCIe Speed ​​Bridge) disclosed herein identifies data packets transmitted between FPGA platform PCIe devices and other high-speed interconnect PCIe devices. It can dynamically adjust the system clock and data bit width to adapt to the bandwidth requirements of data transmission. While adapting the bandwidth, the system can detect the state of the LTSSM state machine of the current FPGA platform PCIe. Based on the detected state, it generates corresponding data packets and sends them to other high-speed interconnect PCIe devices. This compensates for the misalignment of the LTSSM state machine (Link Training and Status State Machine) caused by the slow data packet transmission of the slow FPGA platform PCIe, which leads to the repeated use of data by other high-speed interconnect PCIe devices. It can be applied to complete link training between high-speed peripheral interconnect PCIe devices and PCIe devices on the FPGA, enabling communication between slow FPGA platform PCIe and other high-speed interconnect PCIe devices.

[0114] The following are embodiments of the method of this application, which can be applied to the PCIe communication system embodiments described above. For details not disclosed in the method embodiments of this application, please refer to the system embodiments of this application.

[0115] Figure 8 A flowchart of an FPGA-based PCIe communication method according to an embodiment of this disclosure is shown. Figure 8 As shown, the method may include:

[0116] Step S1: Use the data receiving module to receive asynchronous data sent from the FPGA platform's PCIe to other PCIe devices or from other PCIe devices to the FPGA platform's PCIe, and synchronize the asynchronous data into system clock domain data.

[0117] Step S2: Use the data scrambling / descrambling module to descramble the system clock domain data of PCIe communication protocol version 3.0 and above, and generate the original SKP sequence of the system clock domain data;

[0118] Step S3: Use the link training state machine detection module to detect and analyze the original SKP sequence, obtain the state of the link training state machine, and generate data packets under the state of the link training state machine.

[0119] Step S4: Use the data scrambling / descrambling module to scramble the data packets output by the link training state machine detection module of PCIe communication protocol version 3.0 and above;

[0120] Step S4: Use the data sending module to output the scrambled data packets output by the link training state machine detection module and the system clock domain data.

[0121] Figure 9 A schematic diagram of an application scenario of an FPGA-based PCIe communication system according to an embodiment of the present disclosure is shown.

[0122] like Figure 9 As shown, on the FPGA platform, the PCIe communication system (PCIe Speed ​​Bridge) connects to the PCIe MAC layer on the left via a PIPE interface (endpoint is a general term for PCIe, which can be divided into RC, EP, and switch), and to the PCIe PHY on the right via a PIPE interface. Upon system power-up, it initially operates at a default 125MHz clock. When the right-side PCIe sends a data packet, the PCIe communication system (PCIe Speed ​​Bridge) analyzes the current link's PCIe protocol version (rate) and effective data bit width using the PCIe communication protocol version number (rate) and data bit width to generate a system clock (sys_clk) that conforms to the current link's PCIe communication protocol version. The system clock (sys_clk) serves as the master clock, and other modules use this system clock for operation.

[0123] The following example illustrates the FPGA-based PCIe communication method by sending PCIe data from an FPGA platform to other PCIe devices via a PCIe communication system (PCIe Speed ​​Bridge).

[0124] like Figure 9 As shown, after the data packets output by the PCIe device on the FPGA platform on the left enter the PCIe communication system (PCIe SpeedBridge), as... Figure 2As shown, firstly, the PCIe device output data from the FPGA platform enters the data receiving module for cross-clock domain data synchronization to become system clock data. The synchronized data packet (system clock data) is then simultaneously input to both the data output module and the scrambling / descrambling module. The scrambling / descrambling module determines the PCIe communication protocol version number (rate). If the PCIe communication protocol version number (rate) is 3'b010 or higher, it indicates that the current link PCIe communication protocol version is PCIe 3.0 or higher. At this time, the scrambling / descrambling module starts working. The scrambling / descrambling module distinguishes between control and data packets using the syncheader signal (PIPE protocol signal, used in PCIe 3.0 and above, a 128 / 130B encoded synchronization header, indicating control or data packets). When the syncheader signal is 2'b01, it indicates a control packet, and then the start block signal (start) is used to further distinguish between control and data packets. The block determines the start of the control signal. Upon detection, the scrambling / descrambling module identifies the control signal COM, i.e., at 8'hbc, and initializes the linear feedback shift register LFSR. The initial value of Lane0 is 48'h1DBFBC. After initialization, the control message is descrambled, and the descrambled original control message is input to the LTSSM state machine detection module. The LTSSM state machine detects its current state by identifying the type and quantity of control messages. For example, the switching between Polling.Configuration and Configuration.Linkwidth.Start states is detected by detecting data messages sent from other high-speed PCIe devices to the FPGA platform PCIe devices. When an 8-step continuous TS1 sequence is detected, and both the link number and lane number of the TS1 sequence are PAD, and then after detecting one TS1 sequence, the FPGA platform PCIe sends a 16-step TS1 sequence with both the link number and lane number of the TS1 sequence being PAD, a state transition is detected. The LTSSM state machine detection module detects the state transition.

[0125] After `Configuration.Linkwidth.Start`, a TS2 sequence with a link number other than PAD is generated and sent. When other high-speed interconnect PCIe networks recognize the TS2 sequence, they determine that it does not meet the requirements for the next state transition, so the state will not change and there will be no power loss. After the LTSSM state machine detection module detects the credit_full signal of the asynchronous FIFO, it generates a flow control message indicating that the credit value is full. The message generated by the LTSSM state machine detection module is scrambled / descrambled. If the current PCIe protocol is 3.0 or higher, scrambling / descrambling will be performed. If it is below PCIe 3.0, no processing is performed and it is directly output to the data output module. The data output module counts the internal clock. When the count reaches the value of the SKP sequence, it generates an SKP sequence output and selects SKP sequence output. When the rising edge of the mac clock (mac_clk) arrives, it outputs the data sent by the data receiving module. At other times, it outputs the sequence generated by the LTSSM state machine.

[0126] The following example illustrates the FPGA-based PCIe communication method using a scenario where data from other PCIe devices is transmitted to the FPGA platform PCIe via a PCIe communication system (PCIe Speed ​​Bridge).

[0127] like Figure 9 As shown, other high-speed interconnect PCIe devices on the right send data to the PCIe communication system (PCIe SpeedBridge). After entering the PCIe communication system (PCIe Speed ​​Bridge), as... Figure 2 As shown, firstly, the data receiving module uses an asynchronous FIFO circuit to synchronize data transmission from other high-speed interconnect PCIe devices. The depth of the asynchronous FIFO circuit is determined by calculating the size of the entire LTSSM state machine data packet (the total number of data packets required for the entire link training). Calculations show that the data packet size from the Polling state to the L0 state is 246 TS sequences, each TS sequence is 128 bits, and the read data bit width is 8 bits. Therefore, the read time is 3936 * T. mac_clk The data written during this time period is The FIFO depth is When the asynchronous FIFO circuit is about to fill, it generates a credit full signal (credit_full), which is output to the LTSSM state machine detection module. The asynchronous FIFO circuit uses the MAC clock (mac_clk) to read data. The read data first undergoes asynchronous handshake processing, which is the same as the data receiving module's method. This handshake involves synchronizing the MAC clock (mac_clk) three times, then detecting the rising edge of the MAC clock (mac_clk) using an XOR operation. Once the rising edge of the MAC clock (mac_clk) is detected, the data is synchronized using a register. The synchronized message is simultaneously input to both the scrambling / descrambling module and the data output module. The data output module checks if the data bit width matches; if it does, it outputs directly; otherwise, it converts the bit width before outputting. The scrambling / descrambling module operates similarly to the scrambling / descrambling module described above. The scrambling / descrambling module directly outputs the message to the LTSSM state machine.

[0128] Others skilled in the art will understand that by implementing the above steps, communication between the FPGA platform PCIe and other high-speed interconnect PCIe devices can be achieved.

[0129] Beneficial effects:

[0130] This invention enables normal communication between PCIe devices on an FPGA and other PCIe devices, compared to the two solutions currently on the market.

[0131] First, this communication method has a wide range of applications. It requires no modification to the PCIe design or the PCIe communication system (PCIe Speed ​​Bridge) design. For example, to verify a PCIe 1.0 design with a data bandwidth of 250MB / s, the clock speed can be reduced to 125MHz with a 16-bit data width, and to 62.5MHz with a 32-bit data width. Furthermore, the data width can be dynamically changed during PCIe protocol communication, making it relatively easy to port between different FPGA platforms and different PCIe protocol versions.

[0132] Secondly, this communication method is simple to implement yet fully functional. The PCIe communication system (PCIe Speed ​​Bridge) detects the LTSSM state machine and then sends the corresponding SKP sequence, which allows the LTSSM state machines of other high-speed interconnected PCIe devices to correctly transition. At the same time, actively sending the SKP sequence can solve the problem of clock misalignment. Finally, by sending flow control mechanism messages, no data loss occurs, ensuring data integrity.

[0133] Third, it is low-cost and does not require the purchase of separate equipment. Communication between the FPGA and other PCIe devices can be achieved simply by using the existing PCIe communication method. There is no need to set complex parameters; just connect the PIPE interface correctly to realize communication between the FPGA and other PCIe devices.

[0134] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0135] In some embodiments, the area power consumption optimization system apparatus for VLSI may incorporate the area power consumption optimization method features of any embodiment of VLSI, and vice versa, which will not be elaborated here.

[0136] In an embodiment of the present invention, an electronic device is provided, comprising: a processor and a memory storing a computer program, wherein the processor is configured to perform a method for area and power consumption optimization of a very large-scale integrated circuit according to any embodiment of the present invention when running the computer program.

[0137] Figure 10 The diagram illustrates a method for implementing embodiments of the present invention or an electronic device 1000 for implementing embodiments of the present invention. In some embodiments, it may include more or fewer electronic devices than illustrated. In some embodiments, it may be implemented using a single or multiple electronic devices. In some embodiments, it may be implemented using cloud-based or distributed electronic devices.

[0138] Figure 10 This is a schematic diagram of the structure of the electronic device 10 provided in an embodiment of this application. Figure 10 As shown, the electronic device 1000 includes a processor 1001, which can perform various appropriate operations and processes based on programs and / or data stored in read-only memory (ROM) 1002 or programs and / or data loaded from storage portion 1008 into random access memory (RAM) 1003. The processor 1001 may be a multi-core processor or may contain multiple processors. In some embodiments, the processor 1001 may include a general-purpose main processor and one or more special coprocessors, such as a central processing unit (CPU), graphics processing unit (GPU), neural network processor (NPU), digital signal processor (DSP), etc. Various programs and data required for the operation of the electronic device 1000 are also stored in RAM 1003. The processor 1001, ROM 1002, and RAM 1003 are interconnected via bus 1004. An input / output (I / O) interface 1005 is also connected to bus 1004.

[0139] The processor and memory described above are used together to execute programs stored in the memory. When the program is executed by a computer, it can implement the methods, steps, or functions described in the above embodiments.

[0140] The following components are connected to I / O interface 1005: an input section 1006 including a keyboard, mouse, touchscreen, etc.; an output section 1007 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 1008 including a hard disk, etc.; and a communication section 1009 including a network interface card such as a LAN card, modem, etc. The communication section 1009 performs communication processing via a network such as the Internet. A drive 1010 is also connected to I / O interface 1005 as needed. A removable medium 1011, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 1010 as needed so that computer programs read from it can be installed into storage section 1008 as needed. Figure 10 The diagram only shows a portion of the components and does not imply that the computer system 1000 only includes... Figure 10 The components shown.

[0141] The systems, devices, modules, or units described in the above embodiments can be implemented by a computer or its associated components. The computer may be, for example, a mobile terminal, smartphone, personal computer, laptop computer, in-vehicle human-machine interface device, personal digital assistant, media player, navigation device, game console, tablet computer, wearable device, smart TV, Internet of Things system, smart home, industrial computer, server, or a combination thereof.

[0142] Although not shown, in this embodiment of the invention, a storage medium is provided storing a computer program configured to execute, when run, any file-difference-based compilation method of this embodiment of the invention.

[0143] Storage media in embodiments of the present invention include articles that are permanent and non-permanent, removable and non-removable, capable of storing information by any method or technology. Examples of storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0144] The methods, programs, systems, apparatuses, etc., in embodiments of the present invention can be executed or implemented in one or more networked computers, or practiced in a distributed computing environment. In the embodiments of this specification, in these distributed computing environments, tasks can be performed by remote processing devices connected via a communication network.

[0145] Those skilled in the art will understand that the embodiments described in this specification can be provided as methods, systems, or computer program products. Therefore, those skilled in the art will realize that the functional modules / units or controllers and related method steps described in the above embodiments can be implemented in software, hardware, or a combination of both.

[0146] Unless explicitly stated otherwise, the actions or steps of the methods and procedures described in the embodiments of the present invention do not necessarily have to be performed in a specific order and can still achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0147] This document describes several embodiments of the present invention; however, for the sake of brevity, the descriptions of the embodiments are not exhaustive, and identical or similar features or parts between the embodiments may be omitted. In this document, "one embodiment," "some embodiments," "example," "specific example," or "some examples" refers to embodiments applicable to at least one, but not all, of the present invention. The above terms do not necessarily refer to the same embodiments or examples. Without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described herein, as well as the features of the different embodiments or examples.

[0148] The exemplary systems and methods of the present invention have been specifically shown and described with reference to the above embodiments, which are merely examples of the best mode for implementing the systems and methods. Those skilled in the art will understand that various changes can be made to the embodiments of the systems and methods described herein without departing from the spirit and scope of the invention as defined in the appended claims when implementing the systems and / or methods.

Claims

1. A PCIe communication system based on FPGA, characterized in that, The system includes: a clock module, a data receiving module, a data sending module, a data scrambling / descrambling module, and a link training state machine detection module; The clock module is used to identify the current link PCIe communication protocol version and data bit width, and output a system clock that conforms to the clock rate of the current link PCIe communication protocol version to the data receiving module, data sending module, data scrambling / descrambling module and link training state machine detection module. The data receiving module is used to receive asynchronous data across clock domains, synchronize the asynchronous data across clock domains into system clock domain data, and send the synchronized system clock domain data to the data sending module and the data scrambling / descrambling module. The data scrambling / descrambling module is used to descramble the synchronized PCIe communication protocol 3.0 and above system clock domain data and send it to the link training state machine detection module. The link training state machine detection module is used to detect the descrambled system clock domain data, analyze the state of the link training state machine based on the descrambled system clock domain data, and output the data packets in the state of the link training state machine to the data sending module. The data transmission module is used to process the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine, and output the system clock domain data and the data packet after data bit width processing.

2. The FPGA-based PCIe communication system of claim 1, wherein, The data scrambling / descrambling module is also used to scramble the data packets output by the link training state machine detection module of PCIe communication protocol 3.0 and above, and output the scrambled data packets to the data sending module.

3. The FPGA-based PCIe communication system of claim 1, wherein, The clock module includes a frequency divider circuit and a data selector; The frequency divider circuit is used to divide the input reference clock into two frequency divider clocks using an internal counter. The data selector is used to identify the current link PCIe communication protocol version and data bit width, and selects a system clock from the two frequency divider clocks to output a clock rate that conforms to the current link PCIe communication protocol version based on the current link PCIe communication protocol version and data bit width.

4. The FPGA-based PCIe communication system of claim 1, wherein, The cross-clock domain asynchronous data includes: asynchronous data sent from the FPGA platform's PCIe to other PCIe devices, and asynchronous data sent from other PCIe devices to the FPGA platform's PCIe, wherein the clock rate of the FPGA platform's PCIe is lower than the clock rate of the other PCIe devices.

5. The FPGA-based PCIe communication system of claim 4, wherein, The data receiving module includes an asynchronous handshake circuit and an asynchronous FIFO circuit; The asynchronous handshake circuit is used to synchronize asynchronous data sent from the FPGA platform's PCIe to other PCIe devices into system clock domain data. The asynchronous FIFO circuit is used to synchronize asynchronous PCIe data sent from other PCIe devices to the FPGA platform into system clock domain data.

6. The FPGA-based PCIe communication system of claim 2, wherein, The data scrambling / descrambling module includes a linear feedback shift register, a first data selector, a second data selector, and a scrambling / descrambling module; The linear feedback shift register is used to scramble the data packets output by the link training state machine detection module of PCIe communication protocol 3.0 and above. The first data selector is used to select the output of a linear feedback shift register with a corresponding data bit width according to the data bit width of the current link PCIe; The second data selector is used to select the data packets to be scrambled according to the clock rate of the current link PCIe communication protocol version; The scrambling / descrambling module is used to perform logical operations on the output data of the linear feedback shift register with the corresponding data bit width and the data message that needs to be scrambled.

7. The FPGA-based PCIe communication system according to claim 6, characterized in that, The data packets scrambled by the linear feedback shift register are scrambled again to descramble the system clock domain data of PCIe communication protocol version 3.0 and above.

8. The FPGA-based PCIe communication system according to claim 1, characterized in that, The data transmission module includes a data bit width conversion module, a counter, and a data selector; The data bit width conversion module is used to convert the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine. The counter is used to count the system clock, and generates an SKP sequence when the count value reaches a preset value. The data selector is used to output the SKP sequence, the system clock domain data after data bit width processing, and the data packets in the state of the link training state machine in sequence according to priority.

9. The FPGA-based PCIe communication system of claim 8, wherein, The data bit width of the data packets in the system clock domain data and the state of the link training state machine after the conversion synchronization includes: Detect the data bit width of the data packets in the synchronized system clock domain data and the state of the link training state machine; If the data bit width of the synchronized system clock domain data and the data packet in the state of the link training state machine is less than the output bit width of the data sending module, the synchronized system clock domain data and the data packet in the state of the link training state machine are buffered, and then the synchronized system clock domain data and the data packet in the state of the link training state machine are output. If the data bit width of the synchronized system clock domain data and the data packet in the link training state machine state is greater than or equal to the output bit width of the data sending module, the system clock domain data and the data packet in the link training state machine state are output sequentially according to the order of the input system clock domain data and the data packet in the link training state machine state.

10. A PCIe communication method based on FPGA, characterized in that, The method includes: The data receiving module receives asynchronous data sent from the FPGA platform's PCIe to other PCIe devices or from other PCIe devices to the FPGA platform's PCIe, and synchronizes the asynchronous data into system clock domain data. The system clock domain data of PCIe communication protocol version 3.0 and above is descrambled using a data scrambling / descrambling module to generate the original SKP sequence of the system clock domain data; The original SKP sequence is detected and analyzed using the link training state machine detection module to obtain the state of the link training state machine and generate data packets under the state of the link training state machine. The data packets output by the link training state machine detection module of PCIe communication protocol version 3.0 and above are scrambled using a data scrambling / descrambling module; The data transmission module outputs the scrambled data packets from the link training state machine detection module and the system clock domain data.

Citation Information

Patent Citations

  • PCIE controller verification method and device based on FPGA, and computer equipment

    CN113821463A

  • Data processing circuit, power saving method, power saving program, recording medium, and apparatus

    JP2009259217A