High speed low latency interconnect interface for silicon die to die interconnect
By designing a high-speed, low-latency interconnect interface for silicon dielectric layer interconnects, the data transmission efficiency and power consumption problems of traditional interface designs in heterogeneous computing and high-bandwidth memory are solved, achieving high-speed, low-latency data transmission and high power efficiency, which is suitable for heterogeneous integration of heterogeneous computing and high-bandwidth memory.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 58TH RES INST OF CETC
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-19
AI Technical Summary
At the deep subnanometer technology node, traditional silicon dielectric layer interconnect interface designs are difficult to meet the needs of heterogeneous computing and high-bandwidth memory, cannot achieve high-speed, low-latency data transmission, and have low power efficiency.
A high-speed, low-latency interconnect interface for silicon dielectric layer interconnect is designed, including a physical layer and a link layer. The link layer is used to receive and control signals inside the chip, and the physical layer is used for signal transmission and reception. It supports parallel data transmission of multiple channels, is compatible with DDR and SDR modes, and includes multiple modules to realize data conversion, verification, training and control functions.
It enables protocol-free high-speed data transmission on the silicon dielectric layer, meets the requirements of high efficiency and high performance-to-power ratio, supports interconnection of multiple chips, and is suitable for heterogeneous integration of heterogeneous computing and high-bandwidth memory.
Smart Images

Figure CN116050307B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of high-speed physical interface design technology, and in particular to a high-speed, low-latency interconnect interface for silicon dielectric layer interconnects. Background Technology
[0002] Following the failure of Dennard's geometry scaling, semiconductor technology roadmaps have proposed extending Moore's Law while focusing on diversified packaging. The upgrade from monolithic integration to System-on-Chip (SoC) was a milestone in the semiconductor industry. However, as technology nodes entered the deep sub-nanometer range, not only did the difficulty increase, but the design costs also became prohibitively high, making it difficult to recoup investment within a limited market capacity.
[0003] Even more challenging is the fact that traditional homogeneous processors are struggling to meet the computational demands of applications like big data, which are experiencing explosive growth in computing power. Dedicated accelerators are needed for heterogeneous computing (HC), which requires heterogeneous integration of different memory chips. Simultaneously, high-bandwidth memory (HBM), suitable for high-intensity data applications, also requires heterogeneous integration. Therefore, there is an urgent need to design a high-speed, low-latency interconnect interface (HLII) for silicon dielectric interconnects.
[0004] However, unlike interface designs for traditional PCB (Printed Circuit Board) layer interconnects or SIP (System In a Package) integration, silicon dielectric layer interconnects require high-speed interconnects for large-scale I / O between heterogeneous chips. Therefore, high-speed, low-latency interconnect interface architectures for silicon dielectric layer interconnects cannot follow the design of traditional high-speed interfaces, and their data transmission efficiency and power consumption also face challenges. Summary of the Invention
[0005] Therefore, it is necessary to provide a high-speed, low-latency interconnect interface for silicon dielectric layer interconnection to address the aforementioned technical problems.
[0006] In a first aspect, this application provides a high-speed, low-latency interconnect interface for silicon dielectric layer interconnection, including a physical layer and a link layer between the internal logic of the chip and the physical layer;
[0007] The link layer is used to receive signals from inside the core, transmit the signals from inside the core to the physical layer, and control the physical layer according to the signals from inside the core. The signals from inside the core include data signals, and transmitting the signals from inside the core to the physical layer includes converting the data signals and then sending them to the physical layer.
[0008] The physical layer is used to receive signals transmitted via the link layer, transmit the signals through the silicon dielectric to the physical layer of another high-speed low-latency interconnect interface, and receive signals transmitted by the physical layer of another high-speed low-latency interconnect interface, transmit the signals to the link layer, and transmit them to the core after being received by the link layer.
[0009] In one embodiment, the signals inside the core also include configuration signals and control signals, and the control of the physical layer includes performing data conversion, parity checking, training, channel repair, and instruction stream generation on the physical layer.
[0010] In one embodiment, the physical layer includes at least one transmission channel, and the link layer includes at least one logical control channel, wherein the number of transmission channels and the number of logical control channels are the same;
[0011] The transmission channel is used to transmit data signals, and the modes of transmitting data signals include DDR transmission mode and SDR transmission mode.
[0012] The logic control channel is used to control and schedule the data stream of the transmission channel.
[0013] In one embodiment, each of the transmission channels includes a plurality of transmission sub-channels, each of the transmission sub-channels being responsible for transmitting at least 32 bits of data signal;
[0014] Each of the logical control channels includes multiple logical control sub-channels, and the logical control sub-channels correspond one-to-one with the transmission sub-channels. The logical control sub-channels are used to control and schedule the data streams transmitted by the corresponding transmission sub-channels.
[0015] In one embodiment, the transmission subchannel includes multiple DWORD bit slices, a transmit clock generation module, a receive clock generation module, a DWORD FIFO controller, a delay line tester, and an Rx clock driver;
[0016] Each DWORD bit slice includes one transmit data FIFO, one receive data FIFO, one transmit I / O and one receive I / O;
[0017] The transmit clock generation module is used to generate a high-speed clock;
[0018] The receiving clock generation module is used to generate a high-speed clock and a clock for capturing read data;
[0019] The DWORD FIFO controller is used to control the transmit data FIFO and receive data FIFO in the DWORD bit slice;
[0020] The delay line tester is used to fine-tune the delay on the transmit clock to focus the clock on the data eye;
[0021] The Rx clock driver is used to add clock drivers.
[0022] In one embodiment, the logic control subchannel includes a control module, a delay line controller, a DWORD loopback BIST, a data generation module, and a data checking module;
[0023] The control module is used to control the data path and carry data signals;
[0024] The delay line controller is used for the control, calibration, and VT compensation of the DWORD delay line;
[0025] The DWORD loopback BIST is used for loopback and delay line testing of the BIST logic.
[0026] The data generation module and data inspection module are used to generate training and testing data.
[0027] In one embodiment, the physical layer further includes a physical layer Matser and an interface testing module;
[0028] The physical layer Matser is used to provide the physical layer with a global clock, reset signal and reference voltage;
[0029] The interface testing module is used to perform functional testing on high-speed, low-latency interconnect interfaces.
[0030] In one embodiment, the link layer further includes a link layer MASTER module, which includes a configuration module, a Master status register, an initialization engine, a training controller, a reset and test controller, a P1500 controller, an instruction stream generator, and an instruction unit.
[0031] The configuration module is used to interact with APB interface, TDR interface, and JTAG interface transactions to perform CSR reading and writing;
[0032] The Master status register includes all logical status registers that can be shared by the entire interface;
[0033] The initialization engine is used to implement the initialization process at the hardware level and, in conjunction with the status register, to perform the initialization operation of the high-speed, low-latency interconnect interface.
[0034] The training controller is used to automatically perform training on read delay, read data eye, write data eye, and reference voltage.
[0035] The test controller is used to reset the generation, calibrate the impedance of the I / O drive, provide a global reference voltage for the data receiving I / O, monitor the test output I / O port and the I / O port of the interface test module for internal test signals of the high-speed low-latency interconnect interface.
[0036] The P1500 controller is used to generate P1500 commands for testing.
[0037] The instruction stream generator is the engine used to execute commands within the high-speed, low-latency interconnect interface and the P1500.
[0038] The instruction unit is used to complete the decoding and distribution of internal instructions of the high-speed, low-latency interconnect interface.
[0039] Secondly, this application also provides a high-speed, low-latency interconnect topology for silicon dielectric layer interconnection, including multiple cores stacked on a silicon dielectric layer and at least one interconnect interface corresponding to each core.
[0040] The interconnection interface is the high-speed, low-latency interconnection interface described in the first aspect of this application.
[0041] In one embodiment, each of the interconnect interfaces includes at least one transmission channel; the transmission channels of the plurality of interconnect interfaces are symmetrical and identical to support interfacing between the plurality of interconnect interfaces.
[0042] The aforementioned high-speed, low-latency interconnect interface for silicon dielectric layer interconnect includes a physical layer and a link layer. The link layer is located between the physical layer and the internal logic of the chip. The link layer is used to receive signals from inside the chip and can perform control functions for the physical layer. The physical layer receives signals transmitted through the link layer, such as data signals converted by the link layer, and performs the transmission and reception of the data signals. For example, it can transmit the data signals through the silicon dielectric to the physical layer of another high-speed, low-latency interconnect interface, and receive signals transmitted by the physical layer of another high-speed, low-latency interconnect interface. The physical layer then transmits the signals to the link layer, which receives them and transmits them to the inside of the chip. This completes the transmission of data streams between high-speed, low-latency interconnect interfaces for silicon dielectric layer interconnect, providing protocol-free high-speed data transmission on the silicon dielectric layer and meeting requirements such as high-efficiency data transmission and high performance-to-power ratio.
[0043] In some embodiments, the aforementioned high-speed, low-latency interconnect interface for silicon dielectric layer interconnection can support multiple channels, each channel supporting parallel data transmission and compatible with DDR and SDR transmission modes. Each channel contains multiple transmission sub-channels, and each transmission sub-channel can provide at least 32 bits of data transmission. The high-speed, low-latency interconnect interface of this application can be configured with 1, 2, 4, 8, or more channels to meet the design requirements of different cases. All channels of the high-speed, low-latency interconnect interface are symmetrical and identical. For a multi-channel high-speed interface physical layer, it can support interconnection with multiple computing chips. Attached Figure Description
[0044] Figure 1 This is a schematic diagram of the interconnection of different chips (CPU / GPU / SoC / FPGA / memory chips, etc.) on a silicon dielectric layer in one embodiment;
[0045] Figure 2 This is a top-level architecture block diagram of a high-speed, low-latency interconnect interface in one embodiment;
[0046] Figure 3 This is a schematic diagram of the hierarchical relationship of a high-speed, low-latency interconnect interface in one embodiment;
[0047] Figure 4 This is a top-level design structure diagram of a high-speed, low-latency interconnect interface in one embodiment;
[0048] Figure 5 This is a top-level structural block diagram of the MASTER link layer in one embodiment;
[0049] Figure 6 This is a top-level structure block diagram of the logic control sub-channel in one embodiment;
[0050] Figure 7 This is a top-level structural block diagram of a transmission sub-channel in one embodiment;
[0051] Figure 8 This is a detailed logic structure diagram of the transmission sub-channel in one embodiment;
[0052] Figure 9 This is a diagram of the write data transmission path from the link layer to the physical layer output (side) in one embodiment;
[0053] Figure 10 This is a diagram of the read data reception path from the physical layer input (side) to the link layer in one embodiment;
[0054] Figure 11 This is a schematic diagram of the interconnection between chips with different numbers of channels in one embodiment. Detailed Implementation
[0055] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0056] The high-speed, low-latency interconnect interface for silicon dielectric layer interconnects provided in this application embodiment can be applied to, for example... Figure 1 The application environment shown is illustrated. Chip 1 and Chip 2 are stacked on a silicon dielectric layer. Each chip can correspond to one or more High-Speed Low-Latency Interconnect Interfaces (HLIIs). HLIIs provide protocol-free high-speed data transmission for the chips on the silicon dielectric layer. These chips can be CPUs, GPUs, SoCs, FPGAs, or memory chips, etc. This HLII design provides logic support for all HLII-compatible chips.
[0057] In one embodiment, such as Figure 2 As shown, a high-speed, low-latency interconnect interface for silicon dielectric layer interconnects is provided, which is then applied to... Figure 1 The following is an example of the interconnection of different chips on a silicon dielectric layer. This high-speed, low-latency interconnection interface includes a physical layer and a link layer, with the link layer located between the internal logic of the chip and the physical layer.
[0058] The link layer is used to receive signals from inside the core, transmit the signals from inside the core to the physical layer, and control the physical layer based on the signals from inside the core. The signals from inside the core include data signals, and transmitting the signals from inside the core to the physical layer includes converting the data signals and then sending them to the physical layer.
[0059] The physical layer is used to receive signals transmitted via the link layer, transmit the signals through the silicon dielectric to the physical layer of another high-speed low-latency interconnect interface, and receive signals transmitted by the physical layer of another high-speed low-latency interconnect interface, transmit the signals to the link layer, and transmit them to the core after being received by the link layer.
[0060] Specifically, data transmission in the high-speed low-latency interconnect interface is mainly achieved through the link layer and the physical layer. The link layer is located between the HLII physical layer and the internal logic of the chip. Signal transmission between the internal logic of the chip and the link layer, between the link layer and the physical layer, and between the physical layer and the physical layer of another high-speed low-latency interconnect interface through the silicon medium are all bidirectional.
[0061] For example, the internal logic resources of chip 1 send data signals to the link layer of the corresponding high-speed low-latency interconnect interface. After receiving the data signals, the link layer performs data conversion and sends them to the physical layer. The physical layer receives the data signals converted by the link layer and transmits the data signals through the silicon medium to the physical layer of another high-speed low-latency interconnect interface. Then, the physical layer of the other high-speed low-latency interconnect interface transmits the data signals to the link layer of the corresponding high-speed low-latency interconnect interface. Finally, the link layer transmits the data signals to the corresponding chip 2 to complete the data transmission from chip 1 to chip 2.
[0062] The aforementioned high-speed, low-latency interconnect interface for silicon dielectric layer interconnect includes a physical layer and a link layer. The link layer is located between the physical layer and the internal logic of the chip. The link layer is used to receive signals from inside the chip and can perform control functions for the physical layer. The physical layer receives signals transmitted through the link layer, such as data signals converted by the link layer, and performs the transmission and reception of the data signals. For example, it can transmit the data signals through the silicon dielectric to the physical layer of another high-speed, low-latency interconnect interface, and receive signals transmitted by the physical layer of another high-speed, low-latency interconnect interface. The physical layer then transmits the signals to the link layer, which receives them and transmits them to the inside of the chip. This completes the transmission of data streams between high-speed, low-latency interconnect interfaces for silicon dielectric layer interconnect, providing protocol-free high-speed data transmission on the silicon dielectric layer and meeting requirements such as high-efficiency data transmission and high performance-to-power ratio.
[0063] In one embodiment, such as Figure 3 As shown above, data transmission in the high-speed low-latency interconnect interface (HSI) is primarily implemented through the link layer and the physical layer. The link layer, situated between the HLII physical layer and the internal logic of the core, receives signals from within the core. These signals include data, configuration, and control signals, and the link layer performs functions such as data conversion, parity checking, training, channel repair, and instruction stream generation for the physical layer. The link layer mainly provides control functions to facilitate initialization, delay line calibration, and VT compensation of the HSI physical layer within the core. It can also be programmed and configured with the core's internal registers. The link layer has built-in self-test features for functional testing of the physical layer. The link layer and physical layer are connected via a dedicated data interface. In addition, it includes configuration status registers accessible through a configuration port. These registers are accessible via the APB interface and also have a separate optional TDR interface to improve ease of access to testing functions. The physical layer receives data signals converted by the link layer and completes the transmission and reception of data signals. The physical layer mainly includes high-speed I / O ports, FIFOs and related control logic. The high-speed I / O ports of the physical layer are compatible with both DDR mode and SDR mode.
[0064] This high-speed, low-latency interconnect interface provides protocol-free, high-speed data transmission for the chip on the silicon dielectric layer. See [link to relevant documentation]. Figure 4 , Figure 4 The diagram shows the top-level design architecture of this high-speed, low-latency interconnect interface. As shown, HLII includes a physical layer (PL) and a link layer (LL). The physical layer includes at least one transmission channel, a physical layer master (PL master), and an interface test stack. The link layer includes at least one logical control channel and a link layer master module (LL master).
[0065] The number of transmission channels and logic control channels is the same. The transmission channels are used to transmit data signals, and the modes of data signal transmission include DDR transmission mode and SDR transmission mode. The logic control channels are used to control and schedule the data flow of the transmission channels, and to perform functions such as timing calibration, impedance calibration, BIST process control, and channel repair.
[0066] The physical layer MASTER provides a global clock, reset signal, and Vref reference voltage for the entire physical layer. The interface test module is used to perform functional testing on the high-speed, low-latency interconnect interface.
[0067] The MASTER module at the link layer implements the control logic and can be shared by all channels, such as... Figure 5As shown, the link layer MASTER module includes a configuration module, a Master status register, an initialization engine, a training controller, a reset and test controller, a P1500 controller, an instruction stream generator, and an instruction unit. The Master control register contains all logic control registers shared by the entire interface; these CSRs are not included in the control registers already implemented in each channel. The configuration module is used for interacting with APB, TDR, and JTAG transactions to read and write control registers. Since the APB interface, JTAG interface, and MASTER control register operate in different clock domains, the configuration module, in addition to converting the configuration information of the APB and JTAG interfaces into internal control register data, also needs to perform cross-clock domain processing on the data. The initialization engine is used to implement the initialization process at the hardware level and, in conjunction with the control registers, initializes the HLII. Simultaneously, the initialization engine can also perform frequency switching, allowing the HLII to operate in different power consumption states. The training controller can automatically perform read latency, read data eye, write data eye, and reference voltage training. The instruction stream generator is the engine used to execute internal HLII commands and P1500 commands. The instruction unit can centrally complete the decoding and distribution of internal HLII commands. The reset and test controller can perform different controls: including reset generation, impedance calibration of I / O drives, global reference voltage provided to I / O receivers, test output I / O ports for monitoring internal test signals of HLII, and I / O ports of interface test modules.
[0068] In one embodiment, each transmission channel includes multiple transmission sub-channels (PHY Data WORD, PHY DWORD), each responsible for transmitting at least 32 bits of data signals. Each logic control channel includes multiple logic control sub-channels (Control Data WORD, Control DWORD), each corresponding one-to-one with a transmission sub-channel. The logic control sub-channel is used to control and schedule the data stream transmitted by the corresponding transmission sub-channel. For example, logic control channel 0 corresponds to transmission channel 0, and Control DWORD 0 in logic control channel 0 corresponds to PHY DWORD 0 in transmission channel 0. Control DWORD 0 controls and schedules the data stream transmitted by transmission channel 0.
[0069] In one embodiment, the physical layer is responsible for transmitting and receiving data. The entire physical layer can be configured with up to eight channels, and each channel consists of four PHY DWORDs forming a hierarchical relationship. Each PHY DWORD is responsible for transmitting 32-bit data signals, and each transmission channel supports 128-bit parallel data transmission, compatible with DDR and SDR transmission modes.
[0070] In addition to the full-speed 8-channel mode, in another embodiment, the high-speed low-latency interconnect interface can also be configured in 1, 2, or 4-channel modes to meet the design requirements of different cases. All high-speed low-latency interconnect interfaces have symmetrical and identical channels, and a multi-channel high-speed interface physical layer can support interconnection with multiple computing chips.
[0071] See Figure 6 The logic control subchannel (Control DWORD) includes a control module, a delay line controller, a DWORD loopback BIST, a data generation module, and a Read Status module. The control module controls the data path, carrying data signals that pass between the FPGA and the silicon dielectric through a high-speed, low-latency interface. It also includes data path remapping logic for interconnect redundancy and repair. The delay line controller controls, calibrates, and performs VT compensation on the four DWORD delay lines (WDQS_t / c delay line, DQ delay line, RDQS_t delay line, and RDQS_c delay line). The DWORD loopback BIST provides the BIST logic for loopback and delay line testing. The data generation module generates data for training and testing.
[0072] See Figure 7The transmission subchannel (PHY DWORD) is mainly used to complete the transmission and reception functions of data and signals. In one embodiment, the PHY DWORD includes multiple DWORD bit slices, a transmit clock generation module, a receive clock generation module, a DWORD FIFO controller, a delay line tester, and an Rx clock buffer. Each PHY DWORD can process 48 bits of data signal, that is, it contains 48 DWORD bit slices. Each bit slice consists of one transmit data FIFO, one receive data FIFO, one transmit I / O, and one receive I / O. The transmit clock generation module is used to generate a high-speed clock to provide the clock for all data bit slices except for the four DWORD bit slices used for the WDQS_t, WDQS_c, RDQS_t, and RDQS_c signals. It consists of one delay line and some glue logic. The receive clock generation module generates a high-speed clock for the four DWORD bits used for the WDQS_t, WDQS_c, RDQS_t, and RDQS_c signals. It consists of one delay line and some glued logic; the DWORD bits for RDQS_t and RDQS_c are only used for loopback. The receive clock generation module also captures the clock for read data. It consists of two delay lines (one for RDQS_t and one for RDQS_c) and some glued logic. The transmit FIFO transmits command and data signals, and simultaneously transmits data and command signals from the HLII internal clock domain to the I / O clock domain. The receive FIFO receives data signals, and simultaneously receives data from the RDQS domain to the HLII internal clock domain. The delay line tester fine-tunes the delay on the transmit clock to focus the clock on the data eye. It is also fundamental to HLII training. A delay line test module is designed for the delay line PHY DWORD for testing the delay line ring oscillator. The DWORD FIFO controller is used to control the transmit data FIFO and receive data FIFO in the DWORD bit slice. The Rx clock buffer is used to increase clock drive.
[0073] In HLII, high-speed data signal transmission between die-to-die is accomplished based on the PHYDWORD in the physical layer. One PHY DWORD contains logic for 4 bytes, each byte has a dedicated data mask (DM) and a dedicated data bus inversion (DBI) signal, but all 4 bytes share the same data strobe pair. Figure 8 The diagram shows the detailed logical structure of the PHY DWORD and the corresponding block diagram of the Control DWORD in the link layer.
[0074] The PHY DWORD contains the transmission and loopback paths for the aforementioned 48 data signals. Specifically, these signals include:
[0075] Write data strobe pairs (WDQS_t and WDQS_c);
[0076] Read data strobe pairs (RDQS_t and RDQS_c);
[0077] Data input / output (DQ[31:0]);
[0078] Data bus reversed (DBI[3:0]);
[0079] Data mask (DM[3:0]);
[0080] Data parity check (PAR);
[0081] Data parity error (DERR);
[0082] Redundant data (RD[1:0]);
[0083] Although the data gating is unidirectional under normal read and write operations, the receiver and driver are implemented separately for write and read gating in loopback test mode.
[0084] The PHY DWORD uses write data interfaces (wrdata and wrdata_en) to interact with the HLII internal link layer for data signaling. The link layer uses the wrdata_en signal to execute write transactions. Each PHY DWORD has its own independent wrdata_en signal, which allows HLII to operate in pseudo-channel mode or legacy mode. Figure 9 The details of the write data transmission path from the DFI controller to the PHY output (side) are shown.
[0085] The timing and control information for sending write data signals is written from the link layer to the transmit command FIFO. This information includes transmit enable, enabling the transmit data FIFO read clock, and updating the delay value (TxPhaseUpd) of the delay line on the transmit data FIFO read clock. TxEn and TxClkEn are wrdata_en signals from within the link layer, which are enabled only when valid write data is sent by the controller (that is, when wrdata_en is valid).
[0086] Write data from the link layer passes through a remapping module within the link layer before entering the physical layer. This is to prevent some data paths in HLII from needing to be remapped as a result of interconnect redundancy repair (interconnect redundancy). Each write data signal also passes through an optional coarse delay pipeline used to delay signals within the link layer.
[0087] Data sent on the data (DQ) signal is transmitted through the transmit data FIFO in the PHY DWORD. The FIFO is only written to when valid write data is sent by the controller (that is, when wrdata_en is valid). The receive data FIFO is read using a clock delayed by a delay line, and the output of the transmit data FIFO passes through a clock-controlled transmit circuit.
[0088] For both the data transmit and receive FIFOs, the ratio of the write clock rate to the read clock rate is 1:2. Therefore, each input of the FIFO is 2 bits wide, and the output is 1 bit wide. Consequently, the depth ratio of the FIFOs on the read and write sides is also 2:1. If the depth of the FIFO on the write side is 6, then the depth on the read side is 12.
[0089] The output enable of the transmit circuit does not pass through the transmit data FIFO, but it does pass through the transmit command FIFO, thus being identical for all data channels within a DWORD. Furthermore, to reduce circuit area, the output within the transmit circuit is not clock-controlled. Because the timing requirements around the coordinator are less stringent, the timing at the transmit output enable is relatively relaxed. To provide more flexibility in the timing requirements of the same TxEn signal relative to TxDat in any channel, the timing of the transmit enable (TxEn) can be adjusted using the state control registers within the link layer, thereby providing more margin for setup and / or hold relative to the transmit data (TxDat).
[0090] The PHY DWORD interacts with the link layer using read data interfaces (rddata, rddata_en, and rddata_valid). The link layer enables the read transaction using the rddata_en signal; each PHY DWORD has its own independent rddata_en signal, which allows the controller to operate the PHY in pseudo-channel mode or legacy mode. Data is returned to the link layer using the rddata signal, and the rddata_valid signal confirms data reception. Figure 10 It shows the details of the read data reception path from the physical layer input (side) to the link layer.
[0091] The timing and control information for sending read data signals is written from the link layer to the transmit command FIFO. This information includes updating the delay line (RxPhaseUpd) on the read clock of the receive data FIFO. RxPhaseUpd and other general FIFO controls (such as pointer initialization) are generated by the channel initialization module during the initialization or VT update process.
[0092] Read data (DQ) from the external die DIE is sampled through the DQ receive I / O port and written to the receive data FIFO using read data strobes (RDQS_t / RDQS_c). The read data strobe is delayed via a delay line to allow the strobe signal to be aligned to the center of the read data eye. Since the external die DIE drives RDQS_t and RDQS_c respectively, both LOW and HIGH values are valid simultaneously. The only time the strobe needs to be masked is during the pre-initialization state before the memory reset is invalidated. Therefore, by default, the strobe signal is not enabled until the reset signal is invalidated. The masking behavior of the read strobe can be modified using the HLII's internal control status register.
[0093] The link layer uses a read data enable signal to enable the readout of the receive data FIFO. Data from the receive data FIFO passes through a remapping module in the link layer to prevent some data paths from needing remapping, as a result of interconnect redundancy repair in HLII. After calibration, the read data enable signal generates a read data valid signal to discard invalid data locked into FIFOs on the pre-amble or post-amble at the rising edge of the data strobe. The read data latency is typically compensated for by the number of latency cycles obtained during training.
[0094] The write clock to read clock ratio of the receive data FIFO is 1:1, and the write side has a dual data rate. Therefore, each FIFO has a 2-bit input and a 2-bit output.
[0095] Secondly, embodiments of this application also provide a high-speed, low-latency interconnect topology for silicon dielectric layer interconnection, including multiple chips stacked on a silicon dielectric layer and at least one interconnect interface corresponding to each chip, wherein the interconnect interface is the high-speed, low-latency interconnect interface described in the first aspect of embodiments of this application.
[0096] In one embodiment, the number of transmission channels of the interconnect interface is 1, 2, 4, 8 or more; the transmission channels of the multiple interconnect interfaces are symmetrical and identical to support the interfacing between multiple interconnect interfaces.
[0097] Figure 11 The diagram illustrates the interconnection between chips with different numbers of high-speed, low-latency interconnect interfaces. As shown in the figure, chip 1 has 8 HLII channels, while chips 2 and 3 each have 4 HLII channels. Because the high-speed, low-latency interconnect interfaces are symmetrical and identical, a single multi-channel high-speed, low-latency interconnect interface can interface with multiple high-speed, low-latency interconnect interfaces. Figure 11This demonstrates the interconnection between an 8-channel FPGA die and two 4-channel FPGA dies.
[0098] The high-speed, low-latency interconnect interface for silicon dielectric layer interconnection provided in this application embodiment is used for large-scale I / O interconnection on the silicon dielectric layer. It includes a physical layer and a link layer. The link layer is located between the physical layer and the internal logic of the chip. The link layer is used to receive signals from the internal chip and can perform control functions for the physical layer. The physical layer receives signals transmitted through the link layer, such as data signals converted by the link layer, and completes the transmission and reception of the data signals. For example, it transmits the data signals through the silicon dielectric to the physical layer of another high-speed, low-latency interconnect interface. It also receives signals transmitted by the physical layer of another high-speed, low-latency interconnect interface and transmits the signals to the link layer. After being received by the link layer, the signals are transmitted to the internal chip, thus completing the transmission of data streams between high-speed, low-latency interconnect interfaces for silicon dielectric layer interconnection. This provides high-speed data transmission without protocols on the silicon dielectric layer, meeting requirements such as high-efficiency data transmission and high performance-to-power ratio.
[0099] Furthermore, the aforementioned high-speed, low-latency interconnect interface for silicon dielectric layer interconnection can support multiple channels, each supporting parallel data transmission and compatible with DDR and SDR transmission modes. Each channel contains multiple transmission sub-channels, and each transmission sub-channel can provide at least 32 bits of data transmission. The high-speed, low-latency interconnect interface of this application can be configured with 1, 2, 4, 8, or more channels to meet the design requirements of different cases. All channels of the high-speed, low-latency interconnect interface are symmetrical and identical. For a multi-channel high-speed interface physical layer, it can support interconnection with multiple computing chips.
[0100] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0101] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A high-speed, low-latency interconnect interface for silicon dielectric layer interconnects, characterized in that, This includes the physical layer and the link layer between the internal logic of the chip and the physical layer; The link layer is used to receive signals from inside the core, transmit the signals from inside the core to the physical layer, and control the physical layer based on the signals from inside the core. The signals from inside the core include data signals, and transmitting the signals from inside the core to the physical layer includes data conversion before sending the data signals to the physical layer. The signals from inside the core also include configuration signals and control signals, and controlling the physical layer includes data conversion, parity checking, training, channel repair, and instruction stream generation. The physical layer is used to receive signals transmitted via the link layer, transmit the signals through the silicon dielectric to the physical layer of another high-speed low-latency interconnect interface, and receive signals transmitted by the physical layer of another high-speed low-latency interconnect interface, transmit the signals to the link layer, and transmit them to the core after being received by the link layer. The physical layer includes at least one transmission channel, and the link layer includes at least one logical control channel, wherein the number of transmission channels and logical control channels is the same; the transmission channel is used to transmit data signals, and the mode of transmitting data signals includes DDR transmission mode and SDR transmission mode. Each transmission channel includes multiple transmission sub-channels, and each transmission sub-channel is responsible for transmitting at least 32 bits of data signals; the logical control channel is used to control and schedule the data flow of the transmission channel. Each logical control channel includes multiple logical control sub-channels, and the logical control sub-channels correspond one-to-one with the transmission sub-channels. The logical control sub-channels are used to control and schedule the data flow transmitted by the corresponding transmission sub-channel.
2. The high-speed, low-latency interconnect interface according to claim 1, characterized in that, The transmission subchannel includes multiple DWORD bit slices, a transmit clock generation module, a receive clock generation module, a DWORD FIFO controller, a delay line tester, and an Rx clock driver; Each of the DWORD bit slices includes a transmit data FIFO, a receive data FIFO, transmit I / O, and receive I / O; The transmit clock generation module is used to generate a high-speed clock; The receiving clock generation module is used to generate a high-speed clock and a clock for capturing read data; The DWORD FIFO controller is used to control the transmit data FIFO and receive data FIFO in the DWORD bit slice; The delay line tester is used to fine-tune the delay on the transmit clock to focus the clock on the data eye; The Rx clock driver is used to add clock drivers.
3. The high-speed, low-latency interconnect interface according to claim 1, characterized in that, The logic control sub-channel includes a control module, a delay line controller, a DWORD loopback BIST, a data generation module, and a data checking module. The control module is used to control the data path and carry data signals; The delay line controller is used for the control, calibration, and VT compensation of the DWORD delay line; The DWORD loopback BIST is used for loopback and delay line testing of the BIST logic. The data generation module and the data inspection module are used to generate training and testing data.
4. The high-speed, low-latency interconnect interface according to claim 1, characterized in that, The physical layer also includes the physical layer Matser and an interface testing module; The physical layer Matser is used to provide the physical layer with a global clock, reset signal and reference voltage; The interface testing module is used to perform functional testing on high-speed, low-latency interconnect interfaces.
5. The high-speed, low-latency interconnect interface according to claim 1, characterized in that, The link layer also includes a link layer MASTER module, which includes a configuration module, a Master status register, an initialization engine, a training controller, a reset and test controller, a P1500 controller, an instruction stream generator, and an instruction unit. The configuration module is used to interact with APB interface, TDR interface, and JTAG interface transactions to perform CSR reading and writing; The Master status register includes all logical status registers that can be shared by the entire interface; The initialization engine is used to implement the initialization process at the hardware level and, in conjunction with the status register, to perform the initialization operation of the high-speed, low-latency interconnect interface. The training controller is used to automatically perform training on read delay, read data eye, write data eye, and reference voltage. The test controller is used to reset the generation, calibrate the impedance of the I / O drive, provide a global reference voltage for the data receiving I / O, monitor the test output I / O port of the high-speed low-latency interconnect interface internal test signal, and the I / O port of the interface test module. The P1500 controller is used to generate P1500 commands for testing. The instruction stream generator is the engine used to execute commands within the high-speed, low-latency interconnect interface and the P1500. The instruction unit is used to complete the decoding and distribution of internal instructions of the high-speed, low-latency interconnect interface.
6. A high-speed, low-latency interconnect topology for silicon dielectric layer interconnects, characterized in that, It includes multiple chips stacked on a silicon dielectric layer and at least one interconnect interface corresponding to each chip; The interconnection interface is the high-speed, low-latency interconnection interface as described in any one of claims 1 to 5.
7. The high-speed, low-latency interconnect topology device according to claim 6, characterized in that, Each of the interconnect interfaces includes at least one transmission channel; the transmission channels of the multiple interconnect interfaces are symmetrical and identical to support the interfacing between the multiple interconnect interfaces.
Citation Information
Patent Citations
Inter-chip high-speed interconnection link layer design method and system
CN104767828A
Inter-chip interconnecting interface based on layered structure and writing and reading operation method thereof
CN106502932A