A logic structure partitioning and netlist optimization collaborative method for multi-core FPGA

By introducing macroscopic physical partitioning constraints into the logic synthesis of multi-core FPGAs, the problem of excessive cross-core interconnects is solved, the system frequency and data throughput are improved, and the performance requirements of high-speed interfaces are met.

CN122287508APending Publication Date: 2026-06-2658TH RES INST OF CETC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
58TH RES INST OF CETC
Filing Date
2026-03-31
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing multi-chip FPGA systems suffer from excessive cross-chip interconnects due to the separation of logic synthesis and physical partitioning, resulting in low system frequency and inability to meet the timing and performance requirements of high-speed interfaces such as DDR and PCIe.

Method used

In the micro-optimization process of logic synthesis, macro-physical partitioning constraints are introduced. The logic node regions are pre-partitioned using a hypergraph partitioning algorithm. The evaluation criteria of the logic synthesis optimization operator are redefined. Interconnection costs are introduced. An active structural reorganization strategy and a dynamic interconnection penalty mechanism are adopted. The total number of cross-core interconnections is monitored in real time, and the penalty coefficient is dynamically adjusted to prevent constraint failure and achieve closed-loop iteration of partitioning and optimization.

Benefits of technology

It significantly reduces the reliance of multi-core FPGA systems on time-division multiplexing technology, improves operating frequency and data throughput, optimizes resource utilization, and meets the timing and performance requirements of high-speed interfaces.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122287508A_ABST
    Figure CN122287508A_ABST
Patent Text Reader

Abstract

This invention discloses a collaborative method for logic structure partitioning and netlist optimization in multi-core FPGAs, belonging to the fields of electronic design automation and integrated circuits. This invention introduces physical partitioning constraints during the logic synthesis stage, establishing an interconnect-aware synthesis cost function that integrates area, timing, and cross-core interconnect costs. The method includes: pre-partitioning the netlist and anchoring its locations using a hypergraph partitioning algorithm; real-time selection and optimization operators based on dynamically adjusted interconnect penalty coefficients during logic transformation; implementation of suppression mechanisms for high interconnect cost operations; and active logic cloning and topology reconstruction for long-distance cross-core paths. This invention breaks the traditional boundaries between logic synthesis and physical partitioning, effectively reducing the number of cross-core interconnects at the source through closed-loop iteration of micro-optimization and macro-constraints, significantly reducing the risk of wiring congestion, and improving the operating frequency and performance of multi-core FPGA systems.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of electronic design automation (EDA) and integrated circuit design technology, and in particular to a collaborative method for logic structure partitioning and netlist optimization for multi-chip FPGAs. Background Technology

[0002] With the exponential growth in integrated circuit design scale, single-chip FPGAs often struggle to meet the implementation requirements of large-scale circuits. The industry widely adopts multi-chip FPGA systems based on advanced packaging technologies to expand logic capacity. However, the I / O pin resources between chips are limited, and the interconnect bandwidth between chips is a core bottleneck restricting the overall system performance.

[0003] Currently, multi-core FPGA EDA processes are mainly divided into two types of processing methods: The first category is the register-transfer level (RTL) manual partitioning method. This method typically partitions the code based on the functional module boundaries of the design (such as CPU cores, DDR controllers, etc.). However, RTL code only describes the logical functions and lacks specific gate area and timing information, often resulting in extremely uneven resource utilization across different cores. Furthermore, the granularity of RTL partitioning is too coarse, making it difficult to handle large amounts of fine-grained "glue logic" and cross-module control signals, and hindering in-depth optimization of the circuit structure within each module.

[0004] The second type is the mainstream sequential mode of "logic synthesis first, then netlist partitioning." First, in the logic synthesis stage, synthesis tools (such as ABC) focus on optimizing the logic depth and area of ​​the intermediate representation (AIG / MIG) of the netlist. Their core optimization operators (such as Rewrite, Resubstitution, and Refactor) typically perform greedy optimization based on local windows, completely unaware of subsequent physical location information. This often leads to tools forcibly reusing extremely distant logic nodes to save a few logic gates, introducing a large number of long-distance interconnect signals. Second, after generating the gate-level netlist, the backend partitioning tool typically uses graph partitioning algorithms (such as KL algorithm, FM algorithm, or hMetis algorithm) to cut the netlist into different granules. However, since the netlist topology is fixed in the synthesis stage, once the synthesis tool generates a large number of "entangled" logics crossing potential physical boundaries for micro-optimization, the backend partitioning tool can only passively cut these connections.

[0005] The aforementioned technical shortcomings result in large cross-chip interconnects in existing multi-chip FPGA systems, forcing the systems to employ high-proportion time-division multiplexing (TDM) technology for signal transmission. This not only significantly increases transmission latency but also severely reduces the operating frequency of multi-chip FPGA systems, failing to meet the timing and performance requirements of high-speed interfaces such as DDR and PCIe. Therefore, a collaborative optimization method that can perceive physical boundaries during the logic synthesis stage is urgently needed. Summary of the Invention

[0006] The purpose of this invention is to provide a collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs, in order to solve the problems of excessive cross-core interconnects and low system frequency caused by the separation of logic synthesis and physical partitioning in the prior art.

[0007] To address the aforementioned technical problems, this invention provides a collaborative method for logic structure partitioning and netlist optimization in multi-core FPGAs. In the micro-optimization process of logic synthesis, macro-level physical partitioning constraints are introduced, and collaborative optimization is achieved through the following steps: Step 1: Before performing logic optimization, read the initial netlist and the architecture description file of the multi-core FPGA, pre-divide the netlist using the hypergraph partitioning algorithm, assign an initial region identifier to each logic node, and establish optimized location anchor points; Step 2: Redefine the evaluation criteria for the logic synthesis optimization operator, introducing interconnection cost in addition to traditional area and timing costs; cost function Cost Represented as:

[0008] in, This indicates the change in the number of logic gates caused by a logical transformation. This indicates the change in logical depth caused by logical transformations. These are the time-series weighting coefficients. This indicates whether the candidate transformation introduces new connections between nodes identified in different regions. The interconnection penalty coefficient; Step 3: Calculate the cost of candidate transformations in real time during logic rewriting and resubstitution. Step 4: For cross-core signals on the critical path, adopt an active structural recombination strategy; Step 5: Since the logic optimization changes the topology of the netlist, incremental partitioning is triggered periodically. The region identifier of the node is calibrated according to the current connection relationship to prevent constraint failure and realize closed-loop iteration of partitioning and optimization.

[0009] In one implementation, step 3 introduces a dynamic adjustment strategy, which monitors the current total number of cross-core interconnects in real time and compares the total number of cross-core interconnects with a preset congestion threshold; when the total number of cross-core interconnects is less than the preset congestion threshold, the interconnection penalty coefficient is adjusted. Set as a baseline constant to allow necessary logic multiplexing; when the number of cross-core interconnects is greater than or equal to a preset congestion threshold, the interconnect penalty coefficient increases exponentially with the increase in the total number of cross-core interconnects. Operations with high interconnection costs are rejected, thus suppressing cross-boundary connections at the source.

[0010] In one implementation, the active structural reorganization strategy in step 4 includes: when the source node and the load node are detected to be located in different cores and timing is tight, the algorithm backtracks the fan-in logic cone of the source node; if the number of logic nodes contained in the logic cone is less than the number of logic gate changes, or the logic cone already exists in the target core, then the source node is copied in the target core, and the original cross-core signal is converted into a local signal inside the core.

[0011] In one embodiment, the method is implemented based on an EDA computer system, which includes: a processor, RAM, a storage medium, and a bus interface connecting the various components; wherein, The processor includes a general-purpose central processing unit, an application-specific integrated circuit, or a field-programmable gate array, and is used to execute computer program instructions stored in a storage medium; The RAM memory is used to provide temporary data storage space required for processor operation; The storage medium includes a hard disk, an optical disk, or a flash memory non-volatile computer-readable storage medium, on which an operating system and a computer program module for performing the method are stored.

[0012] In one embodiment, the storage medium specifically includes the following functional modules: The design and constraint loading module is configured to receive user-input netlist files and architecture description files for multi-core FPGAs; its functions include resolving the number of cores, interconnect topology, and maximum available I / O bandwidth limit between each pair of cores, providing a physical constraint basis for subsequent optimization. The pre-partitioned hypergraph segmentation module is configured to perform coarse-grained physical partitioning of the initial netlist using a hypergraph partitioning algorithm before logic optimization is executed; its specific operation is to assign an initial region identifier to each logical node in the netlist, thereby establishing the location anchor point for logic synthesis. The interconnect cost calculation engine is the core decision-making unit, configured to calculate the synthesis cost of candidate transformation operators in real time during the logic synthesis process. The interconnect cost calculation engine integrates an interconnect-aware cost function and is also responsible for executing a dynamic penalty strategy, monitoring the current total number of cross-core interconnects in real time, and dynamically adjusting the value of the interconnect penalty coefficient according to how close it is to the I / O bandwidth limit. The logic optimization engine is configured to perform specific netlist structure transformation operations based on the evaluation results of the interconnect cost calculation engine. Its specific functions include: ① Executing regular optimization operators: performing logic rewriting and resubstitution operations on the netlist, and updating the netlist only when the calculated comprehensive cost decreases; ② Executing proactive structure reorganization: for cross-core signals on the critical path, when timing constraints and excessive interconnect costs are detected, proactive logic cloning operations are performed to copy the logic structure in the target core region, converting the original cross-core signals into local signals within the core to eliminate long-distance interconnects; The incremental update module is configured to periodically trigger incremental partitioning and calibration after the logic optimization engine changes the topology of the netlist. Its function is to update the region identifier of the node according to the changed connection relationship, prevent the physical constraints from failing due to the topology change, and realize the closed-loop iteration of partitioning and optimization.

[0013] This invention provides a collaborative method for logic structure partitioning and netlist optimization in multi-core FPGAs. By introducing macroscopic physical partitioning constraints during the micro-optimization process of logic synthesis, it breaks the boundary between logic optimization and physical partitioning in the traditional process. Utilizing the structural plasticity of the logic netlist, it actively performs cross-boundary logic cloning and structural reorganization, thereby eliminating a large number of unnecessary cross-core interconnects at the source. This invention overcomes the shortcomings of RTL-level partitioning, which is too coarse to penetrate deep into the module's internal optimization. It also avoids the passive cutting problem caused by the solidification of netlist topology in traditional post-synthesis partitioning. It can effectively decouple complex control logic and glue logic, significantly reducing the dependence of multi-core FPGA systems on TDM technology, and thus greatly improving the operating frequency and data throughput of high-capacity FPGA systems. Attached Figure Description

[0014] Figure 1 This is a schematic diagram of the overall process of the method of the present invention.

[0015] Figure 2 This is a schematic diagram illustrating the calculation logic of the interconnection-aware optimization cost function in this invention.

[0016] Figure 3 This is a schematic diagram of two differentiated processing logics of the interconnection sensing optimization mechanism in this invention.

[0017] Figure 4 This is a schematic diagram illustrating the dynamic adjustment trend of the interconnection penalty coefficient in this invention.

[0018] Figure 5 A block diagram of an electronic design automation system that enables the implementation of the method of the present invention. Detailed Implementation

[0019] The following detailed description, in conjunction with the accompanying drawings and specific embodiments, provides a further detailed explanation of the collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs proposed in this invention. The advantages and features of this invention will become clearer from the following description. It should be noted that the accompanying drawings are all in a very simplified form and use non-precise proportions, and are only used to facilitate and clarify the illustration of the embodiments of this invention.

[0020] This embodiment provides a collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs. The overall algorithm flow is as follows: Figure 1 As shown, the method provided by this invention mainly includes: Step 1: Read the initial AIG netlist and load the architecture description file of the multi-chip FPGA to obtain the number of chips (e.g., 4) and the maximum number of available I / Os between each pair of chips (e.g., 500 interconnects). Step 2: Perform k-way hypergraph partitioning on the AIG netlist using the hMetis graph partitioning algorithm. The AIG netlist is represented at the underlying data structure as a directed acyclic graph consisting of AIG nodes representing logical operations and their interconnecting edges. Assume the AIG netlist is partitioned into... P 0, P 1, P 2, P 3. Four regions. At this time, each AIG node n Get an initial property ; Step 3: Traverse the logical nodes (AIG nodes) in the netlist, take them as target nodes to be optimized, and try to apply structural transformations such as Rewrite and Resubstitution to each target node; Step 4: At each step, when attempting candidate transformations using the above operators, the interconnect-aware cost function is called for evaluation. For example... Figure 2 As shown, the cost function first determines whether the source node and load node involved in the logic transformation are located in the same core region. If they are different, the interconnection penalty term is calculated and combined with the dynamic coefficient. The total cost is calculated. The algorithm determines the benefit based on this total cost, and only performs the transformation and updates the netlist topology if the total cost decreases. Step 5: After each complete traversal, due to significant changes in the topology and connectivity of the netlist caused by logic optimization, an incremental partitioning is triggered again for calibration. To prevent physical constraints from failing.

[0021] Combination Figure 3 and Figure 4 The present invention further elaborates on the adaptive optimization logic based on the dynamic interconnection penalty mechanism in the logic synthesis stage of the method described in this invention, as well as the differentiated processing mode for different physical topology constraints.

[0022] like Figure 3 As shown, a multi-core FPGA system is assumed to contain four core regions arranged side-by-side on the same interposer layer, labeled as core A, core B, core C, and core D, with a maximum of 500 available I / O pairs between each pair of cores. Due to the difference in physical distance between cores, long-distance interconnects across non-adjacent cores (such as cores A and D) incur extremely high latency costs and wiring resource consumption. To intuitively verify the significant regulatory effect of the "dynamic interconnect penalty mechanism" on logic structure optimization in this invention, the maximum change in logic depth before and after the logic transformation is set to be extremely small (i.e., assuming...) during the quantization calculation process in this embodiment. This embodiment is based on... Figure 4 The dynamic curve shown determines the interconnect penalty coefficient. This demonstrates the differences in decision-making by the algorithm under different I / O loads.

[0023] Optimization Mechanism 1: Reject high-cost replacements. For example... Figure 3 As shown in mechanism (a) in the figure, when the algorithm traverses to region A of the core, it identifies the node. N There is a potential opportunity for resubsitization optimization. The proposed scheme utilizes the nodes of core D. S The output signal is used to replace the node. N To reduce the number of logic gates by 2 (i.e.) However, this will introduce a cross-core connection ( The algorithm performs the following differentiated calculations and decisions based on the current I / O load.

[0024] Scenario 1: The system is in a region with ample I / O resources, at which point the total number of inter-chip interconnects is far from reaching the physical bandwidth limit. Based on... Figure 4 Curve, interconnection penalty coefficient Maintain at the low baseline value of 1. Substitution is allowed.

[0025] Scenario 2: The system is in the I / O resource congestion zone, at which point the total number of inter-core interconnects is approaching the physical bandwidth limit. It enters the region of non-linear exponential growth. (Assumption) It is 10, at this time The algorithm determines that the cost outweighs the benefit and refuses to execute the substitution, thus avoiding the increase in cross-core interconnects with the area of ​​two logic gates and ensuring the system's wiredability.

[0026] Optimization Mechanism Two: Proactive Logical Cloning. For example... Figure 3 As shown in mechanism (b) 2, the algorithm detects the driving node located in core A. K The need to drive a set of load nodes (L1 to L3) located in core C resulted in the existence of an original long-distance connection. The proposed scheme aims to replicate nodes within core C of the target region. K The logical function is to generate clone nodes. K This is to drive this group of load nodes via local short connections. Assume that core C contains... K 'Pre-condition logic, i.e. .

[0027] Scenario ①: The system is in a range with ample I / O resources ( ), , refuse to be cloned.

[0028] Scenario 2: The system is in the I / O resource congestion zone ( ), The calculation result shows a significant negative value. The algorithm performs this cloning operation, which significantly alleviates the interconnection bottleneck between chips and reduces signal transmission delay by sacrificing a small amount of logic area.

[0029] This embodiment provides an EDA system capable of implementing the aforementioned collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs. Figure 5 The hardware structure and functional module block diagram of the EDA computer system are shown.

[0030] like Figure 5 As shown, the EDA computer system includes a processor, RAM memory, a storage medium, and bus interfaces connecting the various components. The processor can be a general-purpose central processing unit (CPU), an application-specific integrated circuit (ASIC), or a field-programmable gate array (FPGA), used to execute computer program instructions stored in the storage medium. The RAM memory provides temporary data storage space required for processor operation. The storage medium can be a non-volatile computer-readable storage medium such as a hard disk, optical disk, or flash memory, on which an operating system and computer program modules for executing the method of this invention are stored.

[0031] According to the technical solution of the present invention, the storage medium specifically includes the following functional modules: 1. Design and Constraint Loading Module: This module is configured to receive user-input netlist files (such as RTL code or gate-level netlists) and architecture description files for multi-core FPGAs; its functions include resolving the number of cores, interconnect topology, and the maximum available I / O bandwidth limit between each pair of cores, providing a physical constraint basis for subsequent optimization.

[0032] 2. Pre-partitioned Hypergraph Segmentation Module: This module is configured to perform coarse-grained physical partitioning of the initial netlist using a hypergraph partitioning algorithm before logic optimization is executed. Specifically, it assigns an initial region identifier to each logical node in the netlist, thereby establishing the location anchor point for logic synthesis.

[0033] 3. Interconnect Cost Calculation Engine: This module is the core decision-making unit of this system, configured to calculate the synthesis cost of candidate transform operators in real time during logic synthesis. This engine integrates the interconnect-aware cost function proposed in this invention. Simultaneously, this engine is also responsible for executing a dynamic penalty strategy, monitoring the current total number of cross-core interconnects in real time, and dynamically adjusting the interconnect penalty coefficient value based on how close it is to the I / O bandwidth limit.

[0034] 4. Logic Optimization Engine: This module is configured to perform specific netlist structure transformation operations based on the evaluation results of the interconnect cost calculation engine. Its specific functions include: ① Executing regular optimization operators: Performing operations such as Rewrite and Resubstitution on the netlist, and updating the netlist only when the calculated overall cost decreases; ② Executing proactive structural reorganization: For cross-core signals on the critical path, when timing constraints and excessive interconnect costs are detected, proactive logic cloning operations are performed to copy the logic structure in the target core region, converting the original cross-core signals into local signals within the core to eliminate long-distance interconnects.

[0035] 5. Incremental Update Module: This module is configured to periodically trigger incremental partitioning and calibration after the logic optimization engine changes the topology of the netlist. Its function is to update the region identifier of the node according to the changed connection relationship, to prevent the failure of physical constraints due to topology changes, and to realize closed-loop iteration of partitioning and optimization.

[0036] System workflow: The user transmits the user-input netlist file to the EDA computer system. The system, through the processor, calls the above-mentioned modules in RAM and storage media to work together to perform physical-aware logical synthesis and optimization of the netlist, and finally generates an output optimized netlist that achieves a balance in area, timing and interconnect resources. This output file can be directly used for subsequent placement and routing processes.

[0037] The above description is merely a description of preferred embodiments of the present invention and is not intended to limit the scope of the present invention in any way. Any changes or modifications made by those skilled in the art based on the above disclosure shall fall within the protection scope of the claims.

Claims

1. A collaborative method for logic structure partitioning and netlist optimization in multi-core FPGAs, characterized in that, In the micro-optimization process of logic synthesis, macro-level physical partitioning constraints are introduced, and collaborative optimization is achieved through the following steps: Step 1: Before performing logic optimization, read the initial netlist and the architecture description file of the multi-core FPGA, pre-divide the netlist using the hypergraph partitioning algorithm, assign an initial region identifier to each logic node, and establish optimized location anchor points; Step 2: Redefine the evaluation criteria of the logic synthesis optimization algorithm, in addition to the traditional area and timing cost, introduce the interconnection cost; Cost function Cost is represented as: in, This indicates the change in the number of logic gates caused by a logical transformation. This indicates the change in logical depth caused by logical transformations. These are the time-series weighting coefficients. This indicates whether the candidate transformation introduces new connections between nodes identified in different regions. The interconnection penalty coefficient; Step 3: Calculate the cost of candidate transformations in real time during logic rewriting and resubstitution. Step 4: For cross-core signals on the critical path, adopt an active structural recombination strategy; Step 5: Since the logic optimization changes the topology of the netlist, incremental partitioning is triggered periodically. The region identifier of the node is calibrated according to the current connection relationship to prevent constraint failure and realize closed-loop iteration of partitioning and optimization.

2. The collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs as described in claim 1, characterized in that, Step 3 introduces a dynamic adjustment strategy, which monitors the current total number of cross-core interconnects in real time and compares the total number of cross-core interconnects with a preset congestion threshold; when the total number of cross-core interconnects is less than the preset congestion threshold, the interconnection penalty coefficient is adjusted. Set as a baseline constant to allow necessary logic multiplexing; when the number of cross-core interconnects is greater than or equal to a preset congestion threshold, the interconnect penalty coefficient increases exponentially with the increase in the total number of cross-core interconnects. Operations with high interconnection costs are rejected, thus suppressing cross-boundary connections at the source.

3. The collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs as described in claim 1, characterized in that, The active structural reorganization strategy in step 4 includes: when the source node and the load node are detected to be located in different cores and timing is tight, the algorithm backtracks the fan-in logic cone of the source node; if the number of logic nodes contained in the logic cone is less than the number of logic gate changes, or if the logic cone already exists in the target core, then the source node is copied in the target core, and the original cross-core signal is converted into a local signal inside the core.

4. The collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs as described in claim 1, characterized in that, The method is implemented based on an EDA computer system, which includes: a processor, RAM, storage media, and bus interfaces connecting the various components; wherein, The processor includes a general-purpose central processing unit, an application-specific integrated circuit, or a field-programmable gate array, and is used to execute computer program instructions stored in a storage medium. The RAM memory is used to provide temporary data storage space required for processor operation; The storage medium includes a hard disk, an optical disk, or a flash memory non-volatile computer-readable storage medium, on which an operating system and a computer program module for performing the method are stored.

5. The collaborative method for logic structure partitioning and netlist optimization for multi-core FPGAs as described in claim 4, characterized in that, The storage medium specifically includes the following functional modules: The design and constraint loading module is configured to receive user-input netlist files and architecture description files for multi-core FPGAs; Its functions include resolving the number of cores, the interconnect topology, and the maximum available I / O bandwidth limit between each pair of cores, providing a physical constraint basis for subsequent optimization; The pre-partitioned hypergraph segmentation module is configured to perform coarse-grained physical partitioning of the initial netlist using a hypergraph partitioning algorithm before logic optimization is executed; its specific operation is to assign an initial region identifier to each logical node in the netlist, thereby establishing the location anchor point for logic synthesis. The interconnect cost calculation engine is the core decision-making unit, configured to calculate the synthesis cost of candidate transformation operators in real time during the logic synthesis process. The interconnect cost calculation engine integrates an interconnect-aware cost function and is also responsible for executing a dynamic penalty strategy, monitoring the current total number of cross-core interconnects in real time, and dynamically adjusting the value of the interconnect penalty coefficient according to how close it is to the I / O bandwidth limit. The logic optimization engine is configured to perform specific netlist structure transformation operations based on the evaluation results of the interconnect cost calculation engine. Its specific functions include: ① Executing regular optimization operators: performing logic rewriting and resubstitution operations on the netlist, and updating the netlist only when the calculated overall cost decreases; ② Executing proactive structure reorganization: for cross-core signals on the critical path, when timing constraints and excessive interconnect costs are detected, proactive logic cloning operations are performed to copy the logic structure in the target core region, converting the original cross-core signals into local signals within the core to eliminate long-distance interconnects; The incremental update module is configured to periodically trigger incremental partitioning and calibration after the logic optimization engine changes the topology of the netlist. Its function is to update the region identifier of the node according to the changed connection relationship, prevent the physical constraint from failing due to the topology change, and realize the closed-loop iteration of partitioning and optimization.