A two-stage 2.5D core particle placement optimization method based on Q-learning and particle swarm optimization
By combining improved Q-learning and particle swarm optimization algorithms with non-dominated sorting, global and local optimization of 2.5D chip layout was achieved, solving the problems of insufficient global solution space exploration and local optima, improving the comprehensiveness and efficiency of layout, and reducing the manufacturing cost of high-end chips.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANTONG UNIV
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
Smart Images

Figure CN122197810A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of semiconductor integrated circuit design and manufacturing technology, specifically relating to a two-stage 2.5D chip layout optimization method based on Q-learning and particle swarm optimization. Background Technology
[0002] As Moore's Law slows, the feasibility of single-chip integration continues to decline due to physical limitations and rising manufacturing costs. 2.5D integration technology achieves heterogeneous multi-chip integration through silicon interposers, decomposing a single-chip system into functionally independent chips. Interconnected using redistribution layers (RDLs) and through-silicon vias (TSVs), it reduces manufacturing complexity and cost while preserving overall functionality, becoming a key solution to overcome performance bottlenecks. However, 2.5D chip layout requires balancing communication efficiency and thermal management within a limited interposer space. It necessitates shortening interconnect paths for high-communication chips to reduce losses while reserving heat dissipation space for high-heat-generating chips. This balance is the core challenge in ensuring system performance and reliability. Existing multi-objective optimization methods, while attempting to consider key indicators such as communication efficiency and temperature, generally suffer from insufficient global solution space exploration and limited search breadth, failing to fully cover the potential optimal region of multi-chip layouts and thus unable to fully explore more high-quality layout schemes that balance multiple performance objectives. Traditional reinforcement learning algorithms, when dealing with multi-objective optimization problems, are prone to getting trapped in local optima due to their search strategies. They struggle to achieve a balance between exploration and exploitation under complex constraints, failing to adequately approximate the equilibrium solution of the dual-objective optimization. Therefore, a more efficient placement strategy is urgently needed to achieve the optimal balance between communication efficiency and thermal management. Summary of the Invention
[0003] To address the aforementioned technical problems, this invention provides a two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization, which respectively covers the solution space of 2.5D core layout, has a wider search breadth to discover more high-quality layout schemes that balance bus length and thermal control, and further improves the comprehensiveness and effectiveness of layout optimization.
[0004] The present invention discloses a two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization, characterized by comprising the following steps: S1. Initialize core information and intermediary layer parameters; S2. The improved Q-learning algorithm is used to explore the global solution space of the position of the core in the intermediate layer, and the non-dominated sorting method is used to select the global optimal solution. S3. The improved particle swarm optimization algorithm is used to perform local optimization on the selected global optimal solution and to repair illegal core particle layouts to obtain the optimized layout scheme.
[0005] Furthermore, in S1, the core information includes the width and height of the core, power parameters, and interconnection information between cores; the intermediary layer parameters refer to the length and width of the intermediary layer L×W, which lays the cores on an L×W grid plane.
[0006] Furthermore, in S2, an improved Q-learning algorithm is used to explore the globally optimal layout scheme of the core particles in the intermediate layer, specifically: 1) Initialization: Construct a Q-table that combines the layout state and action space, where the layout state is defined as the set of coordinates of the placed cores {(x1, y2), (x2, y2), ..., (x... i , y i and the set of rotation angles {θ1, θ2, ... θ} i}, where (x i ,y i ) and θ i Let x and y be the coordinates and rotation angle of core i, respectively; the action is defined as the target center coordinates (x, y, y) of the core j to be placed. j , y j and rotation angle θ j At the same time, initialize the layout environment and mark all areas to be laid out as unoccupied; 2) Iterative layout: Select an action based on the current state, randomly selecting an action with an exploration rate ε. (1) in, Initial exploration level, This is the minimum value set for the degree of exploration. Here, e represents the total number of training rounds, and 'e' represents the current training round. The action that maximizes the Q value in the current state of the Q table is selected with a probability of 1-ε. After selecting the action, the core is placed, and a temporary layout scheme sol is generated. t ; 3) Legality check, verifying two core constraints: 3-1) No overlap constraint, that is, the coordinates (x, y) of the center nodes of any two core particles i and j i , y i ), (x j , y j It must satisfy formula (2) in the horizontal direction: (2) Or the vertical formula (3): (3) in, and These represent the widths of core i and core j, respectively. and These represent the heights of core particles i and j, respectively. 3-2) No boundary constraints: The center coordinates of all core particles must satisfy formula (4) to ensure that the core particles are within the physical boundary of the interlayer: (4) in, This is the minimum spacing between core particles, and the distance between the core particle and the boundary of the interlayer must also be greater than or equal to... ; If sol t If any constraint is not satisfied, assign a penalty value to the solution and return to the iterative layout to select a new action; if the constraints are satisfied, calculate the current valid layout solution. t Total interconnect length A thermal model was established to evaluate the total temperature index of the core particle layout. ; 4) Set the reward function, which is the negative of the result after normalizing the bus length and temperature index, as shown in formula (5): (5) Where reward is the reward function; This is a weighting factor used to adjust the weights in the bi-objective optimization. Minimum interconnect length; Maximum interconnect length; This is the lowest temperature indicator; This refers to the highest temperature indicator. Update the Q table based on formula (6): (6) in The Q-value represents the value obtained by performing action a in state s; the learning rate. Control the extent to which new experience corrects the original Q value; This is a discount factor used to measure the importance of future rewards in current decision-making; Indicates the next state All available actions The maximum Q value reflects the potential optimal payoff for subsequent decisions; at the same time, the legal layout solution... t Store in the solution list sol_list; If the preset number of iterations is reached, the valid layouts sol_list will be processed. t The optimal global layout is obtained through filtering.
[0007] Furthermore, calculate the current legal layout so t The bus length is as follows: Let C be the set of cores in the system, P be the set of pin clusters of a single core, and N be the set of interconnection networks; for a single network n∈N, the initial core is s. n The target core particle is t n The number of connections required between cores i and j is R. ij ,definition Let be the number of connections from pin cluster l of core i to pin cluster k of core j in network n. Let be the Manhattan distance from pin cluster l of core i to pin cluster k of core j, i.e. The total interconnect length The objective function is given by formula (7): (7) (8) in, It is calculated based on the coordinates of the core particle center and the offset of the pin cluster, as shown in formula (8). The offset of the pin cluster l in core particle i relative to the center is ( , The offset of the pin cluster k in core j relative to the center is ( , ); An optimization model is constructed using mixed-integer linear programming (MILP), with a bus length of [missing information]. To minimize the impact of the minimum value, the MILP solver must satisfy the following constraints: (9) (10) (11) (12) (13) Formula (9) clarifies the number of connections and flow variables between pin cluster l of core i and pin cluster k of core j in network n. The non-negativity constraint ensures that all interconnect traffic meets the basic requirements of physical implementation; Formula (10) clarifies that when core j is the starting core of network n, the total traffic of the core and pin cluster is calculated by summing the traffic of the entire core and pin cluster. At that time, the difference between the outflow and inflow rates is equal to the required number of connection lines between the starting core and the target core. When j is the target core particle The difference is at that time The flow difference between non-initial and non-target core particles is 0; Formula (11) limits the initial core particle by two sets of summation constraints. The pin cluster has no input flow, target chip The pin cluster has no output flow, avoiding invalid flow loops and redundant interconnections, and improving the computational efficiency of route optimization; Formula (12) limits the total bidirectional flow of the pin cluster l of core i to its maximum carrying capacity. Within this range, to prevent overload of flow in a single pin cluster; Formula (13) constrains the total flow between pin clusters that are not the starting or target cores in network n to not exceed the required number of connections between the starting and target cores. This ensures that bus resources are allocated reasonably according to communication needs.
[0008] Furthermore, a thermal model was established to evaluate the overall temperature parameters of the core configuration. ,include: The overall temperature parameters for the core configuration are set as follows: (14) in, (15) (16) (17) in, The temperature at which the core particle i is to be placed is... This represents the power of chip j that has been placed in the grid. This represents the Euclidean distance between the center nodes of chip i and chip j; It is a distance-based normalized weighting coefficient used to determine the degree of influence of other core particles j on core particle i, u=1,…,j, and u≠i.
[0009] Furthermore, a non-dominated sorting method is used to perform bi-objective optimization and selection of layout schemes in the solution list sol_list: Iterate through all layout schemes in the solution list and mark the dominance relationship of each scheme; If scheme A exists, its and total temperature index T total If both objectives of A and B are superior to those of B, or if A has one objective that is equal to B but another objective that is superior to B, then A is said to dominate B. The solution set that is not dominated by any other solution is selected to form the globally optimal solution set.
[0010] Furthermore, S4 specifically refers to: 1) Particle initialization: Encode each layout scheme of the front edge as a particle, with the encoding dimension being the center coordinates (x, y) of all core particles. i , y i ) and rotation angle θ i A small random perturbation (Δx) is continuously applied to each encoding dimension of the initial particle. i Δyi , Δθ i ), where Δx i Δy i ∈[-1,1],Δθ i ∈{0°, 90°, 180°, 270°}, its encoding is expressed as (x1+Δx1, y1+Δy1, θ1+Δθ1, x2+Δx2, y2+Δy2, θ2+Δθ2…x m +Δx m y m +Δy m θ m +Δθ m ), generating an initial particle swarm P containing multiple particles. s ; 2) Legality Filtering: For each particle in the particle swarm, repeat the non-overlapping constraint and no-boundary constraint detection logic in step S2; for layouts that violate constraints, perform layout repair based on a two-step repair method, including: 2-1) Perform boundary trimming on the layout, that is, trim the core particles i that exceed the boundary according to formula (18), where The coordinates of the center node of the cut core particles are used to ensure that none of the core particles exceed the intermediate layer and thus meet the boundary constraints. (18) 2-2) Separating overlapping core particles to eliminate overlap, including: First, calculate the overlap length of the two overlapping core particles in the x and y directions and their respective directions of movement using formulas (19) and (20). (19) (20) (twenty one) in, and These represent the overlap lengths of chip i relative to chip j in the horizontal and vertical directions, respectively. and These represent the horizontal and vertical positions of chip i relative to chip j, respectively. Then the algorithm performs displacement repair on the core particle according to formula (21), , ), ( , ) represent the new coordinates of core particles i and j after they have moved; 3) Fitness calculation: For P sFor each legal particle, the comprehensive objective function F is calculated using formula (22), and F is used as the fitness of the particle. The smaller the fitness, the better the layout scheme. (twenty two) 4) Optimal solution update: Initialize the individual optimal solution p for each particle. pb The global optimal solution p of the particle swarm gb ; In subsequent iterations, if the new fitness of each particle after the position update is less than its current p... pb The fitness of p, update p pb The new position is determined by the particle's new fitness; if the particle's new fitness is less than the current p... gb The fitness of p, update p gb This is the new position of the particle; In iterative optimization, particle movement is achieved through a velocity update formula, which is: (twenty three) in, Let i be the velocity vector of particle i in the t-th iteration. , Use random numbers in the range [0,1] to increase randomness and avoid local convergence, and apply inertial weights. The linear decreasing strategy of formula (24) is adopted, where The initial inertia weight is the value. The final weight value after decay. This decay method, which is the maximum number of iterations, makes the algorithm focus more on local refinement in the later stages of iteration. (twenty four) The position update formula is: (25) After each position update, the validation filtering and fitness calculation are re-executed, and the results are fed back to the optimal solution update stage to form an iterative closed loop; if the maximum number of iterations is reached, p... gb Output as the optimal solution.
[0011] Furthermore, the output includes: physical layout information, i.e., the center node coordinates (x, y, x) of each die on the silicon interposer. i , y i ) and rotation angle θ i Performance metrics information, namely the bus length of the layout scheme. With temperature index T total .
[0012] The beneficial effects of this invention are as follows: The method of this invention constructs a state-action space containing core coordinates and rotation angles in Q-learning. It balances the "exploration" and "utilization" in the placement process using the exploration rate, and balances the dual-objective optimization of core placement by calculating line length and temperature indicators. Then, it updates the Q-table using a normalized comprehensive loss to avoid getting trapped in local optima. Subsequently, a non-dominated sorting method is used to screen high-quality solutions without dominance relationships, forming a set of dual-objective frontier solutions. Finally, a particle swarm optimization algorithm is used for local optimization. High-quality solutions are encoded as particles, and a small random perturbation is applied to generate an initial particle swarm. After legality filtering, the comprehensive objective function is used as the particle fitness to iteratively update individual and global optimal solutions, outputting core placement information and performance indicators. This invention improves the Q-learning algorithm by linearly decaying the exploration rate. A large exploration rate is maintained in the early stages of exploration for global exploration, while the exploration rate decreases in the later stages, allowing the algorithm to utilize more experience selection from the Q-table. In each training iteration of the Q-learning algorithm, the possible placement positions of the next core to be placed are explored, reducing the possibility of generating invalid layouts during iteration and preventing training from becoming invalid due to a single invalid placement. This invention dynamically adjusts the inertia weights of the particle swarm optimization (PSO) algorithm, and repairs overlapping and out-of-bounds particle layouts based on the PSO algorithm, thereby continuously generating legitimate particles. The invention also uses a thermal evaluation model for thermal simulation estimation during the iteration process. In this model, the thermal impact on particle i is caused by the combined effect of all other particles in the system. The thermal impact of particle j on particle i is related to its power consumption and decreases as its distance from particle i increases. The temperature index obtained from the thermal model reflects the dispersion of particles in the layout, thus replacing the traditional thermal simulation model for temperature evaluation. This method offers shorter simulation time and faster computation speed. This invention overcomes the layout optimization bottleneck of 2.5D particle integration, supports the heterogeneous integration requirements of high-end chips, facilitates the mass production of advanced packaging technologies, and reduces the manufacturing cost of high-end chips. Attached Figure Description
[0013] Figure 1 This is a flowchart of the method described in this invention; Figure 2 This is a diagram of the chip interconnect topology of a CPU-DRAM system. Figure 3 These are heatmaps of chip layout obtained using other methods for CPU-DRAM systems; Figure 4 This is a heatmap of the chip layout obtained by the method of the present invention for a CPU-DRAM system; Figure 5 This is a bar chart comparing the layout results of different methods for CPU-DRAM systems. Detailed Implementation
[0014] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
[0015] like Figure 1 As shown, the two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization of the present invention includes the following steps: S1. Initialize core information and intermediary layer parameters; S2. The improved Q-learning algorithm is used to explore the global solution space of the position of the core in the intermediate layer, and the non-dominated sorting method is used to select the global optimal solution. S3. The improved particle swarm optimization algorithm is used to perform local optimization on the selected global optimal solution and to repair illegal core particle layouts to obtain the optimized layout scheme.
[0016] In S1, the core information includes the width and height of the core, power parameters, and interconnection information between cores; the interposer parameters refer to the length and width of the interposer layer L×W, which lays the cores on an L×W grid plane.
[0017] Table 1 shows the chip size and power information for the CPU-DRAM system. Figure 2 The diagram shows the interconnect topology of a CPU-DRAM system, with the values between the chips representing the number of connections that need to be routed.
[0018] Table 1. Chip size and power information for CPU-DRAM systems
[0019] S2 specifically refers to: 1) Initialization: Construct a Q-table that combines the layout state and action space, where the layout state is defined as the set of coordinates of the placed cores {(x1, y2), (x2, y2), ..., (x... i , y i and the set of rotation angles {θ1, θ2, ... θ} i}, rotation angle θ i The possible angles are 0°, 90°, 180°, and 270° to simulate rotation during core placement, providing greater flexibility for the intermediate layer space; the action is defined as the target center coordinates (x, y, y) of the core to be placed j. j , y j and rotation angle θ j Simultaneously, initialize the silicon interposer mesh environment and mark all areas to be laid out as unoccupied. 2) Iterative Layout: Based on the current state (state), an action is selected randomly with an exploration rate ε to escape local optima and explore new layout possibilities. The action with the largest Q-value in the current state (Q-table) is selected with a probability of 1-ε; this "utilizes" known optimal layout strategies. After selecting an action, core placement is performed to generate a temporary layout scheme (sol). t ; 3) Legality check, verifying two core constraints: No overlap constraint, that is, the coordinates (x, j) of the center nodes of any two core particles i and j i , y i ), (x j , y j It must satisfy the formula (1) in the horizontal direction: (1) Or the vertical formula (3): (2) Without boundary constraints, the center coordinates of all core particles must satisfy formula (3) to ensure that the core particles are within the physical boundary of the interposer layer: (3) in, This is the minimum spacing between core particles, and the distance between the core particle and the boundary of the interlayer must also be greater than or equal to... If sol t If any constraint is not satisfied, a penalty value is assigned to the solution, and the iterative layout is returned to select a new action. If the constraint is satisfied, the process proceeds to the reward and Q-table update stage, where the current valid layout solution is calculated. t Bus length.
[0020] Let C be the set of cores in the system, P be the set of pin clusters for a single core, and N be the set of interconnection networks. For a network n∈N, the initial core is s. n The target core particle is t n The number of connections required between cores i and j is R. ij ,definition Let be the number of connections from pin cluster l of core i to pin cluster k of core j in network n. The Manhattan distance between pin cluster l of core i and pin cluster k of core j ( = The total interconnect length is... The objective function is given by formula (4): (4) (5) in, It is calculated based on the coordinates of the core particle center and the offset of the pin cluster, as shown in formula (5). The offset of the pin cluster l in core particle i relative to the center is ( , The offset of the pin cluster k in core j relative to the center is ( , ).
[0021] The planning of connections between cores is essentially an NP-hard multi-commodity flow problem, requiring precise mathematical modeling to ensure the bus length. To minimize the problem, this paper uses mixed integer linear programming (MILP) to establish the model. The MILP solver needs to satisfy the following constraints: (6) (7) (8) (9) (10) Formula (6) clarifies the number of connections and flow variables between pin cluster l of core i and pin cluster k of core j in network n. The non-negativity constraint ensures that all interconnect traffic meets the basic requirements of physical implementation. Formula (7) clarifies that when core j is the starting core of network n, the traffic of the full core and pin cluster is calculated by summing the traffic of the core and pin cluster. At that time, the difference between the outflow and inflow rates is equal to the number of demand connections between the starting and target core particles. When j is the target core particle The difference is at that time The flow difference between non-initial and non-target core particles is 0. Formula (8) limits the initial core particle by two sets of summation constraints. The pin cluster has no input flow, target chip The pin cluster has no output flow, avoiding invalid flow loops and redundant interconnections, and improving the computational efficiency of route optimization. Formula (9) limits the total bidirectional flow of the pin cluster l of core i to its maximum carrying capacity. Within this limit, to prevent overload of a single pin cluster flow, formula (10) constrains the total flow between pin clusters that are not the starting or target cores in network n to not exceed the required number of connections between the starting and target cores. This ensures that bus resources are allocated reasonably according to communication needs.
[0022] 4) Establish a thermal model to evaluate the temperature parameters of the core particle layout, and the layout sol t The total temperature index is set as follows: (11) in, (12) (13) (14) in, The temperature at which the core particle i is to be placed is... This represents the power of chip j that has been placed in the grid. This represents the Euclidean distance between the center nodes of chip i and chip j; It is a distance-based normalized weighting coefficient used to determine the degree of influence of other core particles on core particle i.
[0023] 5) The reward function of the Q-learning algorithm is the negative of the result after normalizing the bus length and temperature index, as shown in formula (15): (15) The algorithm then follows the core formula of Q-learning: (16) Update the Q table and simultaneously add the valid layouts to the solution. t Store the solutions in the solution list sol_list; finally, if the preset number of iterations is reached, use the non-dominated sorting method to perform bi-objective optimization and selection on the layout schemes in the solution list sol_list: Iterate through all layout schemes in the solution list and mark the dominance relationship of each scheme; If scheme A exists, its C total and T total If both objectives of A and B are superior to those of B, or if A has one objective that is equal to B but another objective that is superior to B, then A is said to dominate B. The solutions without dominance relationships are selected to form the globally optimal solution set.
[0024] Furthermore, S3 specifically refers to: 1) Particle Initialization: Encode each layout scheme in the front edge as a particle, with the encoding dimension being the center coordinates (x, y) of all core particles. i , y i ) and rotation angle θ i Apply a small random perturbation (Δx) to each encoding dimension of the initial particle. i Δy i , Δθ i ), its encoding is expressed as (x1+Δx1, y1+Δy1, θ1+Δθ1, x2+Δx2, y2+Δy2, θ2+Δθ2…x m +Δx m y m +Δy m θm +Δθ m ), generating an initial particle swarm P containing multiple particles. s ; 2) Legality filtering: For each particle in the particle swarm, repeat the non-overlapping constraint and no out-of-bounds constraint detection logic in step S2 to remove invalid particles and retain the set of legal particles; 3) Fitness calculation: For P s For each legal particle in the array, the comprehensive objective function is calculated again using formula (17), and F is used as the fitness of the particle. The smaller the fitness, the better the layout scheme. (17) 4) Optimal solution update: Initialize the individual optimal solution p for each particle. pb (Initially the particle itself) and the global optimal solution p of the particle swarm. gb (P) s (the particle with the lowest fitness). In subsequent iterations, if the new fitness of each particle after the position update is less than its current p... pb The fitness of p, update p pb The new position is determined by the particle's new fitness; if the particle's new fitness is less than the current p... gb The fitness of p, update p gb This is the new position of the particle; In iterative optimization, particle movement is achieved through a velocity update formula, which is: (18) in, Let i be the velocity vector of particle i in the t-th iteration. , Use random numbers in the range [0,1] to increase randomness and avoid local convergence, and apply inertial weights. A linear decreasing strategy is adopted, which allows the algorithm to focus more on local refinement in the later stages of iteration. The position update formula is as follows: (19) After each position update, the validation filtering and fitness calculation are re-executed, and the results are fed back to the optimal solution update stage to form an iterative closed loop; if the maximum number of iterations is reached, p... gb Output.
[0025] The output includes: physical layout information, i.e., the center coordinates (x, y, y) of each die on the silicon interposer. i , y i ) and rotation angle θ i Performance metrics information, namely the layout bus length. With temperature index T total .
[0026] Figure 3 The thermal spectrum of the CPU-DRAM system layout results obtained from the heuristic chip layout optimization method in the prior art, obtained using the thermal simulation tool Hotspot, is shown. The highest temperature reached is 97.19℃. Figure 4 The thermal spectrum of the layout results obtained by this invention is shown, with a maximum temperature of 94.37℃. Figure 5 The bar chart comparing the layout results of different methods shows that the layout results obtained by this method have the lowest temperature and bus length.
[0027] The above description is merely a preferred embodiment of the present invention and is not intended to further limit the present invention. All equivalent changes made based on the description and drawings of the present invention are within the protection scope of the present invention.
Claims
1. A two-stage 2.5D core particle layout optimization method based on Q-learning and particle swarm optimization, characterized in that, Includes the following steps: S1. Initialize core information and intermediary layer parameters; S2. The improved Q-learning algorithm is used to explore the global solution space of the position of the core in the intermediate layer, and the non-dominated sorting method is used to select the global optimal solution. S3. The improved particle swarm optimization algorithm is used to perform local optimization on the selected global optimal solution and to repair illegal core particle layouts to obtain the optimized layout scheme.
2. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 1, characterized in that, In S1, the core information includes the width and height of the core, power parameters, and interconnection information between cores; the interposer parameters refer to the length and width of the interposer layer L×W, which lays the cores on an L×W grid plane.
3. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 2, characterized in that, In S2, an improved Q-learning algorithm is used to explore the globally optimal placement scheme of core particles in the intermediate layer, specifically: 1) Initialization: Construct a Q-table that combines the layout state and action space, where the layout state is defined as the set of coordinates of the placed cores {(x1, y2), (x2, y2), ..., (x... i , y i and the set of rotation angles {θ1, θ2, ... θ} i }, where (x i , y i ) and θ i Let x and y be the coordinates and rotation angle of core i, respectively; the action is defined as the target center coordinates (x, y, y) of the core j to be placed. j ,y j and rotation angle θ j At the same time, initialize the layout environment and mark all areas to be laid out as unoccupied; 2) Iterative layout: Select an action based on the current state, randomly selecting an action with an exploration rate ε. (1) in, Initial exploration level, This is the minimum value set for the degree of exploration. Here, e represents the total number of training rounds, and 'e' represents the current training round. The action that maximizes the Q value in the current state of the Q table is selected with a probability of 1-ε. After selecting the action, the core is placed, and a temporary layout scheme sol is generated. t ; 3) Legality check, verifying two core constraints: 3-1) No overlap constraint, that is, the coordinates (x, y) of the center nodes of any two core particles i and j i , y i ), (x j , y j It must satisfy formula (2) in the horizontal direction: (2) Or the vertical formula (3): (3) in, and These represent the widths of core i and core j, respectively. and These represent the heights of core particles i and j, respectively. 3-2) No boundary constraints: The center coordinates of all core particles must satisfy formula (4) to ensure that the core particles are within the physical boundary of the interlayer: (4) in, This is the minimum spacing between core particles, and the distance between the core particle and the boundary of the interlayer must also be greater than or equal to... ; If sol t If any constraint is not satisfied, assign a penalty value to the solution and return to the iterative layout to select a new action; if the constraints are satisfied, calculate the current valid layout solution. t Total interconnect length A thermal model was established to evaluate the total temperature index of the core particle layout. ; 4) Set the reward function, which is the negative of the result after normalizing the bus length and temperature index, as shown in formula (5): (5) Where reward is the reward function; This is a weighting factor used to adjust the weights in the bi-objective optimization. Minimum interconnect length; Maximum interconnect length; This is the lowest temperature indicator; This refers to the highest temperature indicator. Update the Q table based on formula (6): (6) in The Q-value represents the value obtained by performing action a in state s; the learning rate. Control the extent to which new experience corrects the original Q value; This is a discount factor used to measure the importance of future rewards in current decision-making; Indicates the next state All available actions The maximum Q value reflects the potential optimal payoff for subsequent decisions; at the same time, the legal layout solution... t Store in the solution list sol_list; If the preset number of iterations is reached, the valid layouts sol_list will be processed. t The optimal global layout is obtained through filtering.
4. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 3, characterized in that, Calculate the current legal layout (sol) t The bus length is as follows: Let C be the set of cores in the system, P be the set of pin clusters of a single core, and N be the set of interconnection networks; for a single network n∈N, the initial core is s. n The target core particle is t n The number of connections required between cores i and j is R. ij ,definition Let be the number of connections from pin cluster l of core i to pin cluster k of core j in network n. Let be the Manhattan distance from pin cluster l of core i to pin cluster k of core j, i.e. The total interconnect length The objective function is given by formula (7): (7) (8) Wherein, the offset of the pin cluster l in core i relative to the center is ( , The offset of the pin cluster k in core j relative to the center is ( , ); An optimization model is constructed using mixed-integer linear programming (MILP), with a bus length of 10 ... To minimize the impact of the minimum value, the MILP solver must satisfy the following constraints: (9) (10) (11) (12) (13) Formula (9) clarifies the number of connections and flow variables between pin cluster l of core i and pin cluster k of core j in network n. The non-negativity constraint ensures that all interconnect traffic meets the basic requirements of physical implementation; Formula (10) clarifies that when core j is the starting core of network n, the total traffic of the core and pin cluster is calculated by summing the traffic of the entire core and pin cluster. At that time, the difference between the outflow and inflow rates is equal to the required number of connection lines between the starting core and the target core. When j is the target core particle The difference is at that time The flow difference between non-initial and non-target core particles is 0; Formula (11) limits the initial core particle by two sets of summation constraints. The pin cluster has no input flow, target chip The pin cluster has no output flow; Formula (12) limits the total bidirectional flow of the pin cluster l of core i to its maximum carrying capacity. Within this range, to prevent overload of flow in a single pin cluster; Formula (13) constrains the total flow between pin clusters that are not the starting or target cores in network n to not exceed the required number of connections between the starting and target cores. .
5. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 4, characterized in that, Establish a thermal model to evaluate the overall temperature parameters of the core layout. ,include: The overall temperature parameters for the core configuration are set as follows: (14) in, (15) (16) (17) in, The temperature index of the core particle i to be placed. This represents the power of chip j that has been placed in the grid. This represents the Euclidean distance between the center nodes of chip i and chip j; It is a distance-based normalized weighting coefficient used to determine the degree of influence of other core particles j on core particle i, u=1,…,j, and u≠i.
6. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 3, characterized in that, The non-dominated sorting method is used to perform bi-objective optimization and selection of layout schemes in the solution list sol_list: Iterate through all layout schemes in the solution list and mark the dominance relationship of each scheme; If scheme A exists, its and total temperature index T total If both objectives of A and B are superior to those of B, or if A has one objective that is equal to B but another objective that is superior to B, then A is said to dominate B. The solution set that is not dominated by any other solution is selected to form the globally optimal solution set.
7. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 3, characterized in that, S4 specifically refers to: 1) Particle initialization: Encode each layout scheme of the front edge as a particle, with the encoding dimension being the center coordinates (x, y) of all core particles. i , y i ) and rotation angle θ i A small random perturbation (Δx) is continuously applied to each encoding dimension of the initial particle. i Δy i , Δθ i ), where Δx i Δy i ∈[-1,1],Δθ i ∈{0°, 90°, 180°, 270°}, its encoding is expressed as (x1+Δx1, y1+Δy1, θ1+Δθ1, x2+Δx2, y2+Δy2, θ2+Δθ2…x m +Δx m y m +Δy m θ m +Δθ m ), generating an initial particle swarm P containing multiple particles. s ; 2) Legality Filtering: For each particle in the particle swarm, repeat the non-overlapping constraint and no-boundary constraint detection logic in step S2; for layouts that violate constraints, perform layout repair based on a two-step repair method, including: 2-1) Perform boundary trimming on the layout, that is, trim the core particles i that exceed the boundary according to formula (18), where The coordinates of the center node of the cut core particles are used to ensure that none of the core particles exceed the intermediate layer and thus meet the boundary constraints. (18) 2-2) Separating overlapping core particles to eliminate overlap, including: First, calculate the overlap length of the two overlapping core particles in the x and y directions and their respective directions of movement using formulas (19) and (20). (19) (20) (21) in, and These represent the overlap lengths of chip i relative to chip j in the horizontal and vertical directions, respectively. and These represent the horizontal and vertical positions of chip i relative to chip j, respectively. Then the algorithm performs displacement repair on the core particle according to formula (21), , ), ( , ) represent the new coordinates of core particles i and j after they have moved; 3) Fitness calculation: For P s For each legal particle in the array, the comprehensive objective function F is calculated using formula (22), and F is used as the fitness of the particle. The smaller the fitness, the better the layout scheme. (22) 4) Optimal solution update: Initialize the individual optimal solution p for each particle. pb The global optimal solution p of the particle swarm gb ; In subsequent iterations, if the new fitness of each particle after the position update is less than its current p... pb The fitness of p, update p pb The new position is determined by the particle's new fitness; if the particle's new fitness is less than the current p... gb The fitness of p, update p gb This is the new position of the particle; In iterative optimization, particle movement is achieved through a velocity update formula, which is: (23) in, Let i be the velocity vector of particle i in the t-th iteration. , Use random numbers in the range [0,1] to increase randomness and avoid local convergence, and apply inertial weights. The linear decreasing strategy of formula (24) is adopted, where The initial inertia weight is the value. The final weight value after decay. This represents the maximum number of iterations. (24) The position update formula is: (25) After each position update, the validation filtering and fitness calculation are re-executed, and the results are fed back to the optimal solution update stage to form an iterative closed loop; if the maximum number of iterations is reached, p... gb Output as the optimal solution.
8. The two-stage 2.5D core layout optimization method based on Q-learning and particle swarm optimization according to claim 7, characterized in that, The output includes: physical layout information, i.e., the center node coordinates (x, y, x) of each die on the silicon interposer. i , y i ) and rotation angle θ i Performance metrics information, namely the bus length of the layout scheme. With temperature index T total .