A microprocessor architecture

By setting up independent communication links and memory spaces for the central processing unit and coprocessor, the problem of mutual interference when accessing memory in the microprocessor architecture is solved, memory access efficiency and performance are improved, and data copying operations and power consumption are reduced.

CN224383696UActive Publication Date: 2026-06-19PHYTIUM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Utility models(China)
Current Assignee / Owner
PHYTIUM TECH CO LTD
Filing Date
2025-06-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing microprocessor architectures, the central processing unit (CPU) and coprocessor exhibit poor performance when performing memory-intensive tasks, especially when they interfere with each other or compete for memory access bandwidth, leading to an overall performance degradation.

Method used

In a microprocessor architecture, separate communication links are set up for the central processing unit and coprocessors, allowing them to access different memory spaces, thus avoiding mutual interference and improving memory access efficiency.

Benefits of technology

By designing independent communication links and memory spaces, data copying operations are reduced, power consumption is lowered, and the performance of the microprocessor architecture is improved, especially in memory-intensive and heavy-load task scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN224383696U_ABST
    Figure CN224383696U_ABST
Patent Text Reader

Abstract

The application provides a microprocessor architecture, comprising a central processor, a coprocessor, a memory and a memory controller corresponding to the memory; the central processor is connected with the memory controller through a first communication link to realize read and write operations on the memory through the first communication link; the coprocessor is connected with the memory controller through a second communication link to realize read and write operations on the memory through the second communication link. The architecture has better energy efficiency, especially when the central processor and the coprocessor execute memory-intensive tasks or the coprocessor alone executes heavy-load tasks, the memory access efficiency of the central processor or the coprocessor can be improved, the central processor and the coprocessor realize data interaction through zero-copy, the communication and power consumption overheads are reduced, and the energy efficiency performance is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and more particularly to a microprocessor architecture. Background Technology

[0002] In current microprocessor architecture design, various coprocessors, such as graphics processors, neural network processors, deep learning processors, and video processors, are used in microprocessor architectures to improve the computing performance of microprocessors in different aspects.

[0003] Coprocessors are typically integrated within the microprocessor architecture and share memory with the CPU, meaning both the CPU and coprocessor access the same memory via the system bus, allowing for closer collaboration between them. However, in practical applications, this architecture performs poorly when the CPU and coprocessor are performing memory-intensive tasks. Utility Model Content

[0004] To address the aforementioned technical issues, this application provides a microprocessor architecture with superior performance, particularly when the CPU and coprocessor are performing memory-intensive tasks or when the coprocessor is performing heavy-load tasks alone. This architecture can improve the memory access efficiency of the CPU and / or coprocessor, thereby enhancing performance.

[0005] This application provides a microprocessor architecture, including: a central processing unit (CPU), a coprocessor, memory, and a memory controller corresponding to the memory; the CPU is connected to the memory controller via a first communication link to perform read and write operations on the memory via the first communication link; the coprocessor is connected to the memory controller via a second communication link to perform read and write operations on the memory via the second communication link.

[0006] This application provides a microprocessor architecture integrating a central processing unit (CPU) and a coprocessor. In this architecture, a first communication link is established between the CPU and memory, and a second communication link is established between the coprocessor and memory. This allows the CPU and coprocessor to access memory through their respective dedicated communication links, while still sharing memory. This microprocessor architecture avoids mutual interference or competition for memory access bandwidth between the CPU and coprocessor, especially when the CPU and coprocessor are performing memory-intensive tasks. It improves the memory access efficiency of both the CPU and coprocessor, thereby enhancing the performance of the microprocessor architecture.

[0007] In some implementations, the memory includes a first memory space and a second memory space; the central processing unit (CPU) performs read and write operations on the memory through the first communication link, including: the CPU performs read and write operations on the first memory space through the first communication link; the coprocessor performs read and write operations on the memory through the second communication link, including: the coprocessor performs read and write operations on the second memory space through the second communication link.

[0008] Based on this implementation, different memory spaces are allocated for the central processing unit (CPU) and coprocessor in memory, and the CPU and coprocessor access different memory spaces through different links. This can further avoid conflicts when the CPU and coprocessor access memory, thereby improving memory access performance.

[0009] In some implementations, the first communication link and the second communication link are connected, enabling the central processing unit (CPU) to perform read and write operations on the first memory space and / or the second memory space through the first and second communication links, and / or enabling the coprocessor to perform read and write operations on the first memory space and / or the second memory space through the second and first communication links. Based on this implementation, the CPU and coprocessor can achieve zero-copy data interaction while maintaining a high degree of independence in memory access channels, thereby reducing the power consumption of data copying and improving the performance of the microprocessor architecture.

[0010] In some implementations, the memory includes a first memory space and a second memory space. The central processing unit (CPU) performs read and write operations on the memory through the first communication link, including reading and writing operations on both the first and second memory spaces through the first communication link. The coprocessor performs read and write operations on the memory through the second communication link, including reading and writing operations on the second memory space through the second communication link. Based on this implementation, the CPU and coprocessor can share data through the second memory space, thereby reducing data copying, reducing power consumption, and improving the performance of the microprocessor architecture. Limiting the coprocessor to accessing only the second memory space through the second communication link ensures memory security even when the security level of the second communication link is relatively low.

[0011] In some implementations, the coprocessor is also connected to the first communication link, enabling the coprocessor to perform read and write operations on the first memory space via the first communication link. Based on this implementation, the coprocessor can access the first memory space through the first communication link, thereby expanding the coprocessor's memory access range while ensuring memory safety. This implementation also allows the central processing unit and the coprocessor to share data through the cache or the first memory space in the first communication link, thereby reducing data copying, reducing power consumption, and improving the performance of the microprocessor architecture.

[0012] In some implementations, a link switch is provided on the second communication link to control its on / off state. Based on this implementation, the link switch on the second communication link allows the coprocessor to access memory via either the first or second communication link. When the second communication link is closed, both the CPU and coprocessor access memory via the first communication link, thus achieving data consistency between the CPU and coprocessor. This data exchange method can achieve zero-copy, thereby reducing power consumption.

[0013] In some implementations, the microprocessor architecture further includes a mapping module; the input of the mapping module is connected to the coprocessor, and the output of the mapping module is connected to the first communication link and the second communication link, respectively; the mapping module is used to send memory access requests sent by the coprocessor to the first communication link or to the second communication link. Based on this implementation, the mapping module can automatically send memory access requests sent by the coprocessor to the appropriate communication link, thereby making the coprocessor's memory access more flexible and efficient.

[0014] In some implementations, the mapping module includes a configuration register, a space matching module, and a link routing switch. The configuration register stores address information for the first memory space and the second memory space. The space matching module receives memory access requests sent by the coprocessor, compares the destination address of the memory access request with the address information stored in the configuration register, and determines the target memory space corresponding to the memory access request. The link routing switch routes the memory access request to the first or second communication link connected to the memory controller of the target memory space, based on the target memory space. Based on this implementation, the mapping module can achieve accurate routing of memory access requests through address comparison.

[0015] In some implementations, the first communication link is a consistent interconnect network or a non-consistent interconnect network, and the second communication link is a non-consistent interconnect network or a consistent interconnect network. Based on this implementation, the specific types of communication links in the provided microprocessor architecture are more flexible.

[0016] In some implementations, the coprocessor includes at least one of a graphics processing unit (GPU), a general-purpose computing GPU, a neural network processor, a deep learning processor, and a video processor. Based on this implementation, different types of coprocessors in the microprocessor architecture can achieve more efficient memory access. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of this utility model or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this utility model. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0018] Figures 1-7 These are schematic diagrams of several different microprocessor architectures provided in the embodiments of this application. Detailed Implementation

[0019] In current microprocessor architecture design, various coprocessors are used in the microprocessor architecture. These coprocessors can act as accelerators for the microprocessor architecture to speed up the processing speed of specific types of tasks and improve the computing performance of the microprocessor architecture in different aspects.

[0020] For example, the aforementioned coprocessor can be a graphics processing unit (GPU), a general-purpose computing on graphics processing unit (GPGPU), a neural processing unit (NPU), a deep learning processing unit (DPU), a video processing unit (VPU), etc. In the embodiments of this application, xPU represents any of the aforementioned coprocessors, and the coprocessor involved in any embodiment of this specification can be any one or more of the various coprocessors described above.

[0021] For microprocessor architectures that include coprocessors, the central processing unit and the coprocessor need to interact, including data interaction. Traditional architectures achieve data interaction by copying data between memory.

[0022] Unified memory is a design that uses virtualization technology or specific hardware architecture to allow the Central Processing Unit (CPU) and the xPU to share the same memory space. This avoids the overhead of copying data between different memory spaces in traditional architectures, thereby reducing power consumption and improving the overall system efficiency and performance. This technology is widely used in heterogeneous computing systems, enabling the CPU and xPU to work more closely and efficiently, accelerating the operation of various applications such as scientific computing, graphics rendering, and artificial intelligence.

[0023] Figure 1 A multi-core heterogeneous microprocessor architecture is shown.

[0024] This microprocessor architecture is a typical architecture that connects an independent xPU via the PCIe bus. In this architecture, the CPU core connects to the memory controller through a Network on Chip (NOC), which allows for read and write operations on the memory.

[0025] The xPU can include GPU cores, NPU cores, VPU cores, and a memory controller. These cores are connected via an internal interconnect network and to the video memory via the memory controller. This allows the xPU cores to read and write to the video memory through the internal interconnect network. The coherent interconnect network within the microprocessor architecture is connected to the xPU's internal interconnect network via a PCIe bus, enabling communication between the microprocessor architecture and the xPU. Data exchange between the two requires the CPU or DMA (Direct Memory Access) to copy data from main memory to video memory or vice versa.

[0026] This architecture has several drawbacks. For example, the CPU and xPU are on separate chips, and the video memory and system memory are separate, requiring data to be copied between them, resulting in high latency and power consumption. The CPU and xPU are connected via PCIe or other buses, which limits bandwidth and also leads to significant latency. Furthermore, the video memory capacity is relatively small, leading to insufficient video memory in large and complex applications, such as when processing high-resolution images or large-scale deep learning scenarios. To fully utilize the performance of the CPU and xPU, sufficient capacity and high-performance system memory and video memory are required, resulting in high hardware costs.

[0027] Figure 2 This demonstrates another multi-core heterogeneous microprocessor architecture.

[0028] In this architecture, the xPU is embedded within the microprocessor architecture and connected to the memory controller via a coherent interconnect network. This allows the CPU cores in the architecture, as well as the GPU cores, NPU cores, and VPU cores within the xPU, to perform read and write operations on memory through the coherent interconnect network and the memory controller.

[0029] In this architecture, the CPU core and xPU core share memory through a coherent interconnect network, which can shorten the interaction link between the CPU core and xPU core and improve the interaction efficiency. However, the coherent interconnect network is usually limited by the cache line size and bandwidth, and the performance of xPU is poor when performing high burst read and write memory. In practical applications, the above architecture also performs poorly when the central processing unit and coprocessor perform memory-intensive tasks.

[0030] The researchers of this application discovered that while the above architecture achieves unified memory for the CPU and xPU, it also causes the CPU and xPU to share resources highly. The memory access bandwidth and memory access latency of the xPU are easily limited by the consistent interconnect network. When performing heavy-load tasks, it is difficult to fully utilize the performance of the xPU. When the two process tasks at the same time, they will interfere with each other. Especially when both are performing high-bandwidth memory access tasks or intensive memory access tasks, resource competition is strong, making it difficult for both to achieve optimal performance, thus affecting the overall architecture performance.

[0031] To address the aforementioned technical issues, the researchers of this application have proposed a new microprocessor architecture through research and experimentation. This architecture can improve architecture performance while enabling the CPU and xPU to share unified memory, especially improving performance in scenarios where the xPU performs heavy-load tasks and in scenarios where the CPU and xPU perform memory-intensive tasks.

[0032] The technical solutions of the embodiments of this application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0033] This application provides a microprocessor architecture; see [link to relevant documentation] Figure 3 As shown, the microprocessor architecture includes a central processing unit 101, a coprocessor 102, a memory 103, and a memory controller 104 corresponding to the memory 103.

[0034] The central processing unit (CPU) 101 can be one or more, or a central processing unit 101 can include one or more processor cores.

[0035] The coprocessor 102 (xPU) may include one or more processors of the following types: Graphics Processing Unit (GPU), General-Purpose Computing on Graphics Processing Units (GPGPU), Neural Processing Unit (NPU), Deep Learning Processing Unit (DPU), and Video Processing Unit (VPU). Alternatively, the coprocessor 102 may include various processor cores, such as GPU cores, NPU cores, VPU cores, etc.

[0036] Memory 103 can be any type of physical memory, such as any one or more combinations of memory types such as DDR (Double Data Rate Synchronous Dynamic Random Access Memory), GDDR (Graphics Double Data Rate Synchronous Dynamic Random Access Memory), HBM (High Bandwidth Memory), and CXL.mem (Compute Express Link Memory).

[0037] The memory controller 104 is used to receive memory read / write requests sent by the processor core and perform read / write operations on the memory. For the functions and specific implementation methods of the memory controller 104, please refer to the descriptions of memory controller functions and implementation methods in the prior art. This embodiment does not improve its functions and implementation methods, and therefore will not describe them in detail.

[0038] In some embodiments, the memory 103 in the provided microprocessor architecture may include multiple memory blocks, and a memory controller 104 may be set for each of the multiple memory blocks, so that the central processing unit 101 and the coprocessor 102 both perform read and write operations on each memory block through the single memory controller 104.

[0039] Alternatively, a separate memory controller 104 can be set for each memory block, so that when the central processing unit 101 and the coprocessor 102 access different memory blocks, they can do so through the memory controller 104 corresponding to that memory block.

[0040] In some embodiments, the memory controller 104 may be integrated with the processor or processor core, such as with the central processing unit 101 and coprocessor 102, and then connected to the external memory 103 via a communication line. Alternatively, in other embodiments, the memory controller 104 may not be integrated with the processor or processor core. Or, in other embodiments, the memory 103, the memory controller 104, the central processing unit 101, and the coprocessor 102 may all be integrated together. This embodiment mainly describes the structure of the microprocessor architecture proposed in this application and the connection relationships between the structures, without limiting the specific positional relationships between the memory controller, memory, central processing unit, and coprocessor.

[0041] See also Figure 3 In the microprocessor architecture provided in this application embodiment, the central processing unit 101 is connected to the memory controller 104 through the first communication link 105, so that the central processing unit 101 can perform read and write operations on the memory 103 through the first communication link 105 and the memory controller 104.

[0042] Furthermore, in the microprocessor architecture provided in this application embodiment, the coprocessor 102 is connected to the memory controller 104 through the second communication link 106, so that the coprocessor 102 can perform read and write operations on the memory 103 through the second communication link 106 and the memory controller 104.

[0043] In some embodiments, the first communication link 105 in the provided microprocessor architecture can be a consistent interconnect network or a non-consistent interconnect network, and the second communication link 106 can be a non-consistent interconnect network or a consistent interconnect network.

[0044] When the first communication link 105 or the second communication link 106 is a non-uniform interconnection network, it can be a PCIE bus (Peripheral Component Interconnect Express, high-speed serial computer expansion bus), an AXI (Advanced eXtensible Interface) bus, a Wishbone bus (an open-source bus protocol maintained by the OpenCores organization), or the like.

[0045] When the first communication link 105 or the second communication link 106 is a coherent interconnect network, its specific structure is similar to that of a conventional coherent interconnect network. For example, the link may include an address interleaving mapping module and a cache. The specific processing procedure when the coprocessor 102 accesses memory 103 through the second communication link 106 is consistent with the specific processing procedure when the central processing unit 101 accesses memory 103 through the first communication link 105. Both follow the cache coherence protocol. For the specific communication processing procedure, please refer to the memory read / write request processing procedure of the coherent interconnect network in the prior art.

[0046] For example, when the first communication link 105 is a coherent interconnect network, the specific processing procedure when the central processing unit 101 performs read and write operations on the memory 103 through the first communication link 105 and the memory controller 104 is similar to the processing procedure when the CPU accesses memory through the cache coherent network in the existing microprocessor architecture.

[0047] The first communication link 105 uses a consistent interconnect network to ensure data consistency among the cores of the central processing unit 101. In some scenarios, such as when the central processing unit 101 has only one processor core, or when multiple processor cores in the central processing unit 101 maintain data consistency internally through other means, it is not necessary to maintain data consistency through the system interconnect network. Therefore, the first communication link 105 can use a non-consistent interconnect network.

[0048] For the coprocessor 102 in the microprocessor architecture provided in this embodiment, when the second communication link 106 is a coherent interconnect network, the specific processing procedure of the coprocessor 102 when performing read and write operations on the memory 103 through the second communication link 106 and the memory controller 104 is consistent with the specific processing procedure of the central processing unit 101 when performing read and write operations on the memory 103 through the first communication link 105 and the memory controller 104, and both follow the cache coherence protocol of the coherent interconnect network.

[0049] For example, when the coprocessor 102 performs a write operation on memory 103 through the second communication link 106 and the memory controller 104, the implementation process is as follows:

[0050] 1. The coprocessor 102 sends a write request to memory 103 to the second communication link 106.

[0051] 2. The second communication link 106 maps the address of the write request (for example, by using an address hash algorithm or address bit interleaving to control which network node's cache or which memory controller the request goes to) and parses the request type. If the write request is a consistent memory access request, it will proceed to step 3; if it is a non-consistent memory access request, it will proceed to step 7.

[0052] 3. If the write request is cached in the second communication link 106 (the content of the memory unit corresponding to the write request is already in the cache of the second communication link 106), then proceed to step 6; otherwise, proceed to step 4.

[0053] 4. If the write request supports the write allocation strategy, allocate a cache line. Proceed to step 5; otherwise, proceed to step 7.

[0054] 5. Determine whether the data being written is a complete cache line. If it is, proceed to step 6. If it is not a complete cache line, read the corresponding cache line from memory 103 and store it in the newly allocated cache line, and then proceed to step 6.

[0055] 6. Determine if the write strategy is write_back. If so, update the data to the cache line (either entirely or partially), and only write it to memory when the cache line is evicted. If it is write-through (not a write_back strategy), continue with step 7 while updating the cache line.

[0056] 7. The write request is sent to the corresponding DDR controller and then written into memory 103.

[0057] When the coprocessor 102 performs a read operation on the memory 103 through the second communication link 106 and the memory controller 104, the process is as follows:

[0058] 1. The coprocessor 102 sends a read request to the memory 103 to the second communication link 106.

[0059] 2. The second communication link 106 maps the address of the read request (for example, by using an address hash algorithm or address bit interleaving to control which network node's cache or which memory controller the read request goes to) and parses the request type. If the read request is a consistent memory access request, it will proceed to step 3; if it is a non-consistent memory access request, it will proceed to step 4.

[0060] 3. If the read request is cached in the second communication link 106 (the content of the memory unit corresponding to the read request is already in the cache of the second communication link 106), the data is returned directly to the coprocessor 102. If the cache is not hit, proceed to step 4.

[0061] 4. The second communication link 106 sends the read request to the corresponding DDR controller to read the data from memory 103 and return it to the second communication link 106.

[0062] 5. If the read request is an inconsistent memory access request, return the data directly to the coprocessor 102; if it is a consistent memory access request, proceed to step 6.

[0063] 6. If the read request is a read allocate request, the data read from memory 102 is stored in the newly allocated cache line and the data is returned to coprocessor 102. If it is a read non-allocate request, the data read back is directly given to coprocessor 102.

[0064] For the specific processing procedure of the coprocessor 102 performing read and write operations on memory 103 through the second communication link 106 and memory controller 104, please refer to the processing procedure of performing read and write operations on memory through a consistent interconnect network in the prior art. This embodiment will not describe it in detail.

[0065] In another embodiment, see Figure 4 As shown, the memory 103 of the provided microprocessor architecture includes a first memory space 1031 and a second memory space 1032.

[0066] The first memory space 1031 and the second memory space 1032 are respectively portions of the storage space in memory 103. Their sizes can be flexibly set or dynamically adjusted during the operation of the architecture.

[0067] In this embodiment, the first memory space 1031 is configured as a dedicated memory space for the central processing unit 101, while the second memory space 1032 is configured as a dedicated memory space for the coprocessor 102. Based on the above architecture, when the central processing unit 101 performs read and write operations on the memory 103 through the first communication link 105 and the memory controller 104, specifically, the central processing unit 101 performs read and write operations on the first memory space 1031 through the first communication link 105 and the memory controller 104; when the coprocessor 102 performs read and write operations on the memory 103 through the second communication link 106 and the memory controller 104, specifically, the coprocessor 102 performs read and write operations on the second memory space 1032 through the second communication link 106 and the memory controller 104.

[0068] In another embodiment, the first memory space 1031 and the second memory space 1032 may correspond to the same memory controller 104. In this case, the central processing unit 101 is connected to the memory controller 104 via a first communication link 105, and the coprocessor 102 is connected to the memory controller 104 via a second communication link 106. Based on this microprocessor architecture, when the central processing unit 101 needs to perform read / write operations on the memory 103, it sends a memory read / write request for the first memory space 1031 to the memory controller 104 via the first communication link 105. The memory controller 104 then performs read / write operations on the first memory space 1031 based on the memory read / write request sent by the central processing unit 101. When the coprocessor 102 needs to perform read / write operations on the memory 103, it sends a memory read / write request for the second memory space 1032 to the memory controller 104 via the second communication link 106. The memory controller 104 then performs read / write operations on the second memory space 1032 based on the memory read / write request sent by the coprocessor 102. In this embodiment, the specific processing procedure for the memory controller 104 to perform read and write operations on the first memory space 1031 or the second memory space 1032 based on the received memory read and write requests can be found in the prior art.

[0069] In another embodiment, the first memory space 1031 and the second memory space 1032 can each correspond to different memory controllers 104. That is, in the microprocessor architecture provided in this embodiment, dedicated memory controllers 104 are configured for the first memory space 1031 and the second memory space 1032 respectively. In this case, the central processing unit 101 is connected to the memory controller 104 corresponding to the first memory space 1031 through the first communication link 105, and the coprocessor 102 is connected to the memory controller 104 corresponding to the second memory space 1032 through the second communication link 106. Based on this microprocessor architecture, when the central processing unit 101 needs to perform read / write operations on memory 103, it sends a memory read / write request for the first memory space 1031 to the memory controller 104 corresponding to the first memory space 1031 via the first communication link 105. The memory controller 104 corresponding to the first memory space 1031 then performs read / write operations on the first memory space 1031 based on the memory read / write request sent by the central processing unit 101. When the coprocessor 102 needs to perform read / write operations on memory 103, it sends a memory read / write request for the second memory space 1032 to the memory controller 104 corresponding to the second memory space 1032 via the second communication link 106. The memory controller 104 corresponding to the second memory space 1032 then performs read / write operations on the second memory space 1032 based on the memory read / write request sent by the coprocessor 102. In this embodiment, the specific processing procedure for the memory controller 104 to perform read / write operations on the first memory space 1031 or the second memory space 1032 based on the received memory read / write request can be found in the prior art.

[0070] In another embodiment, see Figure 5 As shown, the first communication link 105 and the second communication link 106 in the provided microprocessor architecture have a connection path, thereby enabling the first communication link and the second communication link to be interconnected.

[0071] Based on this architecture, when the first memory space 1031 and the second memory space 1032 correspond to the same memory controller 104, the first communication link 105 and the second communication link 106 are connected through a connection path. This allows the first communication link 105 and the second communication link 106 to perform read and write operations on the specified memory space using another link, thereby improving data interaction efficiency. For example, when the central processing unit 101 performs a write operation on the first memory space 1031 through the first communication link 105, a command control queue for the coprocessor 102 is generated. This queue can be stored in the cache of the first communication link 105 or in the first memory space 1031. The memory read request sent by the coprocessor 102 is transferred to the first communication link 105 through the second communication link 106 to quickly read the command control queue of the coprocessor 102. Alternatively, when the coprocessor 102 performs a write operation on the second memory space 1032 through the second communication link 106, it generates task processing result data. This result data can be stored in the cache of the second communication link 106 (when the second communication link 106 is a coherent interconnect network) or in the second memory space 1032. The memory read request sent by the central processing unit 101 is transferred to the second communication link 106 through the first communication link 105 to quickly read the task processing result data generated by the coprocessor 102.

[0072] In the case where the first memory space 1031 and the second memory space 1032 are respectively associated with dedicated memory controllers 104, and the central processing unit 101 is connected to the memory controller 104 corresponding to the first memory space 1031 via the first communication link 105, and the coprocessor 102 is connected to the memory controller 104 corresponding to the second memory space 1032 via the second communication link 106, the first communication link 105 and the second communication link 106 are connected through a connection path. This allows the central processing unit 101 of the microprocessor architecture to perform read and write operations on the second memory space 1032 via the first communication link 105 and the second communication link 106, or allows the coprocessor 102 to perform read and write operations on the first memory space 1031 via the first communication link 105 and the second communication link 106. This expands the memory access range of the central processing unit 101 and the coprocessor 102 to achieve data interaction when necessary.

[0073] Based on the above embodiments, Figure 5The microprocessor architecture shown allows for flexible configuration or adjustment of the memory access paths of the central processing unit 101 and the coprocessor 102. For example, in some cases, the central processing unit 101 can be configured to access the first memory space 1031 only through the first communication link 105, and the coprocessor 102 can access the second memory space 1032 only through the second communication link 106.

[0074] Alternatively, in certain situations, the central processing unit 101 can be configured to access both the first memory space 1031 and / or the second memory space 1032 via the first communication link 105, and also via the second communication link 106 after switching from the first communication link 105 to the first memory space 1031 and / or the second memory space 1032, thereby allowing the central processing unit 101 to adaptively select the memory access path. Furthermore, the coprocessor 102 can be configured to access both the second memory space 1032 and / or the second memory space 1032 via the second communication link, and also via the first communication link 105 after switching from the second communication link 106 to the first memory link 105.

[0075] visible, Figure 5 The microprocessor architecture shown enables more flexible memory access path selection and control.

[0076] In another embodiment, the memory 103 of the provided microprocessor architecture includes a first memory space 1031 and a second memory space 1032.

[0077] The first memory space 1031 and the second memory space 1032 are respectively portions of the storage space in memory 103. Their sizes can be flexibly set or dynamically adjusted during the operation of the architecture.

[0078] In this architecture, the first memory space 1031 and the second memory space 1032 are configured as memory spaces accessible to the central processing unit 101 via the first communication link 105, and as memory spaces accessible to the coprocessor 102 via the second communication link 106. The second memory space 1032 is configured as memory space accessible to the coprocessor 102 via the second communication link 106. That is, when the central processing unit 101 performs read and write operations on the memory 103 via the first communication link 105, it can perform read and write operations on both the first memory space 1031 and the second memory space 1032 via the first communication link 105. The coprocessor 102 performs read and write operations on the second memory space 1032 through the second communication link 106. Alternatively, in other embodiments, the coprocessor 102 performs read and write operations on the first memory space 1031 by switching from the second communication link 106 to the first communication link 105. The coprocessor 102 is not allowed to directly access the first memory space 1031 through the second communication link 106.

[0079] Based on this architecture, a second memory space 1032 accessible via the second communication link 106 is allocated to the coprocessor 102 within memory 103. This means that a priority memory region accessible via a dedicated second communication link 106 is allocated to the coprocessor 102 within memory 103, reducing memory access latency and increasing memory access bandwidth, thereby improving the performance of the coprocessor 102. The coprocessor 102 accesses the first memory space 1031 via the second communication link 106 to the first communication link 105. The memory access mechanism of the coprocessor 102 via the first communication link 105 is consistent with the memory access mechanism of the central processing unit 101, ensuring data hardware consistency and memory safety.

[0080] Based on this architecture, in some embodiments, the first memory space 1031 and the second memory space 1032 correspond to the same memory controller 104. The central processing unit 101 is connected to the memory controller 104 via a first communication link 105, and the coprocessor 102 is connected to the memory controller 104 via a second communication link 106. When the central processing unit 101 sends a memory read / write request to the memory controller 104 via the first communication link 105, it can send a memory read / write request for either the first memory space 1031 or the second memory space 1032; that is, the memory controller 104 can respond to read / write requests for the entire memory space sent via the first communication link 105. When the coprocessor 102 sends a memory read / write request to the memory controller 104 via the second communication link, it can only send a memory read / write request for the second memory space 1032, and the memory controller 104 only responds to memory read / write requests for the second memory space 1032 sent via the second communication link 106. When the coprocessor 102 needs to access the first memory space 1031, the read / write request can only be responded to by going through the second communication link 106 to the first communication link 105 and then to the memory controller 104. For details on the specific processing procedure of the memory controller 104 in responding to the memory access request, please refer to the prior art.

[0081] In other embodiments, the first memory space 1031 and the second memory space 1032 correspond to different memory controllers 104, the first communication link 105 is connected to the memory controller 104 corresponding to the first memory space 1031 and the memory controller corresponding to the second memory space 1032, and the second communication link 106 is connected to the memory controller 104 corresponding to the second memory space 1032. Based on the above architecture, the central processing unit 101 can send read / write requests for the first memory space 1031 to the memory controller 104 corresponding to the first memory space 1031 via the first communication link 105, and can also send read / write requests for the second memory space 1032 to the memory controller 104 corresponding to the second memory space 1032. Alternatively, in other embodiments, the central processing unit 101 can forward the read / write requests for the second memory space 1032 to the memory controller 104 corresponding to the second memory space 1032 via the second communication link 106 after forwarding the first communication link 105. Meanwhile, the coprocessor 102 can send read / write requests for the second memory space 1032 to the memory controller 104 corresponding to the second memory space 1032 via the second communication link 106. In other embodiments, it can also forward the read / write requests for the first memory space 1031 to the memory controller 104 corresponding to the first memory space 1031 via the second communication link 106 after forwarding the first communication link 105. For the specific processing procedure of the memory controller 104 in responding to memory access requests, please refer to the prior art.

[0082] In the aforementioned microprocessor architecture, a priority memory region is allocated to the coprocessor 102, accessible via the second communication link 106. By establishing communication paths for the central processing unit 101 and the coprocessor 102 to access different memory spaces, read and write access to these spaces by both the central processing unit 101 and the coprocessor 102 is achieved, enabling rapid data exchange between them. This architecture improves the memory access bandwidth and reduces the memory access latency of the coprocessor 102, largely avoiding competition for memory access resources between the central processing unit 101 and the coprocessor 102, reducing data copying operations, and ensuring memory safety.

[0083] In another embodiment, see Figure 6 As shown, the coprocessor 102 in the provided microprocessor architecture is also connected to the first communication link 105, which is connected to the memory controller 104. Therefore, the microprocessor architecture can realize that the transaction requests for the coprocessor 102 to perform read and write operations on the first memory space 1031 are communicated through the first communication link 105.

[0084] The second communication link 106 in this architecture is connected to the memory controller 104. Therefore, this microprocessor architecture can realize the transaction requests of the coprocessor 102 to perform read and write operations on the second memory space 1032 through the second communication link.

[0085] Based on this architecture, when the coprocessor 102 accesses the first memory space 1031, it can send read / write requests to the first memory space 1031 through the first communication link 105; when the coprocessor 102 accesses the second memory space 1032, it can send read / write requests to the second memory space 1032 through the second communication link 106. The first communication link 105 is typically a hardware-coherent interconnect network. Therefore, this method enables the coprocessor 102 and the central processing unit 101 to share data through the cache in the first communication link 105 or the first memory space 1031, thereby reducing data copying and shortening read / write access latency, which is significant for small batches of data that require frequent interaction. The security mechanism for the coprocessor 102 accessing the first memory space 1031 through the first communication link 105 is consistent with the security mechanism for the central processing unit 101 accessing the first memory space 1031, ensuring the security of data stored by the central processing unit 101 in the first memory space 1031 and avoiding security threats from read / write operations by the coprocessor 102.

[0086] In another embodiment, a link switch can be added between the coprocessor 102 and the first communication link 105 and the second communication link 106 to control the on / off state of the first communication link 105 and / or the second communication link 106. In this architecture, the memory access path of the coprocessor 102 can be flexibly controlled through the link switch. For example, when the link switch controls the second communication link 106 to be off, access requests are not routed to the second communication link 106, and the coprocessor 102 can only access the first memory space 1031 and the second memory space 1032 through the first communication link 105 (in this scenario, memory space partitioning is not required), ensuring that the data between the coprocessor 102 and the central processing unit 101 is kept consistent by hardware. When the link switch controls the first communication link 105 to be off, access requests issued by the coprocessor 102 are not routed to the first communication link 105, and the coprocessor 102 can only access the memory space 103 through the second communication link 106, giving the coprocessor 102 an independent communication channel when reading and writing memory, thereby improving memory access bandwidth and shortening memory access latency. This implementation can expand the flexibility of the coprocessor 102 in accessing memory. In scenarios where data security and data hardware consistency are important, the coprocessor 102 can access memory through the first communication link 105. In scenarios where memory access bandwidth and computing performance are important, the coprocessor 102 can access memory through the second communication link 106.

[0087] In another embodiment, see Figure 7 As shown, the provided microprocessor architecture also includes a mapping module 107. The input of the mapping module 107 is connected to the output of the coprocessor 102, and the output of the mapping module is connected to the first communication link 105 and the second communication link 106, respectively. Based on the above architecture, the mapping module 107 can receive memory access requests sent by the coprocessor 102, match the memory access requests sent by the coprocessor 102 to the memory space according to preset values, and then select to send the requests to the first communication link 105 or the second communication link 106 according to the space configuration values.

[0088] For example, if a memory access request sent by coprocessor 102 matches a read / write request for the second memory space 1032, the mapping module 107 sends the memory access request to the second communication link 106, and the memory access request is then sent to the second memory space 1032 via the second communication link 106. If a memory access request sent by coprocessor 102 matches an access request for the first memory space 1031, the mapping module 107 sends the memory access request to the first communication link 105, and the memory access request is then sent to the first memory space 1031 via the first communication link 105. This implementation greatly expands the flexibility of coprocessor 102 in accessing memory. Depending on application requirements, the data that the coprocessor 102 needs to process in batches can be stored in the second memory space 1032 and accessed through the second communication link 106 to improve memory access bandwidth and reduce memory access latency, so as to fully utilize the energy efficiency of the coprocessor 102. The private data of the central processing unit 101 and the data that the coprocessor 102 frequently interacts with the central processing unit 101 are stored in the first memory space 1031, so as to achieve secure management of memory data and ensure cache consistency of frequently interacting data between the coprocessor 102 and the central processing unit 101. This implementation can significantly improve the overall energy efficiency of the processor system, that is, improve performance while reducing power consumption, while the cache consistency and security features on the CPU side are not affected.

[0089] In another embodiment, the mapping module in the microprocessor architecture includes a configuration register, a space matching module, and a link routing switch.

[0090] The configuration register stores address information for the first memory space 1031 and the second memory space 1032. Specifically, it may include the memory address range, start address, and / or end address of the first memory space 1031 and the second memory space 1032. The address information in the configuration register can be configured by the central processing unit 101. That is, the central processing unit 101 divides the first memory space 1031 and the second memory space 1032 from the memory 103 and writes the address information of the divided first memory space 1031 and the second memory space 1032 into the configuration register.

[0091] The input of the spatial matching module is connected to the output of the coprocessor 102, the output of the spatial matching module is connected to the link routing switch, and the spatial matching module is also connected to the configuration register, enabling it to read data from the configuration register.

[0092] When the space matching module receives a memory access request sent by the coprocessor 102, it compares the destination address of the memory access request with the addresses of the first memory space 1031 and the second memory space 1032 stored in the configuration register to determine whether the target memory space requested by the memory access request is in the first memory space 1031 or the second memory space 1032.

[0093] In some embodiments, after the space matching module determines the target memory space corresponding to the memory access request, it adds a routing identifier to the bus of the memory access request. For example, if the target memory space corresponding to the memory access request is the first memory space 1031, the space matching module adds a first identifier to the bus of the memory access request; if the target memory space corresponding to the memory access request is the second memory space 1032, the space matching module adds a second identifier to the bus of the memory access request. Then, the space matching module sends the memory access request with the added identifier to the link routing switch.

[0094] The link routing switch is connected to the first communication link 105 and the second communication link 106 respectively. It can control the connection between the first communication link 105 and the second communication link 106, thereby controlling whether the request signal is sent to the first communication link 105 or the second communication link 106.

[0095] When the space matching module sends the memory access request with an added identifier to the link routing switch, the link routing switch determines the target memory space through the identifier in the memory access request, then controls the first communication link 105 or the second communication link 106 connected to the memory controller of the target memory space to be turned on, and after removing the identifier in the memory access request, sends the memory access request to the turned-on first communication link 105 or second communication link 106.

[0096] In another embodiment, the mapping module in the provided microprocessor architecture further includes a space configuration module for dividing a first memory space 1031 and a second memory space 1032 from memory 103, and storing the address information of the first memory space 1031 and the second memory space 1032 in a configuration register. In other embodiments, the central processing unit 101 can update the memory space requirements of the coprocessor 102 to the space configuration module in real time, and flexibly allocate the first memory space 1031 and the second memory space 1032 to both based on balancing their memory space requirements. This allows the memory access range of the central processing unit 101 and the coprocessor 102 to be dynamically adjusted according to their business needs, thereby meeting their real-time memory requirements and improving the overall system performance.

[0097] Based on the above embodiments, as a more specific example, in a microprocessor architecture, the first communication link 105 adopts a consistent interconnect network, the second communication link 106 adopts a non-consistent interconnect network, a first memory space 1031 and a second memory space 1032 are configured in the memory 103, the central processing unit 101 communicates with the first memory space 1031 and the second memory space 1032 through the first communication link 105, the coprocessor 102 communicates with the second memory space 1032 through the second communication link 106, and the coprocessor 102 communicates with the first memory space 1031 through the first communication link 105.

[0098] In this microprocessor architecture, the central processing unit 101 can access the entire memory space of the memory 103 through the first communication link 105. The specific memory access process can be referred to as the conventional CPU's memory access process through a cache coherence network.

[0099] In this microprocessor architecture, the coprocessor 102 can also access the first memory space 1031 in the memory 103 through the first communication link 105. Its specific memory access process is basically the same as the conventional CPU's memory access process through the cache coherence network.

[0100] The coprocessor 102 accesses the second memory space 1032 in the memory 103 through the second communication link 106. At this time, the second communication link 106 can send the access request to the second memory space 1032 sent by the coprocessor 102 directly to the second memory space 1032 through the memory controller 104. This path is not affected by factors such as the bandwidth of the first communication link, the size of the cache line, and the cache consistency maintenance delay, so as to achieve faster access to the second memory space 1032.

[0101] In the above embodiments, the structure of the provided microprocessor architecture and the connection relationships between its various components have been described and explained. The processing methods or procedures involved in the above embodiments are intended to exemplarily illustrate the functions that each part of the provided microprocessor architecture can achieve, so that those skilled in the art can understand the functions and beneficial effects that these structures and connections can achieve while understanding the structure and connection relationships of the microprocessor architecture provided in each embodiment. Furthermore, these functions are not improved solutions; the specific processing procedures for implementing these functions can be found in relevant processing procedures in the prior art.

[0102] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. The modules and sub-modules in the devices and terminals of the various embodiments of this application can be merged, divided, and deleted according to actual needs, and the features described in each embodiment can be replaced or combined.

[0103] In the embodiments provided in this application, it should be understood that the disclosed terminals and devices can be implemented in other ways. For example, the division of modules or sub-modules is merely a logical functional division; in actual implementation, there may be other division methods. For instance, multiple sub-modules or modules may be combined or integrated into another module, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections through some interfaces, devices, or modules, and may be electrical, mechanical, or other forms.

[0104] The modules or submodules described as separate components may or may not be physically separate. The components that constitute a module or submodule may or may not be physical modules or submodules; that is, they may be located in one place or distributed across multiple network modules or submodules. Some or all of the modules or submodules can be selected to achieve the purpose of this embodiment's solution, depending on actual needs.

[0105] In addition, the functional modules or sub-modules in the various embodiments of this application can be integrated into one processing module, or each module or sub-module can exist physically separately, or two or more modules or sub-modules can be integrated into one module.

[0106] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that an article or device comprising a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such an article or device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device that includes said element.

[0107] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A microprocessor architecture, characterized by include: Central processing unit, coprocessor, memory, and memory controller corresponding to the memory; The central processing unit is connected to the memory controller via a first communication link to enable read and write operations on the memory through the first communication link; The coprocessor is connected to the memory controller via a second communication link to perform read and write operations on the memory through the second communication link.

2. The microprocessor architecture of claim 1, wherein, The memory includes a first memory space and a second memory space; The central processing unit performs read and write operations on the memory through the first communication link, including: the central processing unit performs read and write operations on the first memory space through the first communication link; The coprocessor performs read and write operations on the memory through the second communication link, including: the coprocessor performs read and write operations on the second memory space through the second communication link.

3. The microprocessor architecture of claim 2, wherein, The first communication link and the second communication link have a connection path to enable the central processing unit to perform read and write operations on the first memory space and / or the second memory space through the first communication link and the second communication link, and / or enable the coprocessor to perform read and write operations on the first memory space and / or the second memory space through the second communication link and the first communication link.

4. The microprocessor architecture of claim 1, wherein, The memory includes a first memory space and a second memory space; The central processing unit performs read and write operations on the memory through the first communication link, including: the central processing unit performs read and write operations on the first memory space and the second memory space through the first communication link; The coprocessor performs read and write operations on the memory through the second communication link, including: the coprocessor performs read and write operations on the second memory space through the second communication link.

5. The microprocessor architecture of claim 4, wherein, The coprocessor is also connected to the first communication link to enable the coprocessor to perform read and write operations on the first memory space through the first communication link.

6. The microprocessor architecture of claim 5, wherein, A link switch is provided on the second communication link to control the connection and disconnection of the second communication link.

7. The microprocessor architecture of claim 5, wherein, The microprocessor architecture also includes a mapping module; The input terminal of the mapping module is connected to the coprocessor, and the output terminal of the mapping module is connected to the first communication link and the second communication link respectively. The mapping module is used to send the memory access request sent by the coprocessor to the first communication link or to the second communication link.

8. The microprocessor architecture of claim 7, wherein, The mapping module includes a configuration register, a space matching module, and a link routing switch; The configuration register is used to store the address information of the first memory space and the second memory space; The space matching module is used to receive the memory access request sent by the coprocessor, compare the destination address of the memory access request with the address information stored in the configuration register, and determine the target memory space corresponding to the memory access request. The link routing switch is used to route the memory access request to the first communication link or the second communication link connected to the memory controller of the target memory space, according to the target memory space corresponding to the memory access request.

9. The microprocessor architecture of any of claims 1 to 8, wherein, The first communication link is a consistent interconnect network or a non-consistent interconnect network, and the second communication link is a non-consistent interconnect network or a consistent interconnect network.

10. The microprocessor architecture of any one of claims 1 to 7, wherein, The coprocessor includes at least one of a graphics processor, a general-purpose computing graphics processor, a neural network processor, a deep learning processor, and a video processor.