Quad-core processor system built in quad-core structure and data switching method thereof

A nuclear processor and data exchange technology, applied in the direction of machine execution devices, concurrent instruction execution, etc., can solve the problems of energy consumption, high power consumption, lack of high performance, low power consumption, etc., to improve data exchange efficiency and improve execution Efficiency, the effect of exploiting parallelism

Active Publication Date: 2014-04-23
SHANGHAI JIAO TONG UNIV
2 Cites 6 Cited by

AI-Extracted Technical Summary

Problems solved by technology

The problem of isomorphic design is: as the number of cores continues to increase, how to keep the data of each core consistent; how to meet the storage access and input/output access requirements of the core; Lower processors; how to balance the load and task coordination of several processors, etc.
[0018] Although multiple cores on a quad-core chip each execute their own code, data sharing and synchronization between different cores may be required, so the performance of the on-chip communication structure will directly affect the performance of the processor
[0024] A bottleneck of tra...
View more

Method used

In sum, the present invention has utilized the parallelism of algorithm greatly, has improved the execution efficiency of algorithm, adopts quad-core structure to build quad-core processor, and each microprocessor core all adopts reduced instruction set architecture microprocessor as a prototype, and make corresponding improvements, including the introduction of shared registers, adding configuration registers and configuration instructions, adding left shift operation functions in the arithmetic logic unit, modifying the position of branch instructions, through shared registers and in microprocessors The two data exchange methods of building a multi-layer bus between the core and the external data memory establish the data path between the cores of the quad-core processor, improve the performance of the quad-core processor when processing data in parallel, and improve the efficiency of data exchange.
Preferably, each microprocessor core also includes configuration register, is used to configure the connection mode of the shared register file of the microprocessor core of belonging, to improve the flexibility of this structure, simultaneously in each microprocessor core In terms of the instruction set used, configuration instructions are added to support the specific implementation of configuration.
The pipeline control module is used to control the pipeline, that is, according to the input signal from the execution module, a corresponding pause signal is pr...
View more

Abstract

The invention provides a quad-core processor system built in a quad-core structure and a data switching method thereof. The system processing data in a single-program-segment multi-data manner comprises four microprocessor cores; each microprocessor core comprises an instruction memory, an intra-core data memory, and a central processing unit; the instruction memory is used for storing an instruction; the intra-core data memory is used for storing data; the central processing unit is used for executing corresponding operations according to input instructions and data and updating a register file in the central processing unit and an external data memory. Execution efficiency of an algorithm is improved by making use of parallelism in the algorithm; in addition, inter-core data paths are established in a quad-core processor by two data switching manners, namely sharing a register and setting multiple layers of buses between the microprocessor cores and the external data memory, the performance of the quad-core processor in parallel data processing is improved, and data switching efficiency is improved.

Application Domain

Concurrent instruction execution

Technology Topic

Multi-core processorInstruction memory +11

Image

  • Quad-core processor system built in quad-core structure and data switching method thereof
  • Quad-core processor system built in quad-core structure and data switching method thereof
  • Quad-core processor system built in quad-core structure and data switching method thereof

Examples

  • Experimental program(2)

Example Embodiment

[0063] Example one
[0064] Such as figure 1 As shown, the present invention provides a quad-core processor system built with a quad-core structure, which processes data in a single program segment and multiple data mode, that is, all microprocessor cores strictly execute the same program segment at the same time and process multi-dimensional data in parallel. The system includes 4 microprocessor cores with reduced instruction set architecture, among which,
[0065] Each microprocessor core includes:
[0066] Instruction memory, used to store instructions;
[0067] In-core data memory for storing data;
[0068] The central processing unit is used to perform corresponding operations according to the input instructions and data, and update the internal register file and external data storage of the central processing unit.
[0069] Preferably, the central processing unit includes:
[0070] The instruction fetch module is used to fetch instructions from the instruction memory in the current cycle according to the current pointer value, and calculate the pointer value in the next cycle;
[0071] The decoding module is used to decode the instruction from the instruction fetching module and generate all the control signals required by the arithmetic logic unit, comparator and register file module;
[0072] Arithmetic logic unit, used for operation, receiving data from the register file and the data memory in the core, and sending the write enable signal and data to be written to the register file and the data memory in the core;
[0073] Comparator, used to receive the output from the register file, and judge whether the jump instruction occurs according to the received output, if the jump occurs, the address of the jump instruction is calculated through the arithmetic logic unit, and the address is sent to the fetch instruction Module
[0074] The register file is used to receive data from the data memory, arithmetic logic unit, and comparator in the core, and can send data to the arithmetic logic unit and comparator at the same time;
[0075] The pipeline control module is used to control the pipeline, that is, according to the input signal from the execution module, provide corresponding stop signals to the instruction fetching module, the decoding module, the arithmetic logic unit and the comparator to ensure the smooth operation of the pipeline.
[0076] Preferably, each register file includes a local register file and a shared register file, wherein,
[0077] The local register file is used for the closed operation of the data in the core. During the operation, there is no interaction with the data outside the core. The local microprocessor core has full read and write permissions to its local register file;
[0078] The shared register file is used to interconnect with the shared registers of other microprocessor cores outside the core to realize data interaction between the microprocessor cores. The local microprocessor core has read and write permissions to its shared register file According to the application needs, they are allocated to the local microprocessor core or other microprocessor cores. Specifically, the modification to the register file in this embodiment is as follows: figure 2 As shown, the internal structure of the shared register file is as follows image 3 As shown, the register file of the original Opfal processor is divided into two parts: the local register file and the shared register file.
[0079] Preferably, each local register file is divided into two groups, each group has a read port and a write port, wherein the two groups of register files receive different read address signals and give corresponding read values; the two groups of register files receive The same write address and data input signal to ensure that the contents of the two sets of register files are consistent. Specifically, the local register file includes 16 registers, the corresponding register numbers are register 0 to register 15, and each register is 32 bits. The local register file is used for the closed operation of the data in the core. During the operation, there is no interaction with the data outside the core. The local core has full read and write permissions on the local register file. The local register file is divided into two groups, each group has a read port and a write port. The two sets of register files receive different read address signals and give corresponding read values; receive the same write address and data input signal to ensure that the contents of the two sets of register files are consistent. The shared register file includes 4 registers, each of which is 32 bits, and the corresponding register numbers are register 16 to register 19 (also called shared register 0 to shared register 3). There is a special interconnection mode between the shared register file and the shared registers of other cores outside the core to realize data interaction between cores. The local kernel has read permissions on the shared register file, and write permissions are allocated to local or other kernels according to application needs. The shared register file has two read ports and four write ports, and can accept write signals from four different cores at most.
[0080] Preferably, each microprocessor core also includes a configuration register, which is used to configure the connection mode of the shared register file of the microprocessor core to improve the flexibility of the structure. At the same time, the instructions used in each microprocessor core Add configuration instructions to the collection to support the specific implementation of configuration.
[0081] Preferably, the data exchange path between the 4 microprocessor cores is such as Figure 4 As shown, each microprocessor core exchanges data in the following two ways:
[0082] One way is for each microprocessor core to access external data memory through a multi-layer bus structure;
[0083] Another way is to exchange data between cores through the shared register file of each microprocessor core. Specifically, the shared register file establishes a direct data path between the four cores, and at the same time, the connection mode of each path can be flexibly defined through the configuration register, achieving the purpose of realizing the exchange of a small amount of data between the cores.
[0084] Preferably, the multi-layer bus is a cross switch set between the microprocessor core and the external data memory. The four microprocessor cores select the external data memory through different buses. If the selected external data memory is If they are all different, the 4 microprocessor cores perform synchronous transmission; if the selected external data memory is the same, the microprocessor core is selected for priority transmission according to the preset priority rule. Specifically, the multi-layer bus is used to set a cross switch between a master device and a slave device. Multiple master devices select slave devices through different buses. If the selected slave devices are all different, then multiple master devices Transmission can be synchronized; if the selected slave devices are the same, the master device is selected according to the priority rule specified in the design for priority transmission.
[0085] Detailed, such as figure 1 As shown, the structure of the multi-layer bus can include an input module 11, a decoder module 12, an arbiter module 13, and a steering module 14, which realizes 4 master devices (microprocessor cores) to 4 slave devices (external Data storage), where
[0086] The input module 11 temporarily stores the read and write control and data signals from the microprocessor core, intercepts the upper two bits of the address signal, as the selection signal of the external data memory; intercepts the lower 12 bits of the address signal, and shifts it by 2 bits to the right Then output a new address signal that matches the address input port of the external data memory, and output write data and write enable signals at the same time;
[0087] The decoder module 12 receives the selection signal of the external data memory from the input terminal, determines which external data memory is selected for the read and write operation of the microprocessor core, and sets the corresponding selection output to 1. In addition, it receives 4 external data memories. The read data of the data memory is selected according to the decoded external data memory to select the selection signal of the external data memory to select the correct read data and send it to the microprocessor core;
[0088] The arbiter module 12 is used to arbitrate the authority of the bus. When multiple main modules (microprocessor cores) simultaneously request to occupy the shared bus for data communication, the arbitration algorithm allocates the bus resources and determines the right to use the bus resources. The arbitration algorithms include polling, fixed priority, time division multiplexing, lotto algorithm, random contention arbitration algorithm, etc. In this design, in order to improve the efficiency of arbitration, a simpler algorithm and relatively low-cost polling method is selected as the arbitration sequence. The module receives the selection of 0, 1, 2 three-way read and write data, control signals and external data memory Signals are arbitrated according to the polling rules, and finally a set of read-write control signals are selected and output to the corresponding external data storage, and the corresponding hold signals are returned to other signal sources applying for arbitration to inform them that the bus application failed.
[0089] The steering module 14 selects a group of the through control and data signal and the arbitration control data signal to output to the external data storage according to the through selection signal. If the through selection signal is 1, the output is the through signal group; otherwise, it is the arbitration signal group.
[0090] Preferably, the instruction set used by each microprocessor core includes arithmetic operation instructions, logic operation instructions, branch instructions, and access instructions.
[0091] For other details of the first embodiment, please refer to the corresponding part of the first embodiment, which will not be repeated here.
[0092] This embodiment makes great use of the parallelism of the algorithm and improves the execution efficiency of the algorithm. A quad-core structure is used to build a quad-core processor, and each microprocessor core uses a simplified instruction set architecture microprocessor as a prototype. Corresponding improvements include the introduction of shared registers, the addition of configuration registers and configuration instructions, the addition of a left-shift operation function in the arithmetic logic unit, and the modification of the location of branch instructions. Through shared registers and between the microprocessor core and the external data memory The two data exchange methods of building a multi-layer bus between the two establish the data path between the cores of the quad-core processor, improve the performance of the quad-core processor when processing data in parallel, and increase the efficiency of data exchange.

Example Embodiment

[0093] Example two
[0094] Such as Figure 5 As shown, the present invention also provides another data exchange method, using the quad-core processor system described in Embodiment 1, and the method includes:
[0095] Step S1, the configuration register of each microprocessor core is initialized according to the parallel code of the specific application, that is, the configuration register of each microprocessor core is set according to the configuration instructions; specifically, it is initialized according to the parallel code of the specific application. Configuration register. That is, write configuration information into the configuration registers inside the four cores through configuration instructions;
[0096] Step S2, data exchange between the external data memory and the microprocessor core, the initial data exchange is the external data memory and write data to the register file of the microprocessor core, and then there will be an external data memory and the microprocessor The process of repeated data exchange between the cores; specifically, the process of the initial operation is the data memory writes data to the core register, and there may be a process of repeated data exchange between the memory and the core during the operation;
[0097] Step S3, the exchange of data between cores is realized through the shared register file of each microprocessor core. Specifically, in terms of core operation and data exchange between cores, the shared registers between the microprocessor cores also provide a path for data exchange. During the operation, the data path needs to be used as much as possible through the analysis of the algorithm to improve the efficiency of the operation. . According to different applications, step S2 and step S3 may be repeated.
[0098] To sum up, the present invention greatly utilizes the parallelism of the algorithm and improves the execution efficiency of the algorithm. A quad-core structure is used to build a quad-core processor, and each microprocessor core uses a simplified instruction set architecture microprocessor as a prototype. And make corresponding improvements to this, including the introduction of shared registers, adding configuration registers and configuration instructions, adding a left shift operation function in the arithmetic logic unit, modifying the position of branch instructions, through shared registers, and in the microprocessor core and external The two data exchange methods of building a multi-layer bus between the data memories of the quad-core processor establish a data path between the cores of the quad-core processor, improve the performance of the quad-core processor when processing data in parallel, and increase the efficiency of data exchange.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Method for allocating tasks to operator and device thereof

Owner:BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1

Methods and systems for external connection of tables of database

ActiveCN110175202AReduce network trafficImprove execution efficiency
Owner:BEIJING OCEANBASE TECH CO LTD

User identity authentication method and device

Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products