A data processing method and apparatus

By directly managing memory access through the processor core, the problem of low resource utilization in many-core processors is solved, and more efficient memory resource utilization is achieved.

CN122309119APending Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In many-core processors, the time required for operating system management results in low resource utilization of processor cores, making it impossible to fully utilize computing resources.

Method used

By directly managing memory access through the processor core, it avoids relying on a third-party operating system and uses physical addresses for memory access management, thus explicitly controlling memory access of other processor cores.

Benefits of technology

It improves the resource utilization of many-core processors, saves memory access management time, and achieves more efficient memory resource utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309119A_ABST
    Figure CN122309119A_ABST
Patent Text Reader

Abstract

This application provides a data processing method and apparatus, relating to the field of communications, to improve the resource utilization of many-core processors. In this method, a first device obtains a first instruction from a first processor core, the first instruction indicating a switch to a first mode, in which the first processor core manages memory access; the first device switches to the first mode; the first device obtains first configuration information of the first processor core, the first information including the physical address of a first memory, the first memory being a local address; the first device accesses the first memory according to the first information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of communications, and more particularly to a data processing method and apparatus. Background Technology

[0002] As chip manufacturing processes have evolved, more and more transistors can be accommodated in the same size chip. Therefore, general-purpose processors have introduced more and more processor cores to increase processing power. For example, there are now processors with more than 100 central processing unit (CPU) cores, which can be configured in servers. Processors containing many simpler, independent processor cores (ranging from dozens to thousands or more) are called many-core processors. Many-core processors are a special type of multi-core processor designed for highly parallel processing and are widely used in embedded computers and high-performance computing.

[0003] Many-core processors typically contain at least dozens of processor cores, all managed by an operating system (OS). Synchronization between processor cores and shared memory access require OS intervention. Because OS management takes time, the computing power and other resources of the processor cores cannot be utilized during this period, resulting in low resource utilization for many-core processors.

[0004] Therefore, how to improve the resource utilization of many-core processors is an urgent technical problem to be solved. Summary of the Invention

[0005] This application provides a data processing method and apparatus that manages memory access of other processor cores in a many-core processor through a processor core. It can directly access memory without relying on a third-party OS system, saving the time of memory access management, thereby shortening the time when the computing power and other resources of the processor core cannot be utilized and improving the resource utilization of the many-core processor. In addition, this application can explicitly control the processor core's access to memory, which can make more efficient use of memory resources and improve the utilization rate of memory resources of the many-core processor.

[0006] This application provides a data processing method in its first aspect. This method is executed by a first device, or by a component of the first device (e.g., a processor, chip, or chip system), or by a logic module or software capable of implementing all or part of the functions of the first device. In the first aspect and its possible implementations, the data processing method is described as being executed by a first device. The first device obtains a first instruction from a first processor core, which instructs switching to a first mode in which the first processor core manages memory access; the first device switches to the first mode; the first device obtains first configuration information from the first processor core, including the physical address of a first memory, which is a local address; the first device accesses the first memory according to the first information.

[0007] In the first aspect, firstly, the first processor core can manage memory access of the first device. For example, the first device can be configured with a many-core processor, and the first processor core can be a processor core in that many-core processor. The first processor core can manage memory access of other processor cores in the first device. In the first aspect, there is no need to rely on a third-party OS system to manage memory access; instead, memory access is managed directly through the processor core, saving the time spent managing memory access and thus shortening the time when the processor core's computing power and other resources are unusable, improving the resource utilization rate of the many-core processor. Furthermore, the first processor core is configured with first information, which includes the physical address of the first memory. That is, in the first aspect, the physical address of the memory is visible to the processor cores in the first device. The first processor core can explicitly control memory access by other processor cores, enabling more efficient use of memory resources and improving the utilization rate of memory resources by the many-core processor. From these two aspects, it can be seen that this application can improve the resource utilization rate of the many-core processor.

[0008] Optionally, the first device in this application can be a server, computer, or other similar device. For example, the first device can be an embedded computer or a high-performance computer. This first device is equipped with a many-core processor.

[0009] Optionally, the first processor core may be a processor core in the first device.

[0010] Optionally, the first device may also include other processor cores besides the first processor core. The following description takes the processor core that switches to the first mode in the first device as the second processor core as an example.

[0011] It should be noted that the first aspect is described using the first device as the execution subject as an example. More specifically, method 100 and its related implementation can be executed by the second processor core.

[0012] In one alternative implementation of the first aspect, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0013] Based on the above implementation, the physical address of the L1 cache, L2 cache, or L3 cache in this application are all visible to the processor core in the first device. The processor core in the first device can explicitly use the L1 cache, L2 cache, and L3 cache to avoid resource usage restrictions, such as the problem of cache resource usage restrictions caused by the opacity of cache resources.

[0014] In one alternative implementation of the first aspect, the physical address of the first-level cache and the physical address of the second-level cache correspond to local memory spaces, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0015] Based on the above implementation, the first device can directly access the local memory space through the physical addresses of the first-level cache and the second-level cache. That is, in the first aspect, the physical addresses of the first-level cache and the second-level cache are locally visible, while the physical address of the third-level cache is a clustered address. This memory address setting in the first aspect can save the overhead of the cache coherence protocol.

[0016] In one alternative implementation of the first aspect, the first device performs an atomic operation based on the physical address of the three-level cache to synchronize task information.

[0017] In the above implementation, the processor cores of the first device can achieve inter-core synchronization, which can realize the synchronization of task information more efficiently.

[0018] In one optional implementation of the first aspect, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0019] In the above implementation, the "identifier of the processor core that switched to the first mode" in the first information can notify the first device which processor cores should enter the first mode. The "identifier of the processor cores in the same cluster" can help the first device to perform clustering. The information such as the capacity and latency of the first-level cache, second-level cache, and third-level cache in the first information can help the processor cores in the first device determine the appropriate cache space and improve the caching performance of the data in the processor cores of the first device.

[0020] In one alternative implementation of the first aspect, the first device sends second information to the first processor core, the second information including: information or physical topology information indicating whether the first mode is enabled, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0021] In the above implementation, the information used to indicate whether the first mode is enabled can inform the first processor core that some processor cores in the first device have the ability to enter the first mode. The physical topology information can help the first processor core to better manage memory access.

[0022] In one alternative implementation of the first aspect, the first device obtains a second instruction from the first processor core, the second instruction being used to instruct a first code segment, the first code segment instructing an exit from the first mode and entry into a second mode, the physical address of memory in the second mode being a global address; the first device executes the first code segment.

[0023] The above implementation provides an exit mechanism for the first mode, whereby the first processor core controls the processor core in the first device to exit the second mode and enter the first mode, which is the normal mode, via a second instruction. In this first aspect, the first processor core can control the mode selection of the first device without relying on a third-party OS system, allowing for more direct and efficient management of the processor core's memory access process.

[0024] Optionally, the first device can switch from the second mode to the first mode. The difference between the second mode and the first mode is that the memory access address is globally visible in the second mode, while the memory access address is locally visible in the first mode.

[0025] Optionally, after the second processor core of the first device enters the first mode, the second processor core will enter a waitfor_setup state, that is, wait for the first processor core to build the accelerator hardware and software environment. In the first mode, the first device will flush the cache of each level of the second processor core to system memory.

[0026] A second aspect of this application provides a data processing method, which is executed by a second device, or by a component (e.g., a processor, chip, or chip system) of a first device, or by a logic module or software capable of implementing all or part of the functions of the first device. In the first aspect and its possible implementations, the data processing method is described as being executed by a first device as an example: the first device obtains a first instruction from a first processor core, the first instruction indicating a switch to a first mode in which the first processor core manages memory access; the first device switches to the first mode; the first device obtains first configuration information of the first processor core, the first information including the physical address of the first memory, the first memory being a local address; the first device accesses the first memory according to the first information.

[0027] In the first aspect, firstly, the first processor core can manage memory access of the first device. For example, the first device can be configured with a many-core processor, and the first processor core can be a processor core in that many-core processor. The first processor core can manage memory access of other processor cores in the first device. In the first aspect, there is no need to rely on a third-party OS system to manage memory access; instead, memory access is managed directly through the processor core, saving the time spent managing memory access and thus shortening the time when the processor core's computing power and other resources are unusable, improving the resource utilization rate of the many-core processor. Furthermore, the first processor core is configured with first information, which includes the physical address of the first memory. That is, in the first aspect, the physical address of the memory is visible to the processor cores in the first device. The first processor core can explicitly control memory access by other processor cores, enabling more efficient use of memory resources and improving the utilization rate of memory resources by the many-core processor. From these two aspects, it can be seen that this application can improve the resource utilization rate of the many-core processor.

[0028] A second aspect of this application provides a data processing method, which is executed by a second device, or by some components of the second device (e.g., a processor, chip, or chip system), or by a logic module or software capable of implementing all or part of the functions of the second device. In this second aspect and its possible implementations, the data processing method is described as being executed by a second device. The second device sends a first instruction to a second processor core, which instructs the second processor core to switch to a first mode, in which the second device manages access to the memory of the second processor core; the second device configures first information, which includes the physical address of a first memory location, wherein the physical address of the first memory location is a local address; the second device sends the first information to the second processor core.

[0029] Secondly, firstly, the second device can manage memory access of the second processor core. In this second aspect, memory access management does not rely on a third-party OS system; instead, it directly manages memory access through the processor core, saving time spent on memory access management and thus reducing the time when the processor core's computing power and other resources are unusable, improving the resource utilization of the many-core processor. Furthermore, the second device is configured with first information, which includes the physical address of the first memory. That is, in this second aspect, the physical address of the memory is visible to the processor cores in the second device. The second device can explicitly control memory access by other processor cores, enabling more efficient use of memory resources and improving the utilization rate of memory resources by the many-core processor. From these two aspects, it is clear that this application can improve the resource utilization rate of the many-core processor.

[0030] In one alternative implementation of the second aspect, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0031] Based on the above implementation, the physical address of the L1 cache, L2 cache, or L3 cache in this application is visible to the processor core in the second processor core. The processor core in the second processor core can explicitly use the L1 cache, L2 cache, and L3 cache to avoid resource usage restrictions, such as the problem of cache resource usage restrictions caused by the opacity of cache resources.

[0032] In one alternative implementation of the second aspect, the physical address of the first-level cache and the physical address of the second-level cache correspond to local memory spaces, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0033] Based on the above implementation, the second processor core can directly access its local memory space through the physical addresses of the L1 and L2 caches. That is, in the first aspect, the physical addresses of the L1 and L2 caches are locally visible, while the physical address of the L3 cache is a clustered address. This memory address configuration in the first aspect can save on the overhead of the cache coherence protocol.

[0034] In one alternative implementation of the second aspect, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0035] In the above implementation, the "identifier of the processor core that switched to the first mode" in the first information can notify the second processor core which processor cores should enter the first mode. The "identifier of the processor cores in the same cluster" can help the second processor core to perform clustering. The information such as the capacity and latency of the first-level cache, second-level cache, and third-level cache in the first information can help the processor cores in the second processor core determine the appropriate cache space and improve the caching performance of the data in the processor cores in the second processor core.

[0036] In one alternative implementation of the second aspect, the second device receives second information sent by the second processor core, the second information including: information or physical topology information indicating whether the first mode is enabled, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0037] In the above implementation, the information used to indicate whether the first mode is enabled can inform some processor cores in the first and second processor cores that they have the ability to enter the first mode. The physical topology information can help the second device better manage memory access.

[0038] In an alternative implementation of the second aspect, the second device sends a second instruction to instruct a first code segment to exit the first mode and enter a second mode, wherein the physical address of memory in the second mode is a global address.

[0039] The above implementation provides an exit mechanism for the first mode, whereby the second device controls the processor core in the second processor core to exit the second mode and enter the first mode, which is the normal mode, via a second instruction. In this first aspect, the second device can control the mode selection of the second processor core without relying on a third-party OS system, allowing for more direct and efficient management of the processor core's memory access process.

[0040] A second aspect of this application provides a communication device including a transceiver unit and a processing unit for performing all or part of the operations described in the first or second aspect. The communication device may be a server, or a component within a server for performing related operations, such as a line card or interface board, or a chip system for performing related operations, wherein the chip system may include one or more chips. When the communication device is a chip system, the acquisition module may be, for example, the interface circuit of the chip, and the processing unit may be, for example, the processing circuit of the chip.

[0041] For example, when the communication device is the first device in the first aspect, the transceiver unit is configured to acquire a first instruction from the first processor core, the first instruction being used to instruct switching to a first mode, in which the first processor core manages memory access; the processing unit is configured to switch to the first mode; the transceiver unit is further configured to acquire first information configured by the first processor core, the first information including the physical address of the first memory, the physical address of the first memory being a local address; the processing unit is further configured to access the first memory according to the first information.

[0042] In one alternative implementation of the third aspect, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0043] In one alternative implementation of the third aspect, the physical address of the first-level cache and the physical address of the second-level cache correspond to the local memory space, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0044] In an alternative implementation of the third aspect, the processing unit is further configured to: perform atomic operations based on the physical address of the three-level cache to synchronize task information.

[0045] In an optional implementation of the third aspect, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0046] In an alternative implementation of the third aspect, the transceiver unit is further configured to: send second information to the first processor core, the second information including: information or physical topology information indicating whether the first mode is enabled, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0047] In an optional implementation of the third aspect, the transceiver unit is further configured to: obtain a second instruction from the first processor core, the second instruction being used to instruct a first code segment, the first code segment indicating exiting the first mode and entering a second mode, the physical address of memory in the second mode being a global address; and call the processing unit to run the first code segment.

[0048] For example, when the communication device is the second device in the second aspect, the transceiver unit is used to send a first instruction to the second processor core, the first instruction being used to instruct the second processor core to switch to a first mode, in which the first processor core manages access to the memory of the second processor core; the processing unit is used to configure first information, the first information including the physical address of the first memory, the physical address of the first memory being a local address; the transceiver unit is also used to send the first information to the second processor core.

[0049] In one alternative implementation of the third aspect, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0050] In one alternative implementation of the third aspect, the physical address of the first-level cache and the physical address of the second-level cache correspond to the local memory space, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0051] In an optional implementation of the third aspect, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0052] In an alternative implementation of the third aspect, the transceiver unit is further configured to: receive second information sent by the second processor core, the second information including: information or physical topology information indicating whether the first mode is enabled, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0053] In an alternative implementation of the third aspect, the transceiver unit is further configured to: send a second instruction to instruct a first code segment to exit the first mode and enter a second mode, wherein the physical address of memory in the second mode is a global address.

[0054] Fourthly, embodiments of this application provide a communication device, including: a processor coupled to a memory for storing instructions, wherein when the instructions are executed by the processor, the processor implements the method described in the first aspect or any possible implementation of the first aspect, or the processor implements the method described in the second aspect or any possible implementation of the second aspect.

[0055] Fifthly, embodiments of this application provide a computer-readable storage medium having instructions stored thereon, which, when executed, cause a computer to perform the method described in the first aspect or any possible implementation of the first aspect, or cause a computer to perform the method described in the second aspect or any possible implementation of the second aspect.

[0056] In a sixth aspect, embodiments of this application provide a computer program product, which includes computer program code. When the computer program code is run on a computer, it causes the computer to perform the method described in the first aspect or any possible implementation of the first aspect, or causes the computer to perform the method described in the second aspect or any possible implementation of the second aspect.

[0057] In a seventh aspect, embodiments of this application provide a chip, including: a processor coupled to a memory for storing instructions, wherein when the instructions are executed by the processor, the chip causes the chip to implement the method described in the first aspect or any possible implementation of the first aspect, or causes the chip to implement the method described in the first aspect or any possible implementation of the first aspect.

[0058] The technical effects of any of the implementation methods in aspects three through seven can be found in the technical effects of the first or second aspects and their implementation methods mentioned above, and will not be repeated here. Attached Figure Description

[0059] Figure 1 A system architecture diagram provided for an embodiment of this application;

[0060] Figure 2 Another system architecture diagram provided for embodiments of this application

[0061] Figure 3 This is a flowchart of method 100;

[0062] Figure 4 A schematic diagram of the architecture of a many-core processor provided for an embodiment of this application;

[0063] Figure 5 This is a schematic diagram of the structure of the second processor core provided in an embodiment of this application;

[0064] Figure 6 A schematic diagram of the accelerator core memory access path provided in the embodiments of this application;

[0065] Figure 7 A schematic diagram of a memory access interaction path provided for an embodiment of this application;

[0066] Figure 8 A schematic diagram of a global lock and a cluster lock provided in an embodiment of this application;

[0067] Figure 9 A schematic diagram of one embodiment provided for the purposes of this application;

[0068] Figure 10 This is a schematic diagram of a communication device provided in an embodiment of this application.

[0069] Figure 11 This is yet another structural schematic diagram of a communication device provided in an embodiment of this application. Detailed Implementation

[0070] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.

[0071] References to "one embodiment" or "some embodiments" in this application mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Different embodiments in this application can be reasonably combined to a certain extent. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized.

[0072] The terms "system" and "network" in the embodiments of this application are used interchangeably. "Including" means "including but not limited to." When A includes multiple elements or situations, A can be one or more of those elements or situations. For example, if A includes B or C, then A can be B, A can be C, and A can also be B and C. "At least one" means one or more, and "more" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, "at least one of A, B, and C" includes A, B, C, AB, AC, BC, or ABC. Furthermore, unless otherwise specified, the ordinal numbers such as "first" and "second" mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the order, sequence, priority or importance of multiple objects.

[0073] It is understood that in this application, "instruction" can include direct instruction, indirect instruction, explicit instruction, and implicit instruction. When describing a certain instruction information to indicate A, it can be understood that the instruction information carries A, directly indicates A, or indirectly indicates A.

[0074] The following explanations are provided for some of the terms used in the embodiments of this application.

[0075] 1. Many-core processor

[0076] A many-core processor is a computing device that integrates a large number of processing cores on a single chip. These processors are designed to improve performance through parallelization, making them particularly suitable for highly parallel tasks. Many-core processors typically have dozens or even hundreds of cores, giving them powerful concurrent execution capabilities. Each core can independently run different programs or different parts of the same program. To effectively manage and schedule the communication needs between so many cores, the concept of on-chip networking is introduced in many-core architectures. This is an interconnection method based on message passing, which allows data packets to be transmitted between network nodes, thereby achieving efficient data exchange and resource sharing. Since power consumption is one of the important factors restricting the development of high-performance computers, many-core processors often adopt a low-voltage, high-frequency operating mode and have the ability to dynamically adjust their operating state to reduce energy consumption. In addition, they may be equipped with a dedicated energy management unit for global energy efficiency control.

[0077] Many-core processors can be applied in high-performance computing fields, such as scientific simulations and weather forecasting. High-performance computing relies on a large amount of floating-point computing resources, and many-core processors provide the hardware foundation required for such intensive tasks. For example, supercomputers often use GPUs as coprocessors to accelerate specific types of workloads. Many-core processors can also be applied to data centers and servers. With the development of internet services, data centers face increasingly higher request throughput requirements. Many-core CPU / GPU combinations can significantly reduce the number of servers without affecting service quality, thereby saving costs and improving environmental impact. Many-core processors can also be applied to graphics and image processing and machine learning. Graphics rendering requires the rapid completion of complex geometric transformations and pixel shading operations, which are precisely where many-core processors excel. Similarly, training neural network models involves highly parallel operations such as matrix multiplication, so using a many-core platform can greatly shorten the iteration cycle.

[0078] 2. Cache

[0079] Caches are typically divided into three levels: Level 1 Cache (L1 Cache), Level 2 Cache (L2 Cache), and Level 3 Cache (L3 Cache). These three levels of cache are designed to optimize data access speed and processor performance through hierarchical storage. Their main functions and characteristics are as follows:

[0080] This three-level cache is designed to optimize data access speed and processor performance through hierarchical storage.

[0081] ①L1 Cache

[0082] Location: Directly integrated within the processor core.

[0083] Size: Usually small (typically 16KB to 128KB).

[0084] Speed: Fastest speed, lowest latency.

[0085] Function: Stores the most frequently used data and instructions to improve processor access speed.

[0086] ②L2 Cache

[0087] Location: Usually also within the processor, but may be shared with the core.

[0088] Size: Larger than L1 (typically 256KB to 2MB).

[0089] Speed: Slower than L1, but still faster than main memory.

[0090] Function: Stores infrequently used but still frequently accessed data and instructions, serving as a supplement to L1.

[0091] ③L3 Cache

[0092] Location: Typically a cache shared by multiple cores, located on the processor chip.

[0093] Size: Larger (typically 2MB to 64MB or more).

[0094] Speed: Slower than L2, but still faster than main memory.

[0095] Function: Stores data that is accessed less frequently, reducing access to main memory and improving the efficiency of multi-core processors.

[0096] The system architecture on which the embodiments of this application are based is illustrated below.

[0097] To facilitate understanding of the embodiments of this application, Figure 1 A possible, non-limiting system schematic diagram is shown. Figure 1 The description is a diagram of a many-core processor system architecture, which consists of hundreds of independent CPU cores.

[0098] Figure 2 This illustrates yet another possible, non-limiting system diagram. Figure 2 It describes the detailed composition of the Core, including but not limited to the important L1 / L2 cache and the interrupt controller for inter-Core communication, while the CPU bus network also adds L3 cache, controller and Double Data Rate (DDR) memory modules.

[0099] Understandable Figure 1 It is a general framework diagram of a many-core processor system, and Figure 2 It is aimed at Figure 1 The composition of one or more cores in a many-core accelerator is described.

[0100] It should be noted that, Figure 1 This application describes the processor core in a many-core processor as a CPU core, but it does not limit the processor core in a many-core processor to a CPU core. It can also be other processor cores. Furthermore, with the continuous evolution of chip technology, this application can also be applied to future many-core processors.

[0101] As chip manufacturing processes have evolved, more and more transistors can be accommodated in the same size chip. Therefore, general-purpose processors have introduced more and more processor cores to increase processing power. For example, there are now processors with more than 100 CPU cores, which can be configured in servers. Processors containing many simpler, independent processor cores (ranging from dozens to thousands or more) are called many-core processors. Many-core processors are a special type of multi-core processor designed for highly parallel processing and are widely used in embedded computers and high-performance computing.

[0102] Many-core processors typically contain at least dozens of processor cores, all managed by the operating system. Synchronization between processor cores and shared memory access require OS intervention. Because OS management takes time, the computing power and other resources of the processor cores cannot be utilized during this period, resulting in low resource utilization for many-core processors.

[0103] Therefore, how to improve the resource utilization of many-core processors is an urgent technical problem to be solved.

[0104] To address the aforementioned technical problems, this application proposes a data processing method 100. In method 100, a first processor core can manage memory access of a first device. For example, the first device may be configured with a many-core processor, and the first processor core can be one of the processor cores in the many-core processor. The first processor core can manage memory access of other processor cores within the first device. This application eliminates the need for a third-party OS system to manage memory access; instead, it directly manages memory access through the processor core, saving time spent on memory access management and thus reducing the time when processor cores' computing power and other resources are unavailable, improving the resource utilization of the many-core processor. Furthermore, the first processor core is configured with first information, which includes the physical address of the first memory. That is, the physical address of the memory in the first aspect is visible to the processor cores in the first device. The first processor core can explicitly control memory access by other processor cores, enabling more efficient use of memory resources and improving the utilization rate of memory resources by the many-core processor. From these two aspects, it is clear that this application can improve the resource utilization rate of the many-core processor.

[0105] The following is combined Figure 3 The method 100 provided in the embodiments of this application will be described in detail below. Optionally, when Figure 3 The method shown is applied to Figure 1 When the system is shown, Figure 3 The first device in the process can be a device equipped with a many-core processor, when Figure 3 The method shown is applied to Figure 2 When the system is shown, Figure 3The first device in the series can be a device configured with a processor core, CPU bus interconnect network, L3 cache, controller, and L1 DDR memory. Figure 2 The method is illustrated using the first device as the execution subject of the interaction illustration, but this application does not limit the execution subject of the interaction illustration. For example, the first device in S301-S304 and related implementations can also be replaced by a chip, chip system, or processor that supports the first device in implementing the method, or it can be replaced by a logic module or software that can implement all or part of the functions of the first device.

[0106] like Figure 3 As shown, the method 100 provided in this application embodiment includes the following steps:

[0107] S301, the first device obtains a first instruction from the first processor core, and accordingly, the first processor core sends the first instruction.

[0108] The first instruction is used to instruct a switch to a first mode, in which the first processor manages memory access.

[0109] Optionally, the first device in this application can be a server, computer, or other similar device. For example, the first device can be an embedded computer or a high-performance computer. This first device is equipped with a many-core processor.

[0110] Optionally, the first processor core may be a processor core in the first device.

[0111] Optionally, the first device may also include other processor cores besides the first processor core. The following description takes the processor core that switches to the first mode in the first device as the second processor core as an example.

[0112] It should be noted that method 100 is described using the first device as the execution subject as an example. More specifically, method 100 and its related implementations can be executed by the second processor core.

[0113] It should be noted that the first mode is a new mode introduced in this application. In this mode, the first processor core manages memory access. Under this memory access management method, since it does not require the use of a third-party OS system, the second processor core accesses memory faster. Therefore, the first mode can also be called the accelerator mode.

[0114] Optionally, a typical application scenario of this application is that the processor core (e.g., the first processor core and the second processor core) is a CPU core. This application does not limit the processor core in the many-core processor to be a CPU core, but can also be other processor cores. Furthermore, with the continuous evolution of chip technology, this application can also be applied to future many-core processors.

[0115] Please see Figure 4 , Figure 4 An architecture diagram of the many-core processor in this application is provided. Figure 4 The description is based on a many-core CPU, in which the processor cores are divided into two categories. One category is the processor cores in the host operating system, where the first processor core can be one or more of the processor cores in the host operating system. The second category is the processor cores in the core cluster in the first mode. When the second processor core enters the first mode, the second processor core can belong to the core cluster in the first mode.

[0116] Optionally, the host operating system still runs on the processor core in normal mode (i.e., the second mode). The processor core in the host operating system (e.g., the first processor core) will have the runtime software stack of the first mode core cluster added to it. This software stack includes drivers, system programming environment, task orchestration, scheduling, distribution and execution, etc.

[0117] In one example, the first processor core is programmed in this application, meaning that hardware is programmed. By programming the first processor core, it gains the ability to manage the second processor core; this can be understood as the first processor core containing software that manages the second processor core.

[0118] Based on the above example, this application programs the hardware so that the hardware core and the software code are bound together and not decoupled, allowing for direct scheduling and making the management of the second processor core more efficient.

[0119] Optionally, in addition to managing the memory access of the second processor core in the first device, the first processor core can also manage the memory sharing, memory access, and communication between the second processor core and other processor cores.

[0120] Prior to S301, the second processor core in the first device needs to send some reporting information to the first processor core so that the first processor core can manage the second processor core. For details, please refer to the following implementation:

[0121] In one alternative implementation, the first device sends second information to the first processor core, the second information including: information indicating whether the first mode is enabled or physical topology information, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0122] In the above implementation, the information used to indicate whether the first mode is enabled can inform the first processor core that some processor cores in the first device have the ability to enter the first mode. The physical topology information can help the first processor core to better manage memory access.

[0123] Optional, please refer to Figure 5 The second processor core can be configured with an accelerator mode status register, which is used to report information on whether the first mode is enabled and / or physical topology information.

[0124] Optionally, the second processor core in the first device may send the second information to the first processor core.

[0125] Understandably, through the second information, the second processor core in the first device can negotiate with the first processor core whether to enable the first mode. When the second processor core enables the first mode, the first processor core can send the first instruction to the first device.

[0126] During the execution of step S301, this application provides several optional implementation methods, as follows:

[0127] Optionally, in S301, the first instruction of the first processor core can be obtained by the second processor core in the first device.

[0128] Optionally, the first processor core can send the enter_acc code to the second processor core, and then send a first instruction to the second processor core. This first instruction is used to instruct a specific code segment in the enter_acc code. By running the specific code segment in the enter_acc code, the first processor core can enter the first mode.

[0129] S302, The first device switches to the first mode;

[0130] Optionally, the first device can switch from the second mode to the first mode. The difference between the second mode and the first mode is that the memory access address is globally visible in the second mode, while the memory access address is locally visible in the first mode.

[0131] Optionally, when executing S302, as previously described, the first processor core can enter the first mode by running a specific code segment in the enter_acc code.

[0132] Optionally, after the second processor core of the first device enters the first mode, it will enter the waitfor_setup state, i.e., wait for the first processor core to build the accelerator hardware and software environment. In the first mode, the first device will flush the cache of each level of the second processor core to system memory. After the enter_acc code finishes running, the second processor core is in a static state, waiting for the first processor core to send an interrupt to wake it up and run the accelerator.

[0133] Optionally, the first processor core can build a hardware and software environment based on Basic Input Output System (BIOS) or firmware read-only memory (ROM) table information and user interaction input information.

[0134] Optionally, when building the accelerator hardware and software environment, the first processor core needs to undergo a series of configuration tasks, as shown in the following example:

[0135] For example, the first processor core needs to set up pathways for each path.

[0136] For example, the first processor core can configure the routing table of the second processor core according to the intent requirements.

[0137] For example, the first processor core needs to configure information about the CPU bus interconnect network.

[0138] For example, the first processor core needs to be configured with the first information.

[0139] The first information in this application is described below:

[0140] ① In one optional implementation, the first information includes the physical address of the first memory, which is a local address.

[0141] It should be noted that the first memory refers to the memory in the first device.

[0142] Optionally, the physical address in this application is an exemplary term and can be replaced with any possible term, such as mapped address, address mapping, memory-mapped address, etc.

[0143] It should be noted that "the physical address of the first memory is a local address" means that the physical address of the first memory is locally visible. For example, the first memory is the memory of the second processor core, and the second memory is the memory of other processor cores. The first memory and the second memory share the same physical address 0x0000. The second processor core will access the first memory through the physical address 0x0000, but will not access the second memory.

[0144] In one alternative implementation of the first aspect, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0145] Based on the above implementation, the physical address of the L1 cache, L2 cache, or L3 cache in this application are all visible to the processor core in the first device. The processor core in the first device can explicitly use the L1 cache, L2 cache, and L3 cache to avoid resource usage restrictions, such as the problem of cache resource usage restrictions caused by the opacity of cache resources.

[0146] In one alternative implementation, the physical address of the first-level cache and the physical address of the second-level cache correspond to local memory spaces, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0147] Based on the above implementation, the first device can directly access the local memory space through the physical addresses of the first-level cache and the second-level cache. That is, in the first aspect, the physical addresses of the first-level cache and the second-level cache are locally visible, while the physical address of the third-level cache is a clustered address, and the physical address of the third-level cache is cluster-visible. This memory address setting in the first aspect can save the overhead of the cache coherence protocol.

[0148] Optionally, the first information may also include: the physical address of High Bandwidth Memory (HBM), the physical address of DDR Memory, the physical address of Communication Memory, and the physical address of Code Memory.

[0149] Optionally, the physical addresses of HBM Memory, DDR Memory, Communication Memory, and Code Memory are all globally visible.

[0150] Optionally, the first information may also include: memory size and memory characteristics, such as the memory size of L1 Cache Memory, the memory size of L2 Cache Memory, the characteristics of L1 Cache Memory, and so on.

[0151] Please refer to Table 1, which provides an example of some of the information included in the first part of the information:

[0152] Table 1

[0153]

[0154] ② In an optional implementation, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0155] In the above implementation, the "identifier of the processor core that switched to the first mode" in the first information can notify the first device which processor cores should enter the first mode. The "identifier of the processor cores in the same cluster" can help the first device to perform clustering. The information such as the capacity and latency of the first-level cache, second-level cache, and third-level cache in the first information can help the processor cores in the first device determine the appropriate cache space and improve the caching performance of the data in the processor cores of the first device.

[0156] S303, the first device obtains the first information configured by the first processor core; accordingly, the first processor core configures the first information, and the first processor core sends the first information to the first device.

[0157] The first information includes the physical address of the first memory, which is a local address.

[0158] Optionally, after the first processor core has configured the first information, it can send an instruction to the second processor core. This instruction is used to instruct a specific code segment, and by executing the code segment, the first device can obtain the first information.

[0159] S304. The first device accesses the first memory based on the first information.

[0160] Optionally, the first processor core can send a memory access instruction to the second processor core. This instruction instructs the first device to access the first memory based on its physical address. For example, if the physical address of the L1 Cache Memory is in the range of 0x0000-0x1ffff, the first device can directly access the L1 Cache based on this physical address, and the access attribute is determined by the accessed address.

[0161] Optionally, for the accelerator runtime environment, this application will provide management interfaces for memory access at various levels, such as application programming interfaces (APIs) like malloc.l1 / free.l1, while the underlying specific addresses are managed by the runtime.

[0162] Optional, please refer to Figure 6 , Figure 6This is a schematic diagram of the accelerator core memory access path. Figure 6 Core0, Core1, and Core2 in the example are secondary processor cores. Secondary processor cores access code segments in Code Memory and can access L1 Cache Memory, L2 Cache Memory, and other memory.

[0163] Optional, please refer to Figure 7 , Figure 7 This refers to the memory access interaction path between the host operating system's processor core and the processor core in the first mode. The first processor core is the processor core in the host operating system, and the code is managed and allocated to the second processor core by the accelerator runtime of the first processor core. The second processor core accesses this code in code memory.

[0164] In one alternative implementation of the first aspect, the first device performs an atomic operation based on the physical address of the three-level cache to synchronize task information.

[0165] In the above implementation, the processor cores of the first device can achieve inter-core synchronization, which can realize the synchronization of task information more efficiently.

[0166] Optionally, when the second processor core of the first device accesses the first memory, the second processor core also performs inter-core task information synchronization.

[0167] In one example, the second processor core can implement a synchronization barrier function based on the L3 Cache Memory. That is, the second processor core implements the lock function based on the physical address of the L3 Cache Memory. This method provides a low-latency synchronization mechanism within the cluster and can accelerate the barrier function between cores.

[0168] In another example, the second processor core can implement a synchronization barrier function based on HBM Memory, DDR Memory, or Communication Memory, etc. That is, the second processor core implements the locking function based on the physical address of HBM Memory, DDR Memory, or Communication Memory, etc. This method will have higher latency than communication within the cluster.

[0169] Optionally, the two examples above can together constitute the functionality for implementing the synchronization lock in this application; please refer to [link / reference]. Figure 8Local cluster locking can be implemented through L3Cache Memory, while global locking can be implemented through HBM Memory, DDR Memory, or Communication Memory. Together, they form a hierarchical locking mechanism.

[0170] In addition to the mechanism for entering the first mode, this application also provides a mechanism for exiting the first mode, the specific implementation of which can be found in the following method:

[0171] In one alternative implementation, the first device obtains a second instruction from the first processor core, the second instruction being used to instruct a first code segment, the first code segment instructing an exit from the first mode and entry into a second mode, the physical address of memory in the second mode being a global address; the first device executes the first code segment.

[0172] The above implementation provides an exit mechanism for the first mode, whereby the first processor core controls the processor core in the first device to exit the second mode and enter the first mode, which is the normal mode, via a second instruction. In this first aspect, the first processor core can control the mode selection of the first device without relying on a third-party OS system, allowing for more direct and efficient management of the processor core's memory access process.

[0173] Optionally, the aforementioned second instruction is used to indicate exiting the second mode, i.e., accelerator mode. Therefore, the second instruction can be called the `exit_acc` instruction, which is executed on the accelerator mode CPU during accelerator model runtime. This instruction is used to indicate exiting the first mode and returning to the second mode. After executing the instruction, the second processor core will be in the `waitfor_exit` state. In this state, the second processor core will wait for the host CPU to deconstruct the hardware and software environment of the first mode, and then the first processor core will send an interrupt to wake up the processor and start executing code from the specified interrupt vector.

[0174] The above method 100 will be illustrated with specific examples below:

[0175] In one instance, taking method 100 executed by a second processor core in the first device as an example, please refer to [reference needed]. Figure 9 , Figure 9 A flowchart illustrating one example provided in this application. Figure 9 The specific process includes:

[0176] S901, the second processor core sends a second message to the first processor core;

[0177] The second information includes: information or physical topology information indicating whether the first mode is enabled, wherein the physical topology information includes: the access speed of the first memory or the capacity of the first memory.

[0178] Optionally, the second processor core can be configured with an accelerator status register, which is used to send second information to the first processor core.

[0179] S902, the second processor core fetches the first instruction;

[0180] The first instruction is used to instruct a switch to a first mode, in which the first processor manages memory access.

[0181] Optionally, the first instruction can be a code segment sent from the first processor core to the second processor core. By executing this code segment, the second processor core will switch to the first mode.

[0182] S903, the second processor core executes the first instruction;

[0183] Optionally, as mentioned earlier, after the second processor core executes the first instruction, it switches to the first mode. Once in first mode, the second processor core enters the waitfor_setup state, waiting for the first processor core to build the accelerator hardware and software environment. In first mode, the first device flushes the caches at each level of the second processor core to system memory. After the enter_acc code finishes running, the second processor core is in a static state, waiting for the first processor core to send an interrupt to wake it up and resume accelerator operation.

[0184] S904, the first processor core is configured with the first mode of hardware and software environment;

[0185] The hardware and software environment configured in the first processor core configuration first mode includes configuring the first processor core with first information or configuring the routing table of the second processor core, setting the pathways for each path or configuring the information of the CPU bus interconnection network.

[0186] The first information includes the physical address of the first memory, which is a local address.

[0187] Optionally, the physical address of the first memory may include: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0188] Optionally, the physical address of the first-level cache and the physical address of the second-level cache correspond to the local memory space, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0189] S905, the second processor core obtains the first information;

[0190] S906, the second processor core accesses the first memory based on the first information;

[0191] Optionally, the first processor core can send a memory access instruction to the second processor core. This instruction instructs the first device to access the first memory based on its physical address. For example, if the physical address of the L1 Cache Memory is in the range of 0x0000-0x1ffff, the first device can directly access the L1 Cache based on this physical address, and the access attribute is determined by the accessed address.

[0192] In the S907, the second processor core performs atomic operations based on the physical address of the L3 cache to synchronize task information.

[0193] Optionally, S906 can be performed synchronously with S907. The first device can synchronize task information during the process of accessing the first memory, that is, implement the synchronization barrier function.

[0194] Optionally, the second processor core can also implement synchronization barrier functionality based on HBM Memory, DDR Memory, or Communication Memory.

[0195] S908, the second processor core receives the second instruction sent by the first processor core.

[0196] The second instruction is used to instruct the first code segment, which instructs to exit the first mode and enter the second mode, where the physical address of memory in the second mode is a global address.

[0197] The S909's second processor core runs this first code segment.

[0198] Understandably, by running this first code segment, the second processor core exits the first mode and enters the second mode.

[0199] Therefore, it can be seen that the memory address of the L1 cache is locally visible. Based on this, this application proposes an interoperability method between the processor core of the host operating system and the processor core in the first mode. This application also proposes an address mapping structure for different memory of the processor core in the first mode. Specifically, in this application, the processor core in the first mode adds L1 / L2Cache as Memory access space mapping, adds L3Cache as Shared Memory access space mapping, and adds inter-core synchronization function.

[0200] This application supports an accelerator mode, or first mode, on many-core processors, which allows users to explicitly use the cores in first mode, including their L1 / L2 cache, system L3 cache, and CPU bus NoC network resources. This avoids the limitations of using various resources in normal CPU mode, such as the opacity of cache resources and the uncontrollable NoC network traffic.

[0201] This application also proposes how the processor core in the first mode implements cache access, shared memory, synchronous communication, and interrupt functions. In this application, the processor core can explicitly access L1 / L2 / L3 cache memory, and can flexibly switch between the first and second modes, improving resource efficiency and utilization. This application can explicitly control the memory access of many-core processors, efficiently utilizing L1 / L2 / L3 cache. The processor core in the first mode can implement customized near and far cores and efficient synchronization mechanisms between clusters.

[0202] Accordingly, embodiments of this application also provide related apparatus for implementing the above-described solutions. For details, please refer to... Figure 10 , Figure 10 This is a schematic diagram of a communication device provided in an embodiment of this application. Figure 10 The communication device 300 can be a chip, chip system, or processor used to support the communication device in implementing the method; alternatively, the communication device can also be a logical configuration item, logical module, or software used to implement all or part of the functions of the communication device. For example... Figure 10 As shown, the communication device 300 includes a transceiver unit 301 and a processing unit 302.

[0203] For example, when the communication device is the first device in method 100, the transceiver unit 301 is used to obtain a first instruction from the first processor core, the first instruction being used to instruct switching to a first mode, in which the first processor core manages memory access; the processing unit 302 is used to switch to the first mode; the transceiver unit 301 is also used to obtain first information configured by the first processor core, the first information including the physical address of the first memory, the physical address of the first memory being a local address; the processing unit 302 is also used to access the first memory according to the first information.

[0204] In one alternative implementation, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0205] In one alternative implementation, the physical address of the first-level cache and the physical address of the second-level cache correspond to local memory spaces, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0206] In an alternative implementation, the processing unit 302 is further configured to: perform atomic operations based on the physical address of the three-level cache to synchronize task information.

[0207] In one optional implementation, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0208] In one optional implementation, the transceiver unit 301 is further configured to: send second information to the first processor core, the second information including: information for indicating whether the first mode is enabled or physical topology information, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0209] In one optional implementation, the transceiver unit 301 is further configured to: obtain a second instruction from the first processor core, the second instruction being used to instruct a first code segment, the first code segment indicating exiting the first mode and entering a second mode, the physical address of memory in the second mode being a global address; and call the processing unit 302 to run the first code segment.

[0210] For example, when the communication device is configured with the first processor core in method 100, the transceiver unit 301 is used to send a first instruction to the second processor core, the first instruction being used to instruct the second processor core to switch to a first mode, in which the first processor core manages the access to the memory of the second processor core; the processing unit 302 is used to configure first information, the first information including the physical address of the first memory, the physical address of the first memory being a local address; the transceiver unit 301 is also used to send the first information to the second processor core.

[0211] In one alternative implementation, the physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

[0212] In one alternative implementation, the physical address of the first-level cache and the physical address of the second-level cache correspond to local memory spaces, and the physical address of the third-level cache corresponding to the same cluster is the same.

[0213] In one optional implementation, the first information further includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

[0214] In one optional implementation, the transceiver unit 301 is further configured to: receive second information sent by the second processor core, the second information including: information or physical topology information indicating whether the first mode is enabled, the physical topology information including: the access speed of the first memory or the capacity of the first memory.

[0215] In one alternative implementation, the transceiver unit 301 is further configured to: send a second instruction, the second instruction being used to instruct a first code segment, the first code segment indicating exiting the first mode and entering a second mode, wherein the physical address of memory in the second mode is a global address.

[0216] It should be noted that the information interaction and execution process between the modules / units in the communication device are different from those in this application. Figure 3 The corresponding method embodiments are based on the same concept, and the details can be found in the descriptions of the method embodiments shown above in this application, which will not be repeated here.

[0217] Please see Figure 11 , Figure 11 This is a schematic diagram of the logical structure of a communication device 40 provided in an embodiment of this application. Figure 11 The communication device 40 in the middle can be deployed with Figure 3 The communication device described in the corresponding embodiment is used to implement Figure 3 The communication device 40 in the corresponding embodiment performs the following functions. It includes a memory 401, a processor 402, a communication interface 403, and a bus 404. The memory 401, processor 402, and communication interface 403 are interconnected via the bus 404.

[0218] The memory 401 can be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 401 can store programs. When the program stored in the memory 401 is executed by the processor 402, the processor 402 and the communication interface 403 are used to execute S301-S304 of the above-described data processing method embodiment.

[0219] The processor 402 may be a central processing unit (CPU), microprocessor, application-specific integrated circuit (ASIC), graphics processing unit (GPU), digital signal processor (DSP), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any combination thereof, for executing related programs to implement one or more steps in S301-S304 of the data processing method embodiments of this application, or to implement one or more steps in S301-S304 of the data processing method embodiments of this application. The steps of the data processing method disclosed in conjunction with the embodiments of this application can be executed by a compiler and an executor, wherein the compiler and executor can be executed by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 401. Processor 402 reads information from memory 401 and performs one or more steps in S301-S304 of the data processing method embodiment in this application in conjunction with its hardware.

[0220] The communication interface 403 uses transceiver devices, such as, but not limited to, transceivers, to enable communication between the communication device 40 and other devices or communication networks.

[0221] Bus 404 enables the transmission of information between various components of computer device 40 (e.g., memory 401, processor 402, and communication interface 403). Bus 404 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 11 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0222] It should be noted that the information interaction and execution process between the various modules / units in the controller are different from those in this application. Figure 3The corresponding method embodiments are based on the same concept, and the details can be found in the descriptions of the method embodiments shown above in this application, which will not be repeated here.

[0223] This application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions, capable of running on a computing device or stored on any usable medium. When the computer program product is run on at least one computer device, it causes the at least one computer device to perform the aforementioned actions. Figure 3 The method described in the illustrated embodiment.

[0224] This application also provides a computer-readable storage medium. The computer-readable storage medium can be any usable medium that a computing device can store, or a data storage device such as a data center containing one or more usable media. The usable medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that instruct the computing device to perform the aforementioned operations. Figure 3 The method described in the illustrated embodiment.

[0225] The communication device provided in this application embodiment can specifically be a chip, which includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuits. The processing unit can execute computer execution instructions stored in the storage unit to cause the chip to perform the above-mentioned operations. Figure 3 The method described in the illustrated embodiment. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit can also be a storage unit located outside the chip within the wireless access device, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

[0226] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the accompanying drawings of the device embodiments provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0227] Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments of this application can be implemented by means of software plus necessary general-purpose hardware, or by special-purpose hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, dedicated components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can be diverse, such as analog circuits, digital circuits, or dedicated circuits. However, for the embodiments of this application, software program implementation is more often a better implementation method. Based on this understanding, the technical solution of the embodiments of this application, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods of the various embodiments of this application.

[0228] In the above embodiments, the implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, in the form of a computer program product.

[0229] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)).

[0230] In the various embodiments of this application, unless otherwise specified or in case of logical conflict, the terminology and / or descriptions between different embodiments are consistent and can be referenced by each other. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships.

Claims

1. A data processing method, characterized by, include: Obtain a first instruction from the first processor core, the first instruction being used to instruct switching to a first mode, in which the first processor core manages memory access; Switch to the first mode; Obtain first information about the configuration of the first processor core, the first information including the physical address of the first memory, the physical address of the first memory being a local address; The first memory is accessed based on the first information.

2. The method of claim 1, wherein, The physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

3. The method of claim 2, wherein, The physical addresses of the first-level cache and the second-level cache correspond to local memory spaces, and the physical addresses of the third-level cache corresponding to the same cluster are the same.

4. The method according to claim 2 or 3, characterized in that, After obtaining the first information regarding the configuration of the first processor core, the method further includes: Atomic operations are performed based on the physical address of the three-level cache to synchronize task information.

5. The method according to any one of claims 1 to 4, characterized in that, The first information also includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

6. The method according to any one of claims 1 to 5, characterized in that, Before obtaining the first instruction of the first processor core, the method further includes: Send a second message to the first processor core. The second message includes information or physical topology information indicating whether the first mode is enabled. The physical topology information includes the access speed of the first memory or the capacity of the first memory.

7. The method according to any one of claims 1 to 6, characterized in that, After accessing the first memory based on the first information, the method further includes: Obtain a second instruction from the first processor core, the second instruction being used to instruct a first code segment, the first code segment indicating exiting the first mode and entering the second mode, the physical address of memory in the second mode being a global address; Run the first code segment.

8. A data processing method, characterized by, include: Send a first instruction to the second processor core, the first instruction being used to instruct the second processor core to switch to a first mode, in which the first processor core manages the access to the memory of the second processor core; Configure first information, which includes the physical address of the first memory, and the physical address of the first memory is a local address; The first information is sent to the second processor core.

9. The method of claim 8, wherein, The physical address of the first memory includes: the physical address of the L1 cache, the physical address of the L2 cache, or the physical address of the L3 cache.

10. The method of claim 9, wherein, The physical addresses of the first-level cache and the second-level cache correspond to local memory spaces, and the physical addresses of the third-level cache corresponding to the same cluster are the same.

11. The method according to any one of claims 8 to 10, characterized in that, The first information also includes: the identifier of the processor core switched to the first mode, the identifier of the processor cores in the same cluster, the identifier of the processor cores sharing the L3 cache, the capacity of the shared L3 cache, the capacity of the L1 cache, the latency of the L1 cache, and the capacity or latency of the L2 cache.

12. The method according to any one of claims 8 to 11, characterized in that, Before sending the first instruction to the second processor core, the method further includes: The system receives second information sent by the second processor core. The second information includes information or physical topology information indicating whether the first mode is enabled. The physical topology information includes the access speed of the first memory or the capacity of the first memory.

13. The method according to any one of claims 8 to 12, characterized in that, After sending the first information to the second processor core, the method further includes: Send a second instruction, which is used to instruct a first code segment, the first code segment indicating to exit the first mode and enter the second mode, the physical address of memory in the second mode being a global address.

14. A communications device, characterized by include: Communication interface and processor; The communication interface and the processor perform the method as described in any one of claims 1 to 13.

15. A communications device, characterized by include: A transceiver unit, configured to perform the transceiver operation in the method according to any one of claims 1 to 13; A processing unit is configured to perform operations other than the transmit / receive operations in the method of any one of claims 1 to 13.

16. A computer-readable storage medium, characterized in that, The medium stores instructions that, when executed by a processor, implement the method of any one of claims 1 to 13.

17. A computer program product, characterised in that, Includes instructions that, when executed on a processor, perform the method as described in any one of claims 1 to 13.

18. A chip, characterized by It includes at least one processing unit and an interface circuit, the interface circuit being used to provide program instructions or data to the at least one processing unit, the at least one processing unit being used to execute the program instructions to implement the method of any one of claims 1 to 13.