Data processing method, apparatus, system, and readable medium

By checking for data equality before comparing data in an SMP system, the performance degradation of processing cores caused by ATCCOMPARE failures is resolved, resulting in more efficient data processing and energy consumption optimization.

CN119248529BActive Publication Date: 2026-06-19SANECHIPS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SANECHIPS TECH CO LTD
Filing Date
2023-06-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In an SMP system, when ATCCOMPARE fails, the data in the cache unit is invalidated, causing the processing core to need to reread the data from memory, which reduces processing performance.

Method used

When a data exchange request is received, the data is compared first. If they are not equal, the system does not initiate data listening to other cache units, but directly reads the data from the cache unit to avoid invalid data and reduce bus bandwidth overhead and chip power consumption.

Benefits of technology

It improves the performance of the processing core, reduces bus bandwidth overhead and chip power consumption, and avoids unnecessary data read operations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119248529B_ABST
    Figure CN119248529B_ABST
Patent Text Reader

Abstract

This disclosure provides a data processing method. Upon receiving a data exchange request message from a first cache unit, the method determines the corresponding first target data in memory based on the address information carried therein, and sends a notification message to the first cache unit to request the first cache unit to send second target data and third data. The third data is the data exchanged with the first target data, and the second target data is the data that the second cache unit has requested to read. If the second target data is not equal to the first target data, the method returns a data exchange response message carrying the third data to the first cache unit. Embodiments of this disclosure can reduce bus bandwidth overhead and lower chip power consumption. This disclosure also provides a data processing apparatus, a data processing system, and a readable medium.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer processing technology, and specifically to a data processing method, apparatus, system, and readable medium. Background Technology

[0002] SMP (Symmetric Multiprocessing) systems refer to server systems where multiple CPUs operate symmetrically, with no master-slave relationship between them. All CPUs share the same physical memory, and the time required for each CPU to access any address in memory is the same; therefore, SMP is also known as Uniform Memory Access (UMA).

[0003] In SMP systems, if ATCCOMPARE (Atomic Compare) fails, the HN (Home Node) does not perform data exchange operations, but the data in the cache unit is still invalidated. If the cache unit needs to access this data again later, it must re-send a request to the HN to read the data, which reduces the processing performance of the processing core corresponding to the cache unit. Summary of the Invention

[0004] This disclosure provides a data processing method, apparatus, system, and readable medium.

[0005] In a first aspect, embodiments of this disclosure provide a data processing method, including:

[0006] Receive a data exchange request message sent by the first cache unit, obtain the address information carried in the data exchange request, and determine the first target data corresponding to the address information in memory;

[0007] Send a notification message to the first cache unit;

[0008] The system receives second target data and third data sent by the first cache unit, wherein the third data is data exchanged with the first target data, and the second target data is data that the second cache unit has requested to read.

[0009] If the second target data is not equal to the first target data, a data exchange response message carrying the third data is returned to the first cache unit.

[0010] In another aspect, embodiments of this disclosure also provide a data processing apparatus, including: one or more processors; a storage device storing one or more programs thereon; when the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method as described above.

[0011] In another aspect, embodiments of this disclosure also provide a data processing system, including at least one processing chip, the processing chip including the data processing apparatus as described above.

[0012] In another aspect, embodiments of this disclosure also provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed, implements the data processing method as described above.

[0013] The data processing method provided in this disclosure, upon receiving a data exchange request message sent by a first cache unit, determines the corresponding first target data in memory based on the address information carried therein, and sends a notification message to the first cache unit to allow the first cache unit to send second target data and third data. The third data is the data exchanged with the first target data, and the second target data is the data already requested by the second cache unit. If the second target data is not equal to the first target data, a data exchange response message carrying the third data is returned to the first cache unit. This disclosure first performs data comparison and then decides whether to initiate data monitoring of other cache units. If the data comparison fails, data monitoring of other cache units is not initiated, and other cache units no longer invalidate the corresponding data. This ensures that other processing cores can use the data, thereby reducing bus bandwidth overhead and chip power consumption. Furthermore, when other processing cores access the data, they can directly read the data from the other cache unit without re-accessing memory, further reducing bus bandwidth overhead and chip power consumption. Attached Figure Description

[0014] Figure 1 This is a flowchart illustrating the data processing workflow in related technologies;

[0015] Figure 2 A schematic diagram of the data processing flow provided in the embodiments of this disclosure. Figure 1 ;

[0016] Figure 3 A schematic diagram of the data processing flow provided in the embodiments of this disclosure. Figure 2 ;

[0017] Figure 4 A schematic diagram of the ATCCOMPARE failure handling process provided in this embodiment of the disclosure;

[0018] Figure 5A schematic diagram of the successful processing flow of ATCCOMPARE provided in this embodiment of the disclosure;

[0019] Figure 6 Schematic diagram of the structure of the data processing system (SMP) provided in the embodiments of this disclosure Figure 1 ;

[0020] Figure 7 Schematic diagram of the structure of the data processing system provided in the embodiments of this disclosure Figure 2 ;

[0021] Figure 8 A schematic diagram of the structure of the data processing system (CC-NUMA) provided in the embodiments of this disclosure. Figure 3 . Detailed Implementation

[0022] Exemplary embodiments will be described more fully below with reference to the accompanying drawings; however, these exemplary embodiments may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will enable those skilled in the art to fully understand the scope of this disclosure.

[0023] As used herein, the term “and / or” includes any and all combinations of one or more related enumerated entries.

[0024] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. As used herein, the singular forms “a” and “the” are also intended to include the plural forms unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “made of” are used in this specification, the presence of the said feature, integral, step, operation, element, and / or component is specified, but the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof is not excluded.

[0025] The embodiments described herein can be described with reference to plan views and / or cross-sectional views using the ideal schematic diagrams of this disclosure. Therefore, the example illustrations can be modified according to manufacturing techniques and / or tolerances. Therefore, the embodiments are not limited to those shown in the drawings, but include modifications to configurations formed based on manufacturing processes. Therefore, the areas illustrated in the drawings are schematic in nature, and the shapes of the areas shown in the figures illustrate specific shapes of areas of an element, but are not intended to be limiting.

[0026] Unless otherwise specified, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and this disclosure, and will not be interpreted as having an idealized or overly formal meaning, unless expressly so defined herein.

[0027] The data processing method and system according to embodiments of this disclosure are applicable to any data processing system involving atomic operations, such as SMP systems, cache-coherent non-uniform memory access (CC-NUMA) systems, etc. In the following examples, an SMP system is used as an example, but embodiments of this disclosure are not limited thereto.

[0028] Figure 6 This is a schematic diagram of an SMP system architecture, such as... Figure 6 As shown, the SMP system includes: a processing core, a secondary cache (L2), a network-on-chip (NOC), a master node (HN), and memory. Both the L2 and HN are connected to the NOC, and the HN is connected to the memory. The L2 may include one or more cache units, and each cache unit may be connected to one or more processing cores. Figure 6 In the SMP system shown, L2 comprises three cache units: L2-0, L2-1, and L2-2. L2-0 is connected to processing core Core0, L2-1 to processing core Core1, and L2-2 to processing core Core2. In the SMP system, HN interacts with each cache unit of L2 to achieve cache consistency. HN manages the entire memory space in units of cache lines and tracks memory access for each L2 cache unit.

[0029] MESI represents four states of cached line data. By switching between these four states, the purpose of managing cached data is achieved. The states of MESI and their descriptions are shown in Table 1:

[0030] Table 1

[0031]

[0032] HN can track the access status of all cached line data by maintaining a directory table (dir table). The data structure of the directory table (dir table) is shown in Table 2:

[0033] Table 2

[0034] State Bit width (VEC) state[1:0] vec[n-1:0]

[0035] In Table 2, each entry corresponds to a cache line of data, and the location of the cache line data can be determined based on the address information. An entry includes a state field and a bit width field (VEC). The state field records the state of the data read by the cache unit, and the bit width field corresponds to the number n of cache units in L2. Each bit corresponds to an L2 cache unit, indicating which cache unit in L2 stores the data.

[0036] The status field in Table 2 only records the three states: ESI. This is because after an L2 cache unit receives data in state E, the modification of that data is not notified to HN. Therefore, HN does not know whether the data stored in the L2 cache unit is in state E or state M. For example, dir.state = E, VEC = 'b001' indicates that the data in this cache line in L2-0 may be in state M, state E, state S (downgraded from state E to state S without notifying HN), or state I (data replaced from the cache unit without notifying HN); dir.state = S, VEC = 'b011' indicates that the data in this cache line in L2-0 and L2-1 may be in state S or state I (data replaced from the cache unit without notifying HN); dir.state = I indicates that no L2 cache unit stores this cache line data.

[0037] Figure 1 This is a flowchart illustrating the data processing workflow in related technologies, specifically the atomic lock operation flow of AtomicCompare in an SMP system. For example... Figure 1 As shown, the process includes the following steps:

[0038] Step 1: L2-0 sends an RDS (Read Date S, Shared State Data Read Request) message to HN to request to read data. After receiving the RDS message, HN returns a DATS (Date S, Shared Data Read Response) message and records the requested data as S state. After receiving the DATS message, L2-0 stores the requested data and changes the state of the data from I state to S state. Initially, the data state of each cache unit is I state.

[0039] Step 2: L2-1 sends an ATCCOMPARE (atomic comparison) message to HN to perform an AtomicCompare atomic operation. The ATCCOMPARE message carries address information, and HN queries memory to obtain the data corresponding to that address. On one hand, HN returns a DBIDRESP (Date Buffer ID Response) message to L2-1; on the other hand, HN sends a SNPtoI (Snoop to Invalid) message to L2-0 to listen to L2-0.

[0040] Step 3: After receiving the DBIDRESP message, L2-1 sends ATCCOMPARE_DAT (Atomic CompareData) to HN. ATCCOMPARE_DAT includes exchange data and comparison data.

[0041] Step 4: After HN receives ATCCOMPARE_DAT, the comparison between the data corresponding to the address information and the comparison data fails.

[0042] Step 5: After receiving the SNPtoI message, L2-0 invalidates the data corresponding to the address information and returns an SNPRSPI (Snoop Response Invalid) message to HN.

[0043] Step 6: After HN receives the SNPRSPI message, it records the data corresponding to the address information as state I and returns a DATI (Date I, Invalid State Data Response) message to L2-1.

[0044] If the processing core Core0 corresponding to L2-0 still needs the data corresponding to this address information, it needs to resend the RDS message to HN to read the data, that is, it needs to execute step 7.

[0045] Step 7: L2-0 sends an RDS message to HN to request to read data. After receiving the RDS message, HN returns a DATS message and records the requested data as S state. After receiving the DATS message, L2-0 stores the requested data and changes the state of the data from I state to S state.

[0046] It should be noted that in step 4, if the comparison data in ATCCOMPARE_DAT successfully compares with the data corresponding to the address information, HN modifies the data corresponding to the address information in memory to the swapped data in ATCCOMPARE_DAT and continues to execute subsequent steps. If the processing core Core0 corresponding to L2-0 still needs to access the data corresponding to the address information, the data corresponding to the address information in memory has been modified. L2-0 needs to resend the RDS message to HN to obtain the modified data (i.e., the swapped data), which requires executing step 7.

[0047] The drawback of the relevant technical solution is that if the data comparison fails, the data corresponding to the address information in memory is not actually modified, that is, no data swapping operation is performed, but the data in L2-0 is still invalidated. If the subsequent processing core Core0 needs to access this data, it must re-initiate RDS to read the data, which will reduce the processing performance of the processing core.

[0048] To address the aforementioned technical problems, this disclosure provides a data processing method applied to a data processing system where lock performance is particularly important. The data processing system can be an SMP system or a Cache-Coherent Non-Uniform Memory Access (CC-NUMA) system. In this disclosure, using... Figure 6 The following explanation uses a data processing system (SMP) as an example, where L2-1 is the first cache unit, L2-0 is the second cache unit, and the data processing device that interacts with L2-1 and L2-0 is HN. Figure 2 As shown, the data processing method can be executed by a data processing device in a data processing system, and includes the following steps:

[0049] Step 11: Receive the data exchange request message sent by the first cache unit, obtain the address information carried in the data exchange request, and determine the first target data corresponding to the address information in memory.

[0050] In this step, L2-1 sends an ATCCOMPARE message carrying address information to HN, requesting HN to modify the data corresponding to that address in memory. HN then queries memory based on the address information carried in the ATCCOMPARE message to obtain the first target data corresponding to that address.

[0051] Step S12: Send a notification message to the first cache unit.

[0052] In this step, HN allocates a local data buffer to store data subsequently sent from L2-1. The notification message is a Data Buffer Identifier Response (DBIDRESP) message. After allocating the data buffer, HN returns a DBI DRESP message to L2-1, indicating that the data buffer allocation is complete and that exchange and comparison data can be sent to HN.

[0053] Step S13: Receive the atomic comparison data notification message sent by the first cache unit, and obtain the second target data and the third data carried in the atomic comparison data notification message. The third data is the data exchanged with the first target data (i.e., the exchanged data), and the second target data is the data that the second cache unit has requested to read.

[0054] The second target data is the comparison data, which is the data that the second cache unit has previously requested to read from HN and recorded as shared data. The second cache unit and the first cache unit are cache units at the same level. In this embodiment, the second cache unit L2-0 and the first cache unit L2-1 are both cache units of the second-level cache L2.

[0055] In this step, L2-1 sends ATCCOMPARE_DAT to HN. ATCCOMPARE_DAT includes second target data and third data, which can be stored in the cache space allocated in step S12. The third data is the data exchanged with the first target data, which is the modified data at the corresponding address in memory (i.e., the exchanged data). The second target data is the data that other cache units have previously requested to read from HN, i.e., the data that L2-0 has accessed. The state of this data recorded in L2-0 is S state.

[0056] Step S14: If the second target data is not equal to the first target data, return a data exchange response message carrying the third data to the first buffer unit.

[0057] In this step, HN compares the second target data with the first target data. If they are not equal, it means that the data to be modified by L2-1 is different from the data previously requested to be read by L2-0. HN then returns a DATI message to L2-1, which is a response to the ATCCOMPARE message. It should be noted that if the second target data is not equal to the first target data, the DATI message carries third data, indicating that HN has not written the third data into memory.

[0058] The data processing method provided in this disclosure, upon receiving a data exchange request message sent by a first cache unit, determines the corresponding first target data in memory based on the address information carried therein, and sends a notification message to the first cache unit to allow the first cache unit to send second target data and third data. The third data is the data exchanged with the first target data, and the second target data is the data already requested by the second cache unit. If the second target data is not equal to the first target data, a data exchange response message carrying the third data is returned to the first cache unit. This disclosure first performs data comparison and then decides whether to initiate data monitoring of other cache units. If the data comparison fails, data monitoring of other cache units is not initiated, and other cache units no longer invalidate the corresponding data. This ensures that other processing cores can use the data, thereby reducing bus bandwidth overhead and chip power consumption. Furthermore, when other processing cores access the data, they can directly read the data from the other cache unit without re-accessing memory, further reducing bus bandwidth overhead and chip power consumption.

[0059] In some embodiments, such as Figure 3 As shown, after receiving the second target data and the third data sent by the first buffer unit (i.e., step S13), the data processing method may further include the following steps:

[0060] Step S21: If the second target data is equal to the first target data, a data listening message is sent to the second cache unit so that the second cache unit can modify the state of the second target data to invalid state.

[0061] In this step, HN compares the second target data with the first target data. If the second target data is not equal to the first target data, it means that the data that L2-1 needs to modify is the same as the data that L2-0 previously requested to read. In this case, HN initiates data listening to L2-0 by sending an SNPtoI message to L2-0 so that L2-0 invalidates the second target data, that is, L2-0 changes the state of the second target data from state I to state I.

[0062] Step S22: Receive the data listening response message sent by the second buffer unit.

[0063] In this step, after L2-0 changes the state of the second target data from S state to I state, it returns an SNPRSPI message to HN. The SNPRSPI message is a response message to the SNPtoI message, used to inform HN that the second target data has changed to I state.

[0064] Step S23: Write the third data into memory according to the address information, and change the status of the data corresponding to the address information to invalid.

[0065] In this step, HN writes the third data into the memory at the location corresponding to the address information, that is, updates the first target data with the third data, and modifies the status of the corresponding data to the I state in Table 2.

[0066] Step S24: Return a data exchange response message carrying the first target data to the first buffer unit.

[0067] In this step, HN returns a DATI message to L2-1. The DATI message is a response to the ATCCOMPARE message. It's important to note that if the second target data is equal to the first target data, the DATI message still carries the first target data, indicating that HN has modified the first target data in memory to the third data, effectively invalidating the first target data. Thus, L2-1 uses the third data to exchange data with the first target data in memory, thereby modifying the data in memory that has already been accessed by L2-0.

[0068] In some embodiments, before receiving the data exchange request message sent by the first buffer unit (i.e., step S11), the data processing method further includes the following steps:

[0069] Step S31: Receive the data read request message sent by the second cache unit, and modify the status of the requested second target data in the second cache unit to the shared status.

[0070] In this step, L2-0 sends an RDS message to HN to request the reading of the second target data. HN then changes the status of the second target data to the S status in Table 2.

[0071] Step S32: Return the second target data to the second cache unit so that the second cache unit can store the second target data and modify the state of the second target data to a shared state.

[0072] In this step, HN sends a DATS message to L2-0. The DATS message is a response message to the RDS message, which carries the second target data. L2-0 stores the second target data and changes the status of the second target data from the I state to the S state.

[0073] It should be noted that after step S24 is executed, if the processing core Core0 corresponding to L2-0 still needs to access the data corresponding to the address information, and the data corresponding to the address information in memory has been modified to the third data, then L2-0 needs to resend the RDS message to HN to read the modified data (i.e. the third data), that is, it needs to execute steps 31-S32 again.

[0074] To clearly illustrate the technical solutions of the embodiments of this disclosure, the following descriptions are in conjunction with... Figure 4 and Figure 5 This section explains the handling procedures for ATCCOMPARE failures and successes. Figure 4 As shown, the ATCCOMPARE failure handling process includes the following steps:

[0075] In step S101, L2-0 needs to access the second target data. It sends an RDS message to HN and receives a DATS message to obtain the second target data. L2-0 caches the second target data and records its status as S state. After receiving the RDS message, HN modifies the status of the corresponding data to S state in Table 2.

[0076] In step S102, L2-1 sends an ATCCOMPARE message carrying address information to HN. After allocating data buffer space, HN returns a DBI DRESP message to L2-1 to notify L2-1 that data can be sent.

[0077] In step S103, after receiving the DBI DRESP message, L2-1 sends ATCCOMPARE_DAT to HN. ATCCOMPARE_DAT includes the second target data and the third data.

[0078] In step S104, after HN receives ATCCOMPARE_DAT, it compares the second target data with the first target data corresponding to the address information. Since the two are not equal, the data comparison fails.

[0079] In step S105, HN directly returns a DATI message to L2-1, which carries third data.

[0080] As can be seen from the above steps S101-S105, this embodiment of the present disclosure does not initiate data listening to L2-0 before performing data comparison. Accordingly, L2-0 does not modify the state of the second target data (it remains in the S state). In this way, the failure of data comparison does not affect the data cached in L2-0. The processing core Core0 corresponding to L2-0 can continue to access the data from L2-0 without having to re-initiate an RDS request to HN as in the existing scheme.

[0081] In this embodiment of the disclosure, when the Atomic Compare data comparison fails, no listening message is sent to other cache units (i.e., L2-0). In this way, other cache units will not invalidate the data, and therefore, it will not affect other processing cores (Core0) from using the data. Other processing cores (Core0) do not need to re-access memory when accessing the data, thereby reducing bus bandwidth overhead and reducing chip power consumption.

[0082] like Figure 5 As shown, the successful processing flow of ATCCOMPARE includes the following steps:

[0083] In step S201, L2-0 needs to access the second target data. It sends an RDS message to HN and receives a DATS message to obtain the second target data. L2-0 caches the second target data and records its status as S. After receiving the RDS message, HN modifies the status of the corresponding data to S in Table 2.

[0084] In step S202, L2-1 sends an ATCCOMPARE message carrying address information to HN. After allocating data buffer space, HN returns a DBIDRESP message to L2-1.

[0085] In step S203, after receiving the DBIDRESP message, L2-1 sends ATCCOMPARE_DAT to HN. ATCCOMPARE_DAT includes the second target data and the third data.

[0086] In step S204, after HN receives ATCCOMPARE_DAT, it compares the second target data with the first target data corresponding to the address information. Since the two are not equal, the data comparison is successful.

[0087] In step S205, HN sends an SNPtoI message to L2-0 to listen to L2-0.

[0088] In step S206, L2-0 receives the SNPtoI message, changes the state of the data corresponding to the address information from the S state to the I state, invalidates the data, and returns an SNPRSPI message to HN to inform HN that the data has been invalidated and can be modified.

[0089] In step S207, after HN receives the SNPRSPI message, it writes the third data into memory according to the address information, modifies the state of the data corresponding to the address information to the I state, and returns a DATI message to L2-1. The DATI message carries the first target data, and the Atomic operation is successful.

[0090] As can be seen from steps S201-S207, only a successful data comparison will affect the data cached in L2-0. L2-0 needs to invalidate the accessed data. Thus, if the processing core Core0 corresponding to L2-0 needs to access the data again later, since the data cached in L2-0 is invalid, L2-0 needs to re-send the RDS request to HN.

[0091] This disclosure improves the atomic lock operation process of AtomicCompare, enabling its application in SMP, CC-NUMA, and other systems where atomic lock performance is particularly important. This disclosure improves AtomicCompare performance while reducing bandwidth overhead and chip power consumption.

[0092] In this embodiment, in the event of a data comparison failure, it is not necessary to monitor other cache units storing S-state data. This avoids invalidating the data stored in other cache units after a data comparison failure, ensuring that the corresponding processing core can continue to read data from that cache unit. Monitoring of other cache units storing S-state data is only initiated when the data comparison is successful, thereby improving the performance of the processing core. Simultaneously, it reduces the additional bandwidth and power consumption caused by cache unit monitoring and the processing core re-accessing memory.

[0093] This disclosure also provides a data processing apparatus, which includes one or more processors and a storage device; wherein the storage device stores one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the data processing methods provided in the foregoing embodiments.

[0094] This disclosure also provides a data processing system, which includes at least one processing chip, and the processing chip includes the data processing device as described above.

[0095] In some embodiments, the processing system further includes memory and at least two levels of cache, each level including at least one cache unit. The data processing device may be a master node (HN), or it may be one of the cache units at a different level from the first and second cache units. That is, the embodiments of this disclosure are not limited to cache units accessing memory data through interaction with the HN, but may also include cache units accessing memory data through interaction with cache units at other levels.

[0096] It should be noted that the processing system may also include an on-chip network (NOC), through which different levels of cache units and the master node can communicate.

[0097] like Figure 7As shown, the processing system includes a level 2 cache (L2), a level 3 cache (L3), and a level 4 cache (L4). L2 comprises three cache units: L2-0, L2-1, and L2-2. L2-0 is connected to processing core Core0, L2-1 is connected to processing core Core1, and L2-2 is connected to processing core Core2. L3 comprises two processing units: L3-1 and L3-2. The data processing device can be the master node HN or a cache unit of L4.

[0098] The data processing system can be a symmetric multiprocessor (SMP) system or a cache-coherent non-uniform memory access (CC-NUMA) system.

[0099] In some embodiments, there may be at least two processing chips, each processing chip being connected to at least one other processing chip, and the processing chip to which the data processing device belongs may be the same as or different from the processing chip to which the first cache unit and the second cache unit belong.

[0100] In some embodiments, the processing chip includes, but is not limited to: a central processing unit (CPU), a graphics processing unit (GPU), an embedded neural network processor (NPU), and a data processing unit (DPU).

[0101] Data processing systems that include multiple processing chips, such as CC-NUMA systems, allow cache units to access memory data on the same processing chip as well as memory data on other processing chips, thus enabling cross-CPU access.

[0102] In a CC-NUMA system, distributed memory is interconnected to form a single memory. There is no page copying or data copying between memory locations, nor is there software message passing. Processor chips accessing local memory is relatively fast, but accessing remote memory belonging to another processor chip is slower due to additional latency caused by the interconnect network. CC-NUMA uses a single memory image, with the memory of each processor chip physically connected via copper cables and certain intelligent hardware. Cache coherence means that no software is needed to maintain the consistency of multiple data copies, nor is software required to implement data transfer between the operating system and application systems. Similar to SMP systems, CC-NUMA systems manage a single operating system and multiple processors entirely at the hardware level.

[0103] like Figure 8The CC-NUMA system shown includes four processing chips, each connected in pairs. A cache unit within one of the processing chips enables cross-chip memory data access. It should be noted that this disclosure does not limit the connection method of the processing chips in the CC-NUMA system; for example, the processing chips can also be connected in series.

[0104] This disclosure also provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed, implements the data processing methods provided in the foregoing embodiments.

[0105] It will be understood by those skilled in the art that all or some of the steps in the methods disclosed above, and the functional modules / units in the apparatus, can be implemented as software, firmware, hardware, and suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

[0106] Example embodiments have been disclosed herein, and while specific terminology has been used, it is for illustrative purposes only and should be construed as such, and is not intended to be limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in conjunction with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in conjunction with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of the invention as set forth in the appended claims.

Claims

1. A data processing method, characterized in that, A data processing apparatus applied in a data processing system, the method comprising: The system receives a data exchange request message sent by a first cache unit, obtains the address information carried in the data exchange request, and determines the first target data corresponding to the address information in memory; the data exchange request message is used to request the data processing device to modify the first target data corresponding to the address information in memory. Send a notification message to the first cache unit; The system receives second target data and third data sent by the first cache unit, wherein the third data is data exchanged with the first target data, and the second target data is data that the second cache unit has requested to read and recorded as shared data. If the second target data is not equal to the first target data, a data exchange response message carrying the third data is returned to the first cache unit to inform the first cache unit that the data processing device has not written the third data into memory; If the second target data is equal to the first target data, a data monitoring message is sent to the second cache unit so that the second cache unit can modify the state of the second target data to invalid state. Receive the data listening response message sent by the second cache unit; The third data is written into the memory according to the address information, and the state of the data corresponding to the address information is modified to an invalid state. The system returns a data exchange response message carrying the first target data to the first cache unit, in order to inform the first cache unit that the data processing device has modified the first target data to the third data in memory, and that the status of the first target data has been modified to invalid.

2. The method as described in claim 1, characterized in that, The data monitoring message is an invalid monitoring message, and the data monitoring response message is an invalid status monitoring response message.

3. The method as described in claim 1, characterized in that, Before receiving the data exchange request message sent by the first buffer unit, the method further includes: Receive the shared state data read request message sent by the second cache unit, and modify the state of the second target data requested by the second cache unit to the shared state; The second cache unit returns a shared state data read response message carrying the second target data, so that the second cache unit can store the second target data and modify the state of the second target data to the shared state.

4. The method as described in claim 1, characterized in that, The data exchange request message is an atomic comparison message, the notification message is a data cache identifier response message, and the data exchange response message is an invalid state data response message.

5. The method according to any one of claims 1-4, characterized in that, The data processing system includes a processing chip, memory, and at least two levels of cache. The processing chip includes the data processing device, and each level of cache includes at least one cache unit. The data processing device is a master node, or the data processing device is one of the cache units at a different level from the first cache unit and the second cache unit.

6. A data processing apparatus, wherein, include: One or more processors; A storage device on which one or more programs are stored; When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method as described in any one of claims 1-5.

7. A data processing system, characterized in that, It includes at least one processing chip, said processing chip comprising the data processing apparatus as described in claim 6.

8. The data processing system as described in claim 7, characterized in that, The processing system further includes memory and at least two levels of cache, each level of cache including at least one cache unit, the data processing device being a master node, or the data processing device being one of the cache units at a different level from the first cache unit and the second cache unit.

9. The data processing system as described in claim 7, characterized in that, The data processing system is a symmetric multiprocessor system or a cache-coherent non-uniform memory access system.

10. The data processing system as described in claim 9, characterized in that, There are at least two processing chips, and each processing chip is connected to at least one other processing chip. The processing chip to which the data processing device belongs may be the same as or different from the processing chips to which the first cache unit and the second cache unit belong.

11. The data processing system according to any one of claims 7-10, characterized in that, The processing chip is one of the following: a central processing unit, a graphics processing unit, an embedded neural network processor, or a data processor.

12. A computer-readable medium having a computer program stored thereon, wherein, When the program is executed, it implements the data processing method as described in any one of claims 1-5.