Memory data access method and apparatus, electronic device, and storage medium
By aligning the starting address of memory data to a target access granularity, the method addresses memory access delays in AI chips, enabling high-performance optimizations and enhancing data processing efficiency.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SHANGHAI BIREN TECH CO LTD
- Filing Date
- 2025-03-10
- Publication Date
- 2026-06-22
AI Technical Summary
The limited memory bandwidth in artificial intelligence chips leads to delays in memory access processes due to the need to call a large number of operators, and the starting address of memory must be aligned to a specific access granularity for high-performance optimization techniques to be effective, otherwise memory access performance deteriorates.
A memory data access method that aligns the first starting address according to a target access granularity by performing an offset operation, allowing high-performance optimization techniques to be applied, thereby improving memory access efficiency.
The method enhances memory access performance by enabling the use of high-performance optimization techniques, such as specific layout schemes and burst transfer optimizations, leading to improved data processing efficiency.
Smart Images

Figure 2026520164000001_ABST
Abstract
Description
Technical Field
[0001] This application claims the benefit of priority of Chinese Patent Application No. 202410289823.4 filed on March 14, 2024, and incorporates by reference in its entirety the content disclosed in the above-mentioned Chinese Patent Application.
[0002] Embodiments of the present disclosure relate to a memory data access method, a memory data access device, an electronic device, and a storage medium.
Background Art
[0003] In the field of artificial intelligence (AI), as the generalization of the utilization scenarios of deep learning progresses, AI computing tasks also need to be arranged on different computing devices and hardware architectures. For example, the optimization of the computing performance of large-scale online placement models can be directly converted into cost savings in computing. At the same time, for AI computing tasks, performance optimization can provide a large space by algorithms, enabling algorithm developers to try larger or more complex models within a reasonable time. Through continuous optimization of the model and improvement of the algorithm, the actual placement or training scenarios of artificial intelligence models are also posing higher requirements for hardware performance.
Summary of the Invention
[0004] At least one embodiment of the present disclosure provides a memory data access method. The memory data access method in which the first start address of memory data is aligned according to a first access granularity includes performing an offset operation on the first start address to obtain a target start address, where the target start address is aligned according to a target access granularity, the obtaining, and performing a target memory access operation on target memory data according to the target access granularity, where the target memory data is after the target start address in the memory data, the performing, and the above.
[0005] For example, in a memory data access method provided in at least one embodiment of the present disclosure, the target access granularity is greater than the first access granularity.
[0006] For example, in a memory data access method provided in at least one embodiment of the present disclosure, a first starting address is stored in a first pointer, and performing the offset operation on the first starting address to obtain a target starting address includes adding a target offset amount to the first pointer to obtain a target pointer, the target starting address being stored in the target pointer.
[0007] For example, a memory data access method provided in at least one embodiment of the present disclosure further includes performing a first memory access operation on first memory data from a first starting address to a target starting address in the memory data, according to a first access granularity.
[0008] For example, in a memory data access method provided in at least one embodiment of the present disclosure, the memory data further includes second memory data to be accessed or stored, and the second starting address of the second memory data is aligned according to a second access granularity. The memory data access method further includes performing a second memory access operation on the second memory data according to the second access granularity.
[0009] For example, in a memory data access method provided in at least one embodiment of the present disclosure, the second access granularity is the same as the first access granularity, or the second access granularity is smaller than the target access granularity and larger than the first access granularity.
[0010] For example, in a memory data access method provided in at least one embodiment of the present disclosure, a plurality of data elements in memory data are arranged according to a first layout scheme, and performing a target memory access operation on target memory data according to a target access granularity includes setting the arrangement order of data elements in the target layout scheme as a target memory access order based on a target starting address and target access granularity, and performing a target memory access operation on the target memory data according to the target memory access order and target access granularity.
[0011] For example, in a memory data access method provided in at least one embodiment of the present disclosure, a plurality of data elements in memory data are arranged according to a first layout scheme, and a first memory access operation is performed on the first memory data according to a first access granularity, which includes setting the arrangement order of the data elements in the first layout scheme as the first memory access order, and performing the first memory access operation on the first memory data according to the first memory access order and according to the first access granularity.
[0012] For example, in a memory data access method provided in at least one embodiment of the present disclosure, the target memory data includes M × N data elements, where each of the M × N data elements is a target access granularity, and M and N are positive integers. Performing a target memory access operation on the target memory data according to the target access granularity includes performing M sub-memory access operations on the target memory data based on the target starting address and the target access granularity, where each sub-memory access operation during the M sub-memory access operations accesses or stores N data elements.
[0013] For example, in a memory data access method provided in at least one embodiment of the present disclosure, the first memory data includes Y data elements, each of the Y data elements is defined as a first access granularity, where Y is a positive integer, and performing a first memory access operation on the first memory data according to the first access granularity includes sequentially accessing or storing the Y data elements in the first memory data based on a first starting address and the first access granularity.
[0014] At least one embodiment of the present disclosure further provides a memory data access device in which a first starting address of memory data is aligned according to a first access granularity, comprising: an offset module configured to perform an offset operation on the first starting address to obtain a target starting address, the target starting address being aligned according to a target access granularity; and a first execution module configured to perform a target memory access operation on target memory data after the target starting address in the memory data, according to the target access granularity.
[0015] For example, in a memory data access device provided in at least one embodiment of the present disclosure, a first starting address is stored in a first pointer, and the offset module is configured to further obtain a target pointer by adding a target offset amount to the first pointer, and the target starting address is stored in the target pointer.
[0016] For example, a memory data access device provided in at least one embodiment of the present disclosure further includes a second execution module configured to perform a first memory access operation on first memory data from a first starting address to a target starting address in the memory data, according to a first access granularity.
[0017] For example, in a memory data access device provided in at least one embodiment of the present disclosure, the memory data further includes second memory data to be accessed or stored, wherein a second starting address in the second memory data is aligned according to a second access granularity, and the memory data access device further includes a third execution module, which is configured to perform a second memory access operation on the second memory data according to the second access granularity.
[0018] At least one embodiment of the present disclosure further provides an electronic device, the electronic device comprising a processor and a memory containing one or more computer program modules, the one or more computer program modules being stored in the memory and configured to be executed by the processor, and the one or more computer program modules being used to implement a memory data access method provided in any embodiment of the present disclosure.
[0019] At least one embodiment of the present disclosure further provides a storage medium. A storage medium storing non-temporary computer-readable instructions implements a memory data access method provided in any embodiment of the present disclosure when the non-temporary computer-readable instructions are executed by a computer. [Brief explanation of the drawing]
[0020] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings of the embodiments are briefly introduced below. Clearly, the accompanying drawings in the following description relate only to some embodiments of this disclosure and do not limit this disclosure. [Figure 1] This is a schematic diagram of the memory data access process. [Figure 2] This figure shows an exemplary flowchart of a memory data access method provided in at least one embodiment of the present disclosure. [Figure 3]It is a schematic diagram of an example of a memory data access method provided in at least one embodiment of the present disclosure. [Figure 4A] It is a schematic diagram of another example of a memory data access method provided in at least one embodiment of the present disclosure. [Figure 4B] It is a schematic diagram of yet another example of a memory data access method provided in at least one embodiment of the present disclosure. [Figure 5] It is a schematic diagram of still another example of a memory data access method provided in at least one embodiment of the present disclosure. [Figure 6] It is a schematic block diagram of a memory data access device provided in at least one embodiment of the present disclosure. [Figure 7] It is a schematic block diagram of a kind of electronic device provided in at least one embodiment of the present disclosure. [Figure 8] It is a schematic block diagram of another kind of electronic device provided in at least one embodiment of the present disclosure. [Figure 9] It is a schematic diagram of a kind of storage medium provided in at least one embodiment of the present disclosure.
Embodiments for Carrying Out the Invention
[0021] To make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, hereinafter, in conjunction with the accompanying drawings of the embodiments of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described. Obviously, the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments obtained by those skilled in the art without creative efforts based on the described embodiments of the present disclosure shall fall within the protection scope of the present disclosure.
[0022] Unless otherwise defined, technical terms or scientific terms used in this disclosure shall have the ordinary meanings understood by those of ordinary skill in the field to which this disclosure pertains. The terms "first," "second," and the like used in this disclosure do not represent any order, quantity, or importance, but are merely used to distinguish different components. Similarly, words such as "one," "a," or "the" do not represent a quantity limitation, but indicate that at least one exists. Words such as "comprising" or "including" mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, and do not exclude other elements or items. Words such as "connected" or "associated" are not limited to physical or mechanical connections, and can include electrical connections whether direct or indirect. "Above," "below," "left," "right," etc. are merely used to represent relative positional relationships, and after the absolute position of the object being described changes, the relative positional relationships may also change accordingly.
[0023] Hereinafter, the present disclosure will be described by way of several specific examples. In order to keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and known components can be omitted. If any component of the embodiments of the present disclosure appears in one or more of the accompanying drawings, the component is represented by the same or similar reference numerals in each of the accompanying drawings.
[0024] In the training process of an artificial intelligence model, it is necessary to call a plurality of operators in the AI model. Since the number of operators in the AI model is large and the memory bandwidth of the artificial intelligence chip is limited, the process of calling a large number of operators causes delays in the memory access process, and it is necessary to adopt specific optimization means to optimize the memory access process.
[0025] For example, in the dynamic graph computation mode of Eager Execution, each operator is executed immediately rather than being placed first in the static computation graph. Compared to the static computation graph mode, the dynamic graph computation mode allows for the individual calling of single operators, and if an error occurs in the output result, the computation process at each step can be traced to identify the problem area, thus facilitating debugging by the user and making it widely used in the AI model training process. However, the single-operator calling mode is limited by memory bandwidth, so specific optimization measures must be employed to improve the memory access performance of the artificial intelligence chip.
[0026] For high-performance optimization techniques to be used, the starting address of memory must meet certain conditions. For example, the starting address of memory must be aligned to a relatively large access granularity value (128 bytes, 512 bytes, 1K bytes, or 2K bytes, etc.). For instance, if a specific memory access (load / store) instruction is used in a high-performance optimization technique, and the starting address of the memory data is aligned to the access granularity required by the high-performance optimization technique, the processor can complete the access and storage of data of that access granularity with only one corresponding memory access instruction. This improves bandwidth performance. However, if the starting address of the memory pointer does not meet the alignment condition, the processor can only access or store data in memory at the basic access granularity and cannot use the corresponding high-performance optimization technique. As a result, the memory access performance of the chip deteriorates.
[0027] Figure 1 is a schematic diagram of the memory data access process. For example, as shown in Figure 1, the starting address pointer points to the starting address of the memory data to be accessed or stored. Because the starting address pointer is not aligned to the access granularity required by the high-performance optimization means, the execution unit in the processor can only access or store memory data at the basic access granularity. For example, by using only general-purpose memory access instructions, only a small portion of the data (e.g., data at the basic access granularity) can be accessed or stored each time. Therefore, memory access efficiency decreases, and the memory access performance of the chip is significantly reduced.
[0028] At least one embodiment of the present disclosure provides a method for accessing memory data. The first starting address of the memory data is aligned according to a first access granularity. The memory data access method includes: obtaining an offset operation on the first starting address to obtain a target starting address, the target starting address being aligned according to a target access granularity; and performing a target memory access operation on the target memory data after the target starting address in the memory data, according to the target access granularity.
[0029] At least one embodiment of the present disclosure further provides a memory data access device, an electronic device, and a storage medium, which are used to implement the memory data access method of the above-described embodiment.
[0030] A method, apparatus, electronic device, and storage medium provided by at least one embodiment of the present disclosure obtain a target starting address aligned according to the target access granularity by offsetting a first starting address. Since the target starting address can satisfy the alignment conditions required by the high-performance optimization means, the memory data access method can use the high-performance optimization means to access or store target memory data at the target access granularity, thereby improving the memory access performance of the chip.
[0031] Hereinafter, at least one embodiment of the present disclosure will be described in detail with reference to the attached drawings. Note that the same reference numerals in different attached drawings refer to the same elements that have already been described.
[0032] Figure 2 is an illustrative flowchart of a memory data access method provided by at least one embodiment of the present disclosure. For example, as shown in Figure 2, at least one embodiment of the present disclosure provides a memory data access method in which the first starting address of the memory data is aligned according to a first access granularity. For example, the memory data access method includes the following steps S110 to S120.
[0033] Step S110: Perform an offset operation on the first starting address to obtain the target starting address. Step S120: Execute the target memory access operation on the target memory data after the target starting address in the memory data, according to the target access granularity.
[0034] For example, the first access granularity is, for example, the basic access granularity of memory. The basic access granularity can be equal to the size of the data type, for example, a double-precision floating-point (double) type is 8 bytes, a single-precision floating-point (float) type is 4 bytes, a BF16 type is 2 bytes, and a character type (char) is 1 byte, but the embodiments of this disclosure do not limit this. For example, if the first starting address of memory data is aligned according to the first access granularity and no processing is performed on the first starting address, the processor can only access and store the memory data according to the first access granularity (e.g., the basic access granularity).
[0035] For example, in step S110, a target starting address is obtained by performing an offset operation on the first starting address, and the target starting address is aligned according to the target access granularity. For example, the target access granularity can be the access granularity required by the high-performance optimization means. In some examples, the target access granularity is larger than the first access granularity, and based on the alignment conditions required by different optimization means, the target access granularity can be 128 bytes, 512 bytes, 1K bytes, or 2K bytes, and can be specifically selected as needed, and the embodiments of this disclosure are not limited thereto.
[0036] For example, in step S120, the target starting address for the target memory data after the target starting address in the memory data can be set to the starting address of that target memory data. Since the target starting address can satisfy the alignment conditions required by the high-performance optimization means, the processor can use the high-performance optimization means to access or store (i.e., perform target memory access operations on) the target memory data at the target access granularity, thereby improving the memory access efficiency for the memory data.
[0037] For example, high-performance optimization means may include optimization means applied to a specific layout scheme and burst transfer optimization means, and the embodiments of this disclosure are not limited thereto. For example, all of these high-performance optimization means require that the starting address of the memory data to be accessed satisfies required alignment conditions. For example, when the target starting address of the target memory data satisfies certain alignment conditions (e.g., aligned according to the target access granularity), one or more of the above-mentioned optimization means may be applied depending on the actual situation, and the embodiments of this disclosure are not limited thereto.
[0038] In some examples, the first starting address is stored in the first pointer. For example, step S110 in Figure 2 may further include step S111. Step S111: The target pointer is obtained by adding the target offset amount to the first pointer. For example, the target starting address is stored in the target pointer.
[0039] For example, in step S111, in order to obtain the target starting address by offsetting the first starting address, the target pointer can be obtained by adding the target offset amount to the first pointer. Specifically, the target starting address of the target pointer can be obtained by adding the value obtained by multiplying the target offset amount by the size of the data type to the first starting address of the first pointer.
[0040] In some examples, the memory data access method provided by at least one embodiment of the present disclosure further includes step S130. Step S130: Perform a first memory access operation on first memory data from a first starting address to a target starting address, according to a first access granularity.
[0041] For example, in step S130, the first starting address of the first memory data between the first starting address and the target starting address can be the starting address of the first memory data. Since the first starting address is aligned to the first access granularity (e.g., basic access granularity), it does not satisfy the alignment conditions required by the high-performance optimization means, and the processor can access and store the first memory data at the first access granularity (i.e., perform the first memory access operation).
[0042] In some examples, the memory data further includes second memory data to be accessed or stored, and the second starting address of the second memory data is aligned according to a second access granularity. For example, a memory data access method provided by at least one embodiment of the present disclosure further includes step S140: Step S140: Perform a second memory access operation on the second memory data according to a second access granularity.
[0043] For example, the second memory data can be the remaining data after removing the first memory data and the target memory data from the memory data. That is, the memory data is divided into the first memory data, the target memory data, and the second memory data by the first starting address, the target starting address, and the second starting address.
[0044] For example, if the size of the remaining second memory data is small and insufficient to access or store it according to the target access granularity, in step S140, a second memory access operation is performed on the second memory data according to a smaller second access granularity.
[0045] In some cases, the second access granularity is the same as the first access granularity. For example, if both are basic access granularities, the processor can access and store the second memory data (i.e., perform the second memory access operation) according to the basic access granularity.
[0046] In some other examples, the second access granularity is smaller than the target access granularity but larger than the first access granularity. For example, even if the second access granularity does not satisfy the alignment conditions required by the high-performance optimization means used for the target memory data, if the second access granularity can satisfy the alignment conditions required by other optimization means, and the memory access performance using said optimization means is better than the memory access method that follows the basic memory access granularity, then it is possible to access or store the second memory data at the second access granularity (i.e., perform a second memory access operation) using said optimization means, thereby improving the memory access efficiency for the memory data.
[0047] For example, the second starting address can be stored in the second pointer. That is, in order to access or store memory data more efficiently, the starting address pointer of the memory data is divided into three pointers: a first pointer, a target pointer, and a second pointer. The first pointer points to the first starting address of the first memory data, the target pointer points to the target starting address of the target memory data, and the second pointer points to the second starting address of the second memory data. The specific division method is set according to the required access granularity, and the embodiments of this disclosure do not limit this.
[0048] In at least one embodiment of this disclosure, the first memory access operation, the target memory access operation, and the second memory access operation may be performed in the order of steps S130, S120, and S140 after step S110. Alternatively, steps S120, S130, and S140 may be performed simultaneously, or other execution orders may be selected as needed, and the embodiments of this disclosure are not limited thereto.
[0049] Figure 3 is a schematic diagram of an example of a memory data access method provided by at least one embodiment of the present disclosure. For example, Figure 3 is a specific example of the memory data access method shown in Figure 2. For example, as shown in Figure 3, before performing an offset operation on the starting address, the first pointer points to the first starting address of the memory data 110 to be accessed or stored. The first starting address is aligned according to the first access granularity, which is, for example, the basic access granularity of memory. For example, if no operation is performed on the first starting address, the processor's execution unit can only access and store the memory data 110 at the first access granularity. For example, only general-purpose memory access instructions can be used, and only a small portion of the data (for example, data at the basic access granularity) can be accessed or stored each time, resulting in low memory access efficiency.
[0050] For example, as shown in Figure 3, step S110 is performed to obtain the target starting address by performing an offset operation on the first starting address. Specifically, in step S111, the target offset amount is added to the first pointer to obtain the target pointer. For example, the target starting address is aligned to the target access granularity, and the target access granularity can be the access granularity required by the high-performance optimization means.
[0051] For example, as shown in Figure 3, after step S110, the starting address pointer of the updated memory data 120 is divided into three pointers: a first pointer, a target pointer, and a second pointer. The first pointer points to the first starting address of the first memory data, the target pointer points to the target starting address of the target memory data, and the second pointer points to the second starting address of the second memory data.
[0052] For example, as shown in Figure 3, the memory data 120 is divided into three parts: the first memory data from the first starting address to the target starting address, the target memory data from the target starting address to the second starting address, and the second memory data after the second starting address. For example, because the access granularity at which the starting addresses of these three parts are aligned differs for the first memory data, target memory data, and second memory data, the processor's execution unit accesses or stores the memory data of these three parts at their respective access granularities.
[0053] For example, as shown in Figure 3, since the target starting address can satisfy the alignment conditions required by the high-performance optimization means, step S120 can be performed on the target memory data, and the execution unit can use the high-performance optimization means to access or store the target memory data at the target access granularity (i.e., perform a target memory access operation).
[0054] For example, as shown in Figure 3, the first starting address of the first pointer is aligned according to the first access granularity (e.g., basic access granularity), so the alignment conditions required by the high-performance optimization means are not met, and step S130 is executed on the first memory data, allowing the execution unit to access and store the first memory data at the first access granularity (i.e., perform the first memory access operation).
[0055] For example, as shown in Figure 3, since the second starting address in the second pointer is aligned according to the second access granularity, the second access granularity is smaller than the target access granularity, and the alignment conditions required by the high-performance optimization means used by the target memory data are not met. As a result, step S140 is executed on the second memory data, and the execution unit can access and store the second memory data at the second access granularity (i.e., perform the second memory access operation).
[0056] For example, the second access granularity may be the same as the first access granularity. If both are basic access granularities, the execution unit can access and store the second memory data according to the basic access granularity.
[0057] For example, the second access granularity is smaller than the target access granularity but larger than the first access granularity. If the second access granularity can satisfy the alignment conditions required by other optimization means, and the memory access performance using said optimization means is superior to the memory access method that follows the basic memory access granularity, then it is also possible to use said optimization means to access or store second memory data at the second access granularity.
[0058] The example in Figure 3 logically divides the memory data into three parts: the first memory data, the target memory data, and the second memory data. During this division process, no actual data movement occurs; the operator simply accesses or stores data internally according to the logical division. After data processing is complete, the data provided to the user is still in the format indicated by the original memory data 110.
[0059] The implementation method for memory data access shown in Figure 3 is merely an example. The division of the memory data into three parts, the setting method for the first pointer, target pointer, and second pointer, and the acquired values for the first access granularity, target access granularity, and second access granularity in Figure 3 are all illustrative examples. The specific implementation method can be selected according to actual needs, and the embodiments of this disclosure are not limited thereto.
[0060] A memory data access method provided by at least one embodiment of the present disclosure obtains a target starting address aligned according to the target access granularity by offsetting a first starting address. Since the target starting address can satisfy the alignment conditions required by the high-performance optimization means, the memory data access method can use the high-performance optimization means to access or store target memory data at the target access granularity, thereby improving the memory access performance of the chip.
[0061] In some examples, high-performance optimization means include optimization means applied to a specific layout scheme. For example, memory data contains multiple data elements, and the layout scheme of the multiple data elements in memory can include the data arrangement order (e.g., according to a linear arrangement order, according to a column-major arrangement order, according to a row-major arrangement order) and the data dimensions to be stored. For example, data with a layout scheme of [N,C,H,W] may contain N samples, and the data corresponding to each sample contains multiple channels, where C represents the number of channels. In each channel, multiple data elements are arranged in a two-dimensional array, where the height of the vertical dimension of the data in each channel is H, where H represents the number of data elements along the column direction of the array, and the width of the horizontal dimension of the data in each channel is W, where W represents the number of data elements along the row direction of the array. For example, the layout scheme may include [N,H,W,C], [C,N,H,W], [N,C4,H,W4], etc., or YUV (where Y represents the luminance component, and U and V represent the chromaticity component), RGB (where R represents the red component, G represents the green component, and B represents the blue component), etc., and the embodiments of this disclosure are not limited thereto.
[0062] For example, if multiple data elements in memory data are arranged according to a first layout scheme, and the first layout scheme is [N,C,H,W], then the multiple data elements in memory data are arranged in the first layout scheme [N,C,H,W], and the addresses to which the multiple data elements are mapped in memory are contiguous. In this case, the processor can only access or store the memory data in a linear arrangement order. However, when processing data in an artificial intelligence chip, the processing order of data elements is not necessarily linear, so when the processor accesses or stores memory data, it is necessary to perform additional transformation operations to ensure that no errors occur in the data processing result, which reduces data processing efficiency.
[0063] For example, in an artificial intelligence chip, when converting a first layout scheme (e.g., [N,C,H,W]) to a specific target layout scheme (e.g., [N,C4,H,W4]), the arrangement order of data elements in the target layout scheme matches the processing order of data elements in the processor. This allows for direct access to or storage of memory data using hardware-specific high-performance memory access instructions (i.e., high-performance optimization means) according to the arrangement order of data elements in the target layout scheme, thereby significantly improving data processing efficiency.
[0064] For example, the target layout method has a certain alignment requirement for the starting address of the memory data to be accessed. After step S110 in Figure 2, the target starting address of the target memory data obtained by the offset satisfies this alignment requirement (for example, alignment according to the target access granularity), thereby enabling the use of the high-performance optimization means described above. For example, step S120 in Figure 2 may further include steps S121 to S122.
[0065] Step S121: Based on the target starting address and target access granularity, set the arrangement order of data elements in the target layout scheme as the target memory access order. Step S122: Perform the target memory access operation on the target memory data according to the target memory access order and target access granularity.
[0066] For example, in step S121, multiple data elements in the memory data are arranged according to a first layout scheme (e.g., [N,C,H,W]), and the target starting address of the target memory data satisfies the alignment conditions required by the target layout scheme (e.g., [N,C4,H,W4]) (i.e., alignment according to the target access granularity). Therefore, the arrangement order of the data elements in the target layout scheme can be set as the target memory access order.
[0067] For example, in step S122, since the arrangement order of data elements in the target layout scheme and the processing order of data elements in the processor are the same, when performing a target memory access operation on target memory data according to the target memory access order, the target memory data can be accessed or stored using hardware high-performance memory access instructions (i.e., high-performance optimization means) dedicated to the target layout scheme. This allows access to or storage of a larger amount of data each time at the target access granularity (e.g., the amount of data at the target access granularity), thereby significantly improving data processing efficiency.
[0068] For example, with respect to the first memory data, step S130 provided by at least one embodiment of the present disclosure may further include steps S131 to S132.
[0069] Step S131: The arrangement order of data elements in the first layout scheme is set as the first memory access order. Step S132: Perform a first memory access operation on the first memory data according to the first memory access sequence and the first access granularity.
[0070] For example, in step S131, the first starting address of the first memory data does not satisfy the alignment conditions required by the target layout scheme (e.g., [N,C4,H,W4]), so for the first memory data, the only option is to use the arrangement order of the data elements in the first layout scheme (e.g., [N,C,H,W]) as the first memory access order.
[0071] For example, in step S132, when performing a first memory access operation on the first memory data, the processor accesses or stores the first memory data only according to the first memory access order (i.e., a linear array order).
[0072] In the memory data access method provided by at least one embodiment of the present disclosure, the memory access efficiency of the first memory data is the same as the memory access efficiency of the memory data without using the memory data access method provided by at least one embodiment of the present disclosure, but the overall memory access efficiency of the memory data is improved because high-performance optimization means can be applied to a specific layout scheme for the target memory data.
[0073] Regarding the second memory data, if the second access granularity is the same as the first access granularity, the method for performing the second memory access operation on the second memory data is basically the same as for the first memory data. If the second access granularity is smaller than the target access granularity but larger than the first access granularity, the layout method of the second memory data and the high-performance optimization means employed may differ from those for the target memory data. Therefore, for the second memory data, the specific execution process of the second memory access operation can refer to the first memory access operation, or it can be specifically set based on the optimization means employed, and will not be described in detail here.
[0074] Figure 4A is a schematic diagram of another example of a memory data access method provided by at least one embodiment of the present disclosure, and Figure 4B is a schematic diagram of yet another example of a memory data access method provided by at least one embodiment of the present disclosure. For example, Figure 4A is one specific example of a first memory access sequence in steps S131-S132, and Figure 4B is one specific example of a target memory access sequence in steps S121-S122.
[0075] For example, multiple data elements in memory data are arranged in a first layout scheme [N,C,H,W]. For example, as shown in Figure 4A, a portion of the first memory data is selected, with a sample count N=1, channel count C=4, height H=4, and width W=4, meaning that each channel contains H×W=16 data elements. For example, in step S131, the first starting address of the first memory data does not satisfy the alignment conditions required by the target layout scheme [N,C4,H,W4], so the arrangement order of data elements in the first layout scheme is set as the first memory access order, i.e., a linear arrangement order (1,2,…,15,16,1,2,…,15,16,1,2,…,15,16,1,2,…,15,16). In step S132, when performing a first memory access operation on the first memory data, the processor accesses or stores the first memory data only according to the first memory access order.
[0076] For example, as shown in Figure 4B, a portion of the target memory data is selected, with a sample count N=1, channel count C=4, height H=4, and width W=4, meaning that each channel contains H×W=16 data elements. For example, in step S121, since the target starting address of the target memory data satisfies the alignment conditions required by the target layout scheme [N,C4,H,W4] (i.e., alignment at the target access granularity), the arrangement order of data elements in the target layout scheme [N,C4,H,W4] can be set as the target memory access order (1,1,1,1,2,2,2,2,…,16,16,16,16).
[0077] For example, as shown in Figure 4B, in step S122, when performing a target memory access operation on target memory data in the target memory access order, the target memory data can be accessed or stored using hardware high-performance memory access instructions (i.e., high-performance optimization means) dedicated to the target layout scheme [N,C4,H,W4] directly. This allows access to or storage of at least four data items each time according to the target access granularity. For example, the target memory access operation can be performed each time according to a granularity of (1,1,1,1), (2,2,2,2)...(16,16,16,16), thereby significantly improving data processing efficiency.
[0078] Regarding the second memory data, the specific execution process of the second memory access operation can be found in the first memory access operation shown in Figure 4A, and can also be specifically configured based on the optimization means employed; therefore, it will not be described in detail here.
[0079] Note that the first layout method [N,C,H,W] and target layout method [N,C4,H,W4] in Figures 4A and 4B are merely illustrative examples, and other layout methods can be selected as needed, and the embodiments of this disclosure are not limited thereto.
[0080] A memory data access method provided by at least one embodiment of the present disclosure obtains a target starting address aligned according to the target access granularity by offsetting a first starting address. Since the target starting address can satisfy the alignment conditions required by a particular layout scheme, the memory data access method can access or store target memory data using high-performance optimization means applicable to the particular layout scheme, thereby significantly improving the overall memory access efficiency of the memory data and thereby improving the memory access performance of the chip.
[0081] In some examples, high-performance optimization means further include burst transfer optimization means. For example, in burst transfer optimization means, by simply specifying a starting address and burst lengths that satisfy certain alignment conditions, the processor can sequentially access or store adjacent data elements within the same row, eliminating the need for the controller to sequentially provide column addresses and associated instruction information. This reduces the occupation of control resources and improves the memory access efficiency of memory data.
[0082] For example, the target memory data contains M × N data elements, and each of the M × N data elements is treated as the target access granularity, where M and N are positive integers. For example, step S120 in Figure 2 may further include step S123. Step S123: Based on the target starting address and target access granularity, M sub-memory access operations are performed on the target memory data. For example, in each sub-memory access operation, N data elements are accessed or stored.
[0083] For example, if the target starting address of the target memory data satisfies the alignment condition required by this optimization means called burst transfer (i.e., being aligned at the target access granularity), the burst transfer optimization means can be used directly to access or store M × N data elements in the target memory data. For example, in step S123, M sub-memory access operations can be performed on the target memory data, and in each sub-memory access operation, N adjacent data elements in the same row can be accessed or stored at the target access granularity. This eliminates the need for the controller to continuously provide column addresses and associated instruction information, reduces the occupation of control resources, and improves the memory access efficiency of the memory data.
[0084] Furthermore, in each sub-memory access operation, access and storage of N adjacent data elements within the same row at the target access granularity can be completed with only one memory access instruction corresponding to a burst transfer optimization means. In a single-threaded data processing method, the data amount of these N data elements is equal to the burst length, and in a multi-threaded data processing method, the data amount of these N data elements is equal to the burst length multiplied by the number of threads.
[0085] For example, the first memory data contains Y data elements, and each of the Y data elements is treated as a first access granularity. For example, step S130 provided by at least one embodiment of the present disclosure may further include step S133: sequentially access or store the Y data elements in the first memory data based on a first starting address and a first access granularity.
[0086] For example, in step S133, because the first starting address of the first memory data does not satisfy the alignment conditions required by the optimization means called burst transfer, Y data elements in the first memory data are accessed or stored sequentially only according to the first access granularity, in accordance with a general-purpose memory access method.
[0087] In the memory data access method provided by at least one embodiment of the present disclosure, the memory access efficiency of the first memory data is the same as the memory access efficiency of the memory data when the memory data access method provided by at least one embodiment of the present disclosure is not used, but the target memory data can use burst transfer optimization means, thereby improving the overall memory access efficiency of the memory data.
[0088] Furthermore, regarding the second memory data, if the second access granularity is the same as the first access granularity, the method for performing the second memory access operation on the second memory data is basically the same as for the first memory data. Also, if the second access granularity is smaller than the target access granularity but larger than the first access granularity, the high-performance optimization means employed by the second memory data may differ from that of the target memory data. Therefore, the specific execution process of the second memory access operation for the second memory data can refer to the first memory access operation, or it can be specifically set based on the optimization means employed, and will not be described in detail here.
[0089] Figure 5 is a schematic diagram of a further example of a memory data access method provided by at least one embodiment of the present disclosure. For example, Figure 5 is a specific example of one of several data elements in step S123 or S133.
[0090] For example, as shown in Figure 5, the target memory data contains M × N data elements (a11, a12, ..., a1N; a21, a22, ..., a2N; ...; aM1, aM2, ..., aMN). For example, each N data element (ai1, ai2, ..., aiN, where i = 1, 2, ..., M) among the M × N data elements is defined as the target access granularity. For example, since the target starting address of the target memory data satisfies the alignment condition required by the burst transfer optimization means (i.e., alignment at the target access granularity), the burst transfer optimization means can be used directly to access or store the M × N data elements in the target memory data.
[0091] For example, as shown in Figure 5, in step S123, M sub-memory access operations are performed on the target memory data, and in each sub-memory access operation, N adjacent data elements in the same row can be accessed or stored at the target access granularity (ai1, ai2, ..., aiN). This eliminates the need for the controller to continuously provide column addresses and associated instruction information, thereby reducing the occupation of control resources and improving the memory access efficiency of memory data.
[0092] For example, in the case where the data in Figure 5 is the first memory data, the first memory data contains Y data elements (for example, Y = M × N), and each of the Y data elements (aij, where i = 1, 2, ..., M, j = 1, 2, ..., N) is defined as the first access granularity. For example, in step S133, because the first starting address of the first memory data does not satisfy the alignment conditions required by the optimization means called burst transfer, a general-purpose memory access method is used to sequentially access or store the Y data elements in the first memory data only according to the first access granularity (aij).
[0093] Regarding the second memory data, the specific execution process of the second memory access operation can be found in the first memory access operation shown in Figure 5, and can also be specifically configured based on the optimization means employed; therefore, it will not be described in detail here.
[0094] Note that the M×N data element arrangement and memory access method shown in Figure 5 are merely illustrative examples, and the target memory data and first memory data can be configured with other data element arrangement and memory access methods as needed, and the embodiments of this disclosure are not limited to these.
[0095] A memory data access method provided by at least one embodiment of the present disclosure obtains a target starting address aligned to the target access granularity by offsetting a first starting address. Since the target starting address satisfies the alignment conditions required by the burst transfer optimization means, the memory data access method can use the burst transfer optimization means to access or store the target memory data, thereby significantly improving the overall memory access efficiency of the memory data and thereby improving the memory access performance of the chip.
[0096] Figure 6 is a schematic block diagram of a memory data access device provided by at least one embodiment of the present disclosure. For example, as shown in Figure 6, at least one embodiment of the present disclosure provides a memory data access device 200 in which the first starting address of memory data is aligned according to a first access granularity. For example, this memory data access device 200 includes an offset module 210 and a first execution module 220.
[0097] For example, the offset module 210 is configured to perform an offset operation on the first starting address to obtain the target starting address. For example, the target starting address is aligned according to the target access granularity. That is, the offset module 210 can be configured to perform step S110, for example, as shown in Figure 2.
[0098] For example, the first execution module 220 is configured to perform a target memory access operation on the target memory data after the target starting address in the memory data, according to the target access granularity. That is, the first execution module 220 can be configured to perform, for example, step S120 shown in Figure 2.
[0099] In some cases, the target access granularity is larger than the first access granularity.
[0100] In some examples, the first starting address is stored in the first pointer. For example, the offset module 210 is further configured to obtain the target pointer by adding the target offset amount to the first pointer. For example, the target pointer stores the target starting address.
[0101] In some examples, as shown in Figure 6, the memory data access device 200 further includes a second execution module 230. For example, the second execution module 230 is configured to perform a first memory access operation on first memory data from a first starting address to a target starting address in the memory data at a first access granularity. That is, the second execution module 230 can be configured to perform step S130 provided by at least one embodiment of the present disclosure.
[0102] In some examples, the memory data further includes second memory data to be accessed or stored, and the second starting address of the second memory data is aligned at a second access granularity. For example, the memory data access device 200 further includes a third execution module 240. For example, the third execution module 240 is configured to perform a second memory access operation on the second memory data at a second access granularity. That is, the third execution module 240 can be configured to perform step S140 provided by at least one embodiment of the present disclosure.
[0103] For example, the second access granularity is the same as the first access granularity, or the second access granularity is smaller than the target access granularity and larger than the first access granularity.
[0104] In some examples, multiple data elements in memory data are arranged in a first layout scheme. For example, the first execution module 220 sets the arrangement order of data elements in the target layout scheme as the target memory access order based on the target starting address and target access granularity, and is further configured to perform target memory access operations on the target memory data according to the target memory access order and target access granularity.
[0105] For example, the second execution module 230 is further configured to use the arrangement order of data elements in the first layout scheme as the first memory access order, and to perform the first memory access operation on the first memory data according to the first memory access order and at the first access granularity.
[0106] In some examples, the target memory data contains M × N data elements, and the target access granularity is defined as every N data elements within the M × N data elements, where M and N are positive integers. For example, the first execution module 220 is further configured to perform M sub-memory access operations on the target memory data based on the target starting address and target access granularity. In each sub-memory access operation, N data elements are accessed or stored.
[0107] For example, the first memory data contains Y data elements, and each of the Y data elements is defined as the first access granularity. Here, Y is a positive integer. For example, the first execution module 220 is further configured to sequentially access or store the Y data elements in the first memory data based on the first starting address and the first access granularity.
[0108] As described above, in the memory data access method shown in Figures 2 to 5, for example, details of the operation of the memory data access device 200 have already been introduced. Therefore, for brevity, redundant explanations will be omitted here, and relevant details can be found in the explanations of Figures 2 to 5 above.
[0109] Furthermore, each of the modules in the memory data access device 200 shown in Figure 6 may be configured as software, hardware, firmware, or any combination of the above items that perform a specific function. For example, these modules may correspond to dedicated integrated circuits, to pure software code, or to modules that combine software and hardware. As an example, the device described with reference to Figure 6 may be, but is not limited to, a PC computer, a tablet device, a personal digital assistant, a smartphone, a web application, or any other device capable of executing program instructions.
[0110] Furthermore, while the memory data access device 200 was described above by dividing it into modules for executing corresponding processes, as will be apparent to those skilled in the art, the processes executed by each module can be performed even when the device is not specifically divided into modules or when there are no clear boundaries between modules. Moreover, the memory data access device 200 described with reference to Figure 6 is not limited to including the modules described above, and other modules (e.g., read modules, control modules, etc.) can be added or combined as needed.
[0111] At least one embodiment of the present disclosure further provides an electronic device comprising a processor and memory, the memory comprising one or more computer program modules, the one or more computer program modules being stored in memory and configured to be executed by the processor, and the one or more computer program modules comprising implementing a memory data access method provided by the embodiments of the present disclosure described above.
[0112] Figure 7 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure. For example, as shown in Figure 7, the electronic device 300 includes a processor 310 and a memory 320. For example, the memory 320 is used to store non-temporary computer-readable instructions (e.g., one or more computer program modules). The processor 310 is used to execute the non-temporary computer-readable instructions, and when the non-temporary computer-readable instructions are executed by the processor 310, one or more steps of the memory data access method described above can be performed. The memory 320 and the processor 310 can be interconnected via a bus system and / or other forms of connection mechanisms (not shown).
[0113] For example, the processor 310 can be a central processing unit (CPU), a graphics processing unit (GPU), a general-purpose graphics processing unit (GPGPU), a digital signal processing unit (DSP), or other forms of processing units having performance prediction capabilities and / or program execution capabilities, such as a field-programmable gate array (FPGA). For example, the central processing unit (CPU) may be an X86, RISC-V, or ARM architecture. The processor 310 can be a general-purpose processor or a dedicated processor and can control other components in the electronic device 300 to perform desired functions.
[0114] For example, memory 320 may include any combination of one or more computer program products, and computer program products may include various types of computer-readable storage media, such as volatile memory and / or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and / or cache memory (cache). Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable optical disc read-only memory (CD-ROM), USB storage device, flash memory, etc. One or more computer program modules can be stored on the computer-readable storage media, and the processor 310 can execute one or more computer program modules to realize various functions of the electronic device 300. The computer-readable storage media may also store various application programs and various data, as well as various data used and / or generated by the application programs.
[0115] In the embodiments of this disclosure, the specific functions and technical effects of the electronic device 300 can be found by referring to the description of the memory data access method provided in at least one embodiment of this disclosure above, and will not be repeated here.
[0116] Figure 8 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure. For example, as shown in Figure 8, the electronic device 400 is suitable for implementing, for example, the memory data access method provided by the embodiments of this disclosure. Note that the electronic device 400 shown in Figure 8 is merely illustrative and does not impose any limitations on the functionality and scope of use of the embodiments of this disclosure.
[0117] For example, as shown in Figure 8, the electronic device 400 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 41, which performs various appropriate operations and processes according to a program stored in a read-only memory (ROM) 42 or a program loaded from a storage device 48 into a random access memory (RAM) 43. The RAM 43 further stores various programs and data necessary for the operation of the electronic device 400. The processing unit 41, ROM 42, and RAM 43 are connected to each other via a bus 44. An input / output (I / O) interface 45 is also connected to the bus 44. Typically, the following devices may be connected to the I / O interface 45: an input device 46 including, for example, a touchscreen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 47 including, for example, a liquid crystal display (LCD), speaker, vibrator, etc.; a storage device 48 including, for example, magnetic tape, hard disk, etc.; and a communication device 49. The communication device 49 enables the electronic device 400 to exchange data with other electronic devices via wireless or wired communication.
[0118] Figure 8 shows an electronic device 400 with various devices, but it should be understood that it is not required to implement or have all of the devices shown, and the electronic device 400 may implement or have more or fewer devices instead.
[0119] For a detailed description of the electronic device 400 and its technical effects, please refer to the relevant explanation regarding the memory data access method in the above text; it will not be repeated here.
[0120] Figure 9 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure. For example, as shown in Figure 9, the storage medium 500 stores non-temporary computer-readable instructions 510. For example, when a non-temporary computer-readable instruction 510 is executed by a computer, one or more steps in the memory data access method described above are performed.
[0121] For example, the storage medium 500 can be applied to the electronic device 300 shown in Figure 7. For example, the storage medium 500 may be the memory 320 in the electronic device 300. For example, for a related explanation regarding the storage medium 500, refer to the corresponding explanation of the memory 320 in the electronic device 300 shown in Figure 7, and will not be repeated here.
[0122] The following points require explanation regarding this disclosure: (1) In the accompanying drawings of the embodiments of the present disclosure, only structures relating to the embodiments of the present disclosure are shown, and for other structures, conventional designs may be referenced. (2) Where there is no inconsistency, features in the same embodiment and different embodiments of the present disclosure can be combined with each other.
[0123] The above describes only specific methods of implementing the Disclosure, but the scope of protection of the Disclosure is not limited thereto. Any changes or substitutions that a person skilled in the art could easily conceive within the scope of the technology disclosed herein should be included in the scope of protection of the Disclosure. Therefore, the scope of protection of the Disclosure should be based on the scope of protection of the claims. [Explanation of Symbols]
[0124] 200: Memory data access device 210: Offset Module 220: First executable module 230: Second executable module 240: Third Executable Module 300:Electronic equipment 310: Processor 320: Memory 400:Electronic equipment 41: Processing Unit 42: Read-only memory (ROM) 43: Random Access Memory (RAM) 44: Bus 45: Input / Output (I / O) Interface 46: Input device 47: Output device 48:Storage device 49: Communication equipment 500: Storage medium 510: Non-temporary computer-readable instruction S110, S120: Step
Claims
1. A memory data access method in which the first starting address of memory data is aligned according to a first access granularity, The aforementioned memory data access method is: The process involves performing an offset operation on the first starting address to obtain a target starting address, wherein the target starting address is aligned according to the target access granularity. Performing a target memory access operation on target memory data according to the target access granularity, wherein the target memory data is located after the target starting address in the memory data, and the operation is performed accordingly. including, Methods for accessing memory data.
2. The target access granularity is greater than the first access granularity. The memory data access method according to claim 1.
3. The aforementioned first starting address is stored in the first pointer, Performing the offset operation on the first starting address and obtaining the target starting address is, The process includes adding a target offset amount to the first pointer to obtain a target pointer, wherein the target starting address is stored in the target pointer. The memory data access method according to claim 1 or 2.
4. The first memory access operation is performed on the first memory data from the first starting address to the target starting address in the memory data, according to the first access granularity. A memory data access method according to any one of claims 1 to 3, further comprising:
5. The memory data further includes second memory data to be accessed or stored, and the second starting address of the second memory data is aligned according to the second access granularity. The aforementioned memory data access method is: Performing a second memory access operation on the second memory data according to the second access granularity. Further including, A method for accessing memory data according to any one of claims 1 to 4.
6. The second access granularity is the same as the first access granularity, or The second access granularity is smaller than the target access granularity and larger than the first access granularity. The memory data access method according to claim 5.
7. Multiple data elements in the memory data are arranged according to the first layout scheme. Performing the target memory access operation on the target memory data according to the target access granularity is: Based on the aforementioned target starting address and the aforementioned target access granularity, the arrangement order of data elements in the target layout scheme is set as the target memory access order, Performing the target memory access operation on the target memory data in accordance with the target memory access order and the target access granularity, including, A method for accessing memory data according to any one of claims 1 to 6.
8. Multiple data elements in the memory data are arranged according to the first layout scheme. Performing the first memory access operation on the first memory data according to the first access granularity is: The arrangement order of data elements in the first layout method is set to the first memory access order, Performing the first memory access operation on the first memory data in accordance with the first memory access sequence and the first access granularity, including, A method for accessing memory data according to any one of claims 4 to 7.
9. The aforementioned target memory data includes M × N data elements, Of the M × N data elements, each of the N data elements is defined as the target access granularity. M and N are positive integers, Performing the target memory access operation on the target memory data according to the target access granularity is: This includes performing M sub-memory access operations on the target memory data based on the target starting address and the target access granularity. In each of the M sub-memory access operations mentioned above, access or storage is performed for N data elements. A method for accessing memory data according to any one of claims 1 to 8.
10. The first memory data includes Y data elements, Of the Y data elements, each data element is defined as the first access granularity, and Y is a positive integer. Performing the first memory access operation on the first memory data according to the first access granularity is: Based on the first starting address and the first access granularity, sequentially access or store Y data elements in the first memory data. including, A method for accessing memory data according to any one of claims 4 to 9.
11. A memory data access device, The first starting address of the memory data is aligned according to the first access granularity. The memory data access device is An offset module configured to perform an offset operation on the first starting address to obtain a target starting address, The aforementioned target starting address is aligned according to the target access granularity. The offset module and, A first execution module is configured to perform a target memory access operation on the target memory data after the target starting address in the memory data, according to the target access granularity. A memory data access device equipped with the following features.
12. The first starting address is stored in the first pointer, The offset module is further configured to obtain a target pointer by adding a target offset amount to the first pointer. The target pointer stores the starting address of the target. The memory data access device according to claim 11.
13. Further including a second executable module, The second execution module is configured to perform a first memory access operation on the first memory data from the first starting address to the target starting address in the memory data, according to the first access granularity. The memory data access device according to claim 11 or 12.
14. The aforementioned memory data further includes a second memory data to be accessed or stored, The second starting address in the second memory data is aligned according to the second access granularity, The memory data access device further includes a third execution module, The third execution module is configured to perform a second memory access operation on the second memory data according to the second access granularity. A memory data access device according to any one of claims 11 to 13.
15. Processor and Memory containing one or more computer program modules, Includes, The one or more computer program modules are stored in the memory and configured to be executed by the processor. The one or more computer program modules described above are used to implement the memory data access method described in any one of claims 1 to 10. electronic equipment.
16. A storage medium that stores non-temporary computer-readable instructions, When the non-temporary computer-readable instruction is executed by the computer, the memory data access method according to any one of claims 1 to 10 is realized. storage medium.