Source code conversion program and source code conversion method

The source code conversion program addresses compiler optimization challenges by detecting array reference patterns and inserting temporary variable assignments, enhancing execution performance through reduced load instructions and pipeline stalls.

JP7883123B2Active Publication Date: 2026-07-01エフサステクノロジーズ株式会社

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
エフサステクノロジーズ株式会社
Filing Date
2022-08-25
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Compilers struggle to perform optimization on source code that involves array data with indices defined by variables, leading to potential RAW (Read After Write) cases and resulting in suboptimal execution performance due to difficulty in determining the identity of updated and referenced elements.

Method used

A source code conversion program detects specific code patterns involving array references and updates, inserting additional codes to ensure compiler optimizations can be performed by assigning elements to temporary variables, allowing the compiler to recognize the lack of dependency between updates and subsequent references.

Benefits of technology

Improves the execution performance of compiled programs by reducing unnecessary load instructions and pipeline stalls, enabling more efficient compiler optimizations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007883123000001
    Figure 0007883123000001
  • Figure 0007883123000002
    Figure 0007883123000002
  • Figure 0007883123000003
    Figure 0007883123000003
Patent Text Reader

Abstract

To improve execution performance of a program after compilation.SOLUTION: An information processor 10 detects, from a source code 13, a code 15 for referring to an element designated using a first index including a variable n in array data, a code 16 for updating an element designated using a second index in the array data after the code 15, and a code 17 for referring to the element designated using the first index in the array data after the code 16. The information processor 10 inserts, before the code 16, a code 18 for substituting the element designated using the first index in the array data into a variable var, and replaces the code 17 with a code 19 for referring to the variable var.SELECTED DRAWING: Figure 1
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a source code conversion program and a source code conversion method.

Background Art

[0002] A compiler generates object code described in a low-level language such as machine language from source code described in a high-level language such as the C language. At this time, the compiler may perform compiler optimization to optimize instructions so that the execution time is shortened within a range where the meaning of the processing defined in the source code does not change.

[0003] A typical compiler executes compiler optimization on intermediate code that is lower level than source code in order to define an optimization algorithm that does not depend on minute differences in the description of the source code. For example, the compiler performs lexical analysis and syntax analysis on the source code to generate intermediate code used inside the compiler. The compiler executes an optimization algorithm on the intermediate code to rewrite the intermediate code. The compiler converts the rewritten intermediate code into object code.

[0004] In addition, a compiler has been proposed that detects a partial program including instructions that match a specific pattern and corrects the dependency relationships of other instructions included in the detected partial program so as to match the pattern. Also, a compiler has been proposed that detects array references from intermediate code and converts memory accesses to buffer accesses for arrays that are referenced two or more times. Also, a design device has been proposed that analyzes the dependency relationships of multiple accesses to an array and replaces the array access with an access to a shift register.

Prior Art Documents

Patent Documents

[0005]

Patent Document 1

[0006] Source code sometimes handles array data, which consists of multiple elements. In source code, referencing and updating elements within array data may be described using the array name and an index indicating the element's position. Some source code might define a process that involves referencing an element in array data, then updating that element, and then referencing that element again. In this case, if the compiler can determine that the updated element and the element referenced a second time are not the same, it can perform compiler optimizations, such as reducing unnecessary load instructions.

[0007] However, indices are sometimes defined using variables. For example, an index may be defined as an expression containing a numeric variable. In such cases, it can be difficult for the compiler to determine the identity of the updated element and the element referenced a second time based solely on information at the intermediate code level. As a result, the compiler may determine that there is a dependency between the update and the subsequent reference, which falls under a RAW (Read After Write) case, and may abandon compiler optimization.

[0008] For example, intermediate code defines low-level processing such as calculating a specific index value as an offset from the value of a variable, adding the offset to the starting address of the array data to calculate the address of an element, and then loading the data from memory using that address. Therefore, at the intermediate code level, it can be difficult for the compiler to perform a comprehensive analysis of multiple array accesses using indices. As a result, the compiler may output a program with poor execution performance. Thus, in one aspect, the present invention aims to improve the execution performance of the compiled program. [Means for solving the problem]

[0009] In one embodiment, a source code conversion program is provided that causes a computer to perform the following operations: it detects from the source code a first code that references an element specified using a first index containing a first variable in the array data, a second code that updates an element specified using a second index different from the first index in the array data after the first code, and a third code that references an element specified using the first index in the array data after the second code. It then inserts a fourth code before the second code that assigns the element specified using the first index in the array data to the second variable, and replaces the third code with a fifth code that references the second variable.

[0010] In one embodiment, a method for converting source code to be executed by a computer is provided. [Effects of the Invention]

[0011] One aspect of this is that the execution performance of the compiled program improves. [Brief explanation of the drawing]

[0012] [Figure 1] This is a diagram illustrating the information processing device of the first embodiment. [Figure 2]It is a diagram showing a hardware example of the information processing apparatus according to the second embodiment. [Figure 3] It is a block diagram showing a structural example of the CPU. [Figure 4] It is a block diagram showing a functional example of the information processing apparatus. [Figure 5] It is a diagram showing an example of the original source code. [Figure 6] It is a diagram showing an example of the intermediate code. [Figure 7] It is a diagram showing an example of the schedule table. [Figure 8] It is a diagram showing an example of the source code after conversion. [Figure 9] It is a diagram showing an example of the optimized schedule table. [Figure 10] It is a diagram showing another example of the original source code. [Figure 11] It is a diagram showing an example of the array variable table. [Figure 12] It is a diagram showing another example of the source code after conversion. [Figure 13] It is a flowchart showing an example of the compilation procedure. [Figure 14] It is a flowchart (continuation 1) showing an example of the compilation procedure. [Figure 15] It is a flowchart (continuation 2) showing an example of the compilation procedure.

Mode for Carrying Out the Invention

[0013] Hereinafter, this embodiment will be described with reference to the drawings. [First Embodiment] The first embodiment will be described.

[0014] FIG. 1 is a diagram for explaining the information processing apparatus according to the first embodiment. In the first embodiment, the information processing device 10 transforms the source code 13 before compilation so that compiler optimization can be performed appropriately. The hardware or software that transforms the source code 13 may be called a preprocessor or precompiler. Compilation may be performed by the information processing device 10 or by another information processing device. The information processing device 10 may transform the source code 13 into source code 14 and input the source code 14 into a compiler. The information processing device 10 does not have to explicitly output the source code 14 and may proceed to intermediate code generation and compiler optimization after the following code transformation process. The information processing device 10 may be a client device or a server device. The information processing device 10 may be called a computer or a source code transformation device.

[0015] The information processing device 10 has a storage unit 11 and a processing unit 12. The storage unit 11 may be a volatile semiconductor memory such as RAM (Random Access Memory), or a non-volatile storage such as an HDD (Hard Disk Drive) or flash memory. The processing unit 12 is a processor such as a CPU (Central Processing Unit), GPU (Graphics Processing Unit), or DSP (Digital Signal Processor). However, the processing unit 12 may also include electronic circuits such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). The processor executes a program stored in memory such as RAM (which may also be the storage unit 11). The collection of processors may be called a multiprocessor or simply a "processor".

[0016] The memory unit 11 stores the source code 13. The source code 13 is a program written in a high-level language such as C. The source code 13 defines a process that includes referencing and updating array data consisting of multiple elements. An element may also be called a record, and multiple elements may be data of the same data type.

[0017] Source code 13 includes codes 15, 16, and 17. Code 16 is executed after code 15, and code 17 is executed after code 16. Codes 15, 16, and 17 may also be called instructions, subprograms, strings, statements, or expressions. Code 15 refers to an element in array data specified by a first index containing the variable n. The variable n is a numeric variable, such as an integer variable. Code 15 includes, for example, an array name A and an index expression containing the variable n (e.g., n+1). The array name corresponds to, for example, a pointer that points to the starting address of the array data. The index corresponds to, for example, an offset that indicates the relative position from the beginning of the array data. A reference may also be called a read. An element reference is, for example, written on the right-hand side of an equals sign.

[0018] Code 16 updates an element within the same array data as Code 15, but using a second index that differs from that used in Code 15. The second index may or may not include the variable n. Code 16, for example, includes the array name A and an index expression containing the variable n (e.g., n+0). The update may also be called a write. The element update is, for example, written on the left-hand side of an equals sign.

[0019] Code 17 refers to an element specified using the same first index as Code 15, within the same array data as Codes 15 and 16. Code 17 includes, for example, the array name A and an index expression containing the variable n (e.g., n+1). It is preferable that the value of the variable n is not updated between Code 15 and Code 17. It is also preferable that no updates using the first index are performed between Code 15 and Code 17.

[0020] The processing unit 12 analyzes and rewrites the source code 13. The processing unit 12 may perform syntactic analysis on the source code 13 to generate an abstract syntax tree (AST), and may perform the following detection and rewriting processes on the abstract syntax tree. The processing unit 12 may also generate source code 14 from the source code 13. The processing unit 12 may also generate source code 14 from the rewritten abstract syntax tree. The generated source code 14 is stored, for example, in the storage unit 11.

[0021] The processing unit 12 detects codes 15, 16, and 17 from the source code 13 that satisfy the above conditions. Then, the processing unit 12 inserts code 18 so that its execution order precedes code 16. The processing unit 12 may also insert code 18 so that its execution order precedes code 15. Code 18 assigns the element specified using the same first index as codes 15, 17, and within the same array data as codes 15, 16, and 17, to the variable var. For example, the variable var is written on the left side of the equals sign, and the specified element is written on the right side of the equals sign. The variable var is, for example, a new temporary variable that does not appear in source code 13. The data type of the variable var is, for example, the same as the data type of each element of the array data.

[0022] Furthermore, the processing unit 12 replaces the code 17 containing the first index with code 19 that references the variable var. For example, the variable var is written on the right-hand side of the equals sign. The processing unit 12 may also further replace the code 15 containing the first index with code that references the variable var. This converts source code 13 to source code 14.

[0023] Source code 14 includes codes 16, 18, and 19. Source code 14 also includes code 15 or code converted from code 15. Intermediate code generation and compiler optimization are performed on source code 14 instead of source code 13. Processing unit 12 may output source code 14. Processing unit 12 may display source code 14 on a display device or transmit it to another information processing device.

[0024] As described above, the information processing device 10 of the first embodiment detects from the source code 13 a code 15 that references an element specified by a first index that includes the variable n. The information processing device 10 also detects from the source code 13 a code 16 that updates an element specified by a second index and a code 17 that references an element specified by a first index. The information processing device 10 inserts at least a code 18 before code 16 that assigns the element specified using the first index to the variable var, and replaces code 17 with a code 19 that references the variable var.

[0025] Intermediate code defines lower-level processing than source code and may contain less information about the identity of indices specifying elements in array data than the source code. Furthermore, there is a limit to the range of code that can be optimized at one time. Therefore, in compiler optimization of intermediate code, it can be difficult for the compiler to accurately determine the dependency between references and updates by comprehensively analyzing multiple array accesses.

[0026] In this regard, when compiling source code 13, the compiler may find it difficult to determine from intermediate code level information alone that the element updated in code 16 and the element referenced in code 17 are not identical, because the specified element depends on the value of variable n. For this reason, the compiler may determine that there is a possibility of a RAW case where there is a dependency between the update and the subsequent reference, and therefore may abandon compiler optimization because it could change the meaning of the process defined in source code 13. The absence of a dependency between the update and the subsequent reference can also be called uncorrelated.

[0027] As a result, the compiler may output object code that reloads the same data from memory into registers in code 17 as it did in code 15, or object code that does not parallelize code 16 and code 17. Therefore, the compiler may output object code that does not have high execution performance.

[0028] In contrast, when compiling source code 14, the compiler can easily verify, even from only the information at the intermediate code level, that the value assigned to the variable var in code 18 is the same as the value of the variable var referenced in code 19. Furthermore, the elements of the array data updated in code 16 and the values ​​of the variable var referenced in code 19 are clearly different data. Therefore, the compiler does not need to consider the possibility of a RAW case and can perform compiler optimizations.

[0029] As a result, the compiler can output object code with fewer load instructions than source code 13, object code with less waiting time due to pipeline installation, and object code that parallelizes code 16 and code 17. Therefore, the compiler outputs object code with high execution performance.

[0030] Furthermore, the information processing device 10 may insert code 18 before code 15, or replace code 15 with code that references the variable var. This allows the compiler to output object code with fewer load instructions. The information processing device 10 may also generate source code 14 converted from source code 13, or compile source code 14 using a compiler. This allows for the smooth generation of object code corresponding to source code 13 using an existing compiler.

[0031] [Second Embodiment] Next, a second embodiment will be described. The information processing device 100 of the second embodiment compiles source code written in a high-level language such as C language to generate machine-readable executable code. However, the preprocessor, compiler, and linker, which will be described later, may be executed by different information processing devices. The information processing device 100 may be a client device or a server device. The information processing device 10 may also be called a computer or a compilation device. Note that the information processing device 100 corresponds to the information processing device 10 of the first embodiment.

[0032] Figure 2 shows an example of the hardware of the information processing device according to the second embodiment. The information processing device 100 includes a CPU 101, RAM 102, HDD 103, GPU 104, input interface 105, media reader 106, and communication interface 107 connected to a bus. The CPU 101 corresponds to the processing unit 12 of the first embodiment. The RAM 102 or HDD 103 corresponds to the storage unit 11 of the first embodiment.

[0033] The CPU 101 is a processor that executes program instructions. The CPU 101 loads the program and data stored in the HDD 103 into the RAM 102 and executes the program. The information processing device 100 may have multiple processors.

[0034] RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the CPU 101 and data used for calculations by the CPU 101. The information processing device 100 may have a volatile memory of a type other than RAM. RAM 102 may be inserted into a RAM interface connected to a bus. Alternatively, a DMA (Direct Memory Access) controller connected to the bus may directly transfer data between RAM 102 and peripheral devices without going through the CPU 101.

[0035] The HDD 103 is a non-volatile storage device that stores software programs such as the operating system (OS), middleware, and application software, as well as data. The information processing device 100 may have other types of non-volatile storage, such as flash memory or an SSD (Solid State Drive).

[0036] The GPU 104 works in conjunction with the CPU 101 to perform image processing and outputs the image to a display device 111 connected to the information processing device 100. The display device 111 is, for example, a CRT (Cathode Ray Tube) display, a liquid crystal display, an organic EL (Electro Luminescence) display, or a projector. Other types of output devices, such as a printer, may be connected to the information processing device 100. The GPU 104 may also be used as a GPGPU (General Purpose Computing on Graphics Processing Unit). The GPU 104 can execute programs in response to instructions from the CPU 101. The information processing device 100 may have volatile semiconductor memory other than RAM 102 as GPU memory.

[0037] The input interface 105 receives input signals from an input device 112 connected to the information processing device 100. The input device 112 is, for example, a mouse, a touch panel, or a keyboard. Multiple input devices may be connected to the information processing device 100.

[0038] The media reader 106 is a reading device that reads programs and data recorded on the recording medium 113. The recording medium 113 is, for example, a magnetic disk, an optical disk, or semiconductor memory. Magnetic disks include flexible disks (FD) and HDDs. Optical disks include CDs (Compact Discs) and DVDs (Digital Versatile Discs). The media reader 106 copies the programs and data read from the recording medium 113 to other recording media such as RAM 102 or HDD 103. The read programs may be executed by the CPU 101.

[0039] The recording medium 113 may be a portable recording medium. The recording medium 113 may be used for distributing programs and data. The recording medium 113 and HDD 103 may also be referred to as computer-readable recording media.

[0040] The communication interface 107 communicates with other information processing devices via the network 114. The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or router, or a wireless communication interface connected to a wireless communication device such as a base station or access point.

[0041] Figure 3 is a block diagram showing an example of a CPU structure. The CPU targeted by the compiler, that is, the CPU that executes the executable code generated by the information processing device 100, has CPU cores 121 and 122 and L2 cache memory 123. The target CPU may also be CPU 101 of the information processing device 100.

[0042] CPU core 121 has multiple load / store units, including load / store units 124 and 125; multiple integer units, including integer unit 126; multiple floating-point units, including floating-point unit 127; and L1 cache memory 128. CPU core 122 has hardware similar to CPU core 121. The target CPU may have three or more CPU cores.

[0043] CPU cores 121 and 122 execute machine language instructions in parallel. Load and store units 124 and 125 are arithmetic circuits that execute load instructions to read data from RAM into registers and store instructions to write data from registers to RAM. Load and store units 124 and 125 can execute instructions in parallel with each other. In the following description, load and store unit 124 may be referred to as LSU (Load Store Unit) 0, and load and store unit 125 may be referred to as LSU 1. Execution of a load instruction requires 3 cycles, and execution of a store instruction requires 1 cycle.

[0044] The integer unit 126 is an arithmetic circuit that executes integer arithmetic instructions, such as addition and subtraction instructions, on integer data. The integer unit 126 can execute instructions in parallel with the load / store units 124 and 125. In the following description, the integer unit 126 may be referred to as the ALU (Arithmetic and Logic Unit). Executing an integer arithmetic instruction requires one cycle.

[0045] The floating-point unit 127 is an arithmetic circuit that executes floating-point arithmetic instructions, such as addition and subtraction instructions, on floating-point data. The floating-point unit 127 can execute instructions in parallel with the load / store units 124 and 125 and the integer unit 126. The floating-point unit 127 is sometimes called the FPU (Floating Point Unit). Executing a floating-point arithmetic instruction requires 3 cycles.

[0046] The CPU core 121 may have an instruction pipeline. The instruction pipeline includes multiple stages such as instruction fetch, instruction decode, execution, memory access, and writeback. Each instruction progresses through these multiple stages in a specific order. Circuits in different stages can process different instructions in parallel. When a circuit in one stage is processing an instruction, a circuit in the previous stage can process the next instruction.

[0047] However, instructions with dependencies cannot be fed into the instruction pipeline consecutively, which can lead to a pipeline hazard where some stages of the instruction pipeline become idle. Pipeline hazards are sometimes called stalls. If stalls occur frequently, the execution efficiency of the executable code decreases. Another type of dependency between instructions is data dependency, where the result of an operation performed by one instruction is used by the next instruction. Pipeline hazards caused by data dependencies are sometimes called data hazards.

[0048] The L1 cache memory 128 is a volatile memory used by multiple arithmetic circuits, including load / store units 124 and 125, integer unit 126, and floating-point unit 127. The L1 cache memory 128 is the level 1 cache memory closest to the arithmetic circuits. The L1 cache memory 128 reads instructions and data requested by the arithmetic circuits from the L2 cache memory 123 and stores them temporarily.

[0049] The L2 cache memory 123 is volatile memory used by the CPU cores 121 and 122. The L2 cache memory 123 is a level 2 cache memory that is further from the arithmetic circuit than the L1 cache memory 128. However, the cache memory equivalent to the L2 cache memory 123 is sometimes called the L3 cache memory or LLC (Last Level Cache). The L2 cache memory 123 reads instructions and data requested by the CPU cores 121 and 122 from RAM and stores them temporarily.

[0050] Figure 4 is a block diagram showing an example of the functions of an information processing device. The information processing device 100 includes source code storage units 131 and 132, an executable code storage unit 133, a preprocessor 134, a compiler 137, and a linker 138. The source code storage units 131 and 132 and the executable code storage unit 133 are implemented using, for example, RAM 102 or HDD 103. The preprocessor 134, compiler 137, and linker 138 are implemented using, for example, a CPU 101 and a program.

[0051] The source code storage unit 131 stores the original source code created by the user. The source code is written in, for example, the C language. The source code storage unit 132 stores the source code converted by the preprocessor 134. The converted source code is written in the same programming language as the original source code. The executable code storage unit 133 stores executable code that can be executed on the target CPU. The executable code is written in, for example, machine code. However, if the executable code is executed via middleware, the executable code may be written in a language higher than machine code.

[0052] The preprocessor 134 transforms the source code into an expression suitable for compiler optimization, to the extent that the meaning of the processes defined in the source code does not change, before compiling the source code. The preprocessor 134 is sometimes called a precompiler. The preprocessor 134 has an analysis unit 135 and a rewriting unit 136.

[0053] The analysis unit 135 performs lexical and syntactic analysis on the original source code stored in the source code storage unit 131 and generates an abstract syntax tree. The analysis unit 135 analyzes the abstract syntax tree and detects the rewrite range that satisfies certain conditions. However, the analysis unit 135 may also directly analyze the source code without generating an abstract syntax tree.

[0054] The rewriting unit 136 applies certain rewriting rules to the rewriting range detected by the analysis unit 135 and rewrites at least a portion of the abstract syntax tree. The rewriting unit 136 converts the rewritten abstract syntax tree into source code and stores the converted source code in the source code storage unit 132. However, the rewriting unit 136 may directly rewrite the source code without rewriting the abstract syntax tree. The preprocessor 134 may display the converted source code on the display device 111 or transmit it to another information processing device.

[0055] The compiler 137 compiles the converted source code stored in the source code storage unit 132. The compiler 137 performs lexical analysis, syntactic analysis, and semantic analysis on the source code to generate intermediate code. As compiler optimization, the compiler 137 applies an optimization algorithm to the intermediate code to rewrite it. The compiler 137 converts the intermediate code into object code and outputs it. The object code is written in machine code, for example.

[0056] The linker 138 links the object code output by the compiler 137 with the object code and library programs of other modules to generate executable code. The linker 138 stores the generated executable code in the executable code storage unit 133.

[0057] Next, we will discuss compiler optimizations related to array access. Figure 5 shows an example of the original source code. Source code 141 is stored in source code storage unit 131. Source code 141 contains the function ex1. Function ex1 accepts two arguments, represented by variables n and A. Variable n is an integer used as an index. Variable A is a pointer that indicates the starting address of a character array. Variable A corresponds to the array name.

[0058] A pair of array name and index represents an array access operation, where the element specified by the index is accessed from among the multiple elements contained in the array. Array access is equivalent to calculating the element address by adding the offset indicated by the index containing variable n to the starting address indicated by variable A, and then accessing the data pointed to by the element address. The array access on the left side of the equals sign represents a write operation, where an element is updated. The array access on the right side of the equals sign represents a read operation, where an element is referenced.

[0059] The third line of source code 141 specifies the process of reading the (n+1)th element and the (n-1)th element of array A, and writing the sum of the two elements to the (n+0)th element of array A. The fourth line of source code 141 specifies the process of reading the (n+1)th element and the (n-1)th element of array A, and writing the sum of the two elements to the (n+1)th element of array A.

[0060] Figure 6 shows an example of intermediate code. When compiler 137 compiles source code 141 as is, compiler 137 generates intermediate code 142. In intermediate code 142, array access in the third and fourth lines is defined as low-level operation, as shown in code 142a.

[0061] In source code 141, an expression containing the variable n is used as the index for array access. Therefore, intermediate code 142 specifies that the index value is calculated as an offset from the value of variable n, the element address is calculated by adding the offset to the starting address of array A, and the memory is accessed using the element address. Because array access is expressed as low-level register operations and memory access, intermediate code 142 may contain less information about array access than source code 141.

[0062] Looking at source code 141, the right-hand side of the third line specifies the reading of elements A[n+1] and A[n-1]. The left-hand side of the third line specifies the writing of element A[n+0]. The right-hand side of the fourth line specifies the reading of elements A[n+1] and A[n-1]. Between the two readings of elements A[n+1] and A[n-1], no writing to elements A[n+1] and A[n-1] occurs, and the value of variable n is not updated. Also, the writing of element A[n+0] that occurs between the two readings does not affect the values ​​of elements A[n+1] and A[n-1]. Therefore, the values ​​read in the two readings are the same.

[0063] Therefore, it seems that compiler 137 can save the elements A[n+1] and A[n-1] read in the third line and generate object code that omits reading the elements A[n+1] and A[n-1] in the fourth line. However, unlike source code 141, intermediate code 142 lacks index information expressed as an expression containing the variable n. Furthermore, compiler optimization executes the optimization algorithm in units of instruction sets contained within a window size of a fixed width.

[0064] Therefore, compiler 137, which performs compiler optimization at the intermediate code level, finds it difficult to analyze and optimize multiple array accesses in a broad sense, as described above. Compiler 137 cannot definitively determine that the element written on the left side of the third line and the element read on the right side of the fourth line are not identical, and therefore judges that they may be RAW. As a result, compiler 137 may abandon compiler optimization in order to maintain the meaning of the process.

[0065] Figure 7 shows an example of a schedule table. If source code 141 is compiled as is, compiler 137 may generate object code as shown in schedule table 143. In schedule table 143, w0, w2, w4, and w5 are 32-bit registers, and x1, x2, and x3 are 64-bit registers. At the time of the function ex1 call, the pointer to array A is stored in register x1, and the value of variable n is stored in register w0. sxtw is an instruction to convert the number of bits. ldrb is an 8-bit load instruction. strb is an 8-bit store instruction. The instructions ldrb and strb specify the memory address by [base address, offset].

[0066] In the first cycle, the ALU performs a bitwise conversion of the value of variable n. In the second cycle, the ALU calculates n+1. In the third cycle, the ALU calculates n-1. In the fourth cycle, LSU0 reads A[n+1] from memory, and LSU1 reads A[n-1] from memory. The fifth and sixth cycles are spent waiting for the completion of the load instructions for LSU0 and LSU1, and correspond to a stall.

[0067] In the 7th cycle, the ALU calculates A[n+1] + A[n-1]. In the 8th cycle, LSU0 reads A[n+1] from memory, and LSU1 writes A[n+0] = A[n+1] + A[n-1] to memory. In the 9th cycle, LSU1 reads A[n-1] from memory. Cycles 10 and 11 are spent waiting for the load instructions of LSU0 and LSU1 to complete, and correspond to a stall.

[0068] In the 12th cycle, the ALU calculates A[n+1]+A[n-1]. In the 13th cycle, LSU1 writes A[n+1]=A[n+1]+A[n-1] to memory. In the 14th cycle, the ALU returns to the caller of function ex1.

[0069] Thus, the compiler 137 is unable to determine that the writing of A[n+0] and the subsequent reading of A[n+1] and A[n-1] are uncorrelated, and for safety reasons, reads A[n+1] and A[n-1] again. As a result, the number of load instructions increases, and stalls increase. Therefore, the preprocessor 134 transforms the source code 141 before compilation so that the compiler 137 does not have to consider the possibility of RAW.

[0070] Figure 8 shows an example of the converted source code. Source code 144 is converted from source code 141 and stored in source code storage unit 132. The third line of source code 144 declares two character variables, temp_1 and temp_2, which have the same data type as the elements of array A. Variables temp_1 and temp_2 are new temporary variables that are not included in source code 141.

[0071] Line 4 of source code 144 specifies the process of reading the (n+1)th element of array A and assigning it to the variable temp_1. Line 5 of source code 144 specifies the process of reading the (n-1)th element of array A and assigning it to the variable temp_2. Line 6 of source code 144 specifies the process of writing the sum of the values ​​of variables temp_1 and temp_2 to the (n+0)th element of array A. Line 7 of source code 144 specifies the process of writing the sum of the values ​​of variables temp_1 and temp_2 to the (n+1)th element of array A.

[0072] In source code 144, array accesses that required consideration of whether or not they were RAW have been eliminated. As a result, compiler 137 can perform compiler optimizations that reduce stalls by omitting the second read of the same element.

[0073] Figure 9 shows an example of an optimized schedule table. When source code 144 is compiled, compiler 137 may generate object code as shown in schedule table 145. In the first cycle, the ALU performs a bitwise conversion of the value of variable n. In the second cycle, the ALU calculates n+1. In the third cycle, the ALU calculates the address of A[n+0].

[0074] In the fourth cycle, LSU0 reads A[n+1] from memory, and LSU1 reads A[n-1] from memory. The fifth and sixth cycles are spent waiting for the load instructions for LSU0 and LSU1 to complete, and correspond to a stall. In the seventh cycle, the ALU calculates A[n+1] + A[n-1]. In the eighth cycle, the ALU extracts the lower 8 bits of A[n+1] + A[n-1].

[0075] In the ninth cycle, LSU0 writes A[n+1]=A[n+1]+A[n-1] to memory, and LSU1 writes A[n+0]=A[n+1]+A[n-1] to memory. In the tenth cycle, the ALU returns to the caller of function ex1. Thus, compiler 137 generates object code from source code 144 that is 4 cycles shorter than source code 141. Also, stalls are reduced by 2 cycles.

[0076] Next, we will explain how to convert the source code of the preprocessor 134. The preprocessor 134 extracts array accesses from the source code using pairs of array names and indices. The preprocessor 134 distinguishes between reading and writing array elements, recording reads in a read list and writes in a write list. At this time, the preprocessor 134 records the read and write locations in the source code, classifying each pair of array names and indices. The preprocessor 134 also records updates to the values ​​of variables included in the index in the write list. Variables included in the index may be referred to as index variables below.

[0077] The preprocessor 134 determines one or more rewrite range candidates in the source code for each array name and index pair included in the read list. The beginning of a rewrite range candidate is the first read position. The end of a rewrite range candidate is the last read position. However, if there is a write or update of the index variable with the same index between the first and last read positions, the end of the rewrite range candidate is the next write position. If the end of a rewrite range candidate is not the last read position, the beginning of the next rewrite range candidate is the read position following the write position that became the end.

[0078] The preprocessor 134 selects as the rewrite range candidate from the rewrite range candidates determined above that have multiple reads and writes to the same array at different indices between reads. The rewrite range is a code range that does not fall under RAW, where there is a dependency between the write and the subsequent read, but which the compiler 137 may mistakenly judge to be RAW.

[0079] The preprocessor 134 rewrites the source code for each rewrite range. Immediately before the rewrite range, the preprocessor 134 inserts a declaration statement to declare a new temporary variable and an assignment statement to assign array elements that are read multiple times to the temporary variable. The preprocessor 134 replaces the reads of array elements within the rewrite range with references to the temporary variable. As a result, the preprocessor 134 outputs the transformed source code.

[0080] Figure 10 shows another example of the original source code. Here, the source code conversion method will be explained using source code 146. Source code 146 is the original source code stored in the source code storage unit 131. Lines 3 and 4 of source code 146 are the same as those of source code 141.

[0081] Line 6 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array B, and writing the sum of the two elements to the (n+0)th element of array B. Line 7 of source code 146 specifies the process of reading the (n+2)th element and the (n-1)th element of array B, and writing the sum of the two elements to the (n+1)th element of array B.

[0082] Line 9 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array C, and writing the sum of the two elements to the (n+0)th element of array C. Line 10 of source code 146 specifies the process of writing a constant to the (n+1)th element of array C. Line 11 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array C, and writing the sum of the two elements to the (n+1)th element of array C.

[0083] Line 13 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array D, and writing the sum of the two elements to the (n+0)th element of array D. Line 14 of source code 146 specifies the process of updating the value of variable n. Line 15 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array D, and writing the sum of the two elements to the (n+1)th element of array D.

[0084] Line 17 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array E, and writing the sum of the two elements to the (n+0)th element of array D. Line 18 of source code 146 specifies the process of reading the (n+1)th element and the (n-1)th element of array E, and writing the sum of the two elements to the (n+1)th element of array D.

[0085] Figure 11 shows an example of an array variable table. The preprocessor 134 generates an array access table 147 by analyzing the source code 146. The array access table 147 combines the functions of the read list and write list described above. The array access table 147 includes entries for array elements, read positions, write positions, and rewrite flags.

[0086] Array elements are represented by a pair of array name and index. The read position is the line number of the line in the source code where the array name and index pair appears on the right-hand side of the equals sign. The write position is the line number of the line in the source code where the array name and index pair appears on the left-hand side of the equals sign. The rewrite flag indicates whether or not the rewrite range has been obtained.

[0087] The array elements read in source code 146 are A[n+1], A[n-1], B[n+1], B[n-1], B[n+2], C[n+1], C[n-1], D[n+1], D[n-1], E[n+1], E[n-1].

[0088] The candidate rewrite range for A[n+1] is from the right-hand side of the third line to the right-hand side of the fourth line. This candidate rewrite range is valid because the write of A[n+0] occurs between the two reads of A[n+1]. The candidate rewrite range for A[n-1] is from the right-hand side of the third line to the right-hand side of the fourth line. This candidate rewrite range is valid because the write of A[n+0] occurs between the two reads of A[n-1].

[0089] The only possible rewrite range for B[n+1] is the right-hand side of the 6th line. This rewrite range does not include more than one read operation, so it does not qualify as a rewrite range. The only possible rewrite range for B[n-1] is from the right-hand side of the 6th line to the right-hand side of the 7th line. This rewrite range is valid because the write operation for B[n+0] occurs between the two read operations for B[n-1]. The only possible rewrite range for B[n+2] is the right-hand side of the 7th line. This rewrite range does not include more than one read operation, so it does not qualify as a rewrite range.

[0090] The possible rewrite ranges for C[n+1] are the range from the right-hand side of line 9 to the left-hand side of line 10, and the range of only the right-hand side of line 11. Neither range contains more than two read operations, so they do not qualify as rewrite ranges. The possible rewrite range for C[n-1] is from the right-hand side of line 9 to the right-hand side of line 11. This candidate rewrite range qualifies as a rewrite range because there are write operations on C[n+0] and C[n+1] between the two read operations on C[n-1].

[0091] The possible rewrite ranges for D[n+1] are the range from the right-hand side of the 13th line to the 14th line, and the range of only the right-hand side of the 15th line. Neither range contains more than one read operation, so they do not qualify as rewrite ranges. The possible rewrite ranges for D[n-1] are the range from the right-hand side of the 13th line to the 14th line, and the range of only the right-hand side of the 15th line. Neither range contains more than one read operation, so they do not qualify as rewrite ranges.

[0092] The candidate rewrite range for E[n+1] is from the right-hand side of the 17th line to the right-hand side of the 18th line. This candidate rewrite range does not apply because there is no writing to array E between the two reads of E[n+1]. The candidate rewrite range for E[n-1] is from the right-hand side of the 17th line to the right-hand side of the 18th line. This candidate rewrite range does not apply because there is no writing to array E between the two reads of E[n-1]. Therefore, the array elements that will be replaced by temporary variables are A[n+1], A[n-1], B[n-1], and C[n-1].

[0093] Figure 12 shows another example of the converted source code. The preprocessor 134 converts source code 146 into source code 148. Source code 148 is stored in source code storage unit 132. The third line of source code 148 declares the variables temp_1, temp_2, temp_3, and temp_4.

[0094] Line 4 of source code 148 specifies the process of reading the (n+1)th element of array A and assigning it to the variable temp_1. Line 5 of source code 148 specifies the process of reading the (n-1)th element of array A and assigning it to the variable temp_2. Line 6 of source code 148 specifies the process of writing the sum of the values ​​of variables temp_1 and temp_2 to the (n+0)th element of array A. Line 7 of source code 148 specifies the process of writing the sum of the values ​​of variables temp_1 and temp_2 to the (n+1)th element of array A.

[0095] Line 9 of source code 148 specifies the process of reading the (n-1)th element of array B and assigning it to the variable temp_3. Line 10 of source code 148 specifies the process of reading the (n+1)th element of array B, adding the value of variable temp_3 to it, and writing it to the (n+0)th element of array B. Line 11 of source code 148 specifies the process of reading the (n+2)th element of array B, adding the value of variable temp_3 to it, and writing it to the (n+1)th element of array B.

[0096] Line 13 of source code 148 specifies the process of reading the (n-1)th element of array C and assigning it to the variable temp_4. Line 14 of source code 148 specifies the process of reading the (n+1)th element of array C, adding the value of variable temp_4 to it, and writing it to the (n+0)th element of array C. Line 16 of source code 148 specifies the process of reading the (n+1)th element of array C, adding the value of variable temp_4 to it, and writing it to the (n+1)th element of array C.

[0097] Next, the processing procedure of the information processing device 100 will be described. Figure 13 is a flowchart showing an example of the compilation procedure. (S10) The parsing unit 135 performs syntactic analysis on the source code.

[0098] (S11) The analysis unit 135 determines whether the source code contains the following code block. A code block is a set of code segments separated based on control structures such as function definitions, if statements, while statements, and for statements. If the following code block exists, the process proceeds to step S12; otherwise, the process proceeds to step S28.

[0099] (S12) The analysis unit 135 reads one line of code contained in the code block. (S13) The analysis unit 135 determines whether the read code includes array access or updating of an index variable. If it includes array access or updating of an index variable, the process proceeds to step S14; otherwise, the process proceeds to step S17.

[0100] (S14) The analysis unit 135 determines whether the read code includes reading an array element. If it includes reading an array element, the process proceeds to step S15; if it includes writing an array element or updating an index variable, the process proceeds to step S16.

[0101] (S15) The analysis unit 135 associates the row number of the read code with the sequence name and index pair and records it in the read list. Then the process proceeds to step S17. (S16) The analysis unit 135 associates the line number of the read code with the pair of array name and index and records it in the write list. In the case of an update of an index variable, the analysis unit 135 identifies and records the array element that uses that index variable.

[0102] (S17) The analysis unit 135 determines whether the code block contains the next line. If there is a next line, the process returns to step S12; otherwise, the process proceeds to step S18. Figure 14 is a flowchart (continued 1) showing an example of the compilation procedure.

[0103] (S18) The analysis unit 135 selects one pair of sequence name and index from the read list generated through step S15. (S19) The analysis unit 135 detects the first read position for the selected sequence name and index pair. The read position is the row number recorded in the read list.

[0104] (S20) The analysis unit 135 detects the last read position for the selected sequence name and index pair. However, if there is one or more write positions before the last read position, the analysis unit 135 detects the write position following the first read position. The write position is the row number recorded in the write list.

[0105] (S21) The analysis unit 135 determines that the area from the position in step S19 to the position in step S20 is a candidate for rewriting. The analysis unit 135 determines whether there are multiple reads of array elements within the candidate for rewriting. If there are multiple reads, the process proceeds to step S22; if there are no multiple reads, the process proceeds to step S24.

[0106] (S22) The analysis unit 135 determines whether there have been any writes of array elements with the same array name but different indices during the multiple read operations. If there are any such writes, the process proceeds to step S23; otherwise, the process proceeds to step S24.

[0107] (S23) The analysis unit 135 adopts the rewrite range candidates determined in step S21 as the rewrite range and records them in association with the sequence name and index pair. (S24) The analysis unit 135 determines whether the following sequence name and index pair exists in the read list. If the following sequence name and index pair exists, the process returns to step S18; otherwise, the process proceeds to step S25.

[0108] (S25) The rewriting unit 136 inserts a declaration statement to declare a temporary variable for the array name and index pair that has been selected for rewriting. (S26) The rewriting unit 136 inserts an assignment statement immediately before each rewriting range, which reads an array element and assigns it to a temporary variable.

[0109] (S27) The rewrite unit 136 replaces the reading of array elements within the rewrite range with a reference to a temporary variable for each rewrite range. Figure 15 is a flowchart (part 2) showing an example of the compilation procedure.

[0110] (S28) The rewriting unit 136 outputs the converted source code. (S29) The compiler 137 compiles the converted source code. At this time, the compiler 137 generates intermediate code from the source code, performs compiler optimization on the intermediate code, and generates object code from the optimized intermediate code.

[0111] (S30) The linker 138 links the object code output by the compiler 137 with other object code and library programs to generate executable code. (S31) Linker 138 outputs executable code.

[0112] As described above, the information processing device 100 of the second embodiment compiles source code written in a high-level language to generate machine-readable executable code. At this time, the information processing device 100 performs compiler optimization on the intermediate code generated from the source code. This may reduce redundant instructions, and instruction scheduling such as instruction parallelization and instruction execution order changes may be performed to reduce stalls. As a result, the execution efficiency of the program is improved and the execution time is shortened.

[0113] Furthermore, the information processing unit 100 detects, at the source code level, any code that the compiler might mistakenly identify as RAW when it comes to array access using array names and indices. The information processing unit 100 rewrites the source code using temporary variables so that read operations on the same array after writing are not represented by array names and indices. The information processing unit 100 then compiles the rewritten source code. This allows for proper compiler optimization of the intermediate code, improving program execution efficiency and reducing execution time. [Explanation of symbols]

[0114] 10 Information Processing Devices 11 Storage section 12 Processing Units 13,14 Source code Codes 15, 16, 17, 18, 19

Claims

1. From the source code, detect a first code that references an element specified using a first index containing a first variable within the array data, a second code that updates an element specified using a second index different from the first index within the array data after the first code, and a third code that references an element specified using the first index within the array data after the second code. A fourth code is inserted before the second code, which assigns the element specified using the first index in the array data to the second variable, and the third code is replaced with a fifth code that references the second variable. A source code conversion program that allows a computer to execute a process.

2. The fourth code is inserted before the first code, and the substitution of the third code further replaces the first code with a sixth code that references the second variable. The source code conversion program according to claim 1.

3. The computer is further instructed to generate other source code that is converted from the source code by the insertion of the fourth code and the replacement of the third code, and to compile the other source code using a compiler. The source code conversion program according to claim 1.

4. From the source code, detect a first code that references an element specified using a first index containing a first variable within the array data, a second code that updates an element specified using a second index different from the first index within the array data after the first code, and a third code that references an element specified using the first index within the array data after the second code. A fourth code is inserted before the second code, which assigns the element specified using the first index in the array data to the second variable, and the third code is replaced with a fifth code that references the second variable. A method of converting source code so that a computer can execute the processing.