Instruction processing method and apparatus, processor, electronic device, and storage medium
By identifying and backfilling branch instructions (excluding the end-branch instruction) in the loop body instructions and using a branch predictor to obtain prediction information, the problem of low instruction fetch efficiency in traditional processors when processing multi-branch loop body instructions is solved, achieving more efficient instruction fetching and reduced power consumption.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING ESWIN COMPUTING TECH CO LTD
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional processors suffer from low instruction fetch efficiency when processing loop instructions, especially when there are branch instructions other than the loop termination branch instruction within the loop body, which leads to complex instruction fetch logic and reduced efficiency.
By identifying the reverse jump branch instruction in the loop body instruction, recording its information, and using it as the end branch instruction of the loop body instruction when the conditions are met, a backfilling operation is performed to backfill the information of the branch instructions other than the end branch instruction to the instruction information buffer, and the prediction information is obtained by using the branch predictor, thus expanding the recognition range of the loop body instruction.
It improves the efficiency of the instruction fetch unit in accessing the I-cache, reduces power consumption, and maintains instruction fetch efficiency and accuracy when there are branch prediction errors, thus solving the problem of limited recognition range in traditional designs.
Smart Images

Figure CN122308920A_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to an instruction processing method and apparatus, a processor, an electronic device, and a storage medium. Background Technology
[0002] A loop buffer (Lbuf) is a hardware structure introduced in modern processors to optimize loop execution efficiency. Its core origin lies in the frequent loop body instructions during program execution. Traditional processors, when processing loops, need to repeatedly fetch instructions from the instruction cache (I-Cache) for each iteration. This consumes power and can impact performance due to cache access latency. Lbuf, on the other hand, caches the instructions within the loop body in a small but extremely fast and low-power dedicated memory at the front end of the processor. When another loop iteration is detected, the instruction fetch unit (IFU) can directly read instructions from this cache without accessing the instruction cache, thus reducing fetch power consumption and fetch latency. Summary of the Invention
[0003] At least one embodiment of this disclosure provides an instruction processing method, including: in response to identifying a first conditional branch instruction with a back jump from an instruction stream, recording instruction information of the first conditional branch instruction; in response to identifying the first conditional branch instruction with a back jump at least once more, determining that the first conditional branch instruction is an end branch instruction in a loop body instruction and performing a backfilling operation on the loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the first conditional branch instruction; performing the backfilling operation on the loop body instruction includes: backfilling the instruction information of the second branch instruction into an instruction information buffer; and calling a branch predictor to obtain first prediction information of the second branch instruction, and backfilling the first prediction information into a branch instruction information buffer.
[0004] For example, at least one embodiment of the instruction processing method provided in this disclosure further includes: in the process of popping the second branch instruction from the instruction information buffer, performing branch prediction on the second branch instruction to obtain second prediction information; in response to the first prediction information and the second prediction information being consistent, popping the next instruction recorded after the second branch instruction from the instruction information buffer; or, in response to the first prediction information and the second prediction information being inconsistent, exiting the loop of the loop body instruction.
[0005] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the second branch instruction includes a conditional branch instruction or a JALR instruction.
[0006] For example, in the instruction processing method provided in at least one embodiment of this disclosure, when the second branch instruction is a conditional branch instruction, the first prediction information and the second prediction information include the jump information of the second branch instruction. In response to the consistency between the first prediction information and the second prediction information, the next instruction recorded after the second branch instruction is popped from the instruction information buffer, including: in response to both the first prediction information and the second prediction information indicating no jump, the next instruction is obtained by summing the memory address and instruction width of the second branch instruction; or, in response to both the first prediction information and the second prediction information indicating a jump, the next instruction is the first target instruction to which the first prediction information or the second prediction information indicates a jump.
[0007] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the first prediction information and the second prediction information are inconsistent, including: the first prediction information indicates no jump and the second prediction information indicates jump; before exiting the loop body instruction, the instruction processing method further includes: in response to the first prediction information indicating no jump and the second prediction information indicating jump, prefetching the second target instruction that the second prediction information indicates jump and placing the second target instruction in the instruction stream.
[0008] For example, in the instruction processing method provided in at least one embodiment of this disclosure, when the second branch instruction is a JALR instruction, the first prediction information includes the memory address of the third target instruction to which the second branch instruction indicates a jump, and the second prediction information includes the memory address of the fourth target instruction to which the second branch instruction indicates a jump. In response to the first prediction information and the second prediction information being consistent, the next instruction recorded after the second branch instruction is popped from the instruction information buffer, including: in response to the memory address of the third target instruction and the memory address of the fourth target instruction being consistent, the next instruction is either the third target instruction or the fourth target instruction.
[0009] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the inconsistency between the first prediction information and the second prediction information includes: the memory address of the third target instruction and the memory address of the fourth target instruction are inconsistent; before exiting the loop body instruction, the instruction processing method further includes: in response to the consistency between the memory address of the third target instruction and the memory address of the fourth target instruction, prefetching the fourth target instruction and placing the fourth target instruction in the instruction stream.
[0010] For example, in the instruction processing method provided in at least one embodiment of this disclosure, performing a backfill operation on the loop body instruction includes: in response to performing a backfill operation on the end branch instruction, calling the branch predictor to predict and obtain third prediction information of the end branch instruction; in response to the third prediction information indicating a jump, keeping the instruction information buffer and the branch instruction information buffer in a working state; or, in response to the third prediction information indicating no jump, switching the instruction information buffer and the branch instruction information buffer to an idle state, and fetching an instruction from a first address in the instruction stream, wherein the first address is obtained by summing the current memory address of the end branch instruction and the instruction bit width.
[0011] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the loop body instruction further includes at least one non-branch instruction, and the backfilling operation of the loop body instruction includes: backfilling the instruction information of the non-branch instruction into the instruction information buffer in sequence; the instruction processing method further includes: in response to the need to pop the non-branch instruction from the instruction information buffer, directly popping the non-branch instruction in sequence.
[0012] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the first conditional branch instruction for reverse jump includes at least a first sub-conditional branch instruction and a second sub-conditional branch instruction. The first sub-conditional branch instruction is different from the second sub-conditional branch instruction. In response to the first conditional branch instruction for reverse jump being identified from the instruction stream for the first time, the instruction information of the first conditional branch instruction is recorded, including: in response to the first sub-conditional branch instruction and the second sub-conditional branch instruction being identified from the instruction stream for the first time in sequence, the instruction information of the first sub-conditional branch instruction is recorded through a first table entry, and the instruction information of the second sub-conditional branch instruction is recorded through a second table entry.
[0013] For example, at least one embodiment of the instruction processing method provided in this disclosure further includes: popping non-branch instructions and second branch instructions from the loop body instructions in sequence based on a first pointer maintained by the instruction information cache and a second pointer maintained by the branch instruction information cache.
[0014] For example, in the instruction processing method provided in at least one embodiment of this disclosure, the instruction information of the second branch instruction includes the exception information and the original data information of the second branch instruction.
[0015] At least one embodiment of this disclosure also provides an instruction processing apparatus, including: an identification and recording circuit configured to, in response to identifying a first conditional branch instruction with a reverse jump from an instruction stream, record instruction information of the first conditional branch instruction; in response to identifying the first conditional branch instruction with a reverse jump at least once more, determine that the first conditional branch instruction is an end branch instruction in a loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the first conditional branch instruction; an instruction backfilling circuit configured to backfill the instruction information of the second branch instruction into an instruction information buffer; and to invoke a branch predictor to obtain first prediction information of the second branch instruction, and backfill the first prediction information into a branch instruction information buffer.
[0016] For example, at least one embodiment of the instruction processing apparatus provided in this disclosure further includes an instruction pop-out circuit, configured to perform branch prediction on the second branch instruction during the process of popping the second branch instruction from the instruction information buffer to obtain second prediction information; in response to the first prediction information and the second prediction information being consistent, pop the next instruction recorded after the second branch instruction from the instruction information buffer; or, in response to the first prediction information and the second prediction information being inconsistent, exit the loop of the loop body instruction.
[0017] For example, at least one embodiment of the instruction processing apparatus provided in this disclosure further includes an instruction prefetch circuit, configured to prefetch second prediction information indicating the target instruction to jump to and place the target instruction in the instruction stream before exiting the loop body instruction loop.
[0018] At least one embodiment of this disclosure also provides a processor, including the instruction processing apparatus of any of the above embodiments.
[0019] At least one embodiment of this disclosure also provides an electronic device, including: a processor; a memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, and the one or more computer program modules are used to perform the instruction processing method according to the at least one embodiment described above.
[0020] At least one embodiment of this disclosure also provides a storage medium for non-temporarily storing computer-executable instructions, wherein when the computer-executable instructions are executed by a computer, the instruction processing method according to the at least one embodiment described above is performed. Attached Figure Description
[0021] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings of the embodiments will be briefly described below. Obviously, the drawings described below only relate to some embodiments of this disclosure and are not intended to limit this disclosure.
[0022] Figure 1AThis is a schematic diagram of filling loop body instructions in an instruction stream;
[0023] Figure 1B To Figure 1A The diagram shows the pop-up of the loop body command that is backfilled in Lbuf;
[0024] Figure 2 This is a flowchart of an instruction processing method according to at least one embodiment of the present disclosure;
[0025] Figure 3 This is a schematic diagram illustrating the identification and recording of a first conditional branch instruction by an instruction processing method according to at least one embodiment of the present disclosure;
[0026] Figure 4 for Figure 3 A schematic diagram showing the pop-up of the instruction for the loop body (1);
[0027] Figure 5 This is an example flowchart of an instruction processing method according to at least one embodiment of the present disclosure;
[0028] Figure 6 A schematic diagram of an instruction processing apparatus provided in at least one embodiment of this disclosure;
[0029] Figure 7 A schematic diagram of an electronic device provided for at least one embodiment of this disclosure;
[0030] Figure 8 A schematic diagram of a storage medium provided for at least one embodiment of this disclosure; and
[0031] Figure 9 This is a schematic diagram of another electronic device provided for at least one embodiment of the present disclosure. Detailed Implementation
[0032] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the described embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0033] Unless otherwise defined, the technical or scientific terms used in this disclosure shall have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms “first,” “second,” and similar terms used in this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as “comprising” or “including” mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as “connected” or “linked” are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as “upper,” “lower,” “left,” and “right” are used only to indicate relative positional relationships, and these relative positional relationships may change accordingly when the absolute position of the described objects changes.
[0034] In the example processor architecture, Lbuf is designed to cache loop body instructions to reduce the power consumption of the instruction fetch unit and improve front-side bandwidth. However, if there are branch instructions other than the loop termination branch within the loop body, it will bring complex challenges to the instruction backfilling and popping process of Lbuf, especially when branch prediction errors require additional processing logic. The following example illustrates this.
[0035] Figure 1A This is a schematic diagram illustrating how to backfill loop body instructions in an instruction stream. For example... Figure 1AAs shown, for example, in the instruction stream, instructions 2-6 are identified as loop body instructions. Here, loop body instructions represent multiple instructions executed in a loop, such as multiple instructions executed in the order of instruction 2-instruction 6, and then instruction 2-instruction 6 again. Among them, instruction 6 is the ending branch instruction of the loop body instruction, instruction 3 is a conditional branch instruction with a positive jump (the jump offset is positive), and the remaining instructions 2, 4, and 5 are non-branch instructions. During the instruction backfilling process, in the first backfilling cycle (a), instruction 2 is backfilled into Lbuf; in the second backfilling cycle (b), instruction 3 (forward jump branch instruction) is backfilled into Lbuf, and in this cycle, the branch predictor predicts that the jump direction of instruction 3 is no jump; then in the third backfilling cycle (c), instruction 4, which follows instruction 3, is directly backfilled into Lbuf; subsequently, in the fourth backfilling cycle (d), instruction 5 is backfilled into Lbuf; finally, in the fifth backfilling cycle (e), instruction 6 (end branch instruction) is backfilled into Lbuf, and in this cycle, the branch predictor predicts that the jump direction of instruction 6 is still a reverse jump to instruction 2 (i.e., the jump offset is negative "-4"). At this point, the loop body instructions, including instructions 2 through 6, are filled back into Lbuf. The instruction fetching unit no longer needs to fetch instructions from the cache (I-cache) in subsequent instruction fetching processes, but can directly fetch instructions from Lbuf.
[0036] Figure 1B To Figure 1A The diagram illustrates the popping of the loop body command that is backfilled in Lbuf. (See attached diagram.) Figure 1B As shown, in the first instruction pop cycle (e), instruction 2 is popped from Lbuf for use by the instruction fetch unit; in the second instruction pop cycle (f), instruction 3 (positive jump branch instruction) is popped from Lbuf for use by the instruction fetch unit, and in this cycle, the branch predictor predicts instruction 3 again and obtains that its jump direction is a positive jump to instruction 5 (i.e., the jump offset is a positive number "2"); then in the third instruction pop cycle (g), instruction 5 needs to be popped from Lbuf for use by the instruction fetch unit (instruction 4 is skipped); finally, in the fourth instruction pop cycle (h), instruction 6 (end branch instruction) is popped from Lbuf, and the branch predictor determines whether to exit the loop. For example, if the branch predictor predicts that instruction 6 will still jump back to instruction 2, the above instruction popping process will continue to be executed repeatedly, which will not be elaborated here; if the branch predictor predicts that the jump direction of instruction 6 is no jump, the next instruction immediately following instruction 6 (e.g., instruction 7) will continue to be executed according to the instruction flow.
[0037] Based on the above-described exemplary instruction backfilling and popping process, the inventors of this application have discovered through research that when the prediction results for jump branch instructions (excluding the branch instruction ending the loop body) in the loop body are inconsistent during instruction backfilling and popping from Lbuf, it is necessary for technicians to design complex instruction fetching logic to correctly pop the instruction from Lbuf. For example, refer to... Figure 1B During the first complete loop body instruction popping, since the branch predictor predicts a jump from instruction 3, Lbuf's popping logic needs to be set to jump from instruction 3 to instruction 5 (instruction 4 is skipped). If, during the next complete loop body instruction popping, the branch predictor predicts that instruction 3 will not jump, then theoretically, instruction 4 should be popped in sequence. In this case, Lbuf's popping logic needs to be temporarily modified to pop instruction 3 first and then instruction 4 (instead of instruction 5). Therefore, since the loop body instructions contain other branch instructions besides the ending branch instruction (such as the aforementioned forward jump branch instruction 3), this increases the difficulty of designing the instruction popping logic, resulting in lower instruction fetch efficiency, which contradicts Lbuf's original intention of improving instruction fetch efficiency.
[0038] Meanwhile, the inventors of this application have also discovered that: currently, if there is a Jump And Link (JAL) instruction or a Jump And Link Register (JALR) instruction in the loop body instruction, the loop body instruction will be directly judged as invalid, and instruction backfilling will not be performed on the loop body instruction. The instruction fetching unit will still fetch instructions from the I-cache.
[0039] To address issues such as the inconsistency between two predictions during instruction backfilling and popping of Lbuf, which leads to reduced instruction fetch efficiency, traditional processor design approaches directly avoid including instructions other than loop-ending jump instructions in the loop body. This design philosophy focuses only on recognizing loop body instructions that contain only loop-ending jump instructions, lacking recognition of other types of loop body instructions. This significantly limits the scope of loop body instruction recognition and instruction fetch efficiency.
[0040] In view of at least one of the above-mentioned problems, at least one embodiment of the present disclosure provides an instruction processing method, including: in response to the initial identification of a first conditional branch instruction with a back jump from an instruction stream, recording instruction information of the first conditional branch instruction; in response to the identification of the first conditional branch instruction with a back jump at least once more, determining that the first conditional branch instruction is an end branch instruction in a loop body instruction and performing a backfilling operation on the loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the first conditional branch instruction; performing the backfilling operation on the loop body instruction includes: backfilling the instruction information of the second branch instruction into an instruction information buffer; and calling a branch predictor to obtain first prediction information of the second branch instruction, and backfilling the first prediction information into a branch instruction information buffer.
[0041] At least one embodiment of this disclosure also provides an instruction processing apparatus, processor, electronic device, and storage medium corresponding to the above-described instruction processing method.
[0042] The instruction processing method provided in at least one embodiment of this disclosure expands the recognition range of loop body instructions by increasing the number of branch instructions other than the end-branch instruction in the loop body instruction, thereby reducing the power consumption of the instruction fetch unit accessing the I-cache. In the instruction backfilling stage, the prediction information of other branch instructions in the loop body instruction other than the end-branch instruction is backfilled into the newly added branch instruction information cache area. While expanding the recognition range of loop body instructions, it also ensures the efficiency and accuracy of instruction fetching, overcoming the technical bias of design ideas such as those described above.
[0043] Furthermore, the inventors of this application have also noted that: When traditional Lbuf fails to recognize a branch instruction, if the fetch bandwidth is 64 bits, and the branch instruction consists of the last 16 bits of the current instruction and the first 16 bits of the next instruction (i.e., the branch instruction spans multiple instructions), it requires two instructions to fetch the complete branch instruction. If the fetch bandwidth is 128 bits, the branch instruction can be fetched in one instruction if it spans 64 bits but not 128 bits; however, if the branch instruction spans 128 bits, it still requires two instructions to fetch the complete branch instruction. Therefore, with a smaller fetch bandwidth, there is a problem of decreased fetch efficiency due to branch instructions spanning multiple instructions.
[0044] However, since the instruction processing method in at least one embodiment of this disclosure can backfill the loop body instruction including at least one branch instruction into Lbuf, it also solves the problem of reduced instruction fetch efficiency due to bandwidth during the instruction fetching process for loop body instructions including multiple branch instructions in the traditional method.
[0045] The instruction processing method of at least one embodiment of this disclosure will be described below with reference to specific examples. It should be noted that the instruction processing method of at least one embodiment of this disclosure can be applied to RISC-V, x86, or ARM instruction set architectures, and the embodiments of this disclosure are not limited thereto. Exemplarily, the following description is based on the RISC-V instruction set architecture.
[0046] Figure 2 This is a flowchart illustrating an instruction processing method according to at least one embodiment of this disclosure. Figure 2 As shown, the instruction processing method includes the following steps S101-S102.
[0047] Step S101: In response to the identification of a first conditional branch instruction with a reverse jump from the instruction stream, record the instruction information of the first conditional branch instruction.
[0048] Step S102: In response to recognizing a first conditional branch instruction for a reverse jump at least once more, determine that the first conditional branch instruction is an end branch instruction in the loop body instruction and perform a backfilling operation on the loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the end branch instruction.
[0049] In some embodiments, performing the backfilling operation on the loop body instruction may include: backfilling the instruction information of the second branch instruction into the instruction information cache; and invoking the branch predictor to obtain the first prediction information of the second branch instruction, and backfilling the first prediction information into the branch instruction information cache.
[0050] For step S101, the instruction stream here can refer to the instruction stream in the decoding stage, that is, the instruction stream that is broken down into independent instructions with opcodes after instruction boundary identification (e.g., Figure 1A (Instructions 1-6 are shown). When a first conditional branch instruction for a reverse jump is identified from the instruction stream, the instruction information of that first conditional branch instruction can be recorded. For example, the first conditional branch instruction can be any conditional branch instruction identified in the instruction stream. In some embodiments, the first conditional branch instruction can be identified and its instruction information recorded when it first appears in the instruction stream.
[0051] For example, it can be determined whether it is a first conditional branch instruction with a reverse jump by identifying the instruction type and the sign of the jump offset, and the instruction information of the first conditional branch instruction can be recorded. For example, the instruction information of the first conditional branch instruction includes a valid flag, the memory address PC of the first conditional branch instruction, and its own instruction width. For example, when the first conditional branch instruction with a reverse jump is identified, the valid flag of the first conditional branch instruction is set to high level (1) to be valid, and the memory address PC1 (e.g., 0x1000) and the instruction width (e.g., in the RISC-V instruction set architecture, the instruction width is usually 32 bits or 16 bits under compression) of the first conditional branch instruction are recorded.
[0052] For step S102, in response to the first conditional branch instruction of the above-mentioned reverse jump being identified at least once more (e.g., a second or more times), the first conditional branch instruction can be determined to be the end branch instruction in the loop body instruction, and then the backfilling operation of the loop body instruction can be started.
[0053] Here, identifying the first conditional branch instruction of the aforementioned reverse jump at least once before determining that the first conditional branch instruction is the end branch instruction in the loop body instruction can improve the reliability of identifying the end branch instruction in the loop body instruction. For example, depending on the actual usage needs of the processor, the above "at least once" can be set to "once again," "twice again," etc., and this disclosure is not limited to this. For example, in order to improve instruction fetch efficiency, when the first conditional branch instruction of the reverse jump initially identified in step S101 is identified for the second time, it can be determined that the first conditional branch instruction of the reverse jump is the end branch instruction in the loop body instruction so as to perform subsequent instruction backfilling operations.
[0054] It should be noted that the loop body instructions processed in at least one embodiment of this disclosure include at least one second branch instruction in addition to the termination branch instruction. For example, the second branch instruction includes a conditional branch instruction or a JALR instruction, and the embodiments of this disclosure do not limit this. It should be noted that the instruction processing method of at least one embodiment of this disclosure only restricts the jump direction of the termination branch instruction in the loop body instruction to be negative (i.e., the offset is negative), while the embodiments of this disclosure do not limit the jump direction of the second branch instructions (e.g., conditional branch instructions or JALR instructions) in the loop body instruction.
[0055] Unlike the design concept described above, which only includes the end-branch instruction in the loop body, this embodiment extends the recognition of loop body instructions to include at least one branch instruction in addition to the end-branch instruction (also referred to herein as a multi-branch loop body or multi-branch loop body instruction), thereby expanding the scope of loop body instruction recognition. In this way, once successfully recognized, this multi-branch loop body can be backfilled into Lbuf, allowing subsequent instruction fetching processes to retrieve the multi-branch loop body from Lbuf, thus reducing the power consumption of icache access.
[0056] Additionally, the specific number of at least one second-branch instruction can be set according to the actual situation, and this disclosure is not limited thereto.
[0057] In addition, performing the backfill operation on the loop body instructions can specifically include:
[0058] On one hand, the instruction information of the second branch instruction is backfilled into the instruction information cache. Here, the instruction information cache is the Lbuf mentioned above. The instruction information of the second branch instruction includes, for example, the original instruction data of the second branch instruction (e.g., including the type information of the branch instruction) and exception information. For example, the number of entries included in the instruction information cache can be 32, where one entry is used to record the instruction information of one instruction. The embodiments of this disclosure do not limit the number of entries in the instruction information cache, and those skilled in the art can determine the number based on actual instruction fetch performance requirements and / or the size of the Lbuf storage space. It should be noted that when backfilling non-branch instructions (e.g., regular calculation instructions) in the loop body instruction, since they are not branch instructions, their original instruction data does not include the type information of the branch instruction.
[0059] On the other hand, for example, a branch predictor can be invoked to predict the branch of the second branch instruction to obtain first prediction information, and this first prediction information can be filled back into the branch instruction information cache. For example, the first prediction information may include the jump information of the second branch instruction (e.g., jump Taken or not taken) and the memory address PC of the target instruction to which the second branch instruction indicates a jump. For example, considering both performance and area, the number of entries included in the branch instruction information cache can be 5, where one entry is used to record the first prediction information of one second branch instruction (i.e., it can handle loop body instructions with up to 5 branch instructions). The embodiments of this disclosure do not limit the number of entries in the branch instruction information cache.
[0060] Thus, the instruction processing method provided in at least one embodiment of this disclosure expands the scope of loop body instruction recognition by increasing the number of branch instructions (excluding the end branch instruction) that are limited by loop body recognition, thereby reducing the power consumption of the instruction fetch unit accessing the I-cache.
[0061] For example, in one possible implementation, the above instruction processing method further includes: in response to popping the second branch instruction from the instruction information buffer, calling the branch predictor again to obtain the second prediction information of the second branch instruction; in response to the first prediction information and the second prediction information being consistent, popping the next instruction recorded after the second branch instruction from the instruction information buffer; or, in response to the first prediction information and the second prediction information being inconsistent, exiting the loop of the loop body instruction.
[0062] Here, during the instruction popping process, when the second branch instruction is popped from the instruction information buffer, the branch predictor during the instruction backfilling phase is called again to obtain the second prediction information of the second branch instruction at the current popping moment, and compare it with the first prediction information of the second branch instruction predicted and recorded during the instruction backfilling phase.
[0063] On the one hand, if the first prediction information and the second prediction information are consistent, it can be determined that the first (next) instruction recorded after the second branch instruction in the loop body instruction currently recorded in Lbuf still meets the requirements of the loop body instruction. In this case, the next instruction after the second branch instruction can be directly popped to continue popping instructions. Here, if the second prediction information obtained in the instruction popping stage is consistent with the first prediction information recorded in the instruction backfilling stage, then the instruction processing method of at least one embodiment of this disclosure expands the scope of loop body instruction recognition while ensuring the efficiency and accuracy of instruction fetching.
[0064] On the other hand, if the first prediction information and the second prediction information are inconsistent, it means that the requirements of the loop body instruction are no longer met at the current moment. In this case, the loop body instruction is exited to resume fetching instructions from the instruction stream or from another loop body instruction.
[0065] For example, in one possible implementation, when the second branch instruction is a conditional branch instruction, the first prediction information and the second prediction information include the jump information of the second branch instruction. In response to the consistency of the first prediction information and the second prediction information, the next instruction recorded after the second branch instruction is popped from the instruction information buffer, including: in response to both the first prediction information and the second prediction information indicating no jump, the next instruction is obtained by summing the memory address and instruction width of the second branch instruction; or, in response to both the first prediction information and the second prediction information indicating a jump, the next instruction is the first target instruction to which either the first prediction information or the second prediction information indicates a jump.
[0066] As described above, the second branch instruction includes a conditional branch instruction or a JALR instruction. When the second branch instruction is a conditional branch instruction, the branch predictor obtains first prediction information during the instruction backfilling phase and second prediction information during the instruction pop-up phase, including the jump information of the second branch instruction (e.g., jump to Taken or not jump to NotTaken).
[0067] If both the first and second prediction messages indicate "Not Taken," then the next instruction following the second branch instruction is recorded in Lbuf by summing the memory address PC of the second branch instruction and its instruction width. For example, if the memory address PC of the second branch instruction is 0x1000 and the instruction width is 32 bits (4 bytes), then the memory address PC of the next instruction is 0x1004.
[0068] If both the first and second prediction information indicate a jump to `Taken`, then the next instruction recorded in `Lbuf` after the second branch instruction is the first target instruction to which either the first or second prediction information indicates a jump. This first target instruction is obtained by summing the memory address (PC) of the second branch instruction and its encoded jump offset (Offset). For example, if the memory address (PC) of the second branch instruction is 0x1000 and the jump offset is 0x10, then the memory address (PC) of the first target instruction is 0x1020.
[0069] It should be noted that for the same conditional branch instruction, since its memory address PC and jump offset are fixed, it is only necessary to predict whether it will jump by using the branch predictor to determine whether the address PC of the target instruction indicating the jump is correct.
[0070] For example, in one possible implementation, when the second branch instruction is a conditional branch instruction, the inconsistency between the first prediction information and the second prediction information may include: the first prediction information indicates no jump while the second prediction information indicates a jump; before exiting the loop body instruction, the above instruction processing method may further include: in response to the first prediction information indicating no jump while the second prediction information indicates a jump, prefetching the second target instruction that the second prediction information indicates a jump and placing the second target instruction in the instruction stream.
[0071] For example, if the first prediction information indicates no jump but the second prediction information indicates a jump when the second branch instruction is a conditional branch instruction, then before exiting the loop body instruction to fetch instructions from the instruction stream, for example, in response to backfilling the loop body instruction into the instruction information buffer Lbuf, the second target instruction indicated by the second prediction information to jump can be fetched from the target address in, for example, the I-cache and placed into the instruction stream. This allows the instruction fetching unit to quickly fetch the target instruction to be executed from the instruction stream after exiting the loop, thereby eliminating the bubble problem caused by fetch interruption due to inconsistent prediction.
[0072] For example, refer to Figure 1A and Figure 1B In this case, instruction 3 in the loop body instruction is a conditional branch instruction. During the instruction backfilling phase, its corresponding first prediction information indicates no jump, so instruction 4 is backfilled into Lbuf in sequence. During the instruction popping phase, when instruction 3 is popped from Lbuf, its corresponding second prediction information indicates a jump to instruction 5. At this point, before stopping the popping of the loop body instruction from Lbuf, instruction 5 is pre-fetched and placed in the instruction stream so that the instruction fetching unit can achieve fast instruction fetching, avoiding the complex instruction fetching logic of traditionally popping from Lbuf, and improving instruction fetching efficiency.
[0073] For example, if the first prediction information indicates a jump but the second prediction information indicates no jump when the second branch instruction is a conditional branch instruction, then the next instruction corresponding to the second branch instruction (i.e., the memory address PC of the second branch instruction + instruction width) is placed in the instruction stream before exiting the loop body instruction to fetch instructions from the instruction stream.
[0074] For example, in one possible implementation, when the second branch instruction is a JALR instruction, the first prediction information includes the memory address of the third target instruction to which the second branch instruction indicates a jump, and the second prediction information includes the memory address of the fourth target instruction to which the second branch instruction indicates a jump. In response to the first prediction information and the second prediction information being consistent, the next instruction recorded after the second branch instruction is popped from the instruction information buffer, including: in response to the memory address of the third target instruction and the memory address of the fourth target instruction being consistent, the next instruction is either the third target instruction or the fourth target instruction.
[0075] The JALR (Jump and Link Register) instruction is a RISC-V instruction used to implement register indirect jumps and links. Its operation is as follows: first, the address of the target instruction is calculated (obtained by adding a jump offset value to the value of the base address register RS1), then the jump is performed to that address, and simultaneously the address of the next instruction (PC+4) is saved to the target register RD. Since the value in the base address register RS1 can change at any time (while the jump offset value is fixed), the address of the target instruction indicated by the JALR instruction at different times also changes. Therefore, when the second branch instruction is a JALR instruction, during the instruction backfilling stage, the first prediction information includes the memory address of the third target instruction indicated by the second branch instruction; during the instruction popping stage, the second prediction information includes the memory address of the fourth target instruction indicated by the second branch instruction (the memory addresses of the third and fourth target instructions may be the same or different, depending on whether the values in the base address register RS1 at the backfilling and popping times are the same).
[0076] If the memory address of the third target instruction and the memory address of the fourth target instruction are the same, it means that the value in the base address register RS1 did not change during the backfilling and popping of the second branch instruction. The first (i.e., the next) instruction after the second branch instruction is recorded in Lbuf as either the third or fourth target instruction. At this time, the third or fourth target instruction can be popped directly to continue the instruction popping process.
[0077] For example, in one possible implementation, when the second branch instruction is a JALR instruction, the first prediction information and the second prediction information are inconsistent, including: the memory address of the third target instruction and the memory address of the fourth target instruction are inconsistent; before exiting the loop body instruction, the above instruction processing method further includes: in response to the inconsistency between the memory address of the third target instruction and the memory address of the fourth target instruction, prefetching the fourth target instruction and placing the fourth target instruction into the instruction stream.
[0078] Referring to the above description of the inconsistency between the first and second prediction information when the second branch instruction is a conditional branch instruction, the processing for the second branch instruction being a JALR instruction is similar and will not be repeated here.
[0079] For example, in one possible implementation, performing the backfill operation on the loop body instruction further includes: in response to performing the backfill operation on the end branch instruction, calling the branch predictor to predict the third prediction information of the end branch instruction; in response to the third prediction information indicating a jump, keeping the instruction information buffer and the branch instruction information buffer in an active state; or, in response to the third prediction information indicating no jump, switching the instruction information buffer and the branch instruction information buffer to an idle state, and fetching an instruction from a first address in the instruction stream, wherein the first address is obtained by summing the memory address of the end branch instruction and the instruction width.
[0080] Here, in addition to backfilling the second branch instruction in the loop body, there is also an operation to backfill the ending branch instruction. During the backfilling of the ending branch instruction, if the branch predictor predicts that the third prediction information of the ending branch instruction indicates a jump, it means that the loop body instruction will be executed at least once more, and the instruction information buffer Lbuf and the branch instruction information buffer remain in an active state. If the branch predictor predicts that the third prediction information of the ending branch instruction will not indicate a jump, it means that the loop body instruction will no longer iterate, and the instruction information buffer Lbuf and the branch instruction information buffer are switched to an idle state. Instruction is fetched from the first address in the instruction stream, where the first address is obtained by summing the memory address of the ending branch instruction and the instruction width. For example, if the memory address of the ending branch instruction is 0x20A8 and the instruction width is 32 bits (4 bytes), the first address would be 0x20AC.
[0081] For example, if all loop body instructions have been recorded in Lbuf and the entire process of popping instructions at least once (e.g., twice) has been iterated, in the third iteration, if the prediction information when popping the end branch instruction is that it will not jump, then refer to the case where the prediction of the end branch instruction does not jump during the backfilling process described above, which will not be repeated here.
[0082] For example, in one possible implementation, the loop body instruction further includes at least one non-branch instruction. Performing a backfill operation on the loop body instruction further includes: backfilling the instruction information of the non-branch instruction into the instruction information buffer in sequence; the instruction processing method further includes: in response to the need to pop the non-branch instruction from the instruction information buffer, directly popping the non-branch instruction in sequence.
[0083] For example, in addition to the second branch instruction and the end branch instruction, the loop body instruction also includes some non-branch instructions (e.g., regular calculation instructions). In the process of backfilling and popping these non-branch instructions, there is no need to call the branch predictor for prediction and comparison; the instructions can be backfilled and popped directly.
[0084] For example, in one possible implementation, the first conditional branch instruction for the reverse jump includes at least a first sub-conditional branch instruction and a second sub-conditional branch instruction. The first sub-conditional branch instruction is different from the second sub-conditional branch instruction. In response to the first conditional branch instruction for the reverse jump being identified for the first time from the instruction stream, the instruction information of the first conditional branch instruction is recorded, including: in response to the first sub-conditional branch instruction and the second sub-conditional branch instruction being identified for the first time from the instruction stream in sequence, the instruction information of the first sub-conditional branch instruction is recorded through a first table entry, and the instruction information of the second sub-conditional branch instruction is recorded through a second table entry.
[0085] Figure 3 This is a schematic diagram illustrating the identification and recording of a first conditional branch instruction by an instruction processing method according to at least one embodiment of the present disclosure.
[0086] like Figure 3 As shown, the instruction flow includes a first sub-conditional branch instruction (instruction 11) and a second sub-conditional branch instruction (instruction 21) with a reverse jump. Instructions 2-11 form a loop body (1), and instructions 13-21 form a loop body (2). When instruction 11 is first detected during instruction execution, the instruction information of instruction 11 is first recorded through the first entry. When instruction 11 is detected again, the loop body (1) is backfilled into Lbuf and iteratively popped out as needed after backfilling. After exiting the loop body (1), the instruction information of the loop body (1) recorded in Lbuf is flushed away. At this time, the instruction fetch unit fetches and executes instructions from instruction 12 in sequence. When instruction 21 is detected again, the instruction information of instruction 21 is recorded through the second entry. When instruction 21 is detected again, the loop body (2) is backfilled into Lbuf and iteratively popped out as needed after backfilling. When exiting the loop body (2), the instruction information of the loop body (2) recorded in Lbuf is flushed away. At this time, the instruction fetch unit fetches and executes instructions sequentially from instruction 22. Instruction 22 is a branch instruction and jumps to instruction 1. The instruction fetch unit then starts fetching instructions again from instruction 1. In this case, since instruction 11 has already been recorded in the first entry, if instruction 11 is recognized again, the backfilling process of the loop body (1) will be executed directly to save at least one iteration process, thereby improving the recognition efficiency of the loop body instruction.
[0087] For example, in one possible implementation, the instruction processing method further includes: popping non-branch instructions and second branch instructions from the loop body instructions in sequence based on a first pointer maintained by the instruction information buffer and a second pointer maintained by the branch instruction information buffer.
[0088] Figure 4 for Figure 3 A schematic diagram of the command pop-up for loop body (1).
[0089] like Figure 4 As shown, the instruction information buffer Lbuf maintains a first pointer ptr1, and the branch information buffer maintains a second pointer ptr2. In the loop body (1), instructions 2 and 7 are branch instructions, instruction 11 is the end branch instruction, and the remaining instructions 5, 6, and 10 are regular instructions. Since instructions 2 and 9 are branch instructions, their first prediction information is sequentially backfilled into the branch information buffer during the backfilling process. The instruction information of all instructions is sequentially backfilled into the instruction information buffer Lbuf.
[0090] During the instruction popping process, the pointer ptr1 of Lbuf is initially positioned on instruction 2, so that instruction 2 is popped first and the branch predictor is called to obtain the second prediction information of instruction 2. The pointer ptr2 of the branch instruction information buffer is also initially positioned on instruction 2, so that the first prediction information of instruction 2 is popped first and compared with the second prediction information. If the first and second prediction information of instruction 2 match, ptr1 is incremented by 1 and jumps to the next instruction 5 to pop instruction 5 directly. Simultaneously, ptr2 is also incremented by 1 and jumps to the next instruction 7. Then, ptr1 is incremented again and instruction 6 is popped directly, while ptr2 remains unchanged. Next, ptr1 pops instruction 7 and calls the branch predictor to obtain the second prediction information of instruction 7. Simultaneously, ptr2 pops the first prediction information of instruction 7 to compare with the second prediction information. If the first and second prediction information of instruction 7 also match, ptr1 is incremented by 1 and jumps to the next instruction 10 to pop instruction 10 directly. ptr2 then jumps back to instruction 2 to wait for ptr1 to jump back to instruction 2. Then, the ptr1 counter is incremented by 1 to directly pop instruction 11 and call the branch predictor to predict whether the prediction information of instruction 11 will lead to a jump. If instruction 11 is predicted to lead to a jump, ptr1 jumps back to instruction 2 to cooperate with ptr2 for the next round of popping loop. If instruction 11 is predicted not to lead to a jump, the instruction information of the loop body (1) in the Lbuf is flushed and instruction fetching continues from instruction 12.
[0091] It should be noted that if the predicted second prediction information and the recorded first prediction information are inconsistent when either instruction 2 or instruction 7 is popped up, the loop will be exited directly.
[0092] Figure 5 This is an example flowchart of an instruction processing method according to at least one embodiment of the present disclosure.
[0093] like Figure 5As shown, firstly, it is determined from the instruction stream whether the current instruction is a short loop body ending branch instruction (S103). If it is not a short loop body ending instruction, it is checked whether Lbuf is in a backfilling state (S104). If Lbuf is in a backfilling state, the judgment logic of the backfilling process is entered; if Lbuf is not in a backfilling state, the current judgment cycle is directly ended, and the process waits for the next instruction cycle (S105). If the current instruction is identified as a short loop body ending instruction, the subsequent processing direction is determined according to the current backfilling state of Lbuf, marking the establishment of the loop body boundary (S106).
[0094] Subsequently, during the backfilling process, if a non-loop body termination instruction is encountered, the Lbuf entry is first checked to see if it is full (S107). If it is full, it means that the number of loop body instructions exceeds the Lbuf capacity, so the short loop body is determined to be invalid and the process is terminated (S108); if it is not full, it is then determined whether the current instruction is a branch instruction (S109). If it is a branch instruction, the branch instruction cache entry is further checked to see if it is full (S110); if it is full, the process is terminated (S111), otherwise the first prediction information of the branch instruction is backfilled into the branch instruction information cache, and the instruction information of the branch instruction is backfilled into the Lbuf (S112). This design allows multiple branch instructions to coexist within a loop body, breaking through the traditional limitation on the number of branch instructions.
[0095] When a loop body termination instruction is encountered during the backfilling process, it is also necessary to check whether the Lbuf entry and branch information cache entry are full (S113). If either cache is full, the process terminates (S114); if neither is full, it means that all short loop body instructions have been successfully backfilled, and Lbuf enters the ready state (S115).
[0096] Finally, during the Lbuf instruction popping phase, it is first necessary to determine whether the non-loop instructions have been popped completely (S116). After the non-loop instructions have been popped, the short loop instructions are popped (S117). For each popped instruction, if it is a conditional branch or a JALR instruction, a request is sent to the branch predictor to obtain the current prediction result, which is then compared with the prediction result recorded during instruction backfilling. If the two prediction results match, the next instruction is popped (S118); if they do not match, the loop is determined to be broken, the current loop popping process is immediately exited, and the next processing cycle begins (S119). This mechanism ensures the accuracy and consistency of branch prediction during Lbuf usage, avoiding instruction fetching errors caused by inconsistent predictions.
[0097] At least one embodiment of this disclosure also provides an instruction processing apparatus corresponding to the above-described instruction processing method. Figure 6This is a schematic diagram of an instruction processing apparatus provided for at least one embodiment of the present disclosure.
[0098] like Figure 6 As shown, the instruction processing device 200 includes an identification and recording circuit 210 and an instruction backfilling circuit 220.
[0099] The identification and recording circuit 210 is configured to, in response to the initial identification of a first conditional branch instruction with a reverse jump from the instruction stream, record the instruction information of the first conditional branch instruction; and, in response to the identification of the first conditional branch instruction with a reverse jump at least once more, determine that the first conditional branch instruction is the end branch instruction in the loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the first conditional branch instruction. For specific operation steps, please refer to the description of step S101 and part of step S102 above, which will not be repeated here.
[0100] The instruction backfilling circuit 220 is configured to backfill the instruction information of the second branch instruction into the instruction information buffer; and to call the branch predictor to obtain the first prediction information of the second branch instruction, and backfill the first prediction information into the branch instruction information buffer. The specific operation steps are described above for step S102, and will not be repeated here.
[0101] For example, in one possible implementation, the instruction processing device 200 further includes an instruction pop-out circuit (not shown in the figure). This instruction pop-out circuit is configured to perform branch prediction on the second branch instruction during the process of popping the second branch instruction from the instruction information buffer to obtain second prediction information; in response to the first prediction information and the second prediction information being consistent, pop the next instruction recorded after the second branch instruction from the instruction information buffer; or, in response to the first prediction information and the second prediction information being inconsistent, exit the loop of the loop body instruction. The specific operation steps are described above and will not be repeated here.
[0102] For example, in one possible implementation, the instruction processing apparatus 200 further includes an instruction prefetching circuit (not shown in the figure), which is configured to prefetch the target instruction indicating the jump before exiting the loop body instruction and place the target instruction in the instruction stream. The specific operation steps are described above and will not be repeated here.
[0103] The instruction processing apparatus provided in at least one embodiment of this disclosure expands the recognition range of loop body instructions by increasing the number of branch instructions (excluding the end-branch instruction) in the loop body instruction, thereby reducing the power consumption of the instruction fetch unit accessing the I-cache. Furthermore, during the instruction backfilling stage, the prediction information of the branch instructions (excluding the end-branch instruction) in the loop body instruction is backfilled into a newly added branch instruction information cache. This expands the recognition range of loop body instructions while ensuring the efficiency and accuracy of instruction fetching, overcoming technical bias. In addition, since the instruction processing apparatus in at least one embodiment of this disclosure can backfill loop body instructions including at least one branch instruction into Lbuf, it also solves the problem of decreased instruction fetching efficiency due to bandwidth limitations in traditional instruction fetching processes for loop body instructions including multiple branch instructions.
[0104] It should be noted that, for clarity and brevity, the embodiments of this disclosure do not show all the constituent units of the instruction processing device 200 described above. To achieve the necessary functions of the instruction processing device 200, those skilled in the art can provide and set other constituent units (not shown) according to specific needs, and the embodiments of this disclosure do not impose any limitations on this.
[0105] At least one embodiment of this disclosure also provides a processor, which includes the instruction processing apparatus of any of the foregoing embodiments. For example, the processor includes any processor based on the RISC-V, x86, or ARM instruction architecture, and the embodiments of this disclosure are not limited thereto. For example, the processor may also include dedicated processors such as GPUs and AI accelerators.
[0106] At least one embodiment of this disclosure also provides an electronic device. Figure 7 This is a schematic diagram of an electronic device provided for at least one embodiment of the present disclosure.
[0107] For example, such as Figure 7 As shown, the electronic device 300 includes a processor 310 and a memory 320. The memory 320 is used to store non-transitory computer-readable instructions (e.g., one or more computer program modules). The processor 310 is used to execute the computer program instructions, which, when executed by the processor 310, perform the instruction processing methods provided in any embodiment of this disclosure. The memory 320 and the processor 310 can be interconnected via a bus system and / or other forms of connection mechanisms (not shown).
[0108] Processor 310 can be a central processing unit (CPU), tensor processor (TPU), network processor (NP), or graphics processing unit (GPU) with data processing and / or program execution capabilities. It can also be a digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. For example, the central processing unit (CPU) can be based on x86 or ARM architectures. Processor 310 can be a general-purpose processor or a special-purpose processor, capable of controlling other components in electronic device 300 to perform desired functions.
[0109] For example, memory 320 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and / or cache memory. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, etc. One or more computer program modules may be stored on the computer-readable storage medium, and processor 510 may run one or more computer program modules to implement various functions of electronic device 500. Various application programs and various data, as well as various data used and / or generated by the application programs, may also be stored in the computer-readable storage medium.
[0110] At least one embodiment of this disclosure also provides a storage medium for storing non-transitory computer program executable code (e.g., computer executable instructions). When executed by a computer (e.g., including one or more processors), the non-transitory computer program executable code can implement the instruction processing method of any embodiment of this disclosure.
[0111] Figure 8 This is a schematic diagram of a storage medium provided for at least one embodiment of the present disclosure. For example... Figure 8 As shown, the computer storage medium 400 non-temporarily stores computer-executable instructions 410.
[0112] For example, one or more computer instructions may be stored on the storage medium 400. Some of the computer instructions stored on the storage medium 400 may be, for example, instructions for implementing one or more steps in the instruction processing method described above.
[0113] For example, the storage medium may include the storage component of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), optical disc read-only memory (CD-ROM), flash memory, or any combination of the above storage media, or other suitable storage media.
[0114] Figure 9 This is a schematic diagram of another electronic device provided for at least one embodiment of the present disclosure. Figure 9 The illustrated electronic device 500 is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments disclosed herein.
[0115] like Figure 9 As shown, in some examples, electronic device 500 includes a processing unit (e.g., central processing unit, graphics processor, etc.) 501, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 502 or a program loaded from storage device 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the computer system. The processing unit 501, ROM 502, and RAM 503 are connected via bus 504. An input / output (I / O) interface 505 is also connected to bus 504.
[0116] For example, the following components can be connected to I / O interface 505: input devices 506 including, for example, touch screens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 507 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 508 including, for example, magnetic tapes, hard disks, etc.; and communication devices 509, such as network interface cards like LAN cards and modems, etc. Communication device 509 allows electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data and perform communication processing via networks such as the Internet. Drive 510 is also connected to I / O interface 505 as needed. Removable media 511, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on drive 510 as needed so that computer programs ejected from them can be installed into storage device 508 as needed.
[0117] Although Figure 9 An electronic device 500 including various devices is shown; however, it should be understood that implementation or inclusion of all shown devices is not required. More or fewer devices may be implemented or included alternatively.
[0118] For example, the electronic device 500 may further include a peripheral interface (not shown in the figure). This peripheral interface can be various types of interfaces, such as a USB interface, a Lightning interface, etc. The communication device 509 can communicate wirelessly with a network and other devices, such as the Internet, an intranet, and / or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and / or a metropolitan area network (MAN). Wireless communication can use any of a variety of communication standards, protocols, and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and / or IEEE 802.11n standards), Voice over Internet Protocol (VoIP), Wi-MAX, protocols for email, instant messaging, and / or Short Message Service (SMS), or any other suitable communication protocol.
[0119] For example, the electronic device 500 may include any device such as a mobile phone, tablet computer, laptop computer, e-book, game console, television, digital photo frame, navigator, server, etc., or any combination of instruction processing device and hardware. The embodiments disclosed herein do not limit this.
[0120] Although the present disclosure has been described in detail above with general descriptions and specific embodiments, modifications or improvements can be made to the embodiments of the present disclosure, which will be obvious to those skilled in the art. Therefore, all such modifications or improvements made without departing from the spirit of the present disclosure are within the scope of protection claimed by the present disclosure.
[0121] The following points should be noted regarding this disclosure:
[0122] (1) The accompanying drawings of the embodiments of this disclosure only involve the structures involved in the embodiments of this disclosure. Other structures can be referred to the general design.
[0123] (2) For clarity, the thickness of layers or regions in the drawings used to describe embodiments of the present disclosure is enlarged or reduced, i.e., these drawings are not drawn to actual scale.
[0124] (3) Where there is no conflict, the embodiments of this disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.
[0125] The above description is merely a specific embodiment of this disclosure, but the scope of protection of this disclosure is not limited thereto. The scope of protection of this disclosure should be determined by the scope of protection of the claims.
Claims
1. An instruction processing method, comprising: In response to the identification of a first conditional branch instruction with a reverse jump from the instruction stream, the instruction information of the first conditional branch instruction is recorded; as well as In response to the recognition of the first conditional branch instruction of the reverse jump at least once more, the first conditional branch instruction is determined to be the end branch instruction in the loop body instruction, and the loop body instruction is backfilled, wherein the loop body instruction includes at least one second branch instruction other than the end branch instruction. The step of performing the backfilling operation on the loop body instruction includes: Fill the instruction information of the second branch instruction back into the instruction information buffer; and Perform branch prediction on the second branch instruction to obtain first prediction information, and then fill the first prediction information back into the branch instruction information cache.
2. The instruction processing method according to claim 1 further includes: During the process of popping the second branch instruction from the instruction information buffer, branch prediction is performed on the second branch instruction to obtain second prediction information; In response to the consistency between the first prediction information and the second prediction information, the next instruction recorded after the second branch instruction is popped from the instruction information buffer; or, In response to the inconsistency between the first prediction information and the second prediction information, the loop body instruction is exited.
3. The instruction processing method according to claim 2, wherein, The second branch instruction includes conditional branch instructions or JALR instructions.
4. The instruction processing method according to claim 3, wherein, When the second branch instruction is the conditional branch instruction, the first prediction information and the second prediction information include the jump information of the second branch instruction. The step of popping the next instruction recorded after the second branch instruction from the instruction information buffer in response to the consistency between the first prediction information and the second prediction information includes: In response to both the first and second prediction information indicating no jump, the next instruction is obtained by summing the memory address and instruction width of the second branch instruction; or, In response to both the first prediction information and the second prediction information indicating a jump, the next instruction is a first target instruction indicating a jump based on either the first prediction information or the second prediction information.
5. The instruction processing method according to claim 4, wherein, The inconsistency between the first prediction information and the second prediction information includes: the first prediction information indicating no jump while the second prediction information indicating a jump; and Before exiting the loop of the loop body instruction, the instruction processing method further includes: In response to the first prediction information indicating no jump and the second prediction information indicating a jump, a second target instruction indicating a jump in the second prediction information is prefetched and the second target instruction is placed in the instruction stream.
6. The instruction processing method according to claim 3, wherein, When the second branch instruction is the JALR instruction, the first prediction information includes the memory address of the third target instruction to which the second branch instruction indicates a jump, and the second prediction information includes the memory address of the fourth target instruction to which the second branch instruction indicates a jump. The step of popping the next instruction recorded after the second branch instruction from the instruction information buffer in response to the consistency between the first prediction information and the second prediction information includes: If the memory address of the third target instruction and the memory address of the fourth target instruction are the same, the next instruction is either the third target instruction or the fourth target instruction.
7. The instruction processing method according to claim 6, wherein, The inconsistency between the first prediction information and the second prediction information includes: The memory address of the third target instruction is different from the memory address of the fourth target instruction; Before exiting the loop of the loop body instruction, the instruction processing method further includes: In response to the inconsistency between the memory address of the third target instruction and the memory address of the fourth target instruction, the fourth target instruction is prefetched and placed into the instruction stream.
8. The instruction processing method according to claim 1, wherein, The step of performing the backfilling operation on the loop body instruction further includes: In response to the backfill operation performed on the branch termination instruction, the branch predictor is invoked to predict the third prediction information of the branch termination instruction; In response to the third prediction information indicating a jump, the instruction information cache and the branch instruction information cache remain in an active state; or, In response to the third prediction information indicating no jump, the instruction information buffer and the branch instruction information buffer are switched to an idle state, and an instruction is fetched from the first address in the instruction stream, wherein the first address is obtained by summing the memory address of the end branch instruction and the instruction width.
9. The instruction processing method according to claim 2, wherein, The loop body instructions also include at least one non-branch instruction. The step of performing the backfilling operation on the loop body instruction further includes: The instruction information of the non-branch instructions is sequentially filled back into the instruction information cache; and The instruction processing method further includes: In response to the need to pop the non-branch instructions from the instruction information cache, the non-branch instructions are popped directly in sequence.
10. The instruction processing method according to claim 1, wherein, The first conditional branch instruction for the reverse jump includes at least a first sub-conditional branch instruction and a second sub-conditional branch instruction, wherein the first sub-conditional branch instruction is different from the second sub-conditional branch instruction. The step of recording instruction information for the first conditional branch instruction that initiates a reverse jump upon initial identification of the instruction stream includes: In response to the initial identification of the first sub-conditional branch instruction and the second sub-conditional branch instruction from the instruction stream in sequence, the instruction information of the first sub-conditional branch instruction is recorded through the first table entry, and the instruction information of the second sub-conditional branch instruction is recorded through the second table entry.
11. The instruction processing method according to claim 9, further comprising: Based on the first pointer maintained by the instruction information cache and the second pointer maintained by the branch instruction information cache, the non-branch instruction and the second branch instruction in the loop body instruction are popped in sequence.
12. The instruction processing method according to any one of claims 1-11, wherein, The instruction information of the second branch instruction includes the exception information and the original data information of the second branch instruction.
13. An instruction processing apparatus, comprising: The identification and recording circuit is configured as follows: In response to the identification of a first conditional branch instruction with a reverse jump from the instruction stream, the instruction information of the first conditional branch instruction is recorded; and In response to the first conditional branch instruction of the reverse jump being identified at least once more, the first conditional branch instruction is determined to be the end branch instruction in the loop body instruction, wherein the loop body instruction includes at least one second branch instruction other than the first conditional branch instruction; as well as The instruction backfill circuit is configured as follows: Fill the instruction information of the second branch instruction back into the instruction information buffer; and The first prediction information obtained by performing branch prediction on the second branch instruction is backfilled into the branch instruction information cache area.
14. The instruction processing apparatus according to claim 13, further comprising: The instruction pop-up circuit is configured as follows: During the process of popping the second branch instruction from the instruction information buffer, branch prediction is performed on the second branch instruction to obtain second prediction information; In response to the consistency between the first prediction information and the second prediction information, the next instruction recorded after the second branch instruction is popped from the instruction information buffer; or, In response to the inconsistency between the first prediction information and the second prediction information, the loop body instruction is exited.
15. The instruction processing apparatus according to claim 14, further comprising: The instruction prefetch circuit is configured to prefetch the target instruction indicated by the second prediction information and place the target instruction into the instruction stream before the loop of the instruction exiting the loop body.
16. A processor comprising the instruction processing means according to any one of claims 13-15.
17. An electronic device comprising: processor; Memory, including one or more computer program modules; The one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules being used to perform the instruction processing method according to any one of claims 1-12.
18. A storage medium for non-transitory storage of computer-executable instructions, wherein, When the computer-executable instructions are executed by a computer, the instruction processing method according to any one of claims 1-12 is performed.