A dual exchange network algorithm fusion heterogeneous computing method based on a CLOS architecture

By adopting a dual-switching network computing fusion heterogeneous computing method based on the CLOS architecture, the problems of unified task organization and controlled scheduling of AI processing for heterogeneous inputs in heterogeneous computing platforms are solved. This enables accurate feedback of AI processing output results and continuity of anomaly handling, thereby improving resource utilization efficiency.

CN122247955APending Publication Date: 2026-06-19HAOHAN DATA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HAOHAN DATA
Filing Date
2026-05-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing heterogeneous computing platforms suffer from several problems: lack of unified task organization for heterogeneous inputs, lack of controlled scheduling for AI processing, difficulty in accurately receiving AI processing outputs, and a tendency for the entire process to restart under abnormal conditions.

Method used

A heterogeneous computing method based on dual-switching network computing, using CLOS architecture, is adopted. By acquiring heterogeneous input data, processing tasks are formed based on object identification fields and control requirements. Processing templates and processing dependency rules are preset, task execution relationships are established, execution carrier units and switching acceptance methods are determined, AI execution reservation information is generated, and execution continues along the backup execution path when verification anomalies occur.

Benefits of technology

It achieves unified organization of heterogeneous inputs, controlled access to AI processing and accurate feedback of results, reduces redundant processing in abnormal situations, and improves the continuity of the processing chain and the efficiency of resource utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247955A_ABST
    Figure CN122247955A_ABST
Patent Text Reader

Abstract

This invention discloses a heterogeneous computing method based on a CLOS architecture with dual-switching network integration. The method acquires heterogeneous input data and forms processing tasks based on object identifier fields and control requirements. It establishes task execution relationships between pre-processing, AI processing, and subsequent processing parts based on preset processing templates and processing dependency rules. Based on processing attributes, resource status, and data reception requirements, it determines the execution unit and switching reception method, forming a main execution path and a backup execution path. Based on a first intermediate result, AI processing requirements, and entry time limit parameters, it generates AI execution reservation information and organizes AI execution tasks to enter the target AI execution resource. According to the task identifier, batch identifier, and result feedback entry identifier, it imports the AI ​​processing output into the subsequent processing part. It performs consistency verification based on the execution context, and if verification fails, it continues execution along the backup execution path based on either the first or second intermediate result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of heterogeneous computing and network computing fusion scheduling technology, and in particular to a dual-switching network computing fusion heterogeneous computing method based on the CLOS architecture. Background Technology

[0002] With the development of collaborative applications of high-speed network processing and intelligent computing, business processing often simultaneously receives network input data, memory access input data, and task control input data, and calls upon AI computing resources as needed in the processing chain to complete inference or augmentation processing. Although existing heterogeneous computing platforms can be configured with network processing units, general processing units, and AI processing units, input data from different sources usually enters their respective processing flows separately, lacking a unified task acceptance mechanism around the same object to be processed, resulting in a lack of tight connection between pre-processing, AI processing, and subsequent processing.

[0003] In the current process of accessing AI resources, most solutions simply send tasks to AI resources for execution, lacking a unified organization for the timing of AI processing entry, the method of acceptance, and the location of AI processing output results feedback. When multiple processing tasks need to share AI computing resources, there is usually a lack of batch processing judgment mechanism around task time limits and resource status, which can easily lead to uncontrolled task waiting, misaligned batch feedback, or output results being disconnected from the original task chain.

[0004] In cases of execution anomalies, existing solutions often employ either a complete retry or a restart from the original input, lacking a segmented takeover mechanism around the staged results. This can easily lead to redundant processing, affecting the continuity of the processing chain and the efficiency of resource utilization. Therefore, a heterogeneous computing method is needed that can achieve unified organization of heterogeneous inputs, controlled access to AI processing, accurate result recovery, and staged continuation of execution in case of anomalies under a dual-switching architecture. Summary of the Invention

[0005] The purpose of this invention is to provide a heterogeneous computing method based on the CLOS architecture with dual-switching network computing integration, in order to solve the problems in existing heterogeneous computing platforms such as the lack of unified task organization for heterogeneous inputs, the lack of controlled scheduling in the AI ​​processing process, the difficulty in accurately receiving AI processing output results, and the easy occurrence of overall reruns in abnormal situations.

[0006] To solve the above-mentioned technical problems, the present invention is implemented using the following technical solution.

[0007] On the one hand, this invention provides a dual-switching network-integrated heterogeneous computing method based on the CLOS architecture, comprising: Acquire heterogeneous input data and formulate processing tasks based on object identification fields and control requirements; Pre-defined processing templates and processing dependency rules are used to establish the task execution relationship between the pre-processing part, AI processing part, and subsequent processing part corresponding to the processing task; By considering the processing attributes of each processing unit, the resource status of candidate execution units, and the data transfer requirements between adjacent processing units, the execution units and exchange transfer methods of each processing unit are determined, forming the main execution path and the backup execution path. Based on the first intermediate result, AI processing requirements, and entry time limit parameters, generate AI execution reservation information and organize AI execution tasks to enter the target AI execution resources; Based on the task identifier, batch identifier, and result feedback entry identifier, the AI ​​processing output is imported into the subsequent processing section; The execution results are validated based on the execution context, and if an error occurs, execution continues along the backup execution path based on the first or second intermediate result.

[0008] Furthermore, the heterogeneous input data includes at least two of the following: network input data, memory access input data, and task control input data; The processing task is formed based on the object identification field and control requirements, including: It generates stream-level processing results from network input data, transaction-level processing results from memory access input data, and control-level processing results from task control input data. Extract the object identifier field from the stream-level processing results, transaction-level processing results, and control-level processing results. Group the input results belonging to the same object to be processed into a candidate input set, and generate a processing task based on the control requirements.

[0009] Furthermore, the step of grouping input results belonging to the same object to be processed into a candidate input set includes: When different input results contain the same object identifier field, the corresponding input results will be merged into the same candidate input set; When the object identification fields are not completely identical but meet the preset mapping relationship within the preset time window, the corresponding input results will be merged into the same candidate input set; The processing task is generated when the candidate input set meets the minimum input completeness condition corresponding to the preset task template.

[0010] Furthermore, the processing dependency rules include at least the pre-completion rules, input source rules, bearer compatibility rules, and exchange acceptance rules; Establishing the task execution relationship between the pre-processing part, AI processing part, and post-processing part corresponding to the processing task includes: The executable order of each processing part is defined according to the aforementioned pre-processing completion rule. The input source rule is defined as the input of the AI ​​processing part being derived from the first intermediate result formed by the preceding processing part. The input of the subsequent processing part is defined as being derived from the output result or the second intermediate result formed by the AI ​​processing part.

[0011] Furthermore, determining the execution bearer unit and switching method of each processing part includes: Based on the processing attributes of each processing part, select execution bearer units whose resource status meets the requirements from the candidate execution bearer units that meet the bearer compatibility rules; When adjacent processing units are located on the same node or the same board, and the latter processing unit can directly access the shared storage area where the output of the former processing unit is located, it is determined that the hardware memory access switching structure will take over the task. When adjacent processing units are located on different nodes or different boards, or when cross-node or cross-board forwarding needs to be completed through the CLOS switching layer, the Ethernet switching structure is determined to take over the task.

[0012] Furthermore, based on the first intermediate result, AI processing requirements, and entry time limit parameters, AI execution reservation information is generated, and AI execution tasks are organized to enter the target AI execution resources, including: For processing tasks of multiple target AI execution resources to be entered, perform consistency determination of AI processing requirements and entry time limit compatibility determination; When multiple processing tasks have the same model identifier, compatible input format, consistent output type, and no conflicting accuracy level, the AI ​​processing requirements are determined to be consistent. When the allowed entry time intervals for multiple processing tasks overlap, and the waiting time added by aggregating the multiple processing tasks into the same inference batch does not exceed the maximum allowed waiting time for each processing task, the entry time limit is determined to be compatible.

[0013] Furthermore, the organization's AI execution task enters the target AI execution resource, including: After determining that the AI ​​processing requirements are consistent and the time limit is compatible, the target AI execution resources are deemed acceptable based on the expected access time of the target AI execution resources, the loaded model identifier, and the available video memory or on-chip storage space. When the target AI execution resources can accommodate the current batch, the first intermediate results corresponding to multiple processing tasks are organized into the same inference batch, and a batch identifier and a task sequence number mapping table within the batch are generated. When the target AI's execution resources cannot accommodate the current batch, the corresponding processing tasks will be organized into AI execution tasks as single tasks.

[0014] Furthermore, the step of importing the AI ​​processing output into the subsequent processing section based on the task identifier, batch identifier, and result feedback entry identifier includes: Receive the execution completion receipt and AI processing output results returned by the target AI execution resource; When the AI ​​processing output corresponds to a single task, the original processing task is located based on the task identifier. When the AI ​​processing output result corresponds to a batch processing, the AI ​​processing output result is split into sub-output results corresponding to each original processing task according to the batch identifier and the batch task sequence number mapping table; Based on the mapping relationship between the task identifier and the result feedback entry identifier, the corresponding AI processing output or sub-output result is written into the subsequent processing entry corresponding to the original processing task.

[0015] Furthermore, the execution context includes at least the main execution path identifier, the exchange acceptance method identifier, the AI ​​execution reservation information, the execution completion receipt, the result return entry identifier, and the result output destination identifier; The consistency verification of execution results based on the execution context includes: Consistency checks include execution path, acceptance method, return entry point, and output destination.

[0016] Furthermore, the step of continuing execution along the backup execution path based on the first intermediate result or the second intermediate result when a verification error occurs includes: When an anomaly occurs before the AI ​​processing is completed, and the first intermediate result is valid, the AI ​​processing part is determined as the recovery starting point. Based on the backup execution path, the AI ​​execution reservation information is regenerated, and the first intermediate result is reorganized into a new AI execution task to continue execution. When an anomaly occurs after AI processing is completed but before subsequent processing is completed, and the second intermediate result is valid, the subsequent processing part is determined as the recovery starting point, and the second intermediate result is imported into the subsequent processing entry in the backup execution path to continue execution.

[0017] Compared with the prior art, the beneficial effects achieved by the present invention are as follows: 1. By forming processing tasks around the same object to be processed from heterogeneous input data, and establishing the task execution relationship between the pre-processing part, the AI ​​processing part and the post-processing part based on the preset processing template and processing dependency rules, inputs from different sources are uniformly received before entering the execution chain, avoiding the disconnect between pre-processing, AI processing and post-processing.

[0018] 2. By combining processing attributes, resource status, and data acceptance requirements, the execution carrier unit and exchange acceptance method are determined, and AI execution reservation information is generated before AI processing begins. This ensures that the timing, method, and result feedback basis of AI processing are determined before execution, reducing the risk of AI processing deviating from the predetermined execution path.

[0019] 3. By performing consistency checks on execution results based on the execution context, and continuing execution along the backup execution path based on the first or second intermediate result when an error occurs, exception handling is based on the takeover of phased results, reducing the repetitive processing caused by starting from the original input again. Attached Figure Description

[0020] Figure 1 The diagram shows the overall architecture of a dual-switch network-integrated heterogeneous computing platform based on the CLOS architecture. Figure 2 The diagram shows a flowchart of a dual-switching network-integrated heterogeneous computing method based on the CLOS architecture. Figure 3 The diagram shows a flowchart of merging heterogeneous input data to generate a processing task. Figure 4 The diagram shows the flowchart for generating appointment information and organizing AI-executed tasks. Figure 5 The diagram shows the flowchart for AI processing output result feedback and abnormal continuation control. Detailed Implementation

[0021] This invention can be modified in many ways and has many embodiments, with specific embodiments shown in the accompanying drawings for detailed description. However, this does not mean that the invention is limited to a specific implementation; it should be understood that all modifications, equivalents, and even substitutions falling within the concept and technical scope of this invention are included in this invention. Similar reference numerals are used for similar constituent elements in the description of the drawings.

[0022] The terms “first,” “second,” “A,” “B,” etc., are used to describe a wide variety of constituent elements, but these constituent elements are not limited by these terms. These terms are used to distinguish one constituent element from others. For example, without departing from the scope of the invention, a first constituent element may be named a second constituent element, and similarly, a second constituent element may be named a first constituent element. The term “and / or” includes a combination of multiple associated descriptions, or one of multiple associated descriptions.

[0023] When it is mentioned that a certain constituent element is "connected" or "linked" to other constituent elements, it can mean not only that the element is directly connected or linked to the other constituent element, but also that there are other constituent elements in between. Conversely, when it is mentioned that a certain constituent element is "directly connected" or "directly linked" to other constituent elements, there are no other constituent elements in between.

[0024] The terminology used in this application is for illustrative purposes only and is not intended to limit the scope of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. Terms such as "comprising" or "having" as used herein do not preclude the possibility of the presence or addition of features, numbers, stages, actions, constituent elements, components, or combinations thereof described in the specification.

[0025] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0026] Terms defined in common dictionaries should be interpreted as having the same meaning as in the context of the relevant technology, and should not be interpreted as having an ideal or overly formal meaning unless explicitly defined in this application.

[0027] This embodiment discloses a dual-switching heterogeneous computing method based on the CLOS architecture, the processing of which runs on a dual-switching heterogeneous computing platform based on the CLOS architecture, designed for high-speed network processing and AI computing power collaboration scenarios. Figure 1 As shown, the platform includes at least an access processing unit, a main control and scheduling unit, a heterogeneous computing unit, an Ethernet switching structure, a hardware memory access switching structure, a result control unit, and a reconfiguration control unit. The heterogeneous computing unit includes at least a reconfigurable processing unit, a general-purpose processing unit, and an AI processing unit. The access processing unit receives heterogeneous input data and generates input results. The main control and scheduling unit organizes the execution process around the processing task and determines the execution path. The heterogeneous computing unit undertakes pre-processing, AI processing, and post-processing. The Ethernet switching structure and the hardware memory access switching structure respectively undertake different types of data exchange. The result control unit verifies the correspondence between the execution result and the execution process. The reconfiguration control unit switches the execution path based on the staged results and continues executing the remaining processing when an execution anomaly occurs. The Ethernet switching structure undertakes cross-node or cross-board data forwarding completed through the CLOS switching layer. The hardware memory access switching structure undertakes shared storage access or block data exchange within the same node or board.

[0028] like Figure 2 As shown, in this embodiment, the platform performs the following steps: S101. Obtain heterogeneous input data and form processing tasks based on object identification fields and control requirements; S102. Preset processing templates and processing dependency rules to establish the task execution relationship between the pre-processing part, AI processing part and subsequent processing part corresponding to the processing task; S103. Based on the processing attributes of each processing part, the resource status of the candidate execution bearing unit, and the data transfer requirements between adjacent processing parts, determine the execution bearing unit and the exchange and transfer method of each processing part, and form the main execution path and the backup execution path. S104. Based on the first intermediate result, AI processing requirements and entry time limit parameters, generate AI execution reservation information and organize AI execution tasks to enter the target AI execution resources; S105. Based on the task identifier, batch identifier, and result feedback entry identifier, import the AI ​​processing output results into the subsequent processing section; S106. Perform consistency verification on the execution result based on the execution context, and continue execution along the backup execution path based on the first intermediate result or the second intermediate result if the verification fails.

[0029] Specifically, such as Figures 2 to 5 As shown, the access processing unit receives heterogeneous input data entering the heterogeneous computing platform. The heterogeneous input data includes at least two of the following: network input data, memory access input data, and task control input data. The access processing unit performs type identification on the heterogeneous input data according to the input source interface, field format, or protocol type, and forms corresponding input results according to the input type. It forms stream-level processing results for network input data, transaction-level processing results for memory access input data, and control-level processing results for task control input data.

[0030] The master control scheduling unit extracts an object identifier field from the stream-level processing results, transaction-level processing results, and control-level processing results to characterize the object to be processed. The object identifier field is at least one of the following: session identifier, transaction identifier, task identifier, request sequence number, and cache access identifier. The master control scheduling unit uses the object identifier field as the merging basis to perform association processing on the stream-level processing results, transaction-level processing results, and control-level processing results. When different input results contain the same object identifier field, the corresponding input results will be merged into the candidate input set corresponding to the same object to be processed. When the object identifier fields are not completely identical but satisfy the preset mapping relationship within the preset time window, the corresponding input results will be merged into the candidate input set corresponding to the same object to be processed. When the corresponding input result has neither the same object identifier field nor satisfies the mapping relationship within the preset time window, the corresponding input result is retained in the pending completion queue.

[0031] Extract the control requirements corresponding to the current object to be processed from the control-level processing results. These control requirements include at least one of the following: result output destination, processing time limit level, model call requirements, and anomaly recovery level. Generate a task description item based on the candidate input set and the control requirements. When the candidate input set meets the minimum input completeness condition corresponding to the current task type, a processing task is generated with the task description item, and a task identifier, object identifier, input source set, control requirement set, and input completeness status are written for the processing task. When the candidate input set does not meet the minimum input completeness condition, the current task description item is retained and the missing input is waited for to be filled in, or a restricted processing task is generated according to the degradation formation rule. After the processing task is formed, the stream-level processing result, transaction-level processing result and control-level processing result are all constrained under the same task identifier.

[0032] It should be noted that the platform first performs unified access to heterogeneous input data and forms processing tasks. Then, it establishes the task execution relationship between the pre-processing part, the AI ​​processing part, and the subsequent processing part around the processing tasks. Under the dual-exchange architecture, it determines the execution carrying unit of each processing part and the exchange and acceptance method between adjacent processing parts, forming an execution path. Based on the AI ​​processing part in the execution path, it generates AI execution reservation information and organizes AI execution tasks based on the first intermediate result. After the AI ​​processing is completed, the AI ​​processing output result is repositioned to the subsequent processing part corresponding to the original processing task. Before the result is output, the execution context is formed by combining the execution path, AI execution reservation information, execution completion receipt, and result output destination, and the consistency verification of the execution result is performed based on the execution context. When an anomaly occurs, depending on whether the first intermediate result or the second intermediate result is currently retained, the remaining processing is completed along the backup execution path, using either the AI ​​processing part or the subsequent processing part as the starting point for continued execution.

[0033] The generated processing task is read, the task type corresponding to the processing task is identified, and a preset processing template corresponding to the task type is retrieved. The preset processing template includes at least a set of processing parts, the order of each processing part, the input type and output type of each processing part, the type of execution carrier unit that each processing part is allowed to enter, and the entry identifier and exit identifier corresponding to each processing part.

[0034] The system reads the processing dependency rules corresponding to the preset processing template and performs instantiation processing on the preset processing template based on the processing dependency rules to form the task execution relationship corresponding to the current processing task. The processing dependency rules include at least the pre-processing completion rules, input source rules, bearer compatibility rules, and exchange acceptance rules. The pre-processing completion rules are used to limit the formation of the corresponding output result by the pre-processing part before the subsequent processing part enters the executable state. The input source rules are used to limit the input of the AI ​​processing part to the first intermediate result formed by the pre-processing part, and to limit the input of the subsequent processing part to the AI ​​processing output result formed by the AI ​​processing part or the second intermediate result imported when abnormal continuation occurs. The bearer compatibility rules are used to limit the types of execution bearer units that different processing parts are allowed to enter. The exchange acceptance rules are used to limit the selection of the corresponding exchange acceptance method between adjacent processing parts based on the data transmission location, access method, and data block form.

[0035] Based on the preset processing template and the processing dependency rules, the processing process corresponding to the current processing task is organized into a pre-processing part, an AI processing part, and a subsequent processing part. The output of the pre-processing part constitutes the input source of the AI ​​processing part, and the output of the AI ​​processing part constitutes the input source of the subsequent processing part. For task types that include additional processing steps, the main control scheduling unit can also insert additional processing parts between the pre-processing part and the AI ​​processing part, or between the AI ​​processing part and the subsequent processing part, according to the processing dependency rules, while keeping the input source relationship and the pre-processing completion relationship unchanged.

[0036] Read the task execution relationship and obtain the current resource status of each candidate execution bearer unit and the current switching status of the dual switching structure. The current resource status includes at least one of idle degree, queue length, available storage space, current load level, and expected waiting time; the current switching status includes at least one of the availability status of the hardware memory access switching structure, the availability status of the Ethernet switching structure, and the congestion status of the corresponding switching link.

[0037] The processing attributes corresponding to each processing part are identified. These attributes include at least one of the following: computation type, data block size, latency level, whether AI computing power is required, whether shared storage access is required, and output result type. For each processing part, candidate execution bearer units that meet the current resource status requirements are selected from the set of candidate execution bearer units that satisfy the bearer compatibility rules. When multiple candidate execution bearer units that meet the requirements exist, the main control scheduling unit performs a comparison based on the expected waiting time, the current load level, and the latency level corresponding to the processing part, and selects the execution bearer unit with the shorter expected waiting time and that meets the latency level requirements, so as to determine the pre-processing unit of the preceding processing part, the AI ​​execution unit of the AI ​​processing part, and the subsequent execution unit of the subsequent processing part.

[0038] The system identifies data transfer requirements between adjacent processing units. These requirements include at least whether the data is transferred to the same node or board, whether the subsequent processing unit can directly access the shared storage area where the output of the preceding processing unit is located, whether cross-domain forwarding via the CLOS switching layer is required, and whether the data transfer involves at least one of block-level switching or memory access switching. When adjacent processing units are located on the same node or board, and the subsequent processing unit can directly access the shared storage area where the output of the preceding processing unit is located, the main control scheduling unit determines that the hardware memory access switching structure will handle the transfer. When adjacent processing units are located on different nodes or boards, or when cross-node or cross-board forwarding via the CLOS switching layer is required, the main control scheduling unit determines that the Ethernet switching structure will handle the transfer.

[0039] Based on the determined execution unit and switching method, a main execution path corresponding to the current processing task is generated, and a path identifier, a path node set, and a path switching method set are written for the main execution path. Without changing the current task execution relationship, candidate execution units and candidate switching links with high congestion or low reliability can be excluded to generate at least one backup execution path, and a backup path identifier is written for the backup execution path.

[0040] The main execution path is read, the target AI execution resource corresponding to the AI ​​processing part is identified, and the current resource status of the target AI execution resource is obtained. The current resource status includes at least one of the following: current queue length, estimated access time, loaded model identifier, available video memory or on-chip storage space, and current batch processing capacity limit. The main control scheduling unit extracts entry time limit parameters from the control requirements corresponding to the current processing task. The entry time limit parameters include at least one of the following: allowed entry time interval, maximum allowed waiting time, and timeout threshold.

[0041] For one or more processing tasks to be processed by the target AI execution resource, their corresponding first intermediate results and AI processing requirement descriptions are read respectively. The AI ​​processing requirement description includes at least one of the following: model identifier, input format, output type, and precision level. AI processing requirement consistency determination: When multiple processing tasks have the same model identifier, compatible input format, consistent output type, and no conflicting accuracy level, it is determined that the AI ​​processing requirements corresponding to the multiple processing tasks are consistent. If any of the multiple processing tasks has a different model identifier, incompatible input format, inconsistent output type, or conflicting accuracy level, it is determined that the AI ​​processing requirements corresponding to the multiple processing tasks are inconsistent.

[0042] If the AI ​​processing requirements are consistent, an entry time limit compatibility determination is performed. The allowed entry time intervals corresponding to the multiple processing tasks are read respectively, and it is determined whether there is an overlapping interval between the allowed entry time intervals. When there are overlapping intervals, and the waiting time added by aggregating the multiple processing tasks into the same inference batch does not exceed the maximum allowable waiting time for each processing task, it is determined that the multiple processing tasks are compatible in terms of entry time limit. When there is no overlapping interval, or when the aggregated waiting time exceeds the maximum allowed waiting time for any processing task, it is determined that the entry time limits of the multiple processing tasks are incompatible. The aggregated waiting time is jointly determined by the waiting time required for batch formation and the current queuing waiting time of the target AI execution resource.

[0043] After determining that the entry time limit is compatible, the execution resources of the target AI can be accepted: When the expected access time of the target AI execution resource falls within the overlapping interval, and the currently available video memory or on-chip storage space meets the batch input requirements, and the loaded model identifier is consistent with the model identifier corresponding to the multiple processing tasks, or when the corresponding model switching latency still meets the timeout threshold requirement when a model switching occurs, it is determined that the target AI execution resource can accept the current batch. If the expected access time of the target AI execution resource does not fall within the overlapping interval, or if the currently available video memory or on-chip storage space is insufficient, or if the model switching latency exceeds the timeout threshold requirement, it is determined that the target AI execution resource cannot accept the current batch.

[0044] When all three conditions are met simultaneously—consistent AI processing requirements, compatible entry timeframes, and acceptable target AI execution resources—the first intermediate results corresponding to the multiple processing tasks are organized into the same inference batch. A batch identifier, a task sequence number mapping table within the batch, and a batch input encapsulation result are generated for the inference batch. AI execution reservation information is generated based on the batch identifier, target AI execution resource identifier, reservation time window, exchange acceptance method, result return entry identifier, and timeout threshold. When any of the above three conditions is not met, the corresponding processing tasks are organized into AI execution tasks as single tasks. Corresponding AI execution reservation information is generated based on the task identifier, target AI execution resource identifier, reservation time window, exchange acceptance method, result return entry identifier, and timeout threshold. The result return entry identifier is used to characterize the subsequent processing entry point that the AI ​​processing output result should enter after returning.

[0045] The predetermined main execution path drives the preprocessing section to execute on the corresponding preprocessing unit, forming a first intermediate result for further processing by the AI ​​processing section. This first intermediate result is stored in association with the task identifier, object identifier, main execution path identifier, and result callback entry identifier of the current processing task. Based on the generated AI execution reservation information, the first intermediate result is organized into a corresponding AI execution task, and the AI ​​execution task is sent to the target AI execution resource for AI processing according to the determined exchange and acceptance method.

[0046] After the target AI execution resource completes the corresponding AI processing, it returns an execution completion receipt and the AI ​​processing output result to the main control scheduling unit. The execution completion receipt includes at least one of the following: task identifier or batch identifier, target AI execution resource identifier, execution completion status, and execution completion time. The corresponding AI execution task is identified based on the task identifier or batch identifier in the execution completion receipt; when the AI ​​processing output result corresponds to a single task, the original processing task is located based on the task identifier; when the AI ​​processing output result corresponds to a batch processing, the AI ​​processing output result is split into sub-output results corresponding to each of the original processing tasks based on the batch identifier and the batch task sequence number mapping table.

[0047] By mapping the task identifier to the result callback entry identifier, the corresponding AI processing output result or the split sub-output result is written to the subsequent processing entry corresponding to the original processing task, allowing the subsequent processing part to continue execution from the subsequent processing entry. The subsequent processing part performs result integration, control generation, or result output processing based on the AI ​​processing output result, forming a second intermediate result or the final execution result. For interim results formed when the preceding processing part is completed but the AI ​​processing part is not yet completed, they are retained as the first intermediate result; for interim results formed when the AI ​​processing part is completed but the subsequent processing part is not yet completed, they are retained as the second intermediate result. Both the first and second intermediate results are associated with the current task identifier, the completed processing part identifier, the pending processing part identifier, and the storage location identifier.

[0048] During the execution of the current task, the master control scheduling unit combines the main execution path identifier, the exchange acceptance method identifier, the AI ​​execution reservation information, the execution completion receipt, the result return entry identifier, the result output destination identifier, and the stage time information to form an execution context corresponding to the current execution result. This execution context is used to characterize the correspondence between the current execution result and the established execution chain.

[0049] The result control unit performs a consistency check on the current execution result based on the execution context. This consistency check includes at least path consistency check, connection method consistency check, callback entry consistency check, and output destination consistency check. The path consistency check is used to determine whether the resource execution record in the execution completion receipt is consistent with the main execution path identifier. When the resource execution record in the execution completion receipt is consistent with the main execution path identifier, the path is considered consistent; when the resource execution record in the execution completion receipt is inconsistent with the main execution path identifier, the path is considered deviated.

[0050] The consistency check of the receiving method is used to determine whether the receiving method identifier corresponding to the current result feedback is consistent with the receiving method recorded in the AI ​​execution reservation information; if they are consistent, the receiving method is determined to be consistent; if they are inconsistent, the receiving method is determined to be deviated. The consistency check of the callback entry is used to determine whether the subsequent processing entry identifier written to the current result is consistent with the result callback entry identifier corresponding to the original processing task; if they are consistent, the callback entry is determined to be consistent; if they are inconsistent, the callback entry is determined to be deviated. The consistency check of the output destination is used to determine whether the output destination identifier corresponding to the current execution result is consistent with the result output destination recorded in the control requirements; if they are consistent, the output destination is determined to be consistent; if they are inconsistent, the output destination is determined to be deviated.

[0051] The result control unit also performs time limit checks based on the correspondence between the execution completion time and the scheduled time window. If the execution completion time exceeds the scheduled time window or the timeout threshold corresponding to the control requirements, a time limit anomaly is determined in the current execution result. When all the aforementioned consistency checks pass, the result control unit outputs a consistency pass result, allowing the current execution result to enter the normal output process. If any of the aforementioned checks fails, the result control unit outputs an anomaly trigger result, writing the anomaly type, anomaly stage, and the corresponding task identifier, while simultaneously preventing the current execution result from entering the normal output process. The anomaly stage is used to characterize whether the anomaly occurs before the AI ​​processing is completed or after the AI ​​processing is completed but before subsequent processing is completed.

[0052] After the result control unit outputs the abnormal trigger result, the reconstruction control unit reads the abnormal type, abnormal stage, main execution path identifier, alternative execution path candidate set, and the reserved records of the first and second intermediate results corresponding to the current task. The reconstruction control unit performs filtering on the alternative execution path candidate set, excluding candidate paths containing the current abnormal resources or abnormal exchange links, and selects the path that meets the current processing part's bearing compatibility requirements and exchange acceptance requirements as the recovery path from the remaining candidate paths.

[0053] When the anomaly phase indicates that the anomaly occurred before the AI ​​processing was completed, and the first intermediate result corresponding to the current task is validly available, the reconstruction control unit determines the AI ​​processing portion as the recovery starting point. The first intermediate result being validly available means that the task identifier, storage location identifier, and pending processing portion identifier corresponding to the first intermediate result are all complete and usable. Based on the recovery path, the reconstruction control unit regenerates the AI ​​execution reservation information and reorganizes the first intermediate result into a new AI execution task, sending it to the target AI execution resource in the recovery path, allowing the current processing task to continue execution from the AI ​​processing portion onwards.

[0054] When the anomaly stage indicates that the anomaly occurred after AI processing was completed but before subsequent processing was completed, and the second intermediate result corresponding to the current task is validly available, the reconstruction control unit determines the subsequent processing section as the recovery starting point. The second intermediate result being validly available means that the task identifier, storage location identifier, and pending processing section identifier corresponding to the second intermediate result are all complete and usable. The reconstruction control unit directly imports the second intermediate result into the subsequent processing entry point in the recovery path, allowing the current processing task to continue execution from the subsequent processing section, without re-executing the preceding processing section and the AI ​​processing section.

[0055] If no satisfactory recovery path exists in the candidate set of backup execution paths, or if neither the first nor the second intermediate result has a valid retention record, the reconstruction control unit outputs a recovery failure status and places the current task into the exception handling queue. If a valid recovery path exists and a valid intermediate result retention record exists, the backup execution path only takes over the remaining processing portion of the current task, avoiding restarting the processing flow from the original heterogeneous input data.

[0056] In summary, the access processing unit of this invention performs unified access and object merging on heterogeneous input data, and forms processing tasks based on control requirements; the main control scheduling unit establishes the task execution relationship between the pre-processing part, the AI ​​processing part, and the subsequent processing part according to the preset processing template and processing dependency rules, determines the main execution path and the backup execution path according to the resource status of the execution unit and the exchange status of the dual exchange structure, generates AI execution reservation information and organizes AI execution tasks based on the consistency of AI processing requirements, the compatibility of entry time limits, and the acceptability of target AI execution resources; after the target AI execution resource returns the AI ​​processing output result, the main control scheduling unit re-aligns the AI ​​processing output result to the subsequent processing part corresponding to the original processing task according to the task identifier, batch identifier, and result return entry identifier; the result control unit forms an execution context by combining the execution path, AI execution reservation information, execution completion receipt, and result output destination, and performs consistency verification accordingly; in the event of an anomaly, the reconstruction control unit determines the recovery starting point according to the first intermediate result or the second intermediate result, and continues to execute the remaining processing along the backup execution path.

[0057] By merging object identifiers, instantiating processing templates, and constraining processing dependencies, a unified task chain is formed that runs through heterogeneous inputs, preprocessing, AI processing, and postprocessing, enabling network-side processing, compute-side processing, and control-side processing to execute collaboratively around the same task identifier. By incorporating resource status, entry time limit parameters, and AI processing requirements into the AI ​​execution reservation information generation process, batch processing organization and single-task entry diversion control around target AI execution resources are achieved, which helps improve the utilization efficiency of AI computing power resources while meeting time constraints.

[0058] Establishing a mapping relationship between AI processing output and the original processing task by using batch identifiers, a batch task sequence number mapping table, and a result callback entry identifier helps reduce the risk of output misalignment in batch processing scenarios and improves the accuracy of AI processing output callback. Recording path identifiers, acceptance method identifiers, execution completion receipts, and output destination information in the execution context, and using this to perform consistency checks and anomaly stage identification, helps improve the consistency control capability between execution results and the established execution chain.

[0059] By retaining the first and second intermediate results and combining them with the backup execution path to select the recovery starting point, the system can continue processing only the unfinished parts when an execution exception occurs. This helps maintain the continuity of the processing chain and reduce the overhead of repeated processing.

[0060] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0061] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0062] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0063] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0064] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of the present invention without departing from the spirit and scope of the claims. All of these forms are within the protection scope of the present invention.

Claims

1. A dual-switching network-integrated heterogeneous computing method based on CLOS architecture, characterized in that, include: Acquire heterogeneous input data and formulate processing tasks based on object identification fields and control requirements; Pre-defined processing templates and processing dependency rules are used to establish the task execution relationship between the pre-processing part, AI processing part, and subsequent processing part corresponding to the processing task; By considering the processing attributes of each processing unit, the resource status of candidate execution units, and the data transfer requirements between adjacent processing units, the execution units and exchange transfer methods of each processing unit are determined, forming the main execution path and the backup execution path. Based on the first intermediate result, AI processing requirements, and entry time limit parameters, generate AI execution reservation information and organize AI execution tasks to enter the target AI execution resources; Based on the task identifier, batch identifier, and result feedback entry identifier, the AI ​​processing output is imported into the subsequent processing section; The execution results are validated based on the execution context, and if an error occurs, execution continues along the backup execution path based on the first or second intermediate result.

2. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, The heterogeneous input data includes at least two of the following: network input data, memory access input data, and task control input data. The processing task is formed based on the object identification field and control requirements, including: It generates stream-level processing results from network input data, transaction-level processing results from memory access input data, and control-level processing results from task control input data. Extract the object identifier field from the stream-level processing results, transaction-level processing results, and control-level processing results. Group the input results belonging to the same object to be processed into a candidate input set, and generate a processing task based on the control requirements.

3. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 2, characterized in that, The step of grouping input results belonging to the same object to be processed into a candidate input set includes: When different input results contain the same object identifier field, the corresponding input results will be merged into the same candidate input set; When the object identification fields are not completely identical but meet the preset mapping relationship within the preset time window, the corresponding input results will be merged into the same candidate input set; The processing task is generated when the candidate input set meets the minimum input completeness condition corresponding to the preset task template.

4. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, The processing dependency rules include at least the pre-completion rules, input source rules, bearer compatibility rules, and exchange acceptance rules; Establishing the task execution relationship between the pre-processing part, AI processing part, and post-processing part corresponding to the processing task includes: The executable order of each processing part is defined according to the aforementioned pre-processing completion rule. The input source rule is defined as the input of the AI ​​processing part being derived from the first intermediate result formed by the preceding processing part. The input of the subsequent processing part is defined as being derived from the output result or the second intermediate result formed by the AI ​​processing part.

5. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 4, characterized in that, The determination of the execution unit and switching method for each processing part includes: Based on the processing attributes of each processing part, select execution bearer units whose resource status meets the requirements from the candidate execution bearer units that meet the bearer compatibility rules; When adjacent processing units are located on the same node or the same board, and the latter processing unit can directly access the shared storage area where the output of the former processing unit is located, it is determined that the hardware memory access switching structure will take over the task. When adjacent processing units are located on different nodes or different boards, or when cross-node or cross-board forwarding needs to be completed through the CLOS switching layer, the Ethernet switching structure is determined to take over the task.

6. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, Based on the first intermediate result, AI processing requirements, and entry time limit parameters, AI execution reservation information is generated, and AI execution tasks are organized to enter the target AI execution resources, including: For processing tasks of multiple target AI execution resources to be entered, perform consistency determination of AI processing requirements and entry time limit compatibility determination; When multiple processing tasks have the same model identifier, compatible input format, consistent output type, and no conflicting accuracy level, the AI ​​processing requirements are considered to be consistent. When the allowed entry time intervals for multiple processing tasks overlap, and the waiting time added by aggregating the multiple processing tasks into the same inference batch does not exceed the maximum allowed waiting time for each processing task, the entry time limit is determined to be compatible.

7. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 6, characterized in that, The organization's AI execution tasks enter the target AI execution resources, including: After determining that the AI ​​processing requirements are consistent and the time limit is compatible, the target AI execution resources are deemed acceptable based on the expected access time of the target AI execution resources, the loaded model identifier, and the available video memory or on-chip storage space. When the target AI execution resources can accommodate the current batch, the first intermediate results corresponding to multiple processing tasks are organized into the same inference batch, and a batch identifier and a task sequence number mapping table within the batch are generated. When the target AI's execution resources cannot accommodate the current batch, the corresponding processing tasks will be organized into AI execution tasks as single tasks.

8. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, The step of importing the AI ​​processing output into the subsequent processing section based on the task identifier, batch identifier, and result feedback entry identifier includes: Receive the execution completion receipt and AI processing output results returned by the target AI execution resource; When the AI ​​processing output corresponds to a single task, the original processing task is located based on the task identifier. When the AI ​​processing output result corresponds to a batch processing, the AI ​​processing output result is split into sub-output results corresponding to each original processing task according to the batch identifier and the batch task sequence number mapping table; Based on the mapping relationship between the task identifier and the result feedback entry identifier, the corresponding AI processing output or sub-output result is written into the subsequent processing entry corresponding to the original processing task.

9. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, The execution context includes at least the main execution path identifier, the exchange acceptance method identifier, the AI ​​execution reservation information, the execution completion receipt, the result return entry identifier, and the result output destination identifier; The consistency verification of execution results based on the execution context includes: Consistency checks include execution path, acceptance method, return entry point, and output destination.

10. The heterogeneous computing method based on CLOS architecture with dual switching network as described in claim 1, characterized in that, The step of continuing execution along the backup execution path based on the first intermediate result or the second intermediate result when a verification error occurs includes: When an anomaly occurs before the AI ​​processing is completed, and the first intermediate result is valid, the AI ​​processing part is determined as the recovery starting point. Based on the backup execution path, the AI ​​execution reservation information is regenerated, and the first intermediate result is reorganized into a new AI execution task to continue execution. When an anomaly occurs after AI processing is completed but before subsequent processing is completed, and the second intermediate result is valid, the subsequent processing part is determined as the recovery starting point, and the second intermediate result is imported into the subsequent processing entry in the backup execution path to continue execution.