Graph execution data flow conversion method, device, storage medium and equipment
By orchestrating a three-stage queue and data transformation operators, no-code data processing is achieved, solving the problem of high programming skill requirements, improving data processing performance, and making it suitable for artificial intelligence training data processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING EYECOOL TECHNOLOGY CO LTD
- Filing Date
- 2024-12-30
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309045A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence, and in particular to a method, apparatus, storage medium and device for executing graph data flow. Background Technology
[0002] Artificial Intelligence (AI) is a new branch of technology that studies and develops theories, methods, technologies, and application systems to simulate, extend, and expand human intelligence. It is widely used in machine vision, fingerprint recognition, facial recognition, retinal recognition, iris recognition, palmprint recognition, expert systems, automated planning, intelligent search, theorem proving, game theory, automatic programming, intelligent control, robotics, language and image understanding, genetic programming, and other fields.
[0003] When processing training data for AI models, methods typically include direct code processing or low-code processing. Low-code processing often focuses on data state transformations. For example... Figure 1 The aforementioned open-source Spark technology.
[0004] The above method requires users to have very good programming skills, which means that algorithm engineers who do not have good programming skills cannot define their own data processing flow. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this application provides an execution graph data flow method, apparatus, storage medium, and device, which reduces programming skill requirements and eliminates the need for intermediate state storage, thus achieving better processing performance.
[0006] The technical solution provided in this application is as follows:
[0007] Firstly, this application provides a method for execution graph data flow, the method comprising:
[0008] Create the upstream task processing queue, the current stage task processing queue, and the downstream task processing queue;
[0009] Obtain tasks from the upstream task processing queue, construct context objects from the obtained tasks, and place them into the current stage task processing queue;
[0010] Retrieve tasks from the current stage task processing queue and find the corresponding processing operator definition based on the context object;
[0011] Based on the defined processing operator, determine the input, create an operator object, and begin data processing;
[0012] After data processing is completed, output data is obtained and submitted to the corresponding task processing queue.
[0013] Furthermore, the data processing includes:
[0014] Based on the output entries of the upstream operator, construct the input matrix, and use the input to create an operator object for multiple data processing operations;
[0015] The number of data processing operations is consistent with the number of output entries of the upstream operator;
[0016] And / or;
[0017] An output form is constructed based on the number of downstream operators, and an operator object is created using the input to perform a deep copy of the output data;
[0018] The number of copies of the output data after deep copying is the same as the number of downstream operators.
[0019] Furthermore, the step of retrieving a task from the current stage task processing queue includes, prior to:
[0020] Determine whether to enter the main loop;
[0021] If the task processing queue for the current stage is empty or the number of processing attempts for the current stage has exceeded the set maximum number of attempts, the main loop will exit; otherwise, the main loop will enter and the step of obtaining tasks from the task processing queue for the current stage will be executed.
[0022] Furthermore, submitting the output to the corresponding task processing queue includes:
[0023] Determine whether the downstream operator belongs to the current stage. If so, submit the output to the current stage task processing queue and return to the step of determining whether to enter the main loop; otherwise, submit the output to the downstream task processing queue and return to the step of determining whether to enter the main loop.
[0024] Furthermore, the step of determining the input, creating an operator object, and starting data processing based on the defined processing operator, before proceeding includes:
[0025] Based on the defined processing operator, determine whether the current context already meets the operator processing requirements. If so, execute the step of determining the input, creating an operator object, and starting data processing based on the defined processing operator; otherwise, return to the step of determining whether to enter the main loop.
[0026] Secondly, this application provides an execution graph data transfer device, the device comprising:
[0027] The queue creation module is used to create upstream task processing queues, current stage task processing queues, and downstream task processing queues;
[0028] The task acquisition module is used to acquire tasks from the upstream task processing queue, construct the acquired tasks into context objects, and put them into the current stage task processing queue.
[0029] The operator acquisition module is used to acquire tasks from the current stage task processing queue and find the corresponding processing operator definition based on the context object;
[0030] The data processing module is used to determine the input, create an operator object, and start data processing according to the defined processing operator.
[0031] The data output module is used to obtain output data after data processing is completed, and submit the output data to the corresponding task processing queue.
[0032] Furthermore, the data processing module includes:
[0033] The first processing unit is used to construct an input matrix based on the output entries of the upstream operator, and to create an operator object using the input for multiple data processing operations.
[0034] The number of data processing operations is consistent with the number of output entries of the upstream operator;
[0035] And / or;
[0036] The second processing unit is used to construct an output form based on the number of downstream operators, and to create an operator object using the input to perform a deep copy of the output data.
[0037] The number of copies of the output data after deep copying is the same as the number of downstream operators.
[0038] Furthermore, the device also includes:
[0039] The first judgment module is used to determine whether to enter the main loop;
[0040] If the task processing queue is empty or the number of processing times in the current stage has exceeded the set maximum number of times, the main loop will exit; otherwise, the main loop will be entered and the operator acquisition module will be executed.
[0041] Furthermore, the data output module includes:
[0042] The first judgment unit is used to determine whether the downstream operator belongs to the current stage. If it does, the output is submitted to the current stage task processing queue and returned to the first judgment module; otherwise, the output is submitted to the downstream task processing queue and returned to the first judgment module.
[0043] Furthermore, the device also includes:
[0044] The second judgment module is used to determine whether the current context has met the operator processing requirements based on the processing operator definition. If so, the data processing module is executed; otherwise, the process returns to the first judgment module.
[0045] Thirdly, this application provides a computer-readable storage medium for performing graph data transfer, including a memory for storing processor-executable instructions, which, when executed by the processor, implement the steps of the graph data transfer method described in the first aspect.
[0046] Fourthly, this application provides an apparatus for performing graph data transfer, characterized in that it includes at least one processor and a memory storing computer-executable instructions, wherein the processor executes the instructions to implement the steps of the graph data transfer method described in the first aspect.
[0047] This application has the following beneficial effects:
[0048] This application creates a three-stage queue for each stage, including an upstream task processing queue, a current stage task processing queue, and a downstream task processing queue, realizing a data flow and task flow scheme when data propagates between different operators. This application is a no-code data processing method that orchestrates pre-defined data transformation operators for data processing. It does not require users to have extensive programming skills, making it convenient for algorithm engineers without strong programming abilities to define their own data processing flows. Furthermore, during data flow and task flow when data propagates between different operators, all data processing does not require intermediate state storage, achieving better processing performance. Attached Figure Description
[0049] Figure 1 This is a schematic diagram of the existing open-source Spark technology;
[0050] Figure 2 This is a flowchart illustrating an example of the execution graph data flow method of this application;
[0051] Figure 3 This is a diagram illustrating a deep copy.
[0052] Figure 4 This is an example diagram illustrating the specific process of data processing in S400 of this application;
[0053] Figure 5 This is a flowchart illustrating another example of the execution graph data flow method of this application;
[0054] Figure 6 This is a schematic diagram of an example of the execution diagram data transfer device of this application. Detailed Implementation
[0055] To make the technical problems, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0056] This application provides an execution graph data transfer method, which is used for execution graph data transfer in artificial intelligence (AI) training data processing, such as... Figure 2 As shown, the method includes:
[0057] S100: Create the upstream task processing queue (INPUT), the current stage task processing queue (PROCESSING), and the downstream task processing queue (OUTPUT).
[0058] S200: Retrieve tasks from the upstream task processing queue, construct a context object (Context for short) from the retrieved tasks, and put it into the task processing queue of the current stage.
[0059] S300: Retrieves a task from the current stage task processing queue and finds the corresponding processing operator definition (ProcessorDefine) based on the context object.
[0060] S400: Based on the processing operator definition, determine the input, create an operator object, and begin data processing.
[0061] S500: After data processing is completed, the output data is obtained and submitted to the corresponding task processing queue.
[0062] This application creates a three-stage queue for each stage, including an upstream task processing queue, a current stage task processing queue, and a downstream task processing queue, realizing a data flow and task flow scheme when data propagates between different operators. This application is a no-code data processing method that orchestrates pre-defined data transformation operators for data processing. It does not require users to have extensive programming skills, making it convenient for algorithm engineers without strong programming abilities to define their own data processing flows. Furthermore, during data flow and task flow when data propagates between different operators, all data processing does not require intermediate state storage, achieving better processing performance.
[0063] As an improvement to the embodiments of this application, such as Figure 4 As shown, the specific process of the aforementioned data processing includes:
[0064] S410: Construct the input matrix based on the output items of the upstream operator, and use the input to create operator objects for multiple data processing operations.
[0065] When preparing for data processing, since the number of output items of the upstream operator of multi-input ProcessorDefine is inconsistent, this application constructs an input matrix and then uses the same operator object to perform data processing multiple times. The number of data processing times is consistent with the number of output items of the upstream operator, so as to achieve the purpose of being consistent with the number of output items of the multi-input upstream operator.
[0066] In existing technologies, when the data processing flow branches, automatic deep copying of the data is not possible. This necessitates reprocessing the data multiple times from the beginning, following different processing branches, resulting in redundant data processing. To address this issue, this application provides the following solution:
[0067] S420: Construct an output form based on the number of downstream operators, and use the input to create operator objects to perform a deep copy of the output data.
[0068] When preparing for data processing, since the number of downstream operators for each output of a single-output / multi-output ProcessorDefine may vary, this application constructs an output form (OutputCnt form), in which the operators perform a deep copy of the output. The number of copies of the output data after the deep copy is consistent with the number of downstream operators, so as to achieve the purpose of matching the number of downstream operators.
[0069] The data in this application can be automatically deep-copied according to the orchestrated output, and the output data can be automatically deep-copied into multiple copies for use by multiple downstream operators. This can reduce redundant data processing and allow intermediate data to be directly processed in various ways.
[0070] In a specific example, such as Figure 3 As shown, after reading the file and loading the image data, the loaded image data is automatically deep copied into two copies according to the arrangement. Figure 3 The "Adjust image size (fixed size)" and "Adjust image size (proportional)" functions ensure that the input image sizes of the two downstream operators are consistent, reducing redundant data processing. Finally, the image data is stored and the record is saved.
[0071] Existing data processing technologies are generally based on DAG (Directed Acyclic Graph) for arranging data processing flows, which cannot meet the needs of cyclic data processing, and users can only implement it themselves by using an outer loop.
[0072] To solve the above problems, such as Figure 5 As shown, before S300 and after S200, this application further includes the following steps:
[0073] S210: Determine whether to enter the main loop;
[0074] If the task processing queue for the current stage is empty or the number of processing steps for the current stage has exceeded the set maximum number (MAX_STEP), then the main loop will exit; otherwise, the main loop will be entered and steps S300 will be executed.
[0075] Specifically, if the task processing queue is empty at the current stage, exit the main loop directly and repeat this step to repeatedly check whether to enter the main loop. If the number of processing attempts at the current stage has exceeded the set maximum, exit the main loop and report an error.
[0076] Accordingly, the aforementioned S500 includes:
[0077] Determine whether the downstream operator belongs to the current stage. If it does, submit the output to the current stage task processing queue and return to step S210; otherwise, submit the output to the downstream task processing queue and return to step S210.
[0078] Furthermore, the series following S300 and preceding S400 includes:
[0079] S310: Based on the operator definition, determine whether the current context already meets the operator processing requirements. If so, execute step S400; otherwise, return to step S210.
[0080] This application processes data in a loop. First, it determines whether to enter the main loop based on a condition. If the condition is met, it enters the main loop and executes step S300. After S300 completes, it proceeds to S310 to determine if the current context satisfies the operator's processing requirements. If so, it executes step S400; otherwise, it returns to step S210 and repeats the process of determining whether to enter the main loop. This method internally arranges a loop-based data processing flow. Compared to existing directed acyclic graph (DAG) technologies, it eliminates the need for an external loop to achieve the required loop-based data processing, reducing development difficulty and workload.
[0081] This application also provides an execution graph data transfer device, such as... Figure 6 As shown, the device includes:
[0082] The queue creation module 100 is used to create the upstream task processing queue, the current stage task processing queue, and the downstream task processing queue.
[0083] The task acquisition module 200 is used to acquire tasks from the upstream task processing queue, construct the acquired tasks into context objects, and put them into the current stage task processing queue.
[0084] The operator acquisition module 300 is used to acquire tasks from the current stage task processing queue and find the corresponding processing operator definition based on the context object.
[0085] The data processing module 400 is used to determine the input, create an operator object, and start data processing based on the processing operator definition.
[0086] The data output module 500 is used to obtain output data after data processing and submit the output data to the corresponding task processing queue.
[0087] This application creates a three-stage queue for each stage, including an upstream task processing queue, a current stage task processing queue, and a downstream task processing queue, realizing a data flow and task flow scheme when data propagates between different operators. This application is a no-code data processing method that orchestrates pre-defined data transformation operators for data processing. It does not require users to have extensive programming skills, making it convenient for algorithm engineers without strong programming abilities to define their own data processing flows. Furthermore, during data flow and task flow when data propagates between different operators, all data processing does not require intermediate state storage, achieving better processing performance.
[0088] The aforementioned data processing module includes:
[0089] The first processing unit is used to construct an input matrix based on the output entries of the upstream operator, and to create operator objects using the input for multiple data processing operations.
[0090] The number of data processing operations is consistent with the number of output entries of the upstream operator.
[0091] And / or.
[0092] The second processing unit is used to construct an output form based on the number of downstream operators and to create operator objects using the input to perform a deep copy of the output data.
[0093] The number of copies of the output data after deep copying is the same as the number of downstream operators.
[0094] As an improvement to the embodiments of this application, the device further includes:
[0095] The first judgment module is used to determine whether to enter the main loop.
[0096] If the task processing queue is empty or the number of processing times in the current stage has exceeded the set maximum number of times, the main loop will exit; otherwise, the main loop will be entered and the operator acquisition module will be executed.
[0097] Correspondingly, the data output module includes:
[0098] The first judgment unit is used to determine whether the downstream operator belongs to the current stage. If it does, the output is submitted to the current stage task processing queue and returned to the first judgment module; otherwise, the output is submitted to the downstream task processing queue and returned to the first judgment module.
[0099] Furthermore, the device also includes:
[0100] The second judgment module is used to determine whether the current context has met the operator processing requirements based on the operator definition. If so, the data processing module is executed; otherwise, the process returns to the first judgment module.
[0101] The apparatus provided in the above embodiments corresponds one-to-one with the embodiments of the aforementioned methods in terms of its implementation principle and the resulting technical effects. For the sake of brevity, any parts of the apparatus not mentioned in the embodiments can be referred to the corresponding content in the embodiments of the aforementioned methods. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the modules and units described in this apparatus can all be referred to the corresponding processes in the embodiments of the aforementioned methods, and will not be repeated here.
[0102] The execution graph data transfer method described in the above embodiments of this application can implement business logic through a computer program and record it on a storage medium. This storage medium can be read and executed by a computer, achieving the effects of the scheme described in the method embodiments of this specification. Therefore, embodiments of this application also provide a computer-readable storage medium for execution graph data transfer, including a memory for storing processor-executable instructions. When these instructions are executed by a processor, they implement the steps of the execution graph data transfer described in the foregoing embodiments.
[0103] The storage medium may include a physical device for storing information, typically digitizing the information and then storing it using electrical, magnetic, or optical methods. The storage medium may include: devices that store information using electrical energy, such as various types of memory, like RAM and ROM; devices that store information using magnetic energy, such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, and USB flash drives; and devices that store information using optical methods, such as CDs or DVDs. Of course, there are other readable storage media, such as quantum memories and graphene memories.
[0104] The storage medium described above may also include other implementation methods according to the description of the method embodiments. The implementation principle and technical effects of this embodiment are the same as those of the foregoing method embodiments. For details, please refer to the description of the relevant method embodiments, which will not be repeated here.
[0105] This application also provides an apparatus for performing graph data transfer. The apparatus may be a standalone computer, or it may include an actual operating device that uses one or more of the methods or embodiments described in this specification. The apparatus for performing graph data transfer may include at least one processor and a memory storing computer-executable instructions. When the processor executes the instructions, it implements the steps of any one or more of the above-described graph data transfer methods.
[0106] The device described above may also include other implementation methods according to the method embodiments. The implementation principle and technical effects of this embodiment are the same as those of the foregoing method embodiments. For details, please refer to the description of the relevant method embodiments, which will not be repeated here.
[0107] Finally, it should be noted that the above-described embodiments are merely specific implementations of this application, used to illustrate the technical solutions of this application, and not to limit them. The protection scope of this application is not limited thereto. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this application; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application. All should be covered within the protection scope of this application. Therefore, the protection scope of this application should be determined by the protection scope of the claims.
Claims
1. A method for executing graph data flow, characterized in that, The method includes: Create the upstream task processing queue, the current stage task processing queue, and the downstream task processing queue; Obtain tasks from the upstream task processing queue, construct context objects from the obtained tasks, and place them into the current stage task processing queue; Retrieve tasks from the current stage task processing queue and find the corresponding processing operator definition based on the context object; Based on the defined processing operator, determine the input, create an operator object, and begin data processing; After data processing is completed, output data is obtained and submitted to the corresponding task processing queue.
2. The execution graph data flow method according to claim 1, characterized in that, The data processing includes: Based on the output entries of the upstream operator, construct the input matrix, and use the input to create an operator object for multiple data processing operations; The number of data processing operations is consistent with the number of output entries of the upstream operator; And / or; An output form is constructed based on the number of downstream operators, and an operator object is created using the input to perform a deep copy of the output data; The number of copies of the output data after deep copying is the same as the number of downstream operators.
3. The execution graph data flow method according to claim 2, characterized in that, Prior to retrieving a task from the current stage task processing queue, the process also includes: Determine whether to enter the main loop; If the task processing queue for the current stage is empty or the number of processing attempts for the current stage has exceeded the set maximum number of attempts, the main loop will exit; otherwise, the main loop will enter and the step of obtaining tasks from the task processing queue for the current stage will be executed.
4. The execution graph data flow method according to claim 3, characterized in that, Submitting the output to the corresponding task processing queue includes: Determine whether the downstream operator belongs to the current stage. If so, submit the output to the current stage task processing queue and return to the step of determining whether to enter the main loop; otherwise, submit the output to the downstream task processing queue and return to the step of determining whether to enter the main loop.
5. The execution graph data flow method according to claim 3, characterized in that, The step of determining the input, creating an operator object, and starting data processing based on the defined processing operator, before further including: Based on the defined processing operator, determine whether the current context already meets the operator processing requirements. If so, execute the step of determining the input, creating an operator object, and starting data processing based on the defined processing operator; otherwise, return to the step of determining whether to enter the main loop.
6. An execution graph data transfer device, characterized in that, The device includes: The queue creation module is used to create upstream task processing queues, current stage task processing queues, and downstream task processing queues; The task acquisition module is used to acquire tasks from the upstream task processing queue, construct the acquired tasks into context objects, and put them into the current stage task processing queue. The operator acquisition module is used to acquire tasks from the current stage task processing queue and find the corresponding processing operator definition based on the context object; The data processing module is used to determine the input, create an operator object, and start data processing according to the defined processing operator. The data output module is used to obtain output data after data processing is completed, and submit the output data to the corresponding task processing queue.
7. The execution graph data transfer device according to claim 6, characterized in that, The data processing module includes: The first processing unit is used to construct an input matrix based on the output entries of the upstream operator, and to create an operator object using the input for multiple data processing operations. The number of data processing operations is consistent with the number of output entries of the upstream operator; And / or; The second processing unit is used to construct an output form based on the number of downstream operators, and to create an operator object using the input to perform a deep copy of the output data. The number of copies of the output data after deep copying is the same as the number of downstream operators.
8. The execution graph data transfer device according to claim 7, characterized in that, The device further includes: The first judgment module is used to determine whether to enter the main loop; If the task processing queue is empty or the number of processing times in the current stage has exceeded the set maximum number of times, the main loop will exit; otherwise, the main loop will be entered and the operator acquisition module will be executed.
9. A computer-readable storage medium for performing graph data flow, characterized in that, It includes a memory for storing processor-executable instructions, which, when executed by the processor, implement the steps of the execution graph data flow method according to any one of claims 1-5.
10. A device for performing graph data transfer, characterized in that, It includes at least one processor and a memory storing computer-executable instructions, wherein the processor executes the instructions to implement the steps of the execution graph data flow as described in any one of claims 1-5.