Method for setting parallelism of operator level and device thereof

By parsing Flink SQL tasks to generate execution plans and providing visual data flow graphs, users can manually modify operator-level parallelism, solving the problem that operator-level parallelism cannot be set in existing technologies and achieving efficient processing of Flink SQL tasks.

CN115729552BActive Publication Date: 2026-06-12PETAL CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PETAL CLOUD TECH CO LTD
Filing Date
2021-08-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The current Flink SQL does not support manual setting of operator-level parallelism, resulting in low data processing efficiency or task errors, which cannot meet business requirements.

Method used

By parsing Flink SQL tasks to generate execution plans, it provides a visual data flow graph, allows users to manually modify operator-level parallelism, and automatically optimizes based on parallelism reference values ​​and task running status, enabling flexible parallelism settings.

🎯Benefits of technology

It improves the processing efficiency of Flink SQL tasks, meets business needs, and ensures the efficiency and flexibility of data processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115729552B_ABST
    Figure CN115729552B_ABST
Patent Text Reader

Abstract

The application discloses a method and device for setting operator-level parallelism, relates to the field of data processing, and is used for supporting manual setting of operator-level parallelism.The method comprises the following steps: sending a first Flink SQL task to a client;receiving a target execution plan sent by the client, wherein the target execution plan comprises an execution plan and a parallelism reference value, the target execution plan is in a json format, the parallelism reference value is a parallelism reference value of a node in the execution plan, the execution plan is generated by analyzing the first Flink SQL task, the parallelism reference value of the node in the execution plan is determined according to script information obtained by analyzing the first Flink SQL task, a first data flow graph is provided by analyzing the target execution plan, a second Flink SQL task is sent to the client, wherein the second Flink SQL task comprises a first parallelism, the SQL script and a configuration parameter written, and the first parallelism is obtained by modifying the editable parallelism of the first data flow graph.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to a method and apparatus for setting operator-level parallelism. Background Technology

[0002] In the field of data processing, whether for real-time or offline data processing, using SQL to simplify development will be the overall trend for the future. Flink SQL, implemented based on Flink, is a standard SQL-compliant development language designed by Flink Computing to simplify the computational model and lower the barrier to entry for users. Flink SQL has been widely used in batch and streaming data processing. When processing data using Flink SQL, it is usually necessary to set the degree of parallelism. Parallelism is a very important concept in Flink SQL. Setting a reasonable degree of parallelism can speed up data processing efficiency, while an unreasonable degree of parallelism can lead to reduced efficiency or even task errors. A Flink SQL program consists of multiple tasks. A task is executed by multiple parallel instances; these parallel instances constitute the degree of parallelism in Flink SQL. However, currently, the degree of parallelism at some levels (such as operator level) does not support manual setting by the user. Summary of the Invention

[0003] In view of the above, it is necessary to provide a method and apparatus for setting operator-level parallelism, which can support manual setting of operator-level parallelism.

[0004] In a first aspect, one embodiment of this application provides a method for setting operator-level parallelism, applied to an electronic device. The method includes: sending a first Flink SQL task to a client; wherein the first Flink SQL task includes a written SQL script; receiving a target execution plan sent by the client, including an execution plan and parallelism reference values, wherein the target execution plan is in JSON format; the parallelism reference values ​​are the parallelism reference values ​​of nodes in the execution plan; the execution plan is generated by parsing the first Flink SQL task; the parallelism reference values ​​of nodes in the execution plan are determined based on script information obtained from parsing the first Flink SQL task; the script information includes at least two of the following: script type, source partition size, operator data flow, and source file size; the script type includes at least one of stream processing script and batch processing script; providing a first data flow graph for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes editable parallelism and node parallelism reference values; and sending a second Flink SQL task to the client; wherein the second Flink... The SQL task includes a first degree of parallelism, the written SQL script, and the written configuration parameters; the first degree of parallelism is obtained by modifying the editable degree of parallelism of the first data flow graph.

[0005] The first aspect of this application involves receiving an execution plan and a parallelism reference value generated by parsing the Flink SQL task submitted on an electronic device. The target execution plan, including the execution plan and parallelism reference value, is then interacted with the electronic device via a JSON file. The SQL execution flow is displayed in a visual data flow diagram, allowing users to manually modify the parallelism of operators in the data flow diagram and resubmit the task based on their actions. The parallelism reference value enables automatic optimization, and the visual data flow diagram allows for flexible manual setting of operator-level parallelism in the Flink SQL task, thus improving the processing efficiency of the submitted task.

[0006] According to some embodiments of this application, the method further includes: providing a second data flow graph for operation, wherein the second data flow graph includes an editable degree of parallelism; the second data flow graph is obtained after operating the editable degree of parallelism of the first data flow graph; generating prompt information based on the running status of the second Flink SQL task; wherein the running status of the second Flink SQL task is obtained from the server running the second Flink SQL task; sending a third Flink SQL task to the client; wherein the third Flink SQL task includes a second degree of parallelism, the written SQL script, and the written configuration parameters; the second degree of parallelism is obtained by modifying the editable degree of parallelism of the second data flow graph. By generating prompt information based on the running status of the task processed by the server, more flexible manual setting of the operator-level parallelism of Flink SQL tasks is achieved, further improving task processing efficiency.

[0007] According to some embodiments of this application, the first Flink SQL task is an Explain statement, and the execution plan is generated by executing the Explain statement. The Explain statement performs a pre-parsing operation before the existing SQL-submit execution, so as to generate the target execution plan and achieve a visualized data flow graph.

[0008] According to some embodiments of this application, if the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the nodes in the execution plan is determined based on at least one of the partition size of the source end and the operator data flow. The parallelism reference value of the stream processing script can be determined by at least one of the partition size of the source end and the operator data flow.

[0009] According to some embodiments of this application, if the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the nodes in the execution plan is determined based on at least one of the file size at the source end and the operator data flow. The parallelism reference value of the batch script can be determined by at least one of the file size at the source end and the operator data flow.

[0010] According to some embodiments of this application, the first degree of parallelism is concatenated to the written SQL script using the -op parameter to obtain the second Flink SQL task. The -op parameter can concatenate the first degree of parallelism and the written SQL script, thereby specifying the degree of parallelism for the operator.

[0011] Secondly, an embodiment of this application also provides a method for setting operator-level parallelism, applied on a client. The method includes: receiving a first Flink SQL task sent by an electronic device; wherein the first Flink SQL task includes a written SQL script; generating a target execution plan by setting a parallelism reference value for nodes in the execution plan based on script information obtained by parsing the first Flink SQL task; the target execution plan includes an execution plan and a parallelism reference value; the target execution plan is in JSON format; the parallelism reference value is a parallelism reference value for nodes in the execution plan; the execution plan is generated by parsing the first Flink SQL task; the script information includes at least two of the following: script type, source partition size, operator data flow, and source file size; the script type includes at least one of stream processing script and batch processing script; sending the target execution plan to the electronic device; and receiving a second Flink SQL task sent by the electronic device; wherein the second Flink... The SQL task includes a first degree of parallelism, the written SQL script, and the written configuration parameters; the first degree of parallelism is obtained by modifying the editable degree of parallelism of the first data flow graph; the first data flow graph is obtained by parsing the target execution plan in JSON format; the first data flow graph includes editable degree of parallelism and reference values ​​for the degree of parallelism of nodes; the task is regenerated according to the second Flink SQL task for submission to the server.

[0012] According to some embodiments of this application, the method further includes: receiving a third Flink SQL task sent by the electronic device; wherein the third Flink SQL task includes a second parallelism, the written SQL script, and configuration parameters; the second parallelism is obtained by modifying the editable parallelism of the second data flow graph according to the prompt information of the electronic device; the prompt information is generated according to the running status of the second Flink SQL task; the running status of the second Flink SQL task is obtained from the server running the task generated by the second Flink SQL task; the second data flow graph is obtained after operating the editable parallelism of the first data flow graph; and a task is regenerated according to the third Flink SQL task for submission to the server.

[0013] According to some embodiments of this application, the first Flink SQL task is an Explain statement, and the execution plan is generated by executing the Explain statement.

[0014] According to some embodiments of this application, if the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the nodes in the execution plan is set according to at least one of the partition size of the source end and the operator data traffic.

[0015] According to some embodiments of this application, if the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the nodes in the execution plan is set according to at least one of the file size of the source end and the operator data flow.

[0016] According to some embodiments of this application, the first parallelism is concatenated into the written SQL script via the -op parameter to obtain the second Flink SQL task.

[0017] Thirdly, an embodiment of this application also provides an electronic device, the electronic device comprising: a transceiver unit, configured to send a first Flink SQL task to a client; wherein the first Flink SQL task includes a written SQL script; the transceiver unit is further configured to receive a target execution plan sent by the client, including an execution plan and a parallelism reference value, the target execution plan being in JSON format; the parallelism reference value being a parallelism reference value of a node in the execution plan; the execution plan being generated by parsing the first Flink SQL task; the parallelism reference value of a node in the execution plan being determined based on script information obtained from parsing the first Flink SQL task; the script information including at least two of script type, source partition size, operator data flow, and source file size; the script type including at least one of stream processing script and batch processing script; a processing unit, configured to provide a first data flow graph for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes an editable parallelism and a node parallelism reference value; the transceiver unit is further configured to send a second Flink SQL task to the client; wherein the second Flink... The SQL task includes a first degree of parallelism, the written SQL script, and the written configuration parameters; the first degree of parallelism is obtained by modifying the editable degree of parallelism of the first data flow graph.

[0018] Fourthly, an embodiment of this application also provides a client, the client comprising: a transceiver unit, configured to receive a first Flink SQL task sent by an electronic device; wherein the first Flink SQL task includes a written SQL script; a processing unit, configured to generate a target execution plan by setting a parallelism reference value for nodes in the execution plan based on script information obtained by parsing the first Flink SQL task; the target execution plan includes an execution plan and a parallelism reference value; the target execution plan is in JSON format; the parallelism reference value is a parallelism reference value for nodes in the execution plan; the execution plan is generated by parsing the first Flink SQL task; the script information includes at least two of the following: script type, source partition size, operator data flow, and source file size; the script type includes at least one of stream processing script and batch processing script; the transceiver unit is further configured to send the target execution plan to the electronic device; the transceiver unit is further configured to receive a second Flink SQL task sent by the electronic device; wherein the second Flink... The SQL task includes a first degree of parallelism, the written SQL script, and the written configuration parameters; the first degree of parallelism is obtained by modifying the editable degree of parallelism of the first data flow graph; the first data flow graph is obtained by parsing the target execution plan in JSON format; the first data flow graph includes an editable degree of parallelism and reference values ​​for the degree of parallelism of nodes; the processing unit is also used to regenerate the task according to the second Flink SQL task for submission to the server.

[0019] Fifthly, an embodiment of this application also provides an electronic device, the electronic device including at least one processor, a memory, and a communication interface; the at least one processor is coupled to the memory and the communication interface; the memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with a client under the control of the at least one processor; when the instructions are executed by the at least one processor, the at least one processor causes the at least one processor to perform the operator-level parallelism setting method as described in any possible implementation of the first aspect above.

[0020] In a sixth aspect, an embodiment of this application also provides a client, the client including at least one processor, a memory, and a communication interface; the at least one processor is coupled to the memory and the communication interface; the memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with an electronic device and a server under the control of the at least one processor; when the instructions are executed by the at least one processor, the at least one processor performs the operator-level parallelism setting method as described in any possible implementation of the second aspect above.

[0021] In a sixth aspect, an embodiment of this application also provides an operator-level parallelism setting system, the operator-level parallelism setting system including an electronic device, a client, and a server; the electronic device is used to execute the operator-level parallelism setting method as described in any possible implementation of the first aspect above, and the client is used to execute the operator-level parallelism setting method as described in any possible implementation of the second aspect above.

[0022] In a seventh aspect, one embodiment of this application also provides a computer-readable storage medium storing a program that causes a computer device to execute the operator-level parallelism setting method as described in any possible implementation of the first or second aspect above.

[0023] Eighthly, an embodiment of this application also provides a computer program product including computer execution instructions stored in a computer-readable storage medium; at least one processor of the device can read the computer execution instructions from the computer-readable storage medium, and the at least one processor executes the computer execution instructions to cause the device to perform the operator-level parallelism setting method as described in any possible implementation of the first or second aspect above.

[0024] For a detailed description of aspects two through eight and their various implementations in this application, please refer to the detailed description in aspect one and its various implementations; and for a detailed description of the beneficial effects of aspects two through eight and their various implementations, please refer to the beneficial effect analysis in aspect one and its various implementations, which will not be repeated here. Attached Figure Description

[0025] Figure 1 This is a schematic diagram of the application environment of an embodiment of this application.

[0026] Figure 2 This is a flowchart illustrating a method for setting operator-level parallelism according to an embodiment of this application.

[0027] Figure 3 This is a schematic diagram illustrating the setting of operator-level parallelism according to an embodiment of this application.

[0028] Figure 4 This is a schematic diagram illustrating the communication between and within a data platform, client, and server according to an embodiment of this application.

[0029] Figure 5 A schematic diagram of two data flow graphs generated for the data platform of this application.

[0030] Figure 6This is a schematic diagram of the structure of an electronic device according to this application.

[0031] Figure 7 This is a schematic diagram of the structure of a client application according to this application.

[0032] Figure 8 This is a schematic diagram of the hardware structure of an electronic device according to this application.

[0033] Figure 9 This is a schematic diagram of the hardware structure of a client according to this application. Detailed Implementation

[0034] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the stated features. In the description of embodiments of this application, words such as "for example" are used to indicate examples, illustrations, or descriptions. Any embodiment or design scheme described as "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of words such as "for example" is intended to present the relevant concepts in a concrete manner.

[0035] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in this application's specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It should be understood that, unless otherwise stated, "a plurality of" in this application means two or more.

[0036] In existing technologies, Flink supports setting parallelism at different levels. Parallelism levels include operator level, execution environment level, client level, and system level. Different levels of parallelism have different priorities. Specifically, the priority of parallelism at each level is: operator level > execution environment level > client level > system level. Higher-level parallelism settings override lower-level settings, and higher-level parallelism settings take precedence. If no other level of parallelism is set, Flink uses the default parallelism in the configuration file. When setting parallelism at the system level, the parallelism in the configuration file can be set to specify the default parallelism of the task at the system level. When submitting a task to the Flink client, the parallelism can be specified using the -p parameter, enabling client-level parallelism settings. Operator-level and execution environment-level parallelism can only be set in the code and cannot be flexibly configured. In practical use, the parallelism settings at the system level and client level often fail to meet business requirements or guarantee efficient data processing. However, under normal circumstances, such as source, sink, and operators, different degrees of parallelism may be required to ensure fast data reading and writing loads. The current Flink SQL does not support user settings for operator-level parallelism, which will prevent data from being processed efficiently.

[0037] refer to Figure 1 This is a schematic diagram illustrating the application environment of an embodiment of this application. Figure 1As shown, the operator-level parallelism setting system 1 includes a data platform 10, a client 20, and a server 30. The data platform 10 is communicatively connected to the client 20 and the server 30. The client 20 is communicatively connected to the server 30. In this embodiment, the client 20 is a Flink client. The server 30 is a server cluster. A server cluster is a cluster that brings together multiple servers (e.g., thousands of servers) to perform the same service. Each server in the server cluster can be regarded as a node in the cluster, and the nodes are communicatively connected. The server cluster is a YARN cluster. A YARN cluster is a general resource management system that can provide unified resource management and scheduling capabilities for Flink SQL tasks. The data platform 10 is used to send a first Flink SQL task to the client 20, receive a target execution plan including an execution plan and parallelism reference values ​​sent by the client 20, provide a first data flow graph for operation by parsing the target execution plan in JSON format, and send a second Flink SQL task to the client 20. Client 20 is used to receive a first Flink SQL task sent by an electronic device; generate a target execution plan by setting the parallelism reference value of the nodes in the execution plan according to the script information obtained by parsing the first Flink SQL task; send the target execution plan to the electronic device; receive a second Flink SQL task sent by the electronic device; and regenerate the task according to the second Flink SQL task for submission to server 30. Server 30 is used to process the task.

[0038] refer to Figure 2 This is a flowchart illustrating a method for setting operator-level parallelism according to a first embodiment of this application. The method for setting operator-level parallelism involves setting the parallelism of a node based on a node's parallelism reference value. The method for setting operator-level parallelism includes:

[0039] S201: The data platform provides a human-computer interaction page for operation.

[0040] The data platform can generate and control the display of interactive web pages, providing user-friendly interfaces. The data platform includes a front-end and a back-end. The front-end can be the front end of the data platform, while the back-end can be the server end. The front-end and back-end can exchange data. The front-end generates interactive web pages, such as web pages. It can also control the display of visual pages. This control and display refers to using a device (such as a computer) for display. Users can then interact with these pages.

[0041] S202: The data platform obtains the SQL script and configuration parameters written on the page, and generates the first Flink SQL task based on the written SQL script.

[0042] The configuration parameters can be, for example, the -yt parameter and the -yD parameter. The -yt parameter is used to transfer files in a specified directory. The -yD parameter is used to apply values ​​to a given attribute. To submit tasks to the server, users can write SQL scripts and configure parameters on the pages provided by the data platform, such as... Figure 3 As shown. In Figure 3 In this scenario, the user performs step 1 on the front-end page: writing an SQL script and configuring parameters. Based on the user's actions, the data platform's front-end retrieves the written SQL script and configuration parameters and sends them to the data platform's back-end. It's understandable that the data platform's front-end can send the written SQL script and configuration parameters to the data platform's back-end by sending a request including the written SQL script and configuration parameters; the front-end can also directly send the written SQL script and configuration parameters to the data platform's back-end. For example, in... Figure 3 In the process, the front-end execution steps of the data platform are as follows: 2. Submit a request, including the written SQL script and configuration parameters, to the back-end of the data platform.

[0043] In this embodiment, the data platform's backend stores the written SQL script and configuration parameters. The data platform's backend also generates a first Flink SQL task based on the written SQL script. For example, in... Figure 3 In the data platform's background execution steps: 3. Generate the first Flink SQL task. In this embodiment, the first Flink SQL task is an explain statement, such as explain flinksql_test.

[0044] S203: The data platform sends the first Flink SQL task to the client.

[0045] In this embodiment, the data platform's backend also sends the explain statement to the client, thereby enabling the transmission of the first Flink SQL task to the client. For example, Figure 3 Step 3 also includes: sending the first Flink SQL task; Figure 4 In the process, the data platform sends an Explain statement to the Flink client.

[0046] S204: The client generates an execution plan in JSON format by parsing the first Flink SQL task.

[0047] In this embodiment, the client is a Flink client. The client uses the `Explain` statement to parse the first Flink SQL task, such as... Figure 4 As shown. In Figure 4In this process, the Flink client uses `Explain` to parse the SQL for the first time and generates an execution plan after parsing the SQL. The execution plan describes the execution flow of the first Flink SQL task. The execution plan consists of multiple nodes. Each node includes information such as the number of operator nodes, the specific function scripts of the node operators, and the parallelism of the node. The node also includes node link information. Node link information is used to identify its preceding node. It is understood that node link information can also be used to identify its following node, or simultaneously identify both its preceding and following nodes; this application does not impose any limitations on this.

[0048] The following two examples illustrate the execution plan of the first Flink SQL task. In the first example, the first Flink SQL task is a filtering script, and the generated execution plan includes three nodes: the source, an operator, and the sink. The information of the three nodes is as follows: Operator node number (id) 1, node operator specific function script: Source: HiveTableSource(xxx), parallelism 1; Operator node number (id) 2, node operator specific function script: Calc(select=[yyy]), parallelism 1; Operator node number (id) 3, node operator specific function script: Sink: Select table sink(zzz), parallelism 1. Figure 4 The code shown here is only for the first example, which contains information about the number of operator nodes (1). This information could be, for example:

[0049]

[0050]

[0051] In the second example, the first Flink SQL task is an insert script. The generated execution plan includes two nodes and their information, namely: 1 operator node, with the specific function script of the node operator: Source: HiveTableSource(mmm), parallelism 3; 2 operator nodes, with the specific function script of the node operator: Sink: Selecttable sink(nnn), parallelism 3.

[0052] S205: The client generates the target execution plan by setting the parallelism reference value of the nodes in the execution plan based on the script information obtained from parsing the first Flink SQL task. The script information includes at least two of the following: script type, source partition size, operator data flow, and source file size. The script type includes at least one of stream processing script and batch processing script. The target execution plan is in JSON format.

[0053] The script information is obtained after parsing the first Flink SQL task. The client sets the parallelism reference value for nodes in the execution plan based on the script information obtained from parsing the first Flink SQL task. This includes setting the parallelism reference value for nodes in the execution plan based on at least one of the following: if the script type of the first Flink SQL task includes a stream processing script, the client sets the parallelism reference value for nodes in the execution plan based on the partition size of the source and the operator data flow. Continuing with the first example above, if the first Flink SQL task is a stream processing script and the number of partitions at the source is 15, then the client sets the parallelism reference value for the source nodes in the execution plan to 15. If the operation operator's function is to filter the upper-layer data, and the filtered data flow is approximately 1 / 3 of the upper-layer data, then the client sets the parallelism reference value for the operation operator nodes in the execution plan to 5.

[0054] The client's setting of the parallelism reference value for nodes in the execution plan based on the script information obtained from parsing the first Flink SQL task also includes: if the script type of the first Flink SQL task includes batch scripts, the client sets the parallelism reference value for nodes in the execution plan based on at least one of the source file size and operator data flow. For example, if the file size of the first Flink SQL task is 1024MB, and the data platform specifies that each block size is 128MB, the file of the first Flink SQL task will be divided into 8 subtasks, then the client sets the parallelism reference value for the source node in the execution plan to 8.

[0055] Setting the parallelism reference value for nodes in the execution plan can be done by adding the parallelism reference value for newly added nodes to the execution plan. This application does not impose any restrictions on setting the parallelism reference value for nodes in the execution plan.

[0056] S206: The client sends the target execution plan to the data platform.

[0057] The client sends the target execution plan to the backend of the data platform. For example, in Figure 3 In the process, the client executes the following steps: 4. Send the target execution plan in JSON format to the data platform. For example, in... Figure 4 In the process, the client returns an execution plan in JSON format, including parallelism reference values, to the data platform. This execution plan is the target execution plan.

[0058] S207: The data platform provides a first data flow graph for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes editable parallelism and node parallelism reference values.

[0059] The first data flow graph graphically depicts the flow and processing of data within the system. The data platform can determine the operator execution order based on node link information and provide the corresponding first data flow graph.

[0060] In this embodiment, after receiving the target execution plan in JSON format, the backend of the data platform transmits the JSON-formatted target execution plan to the frontend of the data platform. The frontend of the data platform generates a first data flow graph by parsing the JSON-formatted target execution plan and controls the display of the first data flow graph. Specifically, the frontend of the data platform can generate the first data flow graph based on the node link information by parsing the JSON-formatted target execution plan. Figure 3 In the data platform's backend execution steps: 5. Send the target execution plan in JSON format to the data platform's frontend to display the first data flow diagram. Users can modify the first data flow diagram.

[0061] S208: The data platform obtains the first parallelism, and concatenates the first parallelism, the written SQL script, and the written configuration parameters to generate a second Flink SQL task; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph.

[0062] To modify the parallelism of operators, users can adjust the parallelism of nodes in the first data flow graph based on the node parallelism reference value, such as... Figure 3 As shown. In Figure 3 In the process, the user performs step 6 on the first data flow graph: modifying the parallelism. The first parallelism can be a reference value, and it can be greater than or less than the reference value. Based on the user's modification, the data platform's front-end can obtain the first parallelism and transmit it to the data platform's back-end. It is understandable that the data platform's front-end can send the first parallelism to the data platform's back-end by sending a request including the first parallelism to the back-end; the front-end can also directly transmit the first parallelism to the back-end. For example, in... Figure 3 In the process, the front-end execution steps of the data platform are as follows: 7. Submit a request, including the first degree of parallelism, to the back-end of the data platform.

[0063] The data platform's backend can concatenate written SQL scripts and configuration parameters. The backend can use the `-op` parameter to concatenate the first parallelism level and the written SQL script. The first parallelism level is specified in the written SQL script via the `-op` parameter, thus generating a second Flink SQL task by concatenating the first parallelism level, the written SQL script, and the written configuration parameters. The second Flink SQL task is a submit statement. Continuing with the example of setting the parallelism reference value in the stream processing script, based on the parallelism reference value, if the user modifies the parallelism of operator node 1 to 15, operator node 2 to 5, and operator node 3 to 5, then the second Flink SQL task is: `sql-submit.sh -op1:15,2:5,3:5'-d”-f'xxx.sql”-ynm”flinksql_tes”-yt” / xxx / keystore”-yt” / xxx / config / ”-yD''yarn.tags=xxx'. Where 1:15 means that the parallelism of operator node 1 is 15, 2:5 means that the parallelism of operator node 2 is 5, and 3:5 means that the parallelism of operator node 3 is 5.

[0064] S209: The data platform sends a second Flink SQL task to the client.

[0065] For example, in Figure 3 In the background execution steps of the data platform: 8. Send a second Flink SQL task, including the parallelism of the node specified by the -op parameter, to the client. The data platform can then transmit the second Flink SQL task to the client. For example, in... Figure 4 In the process, the data platform sends a submit statement to the Flink client.

[0066] S210: The client parses the second Flink SQL task to obtain the first parallelism and execution plan, and regenerates the task by configuring the first parallelism into the node of the execution plan.

[0067] In this embodiment, the client executes a submit statement to parse the second Flink SQL task and obtain the first parallelism and execution plan, such as... Figure 4 As shown. In Figure 4 In this process, the Flink client executes the submit statement to perform the second parsing of the SQL, and then generates an execution plan. This execution plan is configured with the first degree of parallelism.

[0068] S211: The client submits a task to the server.

[0069] For example, in Figure 3In this context, the client executes step 9. Submit the task to the server. For example, in... Figure 4 In the process, the Flink client also publishes the execution plan configured with the first degree of parallelism to the server. The server is a Yarn server.

[0070] S212: Server processes tasks.

[0071] In this embodiment, when processing a task, the server can obtain the task's running status and transmit it to the data platform. The data platform can then correspondingly notify the user of the task's running status.

[0072] Figure 2 The operator-level parallelism setting method shown can be applied not only to scenarios where node parallelism is set based on a node's parallelism reference value, but also to scenarios where node parallelism is set based on both the node's parallelism reference value and the running status of the tasks processed by the server. In the scenario where node parallelism is set based on the node's parallelism reference value and the running status of the tasks processed by the server, the method is similar to the one described above. Figure 2 The difference between scenarios where the parallelism of nodes is set based on a reference value is that:

[0073] The target execution plan includes a first execution plan and a second execution plan. The first execution plan is generated by parsing the first Flink SQL task, and the second execution plan is generated by modifying the parallelism values ​​of the nodes in the first execution plan to the reference parallelism values ​​of the nodes. The client transmits the target execution plan to the data platform. The data platform provides two data flow graphs for operation by parsing the first execution plan in JSON format and the second execution plan in JSON format. The first data flow graph is the original parallelism data flow graph, and the second data flow graph is the parallelism reference value data flow graph. The data flow graphs include editable parallelism. In this embodiment, the first data flow graph includes editable parallelism. It is understood that this application is not limited to the first data flow graph including editable parallelism; it is also possible to set the second data flow graph to include editable parallelism. This application does not impose any limitations on this. The parallelism of the first data flow graph and the parallelism of the second data flow graph are different. The difference includes completely different and partially different.

[0074] Continuing with the first example above, both data flow graphs include three nodes: a source node, an operator node, and a receiver node. The source node precedes the operator node, and the operator node precedes the receiver node, as shown below. Figure 5 As shown. In Figure 5In the diagram, the top data flow graph represents the reference value for parallelism, while the bottom data flow graph represents the parallelism value. The top data flow graph has the following operator nodes: Data Source (id=1), with the operator's script: Source:HiveTableSource(xxx), and a parallelism of Parallelism: 15; Operator node has the following operator nodes: Operator (id=2), with the operator's script: Calc(select=[yyy]), and a parallelism of Parallelism: 5; and Receiver node has the following operator nodes: Data Sink (id=3), with the operator's script: Sink:Select table sink(zzz), and a parallelism of Parallelism: 5. The number of operator nodes for the source node in the data flow graph below is: Data Source (id=1), the specific function script for the node operator is: Source:HiveTableSource(xxx), and the parallelism is: Parallelism:1; the number of operator nodes for the operation operator node is: Operator (id=2), the specific function script for the node operator is: Calc(select=[yyy]), and the parallelism is: Parallelism:1; the number of operator nodes for the receiver node is: Data Sink (id=3), the specific function script for the node operator is: Sink:Select table sink(zzz), and the parallelism is: Parallelism:1. It is understood that this application does not restrict the position of the two data flow graphs or the content and style of each data flow graph.

[0075] After transmitting the task, including the first parallelism and execution plan, to the client, the data platform also saves the first parallelism and execution plan, and provides a second data flow graph based on the first parallelism and execution plan. The second data flow graph is a modified version of the first data flow graph, and includes editable parallelism. Therefore, after transmitting the second FlinkSQL task to the client, the data platform saves the current first parallelism and execution plan, ensuring that each subsequent modification to the parallelism displays the modified data flow graph.

[0076] The data platform also acquires the running status of tasks processed by the server and generates prompts based on the task's running status. The running status of tasks processed by the server includes the processing efficiency and throughput of the operators of the tasks processed by the server. The prompts can provide information including the operators and their running status, such as providing a webpage to the user with information including the processing efficiency and throughput of the operators. Users can modify the parallelism of nodes in the second data flow graph based on the prompts. Continuing with the example of modifying the parallelism based on the reference value, the parallelism of the source node, operation operator node, and receiver node in the second data flow graph is 15, 5, and 5, respectively. After the task is submitted to the server, the data platform prompts the user that the processing efficiency of operator node number 3 is slow, so the user can increase the parallelism of operator node number 3 in the second data flow graph. The data platform also acquires the second parallelism, concatenates the second parallelism, the written SQL script, and the written configuration parameters to generate a third Flink SQL task, and sends the third Flink SQL task to the client; the second parallelism is obtained by modifying the editable parallelism of the second data flow graph. The client receives the third Flink SQL task, parses it to obtain the modified parallelism and execution plan, configures the modified parallelism into the nodes of the execution plan, regenerates the task, and submits it to the server. The data platform also obtains the second parallelism, concatenates the second parallelism and the execution plan to generate the third Flink SQL task, and sends the third Flink SQL task to the client. This process is similar to the data platform obtaining the first parallelism, concatenating the first parallelism and the execution plan to generate the second Flink SQL task, and sending the second Flink SQL task to the client, and will not be described in detail here. The client receives the third Flink SQL task, parses it to obtain the second parallelism and execution plan, configures the second parallelism into the nodes of the execution plan, regenerates the task, and submits it to the server. This process is similar to the client receiving the second Flink SQL task, parses it to obtain the first parallelism and execution plan, configures the first parallelism into the nodes of the execution plan, regenerates the task, and submits it to the server, and will not be described in detail here.

[0077] Understandably, the data platform can also save the second degree of parallelism and execution plan, and provide a third data flow graph based on the second degree of parallelism and execution plan; the third data flow graph includes editable degrees of parallelism; the data platform can also continue to obtain the running status of the tasks processed by the server, and prompt the user according to the running status of the tasks, until preset conditions are met, such as the running status of the tasks being normal. The end of the process may also be triggered by other factors, which this application does not limit.

[0078] Please refer to Figure 6This is a schematic diagram of the structure of an electronic device according to this application. The electronic device 600 can perform the operations executed by the data platform in the above-described method. The electronic device can be a mobile phone, desktop computer, laptop, handheld computer, cloud server, etc. The electronic device 600 may include a transceiver unit 601 and a processing unit 602.

[0079] The transceiver unit 601 is used to send a first Flink SQL task to the client; wherein the first Flink SQL task includes a written SQL script.

[0080] The transceiver unit 601 is further configured to receive a target execution plan sent by the client, including an execution plan and a parallelism reference value. The target execution plan is in JSON format. The parallelism reference value is the parallelism reference value of the nodes in the execution plan. The execution plan is generated by parsing the first Flink SQL task. The parallelism reference value of the nodes in the execution plan is determined based on the script information obtained from parsing the first Flink SQL task. The script information includes at least two of the following: script type, source partition size, operator data flow, and source file size. The script type includes at least one of stream processing script and batch processing script.

[0081] The processing unit 602 is configured to provide a first data flow graph for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes editable parallelism and reference values ​​for the parallelism of nodes.

[0082] The transceiver unit 601 is further configured to send a second Flink SQL task to the client; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph.

[0083] Optionally, the processing unit 602 is further configured to provide a second data flow graph for operation, wherein the second data flow graph includes editable parallelism; the second data flow graph is obtained by operating the editable parallelism of the first data flow graph.

[0084] The processing unit 602 is further configured to generate prompt information based on the running status of the second Flink SQL task; wherein the running status of the second Flink SQL task is obtained from the server running the second Flink SQL task.

[0085] The transceiver unit 601 is further configured to send a third Flink SQL task to the client; wherein the third Flink SQL task includes a second degree of parallelism, the written SQL script, and the written configuration parameters; the second degree of parallelism is obtained by modifying the editable degree of parallelism of the second data flow graph.

[0086] Optionally, the first Flink SQL task is an Explain statement, and the execution plan is generated by executing the Explain statement.

[0087] Optionally, if the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the node in the execution plan is determined based on at least one of the partition size of the source end and the operator data flow.

[0088] Optionally, if the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the node in the execution plan is determined based on at least one of the file size of the source end and the operator data flow.

[0089] Optionally, the first parallelism is concatenated into the written SQL script via the -op parameter to obtain the second Flink SQL task.

[0090] Please refer to Figure 7 This is a schematic diagram of the structure of a client according to this application. The client 700 can perform the operations performed by the client in the above-described method. The client can be a mobile phone, desktop computer, laptop, or handheld computer, etc. The client 700 may include a transceiver unit 701 and a processing unit 702.

[0091] The transceiver unit 701 is used to receive a first Flink SQL task sent by an electronic device; wherein the first Flink SQL task includes a written SQL script.

[0092] The processing unit 702 is configured to generate a target execution plan by setting the parallelism reference values ​​of the nodes in the execution plan based on the script information obtained from parsing the first Flink SQL task; the target execution plan includes the execution plan and the parallelism reference values; the target execution plan is in JSON format; the parallelism reference values ​​are the parallelism reference values ​​of the nodes in the execution plan; the execution plan is generated by parsing the first Flink SQL task; the script information includes at least two of the following: script type, source size, operator data flow, and source file size; the script type includes at least one of stream processing script and batch processing script.

[0093] The transceiver unit 701 is also used to send the target execution plan to the electronic device.

[0094] The transceiver unit 701 is further configured to receive a second Flink SQL task sent by the electronic device; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and the written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph; the first data flow graph is obtained by parsing the target execution plan in JSON format; the first data flow graph includes editable parallelism and reference values ​​for the parallelism of nodes.

[0095] The processing unit 702 is further configured to regenerate the task based on the second Flink SQL task for submission to the server.

[0096] Optionally, the transceiver unit 701 is further configured to receive a third Flink SQL task sent by the electronic device; wherein the third Flink SQL task includes a second degree of parallelism, the written SQL script, and configuration parameters; the second degree of parallelism is obtained by modifying the editable degree of parallelism of the second data flow graph according to the prompt information of the electronic device; the prompt information is generated according to the running status of the second Flink SQL task; the running status of the second Flink SQL task is obtained from the server that runs the task generated by the second Flink SQL task; the second data flow graph is obtained after operating the editable degree of parallelism of the first data flow graph.

[0097] The processing unit 702 is further configured to regenerate the task based on the third Flink SQL task for submission to the server.

[0098] Optionally, the first Flink SQL task is an Explain statement, and the execution plan is generated by executing the Explain statement.

[0099] Optionally, if the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the nodes in the execution plan is set according to at least one of the partition size of the source end and the operator data traffic.

[0100] Optionally, if the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the nodes in the execution plan is set according to at least one of the file size of the source end and the operator data flow.

[0101] Optionally, the first parallelism is concatenated into the written SQL script via the -op parameter to obtain the second Flink SQL task.

[0102] Please see Figure 8 This is a schematic diagram of the hardware structure of an electronic device according to this application. Figure 8 The electronic device shown includes a memory 801, a processor 802, a communication interface 803, and a bus 804. The memory 801, processor 802, and communication interface 803 are interconnected via the bus 804.

[0103] The memory 801 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 801 may store a program. When the program stored in the memory 801 is executed by the processor 802, the processor 802 and the communication interface 803 are used to perform the operations performed by the data platform in the method of this application embodiment.

[0104] The processor 802 may be a general-purpose central processing unit (CPU), microprocessor, application specific integrated circuit (ASIC), graphics processing unit (GPU), or one or more integrated circuits, used to execute related programs to achieve the functions required by the units in the electronic device of this application embodiment, or to execute the operations performed by the data platform in the method of this application embodiment.

[0105] The processor 802 can also be an integrated circuit chip with signal processing capabilities. During implementation, the various operations performed by the data platform of the method of this application can be completed through the integrated logic circuits in the hardware of the processor 802 or through software instructions. The aforementioned processor 802 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the various methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The operation of the method disclosed in the embodiments of this application can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in the memory 801. The processor 802 reads the information in the memory 801 and, in conjunction with its hardware, performs the functions required by the units included in the electronic device of this application embodiment, or performs the operations performed by the data platform in the method of this application embodiment.

[0106] The communication interface 803 uses transceiver devices, such as, but not limited to, transceivers, to enable communication between the electronic device 800 and other devices or communication networks. For example, data can be obtained from a client through the communication interface 803.

[0107] Bus 804 may include a pathway for transmitting information between various components of electronic device 800 (e.g., memory 801, processor 802, communication interface 803).

[0108] Understandably, Figure 8 The structure shown does not constitute a limitation on the electronic device 800, which may include more or fewer components than shown, or combine some components, or split some components, or have different component arrangements.

[0109] Please see Figure 9 This is a schematic diagram of the hardware structure of a client according to this application. Figure 9 The client 900 shown includes a memory 901, a processor 902, a communication interface 903, and a bus 904. The memory 901, processor 902, and communication interface 903 are interconnected via the bus 904.

[0110] The memory 901 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 901 may store a program. When the program stored in the memory 901 is executed by the processor 902, the processor 902 and the communication interface 903 are used to perform the operations executed by the client in the method of this application embodiment.

[0111] The processor 902 may be a general-purpose central processing unit (CPU), microprocessor, application specific integrated circuit (ASIC), graphics processing unit (GPU), or one or more integrated circuits, used to execute related programs to implement the functions required by the unit in the client of this application embodiment, or to execute the operations performed by the client in the method of this application embodiment.

[0112] The processor 902 can also be an integrated circuit chip with signal processing capabilities. During implementation, the various operations performed by the client of the method in this application can be completed through the integrated logic circuits in the hardware of the processor 902 or through software instructions. The processor 902 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The operation of the method disclosed in the embodiments of this application can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in the memory 901. The processor 902 reads the information in the memory 901 and, in conjunction with its hardware, completes the functions required by the units included in the client of this application embodiment, or executes the operations performed by the client in the method of this application embodiment.

[0113] Communication interface 903 uses transceiver devices, such as, but not limited to, transceivers, to enable communication between client 900 and other devices or communication networks. For example, data can be obtained from the client through communication interface 903.

[0114] Bus 904 may include a pathway for transmitting information between various components of client 900 (e.g., memory 901, processor 902, communication interface 903).

[0115] Understandably, Figure 9 The structure shown does not constitute a limitation on the client 900, which may include more or fewer components than shown, or combine some components, or split some components, or have different component arrangements.

[0116] In addition to the methods and devices described above, embodiments of this application also provide a computer-readable storage medium storing a program that causes a computer device to execute... Figure 2 The method for setting the parallelism at the operator level is shown.

[0117] A computer program product includes computer-executable instructions stored in a computer-readable storage medium; at least one processor of the device can read the computer-executable instructions from the computer-readable storage medium, and the at least one processor executes the computer-executable instructions to cause the device to perform... Figure 2 The method for setting the parallelism at the operator level is shown.

[0118] This application pre-executes the first parsed SQL using the `Explain` statement before executing the second parsed SQL, generating a JSON-formatted execution plan. It then sets parallelism reference values ​​for each node in the execution plan to generate a target execution plan, which is displayed visually for users to reference and configure node parallelism. The application also resubmits the task based on user actions. This visual presentation of the task flow provides users with a clear and intuitive understanding of the processing logic and data flow. The use of default and reference parallelism values ​​lowers the barrier to setting operator parallelism. Setting parallelism at the operator level improves data processing efficiency and resource utilization. Furthermore, this application sets node parallelism based on the server's task execution status, allowing for more flexible operator-level parallelism settings.

[0119] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application and are not intended to limit it. Although this application has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of this application without departing from the spirit and scope of the technical solutions of this application.

Claims

1. A method for setting operator-level parallelism, applied to an electronic device, the method comprising: Send a first Flink SQL task to the client; wherein the first Flink SQL task includes a written SQL script; The system receives a target execution plan sent by the client, including an execution plan and a parallelism reference value. The target execution plan is in JSON format. The parallelism reference value is the parallelism reference value of the nodes in the execution plan. The execution plan is generated by parsing the first Flink SQL task. The parallelism reference value of the nodes in the execution plan is determined based on the script information obtained from parsing the first Flink SQL task. The script information includes at least two of the following: script type, source partition size, operator data flow, and source file size. The script type includes at least one of stream processing script and batch processing script. A first data flow graph is provided for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes editable parallelism and parallelism reference values ​​for nodes; A second Flink SQL task is sent to the client; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and the written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph.

2. The method for setting the parallelism at the operator level as described in claim 1, characterized in that, The method further includes: A second data flow graph is provided for operation, wherein the second data flow graph includes editable parallelism; the second data flow graph is obtained by operating on the editable parallelism of the first data flow graph; The system generates a prompt message based on the running status of the second Flink SQL task; wherein, the running status of the second Flink SQL task is obtained from the server running the second Flink SQL task. A third Flink SQL task is sent to the client; wherein the third Flink SQL task includes a second degree of parallelism, the written SQL script, and the written configuration parameters; the second degree of parallelism is obtained by modifying the editable degree of parallelism of the second data flow graph.

3. The method for setting the parallelism at the operator level as described in claim 1, characterized in that: The first FlinkSQL task is an Explain statement, and the execution plan is generated by executing the Explain statement.

4. The method for setting the parallelism at the operator level as described in claim 1, characterized in that: If the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the node in the execution plan is determined based on at least one of the partition size of the source end and the operator data flow.

5. The method for setting the parallelism at the operator level as described in claim 1, characterized in that: If the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the node in the execution plan is determined based on at least one of the file size of the source end and the operator data flow.

6. The method for setting the parallelism at the operator level as described in claim 1, characterized in that: The first parallelism is concatenated into the written SQL script via the -op parameter to obtain the second Flink SQL task.

7. A method for setting operator-level parallelism, applied on a client, the method comprising: Receive a first Flink SQL task sent by an electronic device; wherein the first Flink SQL task includes a written SQL script; Based on the script information obtained from parsing the first Flink SQL task, a target execution plan is generated by setting the parallelism reference values ​​for the nodes in the execution plan. The target execution plan includes the execution plan and the parallelism reference values. The target execution plan is in JSON format. The parallelism reference values ​​are the parallelism reference values ​​for the nodes in the execution plan. The execution plan is generated by parsing the first Flink SQL task. The script information includes at least two of the following: script type, source partition size, operator data flow, and source file size. The script type includes at least one of stream processing script and batch processing script. Send the target execution plan to the electronic device; The system receives a second Flink SQL task sent by the electronic device; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and the written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph; the first data flow graph is obtained by parsing the target execution plan in JSON format; the first data flow graph includes editable parallelism and reference values ​​for the parallelism of nodes; The task is regenerated based on the second Flink SQL task for submission to the server.

8. The method for setting operator-level parallelism as described in claim 7, characterized in that, The method further includes: The system receives a third Flink SQL task sent by the electronic device; wherein the third Flink SQL task includes a second degree of parallelism, the written SQL script, and configuration parameters; the second degree of parallelism is obtained by modifying the editable degree of parallelism of the second data flow graph according to the prompt information of the electronic device; the prompt information is generated according to the running status of the second Flink SQL task; the running status of the second Flink SQL task is obtained from the server that runs the task generated by the second Flink SQL task; the second data flow graph is obtained by manipulating the editable degree of parallelism of the first data flow graph; The task is regenerated based on the third Flink SQL task for submission to the server.

9. The method for setting the parallelism at the operator level as described in claim 7, characterized in that: The first FlinkSQL task is an Explain statement, and the execution plan is generated by executing the Explain statement.

10. The method for setting operator-level parallelism as described in claim 7, characterized in that: If the script type of the first Flink SQL task includes a stream processing script, the parallelism reference value of the node in the execution plan is set according to at least one of the partition size of the source end and the operator data traffic.

11. The method for setting the parallelism at the operator level as described in claim 7, characterized in that: If the script type of the first Flink SQL task includes a batch script, the parallelism reference value of the nodes in the execution plan is set according to at least one of the file size of the source end and the operator data flow.

12. The method for setting the parallelism at the operator level as described in claim 7, characterized in that: The first parallelism is concatenated into the written SQL script via the -op parameter to obtain the second Flink SQL task.

13. An electronic device, characterized in that, The electronic device includes: The transceiver unit is used to send a first Flink SQL task to the client; wherein the first Flink SQL task includes a written SQL script; The transceiver unit is further configured to receive a target execution plan sent by the client, including an execution plan and a parallelism reference value. The target execution plan is in JSON format. The parallelism reference value is the parallelism reference value of the nodes in the execution plan. The execution plan is generated by parsing the first Flink SQL task. The parallelism reference value of the nodes in the execution plan is determined based on the script information obtained from parsing the first Flink SQL task. The script information includes at least two of the following: script type, source partition size, operator data flow, and source file size. The script type includes at least one of stream processing script and batch processing script. A processing unit is configured to provide a first data flow graph for operation by parsing the target execution plan in JSON format, wherein the first data flow graph includes editable parallelism and reference values ​​for the parallelism of nodes; The transceiver unit is further configured to send a second Flink SQL task to the client; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph.

14. A client, characterized in that, The client includes: A transceiver unit is used to receive a first Flink SQL task sent by an electronic device; wherein the first Flink SQL task includes a written SQL script; The processing unit is configured to generate a target execution plan by setting parallelism reference values ​​for nodes in the execution plan based on the script information obtained from parsing the first Flink SQL task; the target execution plan includes the execution plan and the parallelism reference values; the target execution plan is in JSON format; the parallelism reference values ​​are the parallelism reference values ​​for nodes in the execution plan; the execution plan is generated by parsing the first Flink SQL task; the script information includes at least two of the following: script type, source partition size, operator data flow, and source file size; the script type includes at least one of stream processing script and batch processing script. The transceiver unit is also used to send the target execution plan to the electronic device; The transceiver unit is further configured to receive a second Flink SQL task sent by the electronic device; wherein the second Flink SQL task includes a first parallelism, the written SQL script, and written configuration parameters; the first parallelism is obtained by modifying the editable parallelism of the first data flow graph; the first data flow graph is obtained by parsing the target execution plan in JSON format; the first data flow graph includes an editable parallelism and a parallelism reference value for the nodes; The processing unit is further configured to regenerate the task based on the second Flink SQL task for submission to the server.

15. An electronic device, characterized in that, The electronic device includes at least one processor, memory, and communication interface; The at least one processor is coupled to the memory and the communication interface; The memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with the client under the control of the at least one processor; When the instruction is executed by the at least one processor, it causes the at least one processor to perform the operator-level parallelism setting method as described in any one of claims 1 to 6.

16. A client, characterized in that, The client includes at least one processor, memory, and communication interface; The at least one processor is coupled to the memory and the communication interface; The memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with electronic devices and servers under the control of the at least one processor; When the instruction is executed by the at least one processor, it causes the at least one processor to perform the operator-level parallelism setting method as described in any one of claims 7 to 12.

17. A system for setting operator-level parallelism, characterized in that, The system for setting the parallelism at the operator level includes electronic devices, clients, and servers; The electronic device is used to execute the operator-level parallelism setting method as described in any one of claims 1 to 6, and the client is used to execute the operator-level parallelism setting method as described in any one of claims 7 to 12.

18. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a program that causes a computer device to execute the operator-level parallelism setting method as described in any one of claims 1 to 12.

19. A computer program product, characterized in that, The computer program product includes computer-executable instructions stored in a computer-readable storage medium; at least one processor of the device can read the computer-executable instructions from the computer-readable storage medium, and the at least one processor executes the computer-executable instructions to cause the device to perform the operator-level parallelism setting method as described in any one of claims 1 to 12.