Data processing method, computing device, storage medium, and computer program product

By breaking down the object generation task into subtasks and using a large language model to generate descriptive information, the problem of inaccurate target objects in document type generation is solved, achieving efficient and accurate target object generation, especially in API call scenarios in code development.

WO2026124186A1PCT designated stage Publication Date: 2026-06-18ALIBABA (CHINA) CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ALIBABA (CHINA) CO LTD
Filing Date
2025-11-21
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In the process of generating target objects of document type, existing technologies have the problem of inaccurate generation, especially in code development scenarios, where users need to consult the documentation to call unknown APIs, resulting in low accuracy of generated target objects.

Method used

The object generation task is divided into multiple subtasks by parsing, the description information of the target object is determined by a large language model, and an initial object is generated based on the subtasks and description information. Finally, the target object is synthesized, and a chain-based exploration method is used to handle unknown API calls.

🎯Benefits of technology

It improves the accuracy of target object generation, reduces the time users spend consulting documentation, and enhances the efficiency and accuracy of code generation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025136924_18062026_PF_FP_ABST
    Figure CN2025136924_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure provide a data processing method, a computing device, a storage medium and a computer program product. The data processing method comprises: determining an object generation task, wherein the object generation task is used for requesting to generate a target object of a document type; parsing the object generation task to obtain a plurality of object generation subtasks corresponding to the object generation task; on the basis of task information of the object generation subtasks, determining object description information of the target object, wherein the object description information is description information used for describing the target object; and performing object generation on the basis of the object generation subtasks and the object description information to obtain initial objects corresponding to the object generation subtasks, and on the basis of the plurality of initial objects, determining the target object corresponding to the object generation task.
Need to check novelty before this filing date? Find Prior Art

Description

Data processing methods, computing devices, storage media and computer program products

[0001] This disclosure claims priority to Chinese Patent Application No. 202411837253.4, filed with the China Patent Office on December 12, 2024, entitled “Data Processing Method, Computing Device, Storage Medium and Computer Program Product”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of computer technology, and in particular to a data processing method. One or more embodiments of this disclosure also relate to two other data processing methods, a data processing apparatus, two other data processing apparatuses, a computing device, an electronic device, a computer-readable storage medium, and a computer program product. Background Technology

[0003] With the continuous development of computer technology, in the process of using computers for data processing, corresponding target objects can be generated based on object generation tasks; however, in the process of generating document-type target objects, there may be problems with the accuracy of the generated target objects. Therefore, how to improve the accuracy of target objects has become an urgent problem to be solved. Summary of the Invention

[0004] In view of the above, this disclosure provides a data processing method. One or more embodiments of this disclosure also relate to two other data processing methods, a data processing apparatus, two other data processing apparatuses, a computing device, an electronic device, a computer-readable storage medium, and a computer program product, to address the technical deficiencies existing in the prior art.

[0005] According to a first aspect of the present disclosure, a data processing method is provided, comprising:

[0006] Determine an object generation task, wherein the object generation task is used to request the generation of a target object of document type;

[0007] Analyze the object generation task to obtain multiple object generation subtasks corresponding to the object generation task;

[0008] Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0009] Objects are generated based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and the target object corresponding to the object generation task is determined based on multiple initial objects.

[0010] According to a second aspect of the present disclosure, a data processing apparatus is provided, comprising:

[0011] The task determination module is configured to determine an object generation task, wherein the object generation task is used to request the generation of a target object of document type;

[0012] The task parsing module is configured to parse the object generation task and obtain multiple object generation subtasks corresponding to the object generation task;

[0013] The document determination module is configured to generate task information for sub-tasks based on each object, and determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0014] The object generation module is configured to generate objects based on the object generation sub-tasks and the object description information, obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects.

[0015] According to a third aspect of the present disclosure, a data processing method is provided, comprising:

[0016] Determine the code generation task, wherein the code generation task is used to request the generation of target code;

[0017] Parse the code generation task to obtain multiple code generation subtasks corresponding to the code generation task;

[0018] Based on the task information of each code subtask, determine the code description document of the target code, wherein the code description document is a description document used to describe the target code;

[0019] Code is generated based on the code generation subtasks and the code documentation to obtain the initial code corresponding to each code generation subtask, and the target code corresponding to the code generation task is determined based on multiple initial codes.

[0020] According to a fourth aspect of the present disclosure, a data processing apparatus is provided, comprising:

[0021] The task determination module is configured to determine the code generation task, wherein the code generation task is used to request the generation of target code;

[0022] The task parsing module is configured to parse the code generation task and obtain multiple code generation subtasks corresponding to the code generation task;

[0023] The document determination module is configured to generate task information for subtasks based on each code, and determine the code description document for the target code, wherein the code description document is a description document used to describe the target code;

[0024] The code generation module is configured to generate code based on the code generation subtasks and the code documentation, obtain the initial code corresponding to each code generation subtask, and determine the target code corresponding to the code generation task based on multiple initial codes.

[0025] According to a fifth aspect of the present disclosure, a data processing method is provided, applied to a cloud-side device, comprising:

[0026] The receiving end device sends an object generation task, wherein the object generation task is used to request the generation of a target object of document type;

[0027] Analyze the object generation task to obtain multiple object generation subtasks corresponding to the object generation task;

[0028] Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0029] Objects are generated based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and the target object corresponding to the object generation task is determined based on multiple initial objects.

[0030] The target object is sent to the end-side device.

[0031] According to a sixth aspect of the present disclosure, a data processing apparatus is provided, applied to a cloud-side device, comprising:

[0032] The task receiving module is configured to receive an object generation task sent by the end-side device, wherein the object generation task is used to request the generation of a target object of document type;

[0033] The task parsing module is configured to parse the object generation task and obtain multiple object generation subtasks corresponding to the object generation task;

[0034] The document determination module is configured to generate task information for sub-tasks based on each object, and determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0035] The object generation module is configured to generate objects based on the object generation sub-tasks and the object description information, obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects.

[0036] An object sending module is configured to send the target object to the end-side device.

[0037] According to a seventh aspect of the present disclosure, a computing device is provided, comprising:

[0038] Memory and processor;

[0039] The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of any of the above methods.

[0040] According to an eighth aspect of the present disclosure, an electronic device is provided, comprising:

[0041] A memory and a processor, the memory and the processor being connected via a bus;

[0042] The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of any of the above methods.

[0043] According to a ninth aspect of the present disclosure, a computer-readable storage medium is provided that stores a computer program / instructions that, when executed by a processor, implement the steps of any of the above methods.

[0044] According to a tenth aspect of the present disclosure, a computer program product is provided, including a computer program / instructions that, when executed by a processor, implement the steps of any of the above-described methods.

[0045] The data processing method provided in one or more embodiments of this disclosure can, during the generation of a target object, parse the object generation task and accurately divide the target object process into multiple object generation sub-tasks, thereby accurately knowing the multiple execution steps for generating the target object; secondly, determine the object description information used to describe the target object, providing accurate reference data for generating the target object; finally, based on each object generation sub-task and the object description information, generate multiple initial objects in batches, and based on the multiple initial objects, accurately determine the target object corresponding to the object generation task, thereby avoiding the problem of inaccurate target objects that may exist in the process of generating document-type target objects, and improving the accuracy of target objects. Attached Figure Description

[0046] Figure 1 is a schematic diagram illustrating the application of a data processing method provided in an embodiment of this disclosure;

[0047] Figure 2 is a flowchart of a data processing method provided in an embodiment of this disclosure;

[0048] Figure 3 is a flowchart of a data processing method provided in an embodiment of this disclosure;

[0049] Figure 4 is a flowchart of a second data processing method provided in an embodiment of this disclosure;

[0050] Figure 5 is a flowchart of a third data processing method provided in an embodiment of this disclosure;

[0051] Figure 6 is a schematic diagram of the structure of a data processing apparatus provided in an embodiment of the present disclosure;

[0052] Figure 7 is a schematic diagram of the structure of a second data processing apparatus provided in an embodiment of the present disclosure;

[0053] Figure 8 is a schematic diagram of the structure of a third data processing apparatus provided in an embodiment of the present disclosure;

[0054] Figure 9 is a structural block diagram of a computing device provided in an embodiment of this disclosure;

[0055] Figure 10 is a structural block diagram of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0056] Numerous specific details are set forth in the following description to provide a full understanding of this disclosure. However, this disclosure can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this disclosure. Therefore, this disclosure is not limited to the specific implementations disclosed below.

[0057] The terminology used in one or more embodiments of this disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this disclosure. The singular forms “a,” “the,” and “the” as used in one or more embodiments of this disclosure and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this disclosure refers to and includes any or all possible combinations of one or more associated listed items.

[0058] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this disclosure, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this disclosure, and similarly, second may also be referred to as first. Depending on the context, the word “if” as used herein may be interpreted as “when”, “in response to a determination”, or “when…”.

[0059] Furthermore, it should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in one or more embodiments of this disclosure are all information and data authorized by the user or fully authorized by all parties. Moreover, the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation entry points are provided for users to choose to authorize or refuse.

[0060] In one or more embodiments of this disclosure, a large model refers to a deep learning model with a large number of model parameters, typically containing hundreds of millions, tens of billions, hundreds of billions, trillions, or even tens of trillions of model parameters. A large model can also be called a foundation model. It is pre-trained using large-scale unlabeled corpora to produce a pre-trained model with hundreds of millions of parameters. Such models can adapt to a wide range of downstream tasks and have good generalization ability. Examples include Large Language Models (LLMs) and multi-modal pre-training models.

[0061] In practical applications, large models only require a small number of samples to fine-tune the pre-trained model before they can be applied to different tasks. Large models can be widely used in fields such as Natural Language Processing (NLP) and Computer Vision. Specifically, they can be applied to computer vision tasks such as Visual Question Answering (VQA), Image Captioning (IC), and Image Generation, as well as NLP tasks such as text-based sentiment classification, text summarization, and machine translation. The main application scenarios for large models include digital assistants, intelligent robots, search, online education, office software, e-commerce, and intelligent design.

[0062] First, the terms and concepts involved in one or more embodiments of this disclosure will be explained.

[0063] Intelligent Coding Assistant: Utilizing large-scale language models and artificial intelligence technology, it provides functions such as code completion, code generation, code error detection and correction, and code optimization to help developers reduce repetitive work and achieve significant improvements in programming efficiency and code quality.

[0064] Code generation: Utilizing large-scale language models and artificial intelligence technology, code is created based on user input prompts to implement user functions.

[0065] API (Application Programming Interface): An API is a set of predefined functions designed to provide applications and developers with the ability to access a set of routines based on certain software or hardware without needing to access the source code or understand the details of the internal workings.

[0066] PyTorch is an open-source deep learning framework for machine learning and deep learning.

[0067] TensorFlow is an end-to-end open-source machine learning platform.

[0068] Java is an object-oriented programming language.

[0069] Python: A computer programming language.

[0070] Vector recall is an efficient and flexible information retrieval technique. Its basic idea is to represent the query terms and each entry in the database as vectors, and then find the most matching entry by calculating the similarity between these vectors.

[0071] With the continuous development of computer technology, in the process of using computers for data processing, corresponding target objects can be generated based on object generation tasks. However, in the process of generating document-type target objects, there may be problems with the accuracy of the generated target objects. Therefore, how to improve the accuracy of target objects has become an urgent problem to be solved. For example, taking code generation scenarios as an example, in code development scenarios, more and more development frameworks and general API interfaces are widely used, such as neural network model development frameworks like PyTorch and TensorFlow in recent years, which have brought great convenience to code development. These API tools will iterate with the version updates of coding languages ​​such as Java and Python, and the calling methods will also change. Therefore, API calling has always been a widely used problem in code development scenarios, especially for unknown APIs (updated APIs or newly emerging APIs), requiring users to consult the documentation to determine the accurate calling code and thus call it correctly, including the API input parameter types, output result types, etc.

[0072] Based on this, a data processing method is provided in this disclosure. One or more embodiments of this disclosure also involve two other data processing methods, a data processing apparatus, two other data processing apparatuses, a computing device, a computer-readable storage medium, and a computer program product, which will be described in detail in the following embodiments.

[0073] Considering the large number of model parameters in the large model and the limited computing resources of mobile terminals, the data processing method provided in this disclosure can be applied to the application scenario shown in Figure 1, but is not limited thereto. In the application scenario shown in Figure 1, the large model is deployed on server 10. Server 10 can connect to one or more client devices 20 via a local area network (LAN), wide area network (WAN), Internet, or other types of data networks. These client devices 20 may include, but are not limited to, smartphones, tablets, laptops, PDAs, personal computers, smart home devices, and in-vehicle devices. Client devices 20 can interact with users through a graphical user interface to invoke the large model, thereby implementing the method provided in this disclosure.

[0074] In this embodiment, the system comprising a client device and a server can perform the following steps: the client device 20 can send a code generation task for target code to the server 10, wherein the code generation task requests the generation of target code; the server can perform the following steps: first, a task parsing step, which involves parsing the code generation task to break it down into multiple sub-tasks, thereby accurately determining the multiple execution steps for generating the target object; second, a document determination step, which involves determining a corresponding code description document for each sub-task based on the task information of each sub-task, wherein the code description document describes relevant information about the target code; finally, a code generation step, which involves generating corresponding initial code for each sub-task based on the code description document, and merging multiple initial codes to obtain the accurate target code corresponding to the code generation task. After obtaining the target code, the server 10 can send the target code to the client device 20, allowing the client device 20 to display the target code based on a graphical user interface.

[0075] It should be noted that, provided that the operating resources of the client device can meet the deployment and operating conditions of the large model, the embodiments disclosed herein can be performed on the client device.

[0076] Referring to Figure 2, Figure 2 shows a flowchart of a data processing method provided according to an embodiment of the present disclosure, which specifically includes the following steps.

[0077] Step 202: Determine the object generation task, wherein the object generation task is used to request the generation of a target object of document type.

[0078] The target object can be the target code to be generated, the game guide to be generated, the article to be generated, or other document-type objects; the object generation task can be the task that generates the target object; for example, the object generation task can be a code generation task, a game guide generation task, an article generation task, etc.

[0079] In one or more embodiments provided in this disclosure, this data processing method can be applied to a server and receive object generation tasks sent by users through a client. Specifically, determining the object generation task includes:

[0080] The client sends an object generation task for the target object, wherein the object generation task is sent by the client when the user performs an object generation operation based on the object generation interface;

[0081] The client can be a hardware device such as a computer or smart terminal, or a software such as an application or script. For example, the client could be the client corresponding to an intelligent coding assistant; this intelligent coding assistant can be understood as being based on artificial intelligence technology, which, by deeply understanding the developer's programming intent and context information, can achieve functions such as automatic code completion, unit test generation, and project-level code modification suggestion generation, significantly improving the developer's coding efficiency.

[0082] An object creation interface can be understood as an interface used by users to perform object creation operations. This interface can be an application interface or a web page. The object creation operation can be understood as an instruction to perform an action to create a target object; for example, a user clicking on an object creation control in the object creation interface, or a user entering text to create an object.

[0083] The object generation control can be understood as a control that performs an object generation operation upon triggering, such as a button or option box. The object generation text can be understood as the requirement text for the target object. For example, the object generation text could be text such as "Create a dataset from files in a given directory using a scrolling window method", "Generate a simple addition operation code", or "Generate an API call code".

[0084] Taking the application of the data processing method provided in this disclosure in the API call code generation scenario as an example, the data processing method is described below; the target object is the API call code, and the object generation task is the code generation task; based on this, the user inputs complex code requirement text (e.g., generating call code for a neural network model API) through the human-computer interaction interface (i.e., the object generation interface) of the intelligent coding assistant on the client; after receiving the code requirement text, the client generates a code generation task based on the code requirement text and sends the code generation task to the server, so that the server can generate the API call code required by the user based on the code generation task, thereby meeting the user's needs.

[0085] Step 204: parse the object generation task to obtain multiple object generation subtasks corresponding to the object generation task.

[0086] An object generation subtask can be understood as multiple subtasks used to generate a target object; one object generation subtask can correspond to an execution step in the target object generation operation; in the case of a code generation task within an object generation task, the object generation subtask can be a code generation subtask, where a code generation subtask is used to generate a portion of the target code; through multiple code generation subtasks, all the code data of the target code can be generated, thereby achieving the generation of the target code.

[0087] In one or more embodiments provided in this disclosure, in order to more accurately understand the object generation task and precisely decompose the object generation task into multiple object generation subtasks during the parsing of the object generation task, it is necessary to be able to determine the template object related to the target object. The specific implementation method is as follows.

[0088] The process of parsing the object generation task to obtain multiple object generation sub-tasks corresponding to the object generation task includes:

[0089] Based on the object information carried in the object generation task, a template object corresponding to the target object is determined, wherein the template object is a document-type template object;

[0090] Based on the template object, the object generation task is parsed to obtain the multiple object generation subtasks.

[0091] Among them, object information can be understood as the attribute information of the target object, such as the object identifier of the target object, the object type of the target object, etc.

[0092] A template object can be understood as an object similar to a target object. For example, if the target object is target code, the template object can be sample code; if the target object is an article, the template object can be a sample article.

[0093] Following the previous example, this method can determine a sample code (i.e., template object) for an API library based on the code type (i.e., object information) of the API call code, helping to better break down the task. Based on this, after analyzing the user's complex requirements (i.e., the object-generated text in the object generation task) and decomposing and planning the user's complex requirements according to the sample code, multiple code generation sub-requirements (i.e., the object-generated sub-text in the object generation sub-task) are obtained. Based on these multiple code generation sub-requirements, multiple code generation sub-tasks (i.e., multiple object generation sub-tasks) are obtained. These multiple code generation sub-tasks can constitute a sub-task list [t1, t2, ..., tn]; where t1-tn represent the multiple code generation sub-tasks respectively.

[0094] In one or more embodiments provided in this disclosure, this method can utilize a language generation model to parse the object generation task, thereby obtaining multiple object generation sub-tasks corresponding to the object generation task; the language generation model can be a large language model, a large model, etc. For example, in a neural network development framework, user code requirements are often complex, and to achieve a certain code requirement, users often need to continuously call multiple APIs (such as calling neural network models, model training strategies, loss calculation, etc. APIs); in this case, users need to consult API documentation, which is time-consuming and laborious. However, in the era of large model (LLM), many works have attempted to use LLM to automatically generate code, saving manpower. Based on this, after determining the code generation task, this method can utilize a large model to parse the code generation task, thereby determining multiple code generation sub-tasks corresponding to the code generation task.

[0095] In one or more embodiments provided in this disclosure, during the process of parsing the object generation task using a language generation model to obtain multiple object generation subtasks corresponding to the object generation task, the cost of knowledge updates in LLM models is huge due to the updating and replacement of API frameworks (such as retraining the model). To address this situation, this method proposes a code generation method based on chained exploration in multi-API call scenarios, which can effectively alleviate the problem of LLM models correctly calling APIs when facing unknown (new / updated) APIs, thereby generating reasonable and effective code.

[0096] The object generation task includes generating text from objects;

[0097] The step involves parsing the object generation task based on the template object to obtain multiple object generation sub-tasks corresponding to the object generation task, including:

[0098] The template object and the object-generated text are input into the language processing model, wherein the object-generated text is text used to describe the target object;

[0099] In the language processing model, the object-generated text is analyzed based on the template object to obtain multiple object-generated sub-texts that perform object generation operations. The multiple object-generated sub-texts correspond to the execution steps of the object generation operation. The multiple object-generated sub-texts are obtained by dividing the object-generated text according to the execution steps of the object generation operation, and the multiple object-generated sub-texts are ordered according to the execution steps.

[0100] Based on the multiple objects, generate sub-text and determine the multiple object generation sub-tasks corresponding to the object generation task.

[0101] Among them, object-generated text can be understood as text used to generate target objects, such as user-provided requirement text for generating code, or user-provided requirement text for generating articles.

[0102] Following the previous example, this method can determine a sample code for using an API library based on the code type of the API call code. Thus, when using LLM for task decomposition and planning, it can provide LLM with a sample code for using an API library in a one-shot manner, thereby helping LLM to better decompose tasks.

[0103] Based on this, LLM is used to analyze the user's complex requirements (i.e., the object-generated text in the object generation task), and the user's complex requirements are decomposed and planned according to the sample code to obtain multiple code generation sub-requirements (i.e., the object-generated sub-text in the object generation sub-task). Based on the multiple code generation sub-requirements, multiple code generation sub-tasks are obtained, and these multiple code generation sub-tasks can form a sub-task list [t1,t2,…,tn].

[0104] Step 206: Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object.

[0105] This task information can be understood as a text vector or keywords for the object generation subtask.

[0106] Taking task information as a keyword as an example, this method can determine the keyword from the object generation subtext corresponding to the object generation subtask, and retrieve the object description information corresponding to the keyword from the data storage unit based on the keyword, and use the object description information as the object description information of the target object, thereby achieving fast and efficient determination of the object description information of the target object.

[0107] The object description information can be understood as a document or text used to describe the target object; if the object description information is a document, it can be an object description document. For example, the object description information can be a code documentation document.

[0108] In one or more embodiments provided in this disclosure, this method can determine the object description information of the target object through vector recall, and the specific implementation is as follows:

[0109] The task information is a text vector;

[0110] The step of generating sub-tasks based on the task information of each object and determining the object description information of the target object includes:

[0111] The object generation subtext included in the target object generation subtask is vectorized to obtain the text vector of the object generation subtext, wherein the target object generation subtask is any one of the plurality of object generation subtasks;

[0112] Determine the information retrieval vector corresponding to the description information of each candidate object in the data storage unit, and determine the similarity between the text vector and the information retrieval vector;

[0113] From multiple similarities, a target similarity greater than or equal to a preset similarity threshold is determined, and the candidate object description information corresponding to the target similarity is determined as the object description information.

[0114] The text vector can be understood as the vector obtained by vectorizing the subtext generated from the object. This method can determine the object description information of the target object through the text vector.

[0115] A data storage unit can be understood as a unit used to store object description information. For example, this data storage unit can be a database, a local disk, or a cloud storage service. When the object description information is API documentation, this data storage unit can be an API library. When developing or updating an API library, its creators typically release documentation to help users understand how to use the API. This API library documentation is referred to as an API Library.

[0116] An information retrieval vector can be understood as a vector corresponding to the descriptive information of a candidate object. This information retrieval vector is used to retrieve the descriptive information of the candidate object. It is obtained by converting each entry (such as a document, product, user, etc.) in the data storage unit into a numerical vector.

[0117] Following the previous example, this method first requires converting the query item (object-generated subtext) and each API document in the API library into numerical vectors (i.e., text vectors and information retrieval vectors).

[0118] Secondly, the similarity between the text vector and the information retrieval vector is calculated. The similarity metric includes, but is not limited to, cosine similarity, Euclidean distance, Manhattan distance, etc.

[0119] Finally, the candidate API documents are sorted according to the calculated similarity, and the most relevant API documents are returned as the final result.

[0120] Step 208: Generate objects based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects.

[0121] The initial object can be understood as an object that can generate the target object. The initial object can be a part of the target object. For example, if the target object is the target code, the initial object can be the initial code, and the initial code is a part of the target code. The target code can be formed by multiple initial objects.

[0122] In one or more embodiments provided in this disclosure, the step of generating objects based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and determining the target object corresponding to the object generation task based on multiple initial objects, includes steps one and two:

[0123] Step 1: Using a language processing model, generate objects based on the object generation subtasks and object description information to obtain the initial objects corresponding to the object generation subtasks.

[0124] In one or more embodiments provided in this disclosure, the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation.

[0125] The process of using a language processing model to generate objects based on the object generation subtasks and the object description information, and obtaining the initial objects corresponding to the object generation subtasks, includes:

[0126] From the subtask sequence, a target code generation subtask is determined, wherein the subtask sequence contains multiple code generation subtasks ordered according to the execution steps of the code generation operation, and the target code generation subtask is the code generation subtask that is first in the subtask sequence.

[0127] From the code documentation, determine the target code documentation corresponding to the target code generation subtask, and use the language processing model to generate code based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask;

[0128] From the subtask sequence, determine the candidate code generation subtask that is located after the target code generation subtask and adjacent to the target code generation subtask;

[0129] The candidate code generation subtask is identified as the target code generation subtask, and the operation of determining the target code documentation corresponding to the target code generation subtask from the code documentation continues until the candidate code generation subtask is empty.

[0130] Following the previous example, in the process of generating initial code using a large model (i.e., a language generation model), the subtasks that need to be generated can be determined sequentially from the subtask sequence. After a code generation subtask is completed, the next corresponding code generation subtask can be selected from the subtask sequence to generate code, until all code generation subtasks in the subtask sequence have been processed.

[0131] After retrieving the API documentation (i.e., code documentation) related to each subtask ti (i.e., multiple code generation subtasks), this method can explore the code generation scheme for each subtask step by step, starting from subtask t1 (i.e., the target code generation subtask), according to the order of the subtask list [t1, t2, ..., tn] (i.e., the subtask sequence). For each subtask ti, the exploration is carried out according to the following steps:

[0132] (a) Code generation.

[0133] The subtask ti (i.e., the target code generation subtask) and the relevant API documentation ai (i.e., the target code description documentation) are input into the LLM (i.e., the language processing model), and the LLM is used to generate candidate code results (i.e., the initial code).

[0134] (b) Repeat step (a) above until all subtasks in the subtask list have completed the exploration process and obtained the corresponding code candidate results.

[0135] In one or more embodiments provided in this disclosure, the initial code may be multiple;

[0136] After generating code using the language processing model based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask, the process further includes:

[0137] Multiple initial codes are treated as multiple executable codes, and each executable code is executed separately to obtain the code execution result of each executable code;

[0138] From multiple code execution results, determine the target code execution result that meets the code detection criteria;

[0139] If the execution result of the target code is one, the code to be executed corresponding to the execution result of the target code is used as the initial code corresponding to the target code generation subtask;

[0140] When there are multiple execution results of the target code, select one code to be executed from the code to be executed corresponding to each execution result as the initial code corresponding to the target code generation subtask;

[0141] If the execution result of the target code is zero, select one code from multiple executable codes as the initial code corresponding to the target code generation subtask.

[0142] The code execution result can be understood as the execution result obtained by executing the initial code. The code execution result includes: execution success and execution failure (such as syntax errors caused by syntax errors).

[0143] The code detection conditions can be set according to the actual application scenario. For example, the code detection condition can be that the code execution result is successful.

[0144] Following the previous example, after retrieving the relevant API documentation (denoted as ai) for each subtask ti from the API Library based on the subtask list, this method can explore the code generation scheme for each subtask step by step, starting from t1, according to the order of the subtask list [t1, t2, ..., tn]. For each subtask ti, the exploration is carried out according to the following steps:

[0145] (a) Code generation: Based on the subtask ti and its related API documentation ai, as well as the chain result of the code and execution result of the preceding subtask, LLM is used to generate multiple code candidate results.

[0146] (b) Code execution: For each candidate code result generated in step (a), execute the code and view the generation result of each candidate result. The execution result includes: execution success and execution failure (e.g., syntax error caused by syntax error).

[0147] (c) Experience selection: Based on the execution result of step (b): If only one code snippet executes successfully, then select that code snippet and the execution result.

[0148] If multiple candidate codes execute successfully, then randomly select one successful code and its execution result.

[0149] If none of the above methods succeed, then a code and its execution result will be randomly selected.

[0150] (d) Link integration: Integrate the code and execution results of all subtasks t1, t2, ..., ti, and use them as part of the input for the subsequent subtask ti+1.

[0151] (e) Repeat steps (a) to (d) above until all subtasks in the subtask list have been processed.

[0152] In one or more embodiments provided in this disclosure, the step of using the language processing model to generate code based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask includes:

[0153] If it is determined that the target code generation subtask is not at the beginning of the subtask sequence, the preceding subtask corresponding to the target code generation subtask is determined from the subtask sequence, wherein the preceding subtask is the code generation subtask that precedes the target code generation subtask in the subtask sequence.

[0154] Determine the preceding initial code of the preceding subtask and the code execution result of the preceding initial code, wherein the code execution result is obtained by executing the preceding initial code;

[0155] The target code generation subtask, the target code documentation, the preceding initial code, and the execution result of the preceding initial code are input into the language processing model to generate code, thereby obtaining the initial code corresponding to the target code generation subtask.

[0156] Following the example above, for each generated candidate code result, execute the code to view the generated result of each candidate result. The execution result includes: execution success and execution failure (e.g., syntax errors caused by syntax errors, etc.).

[0157] If only one code snippet executes successfully, then select that code snippet along with the execution result.

[0158] If multiple candidate codes execute successfully, then randomly select one successful code and its execution result.

[0159] If none of the above steps are successful, then randomly select one code and its execution result.

[0160] The code and execution results of all subtasks t1, t2, ..., ti are integrated and used as part of the input for the subsequent subtask ti+1.

[0161] Step 2: Using the language processing model, perform object fusion on the multiple initial objects to obtain the target object.

[0162] In one or more embodiments provided in this disclosure, the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation.

[0163] The step of using the language processing model to perform object fusion on the multiple initial objects to obtain the target object corresponding to the object generation task includes:

[0164] Determine the execution result of each initial code;

[0165] Multiple code generation subtasks, the code documentation, multiple initial codes, and the code execution results corresponding to each initial code are input into a language processing model for code fusion to obtain the target code corresponding to the code generation task.

[0166] Following the previous example, the code, execution results, and corresponding API documentation for each subtask are all input into the LLM to generate the final complete code snippet (i.e., the target code), thereby fulfilling the user's complex requirements.

[0167] In one or more embodiments provided in this disclosure, after determining the target object corresponding to the object generation task based on multiple initial objects, the method further includes:

[0168] The target object is sent to the client, and then displayed to the user through the object generation interface in the client, thereby satisfying the user's object generation needs.

[0169] The data processing method provided in one or more embodiments of this disclosure can, during the generation of a target object, parse the object generation task and accurately divide the target object process into multiple object generation sub-tasks, thereby accurately knowing the multiple execution steps for generating the target object; secondly, determine the object description information used to describe the target object, providing accurate reference data for generating the target object; finally, based on each object generation sub-task and the object description information, generate multiple initial objects in batches, and based on the multiple initial objects, accurately determine the target object corresponding to the object generation task, thereby avoiding the problem of inaccurate target objects that may exist in the process of generating document-type target objects, and improving the accuracy of target objects.

[0170] The following description, in conjunction with Figure 3, uses the application of the data processing method provided in this disclosure in a multi-API call scenario as an example to further illustrate the data processing method. Figure 3 shows a flowchart of the processing procedure of a data processing method provided in an embodiment of this disclosure.

[0171] As shown in Figure 3, this method can be understood as a code generation method based on chained exploration in multi-API call scenarios. For complex requirement texts proposed by users, the code generation task can be achieved by following the chain of steps below, which specifically includes the following steps.

[0172] Step 302: Receive complex requirement tasks.

[0173] Specifically, users can input the requirement text for generating multi-API call code in the terminal's intelligent coding assistant interface; the terminal generates a complex requirement task based on the requirement text and sends the complex requirement task to the server.

[0174] On the server side, after receiving a complex task, the process of generating code for multiple API calls begins.

[0175] Step 304: Task planning.

[0176] Specifically, this method can determine a sample code for using an API library based on the code type of the API call code required by the user. Thus, when using LLM (i.e., subtask scheduler) for task decomposition and planning, it provides an example code for using an API library in a one-shot manner, thereby helping LLM to better decompose tasks.

[0177] Based on this, the requirement text and the sample code are input into the LLM. Using the LLM, the user's complex requirements are analyzed based on the sample code, and the user's complex requirements are decomposed and planned based on the sample code to obtain multiple code-generated sub-requirement texts.

[0178] Multiple code generation sub-requirement texts will be generated based on multiple code generation sub-tasks, which can form a sub-task list [t1,t2,…,tn].

[0179] Step 306: API Documentation Retrieval (API Recommendation).

[0180] Specifically, based on the subtask list returned in step 304, a vector recall method is used to retrieve the API documentation related to each subtask ti from the API Library, denoted as ai; the specific execution method is as follows:

[0181] First, the query items (code generation sub-requirement text) and each API document in the API library need to be converted into numeric vectors (i.e., text vectors and document vectors).

[0182] Secondly, calculate the similarity between text vectors and document vectors. The similarity metric includes, but is not limited to, cosine similarity, Euclidean distance, Manhattan distance, etc.

[0183] Finally, the candidate API documents are sorted according to the calculated similarity, and the most relevant API documents are returned as the final result.

[0184] Step 308: API exploration chain.

[0185] Specifically, after retrieving the API documentation related to each subtask ti, this method can explore the code generation scheme for each subtask step by step, starting from t1, according to the order of the subtask list [t1, t2, ..., tn]. For each subtask ti, the exploration is carried out according to the following steps:

[0186] (a) Code generation:

[0187] Determine the subtask ti and its related API documentation ai, and if the subtask ti has a preceding subtask, use the code result of the preceding subtask and the execution result obtained by executing the code result as the link result;

[0188] The subtask ti, the relevant API documentation ai, and the link results are input into the LLM (i.e., generator), which generates multiple code candidate results; the code candidate results are code data.

[0189] (b) Code execution:

[0190] Execute each candidate code result generated in step (a) using the executor; examine the execution result of each candidate result by executing the code data, including: execution success or execution failure (e.g., syntax errors caused by syntax errors, etc.);

[0191] (c) Experience selection:

[0192] Based on the execution result of step (b), multiple code candidate results are filtered to obtain the code snippet and execution result corresponding to subtask ti. The specific method is as follows:

[0193] If only one code snippet among multiple code candidate results executes successfully, then that code candidate result and its corresponding execution result are selected.

[0194] If multiple code snippets pass execution among multiple candidate code results, then randomly select one candidate code that passed execution and its corresponding execution result.

[0195] If none of the multiple candidate code results pass execution, then a candidate code result and its corresponding execution result are randomly selected.

[0196] (d) API exploration chain integration:

[0197] The code and execution results of all subtasks t1, t2, ..., ti are integrated and used as part of the input for the subsequent subtask ti+1.

[0198] (e) Repeat steps (a) to (d) above until all subtasks in the subtask list have completed the exploration process.

[0199] It should be noted that, in order to ensure the accuracy of API exploration, this method can also perform an extra round of self-debugging.

[0200] Step 310: Obtain the complete code snippet (final solution generation).

[0201] Specifically, after obtaining the code (i.e., pruned code, pruned experience) and execution results for each subtask, the final complete code snippet is determined using a large model. The specific method is as follows:

[0202] Input the code, execution results, and corresponding API documentation for each subtask into the LLM;

[0203] By using LLM to adjust and organize the code of each subtask, multiple code snippets can be merged to generate a final, complete code snippet that meets the user's complex needs.

[0204] Based on the above steps, the data processing method in this disclosure provides a code generation method based on chained exploration in multi-API call scenarios. This method can improve the code generation capability in multi-API call scenarios, especially when the API model to be called has never been trained or learned. This method can effectively solve the illusion problem that LLM models may produce when facing unknown new knowledge, and improve the quality of code generation in this scenario.

[0205] Furthermore, this method takes into account that while LLM can quickly generate code snippets based on user needs, the APIs required to meet user needs are not trained by LLM. Therefore, when LLM generates code directly, it cannot effectively utilize the APIs, including the API names, input parameter formats, and output parameter formats. As a result, the generated code is of poor quality and cannot meet user needs.

[0206] Based on this, this method addresses the issue of poor generation performance when LLM encounters APIs that have not been trained. It employs a recall approach, retrieving relevant documentation for the APIs to be used and providing it to the LLM for learning, thereby ensuring that the model correctly uses the APIs, including API names and calling methods.

[0207] Furthermore, this method takes into account that when the user needs are complex, the process of retrieving relevant API documents from the API documentation library based on the user's requirements and generating code snippets by the LLM based on the retrieved API documents and user requirements may result in some requirements being missed when the user's requirements are complex. This may prevent the LLM from fully realizing the user's requirements.

[0208] Based on this, when LLM faces complex user requirements, it may miss some requirements and fail to fully realize the user requirements. To address this, the method breaks down the user requirements into sub-tasks and explores them step by step to ensure the complete implementation of the sub-tasks and form context dependencies, thus ensuring the complete feasibility of the final solution.

[0209] In summary, this method provides a chain-based code generation approach for multi-API call scenarios. It breaks down complex requirements into steps, generating a list of subtasks. Combined with sample code demonstrating API library usage, this helps LLMs better decompose tasks, ensuring the final code generates complete user-defined code. For the subtask list, a chain-based API exploration path is constructed. Each subtask generation includes code generation, execution, and selection steps, and the results of each subtask are chained together to ensure the final code fully considers the context. This effectively alleviates the challenges LLMs face when encountering unknown (new / updated) APIs, enabling correct API calls and generating reasonable and effective code.

[0210] Referring to Figure 4, Figure 4 shows a flowchart of a second data processing method provided according to an embodiment of the present disclosure, which specifically includes the following steps.

[0211] Step 402: Determine the code generation task, wherein the code generation task is used to request the generation of target code.

[0212] Step 404: Parse the code generation task to obtain multiple code generation sub-tasks corresponding to the code generation task.

[0213] Step 406: Generate task information for subtasks based on each code, and determine the code description document for the target code, wherein the code description document is a description document used to describe the target code.

[0214] Step 408: Generate code based on each code generation subtask and the code description document to obtain the initial code corresponding to each code generation subtask, and determine the target code corresponding to the code generation task based on multiple initial codes.

[0215] The second data processing method provided in one or more embodiments of this disclosure, during the generation of target code, can parse the code generation task and accurately divide the target code process into multiple code generation sub-tasks, thereby accurately knowing the multiple execution steps of generating the target code; secondly, it determines the code specification document used to describe the target code, providing accurate reference data for generating the target code; finally, based on each code generation sub-task and the code specification document, it generates multiple initial codes in batches, and based on the multiple initial codes, accurately determines the target code corresponding to the code generation task, thereby avoiding the problem of inaccurate target code that may exist in the process of generating document-type target code, and improving the accuracy of the target code.

[0216] The above is an illustrative scheme of the second data processing method in this embodiment. It should be noted that the technical solution of the second data processing method belongs to the same concept as the technical solution of the first data processing method described above. For details not described in detail in the technical solution of the second data processing method, please refer to the description of the technical solution of the first data processing method described above.

[0217] Referring to Figure 5, Figure 5 shows a flowchart of a third data processing method provided according to an embodiment of the present disclosure. This data processing method is applied to a cloud-side device and specifically includes the following steps.

[0218] Step 502: Receive an object generation task sent by the receiving end device, wherein the object generation task is used to request the generation of a target object of document type;

[0219] Step 504: Parse the object generation task to obtain multiple object generation sub-tasks corresponding to the object generation task;

[0220] Step 506: Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0221] Step 508: Generate objects based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects;

[0222] Step 510: Send the target object to the end-side device.

[0223] In one or more embodiments provided in this disclosure, the cloud-side device can be a central cloud device in a distributed architecture or an edge cloud device in a distributed architecture. The cloud-side device can be a cloud-side device with a cloud desktop system or cloud desktop software installed and deployed, such as a cloud server or cloud host. The endpoint device can be understood as any terminal that interacts with the cloud-side device. This terminal can be a laptop, desktop computer, tablet computer, smart device, server, etc.

[0224] This disclosure provides a data processing method for cloud-side devices through one or more embodiments. During the generation of a target object, the method can parse the object generation task, accurately divide the target object process into multiple object generation sub-tasks, and thus precisely determine the multiple execution steps for generating the target object. Secondly, it determines object description information to describe the target object, providing accurate reference data for generating the target object. Finally, based on each object generation sub-task and object description information, multiple initial objects are generated in batches, and based on these multiple initial objects, the target object corresponding to the object generation task is accurately determined. This avoids the potential problem of inaccurate target objects during the generation of document-type target objects, improving the accuracy of the target object.

[0225] The above is an illustrative scheme of the third data processing method in this embodiment. It should be noted that the technical solution of this third data processing method belongs to the same concept as the technical solution of the first data processing method described above. For details not described in detail in the technical solution of the third data processing method, please refer to the description of the technical solution of the first data processing method described above.

[0226] Corresponding to the above method embodiments, this disclosure also provides a data processing apparatus embodiment. FIG6 shows a schematic diagram of the structure of a data processing apparatus provided in one embodiment of this disclosure. As shown in FIG6, the apparatus includes:

[0227] The task determination module 602 is configured to determine an object generation task, wherein the object generation task is used to request the generation of a target object of document type;

[0228] Task parsing module 604 is configured to parse the object generation task and obtain multiple object generation subtasks corresponding to the object generation task;

[0229] The document determination module 606 is configured to generate task information for sub-tasks based on each object and determine object description information for the target object, wherein the object description information is descriptive information used to describe the target object;

[0230] The object generation module 608 is configured to generate objects based on the object generation sub-tasks and the object description information, obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects.

[0231] Optionally, the task parsing module 604 is further configured to:

[0232] Based on the object information carried in the object generation task, a template object corresponding to the target object is determined, wherein the template object is a document-type template object;

[0233] Based on the template object, the object generation task is parsed to obtain the multiple object generation subtasks.

[0234] Optionally, the object generation task includes generating text from objects;

[0235] The task parsing module 604 is further configured to:

[0236] The template object and the object-generated text are input into the language processing model, wherein the object-generated text is text used to describe the target object;

[0237] In the language processing model, the object-generated text is analyzed based on the template object to obtain multiple object-generated sub-texts that perform the object generation operation, wherein the multiple object-generated sub-texts correspond to the execution steps of the object generation operation;

[0238] Based on the multiple objects, sub-text is generated, and the multiple objects are used to generate sub-tasks.

[0239] Optionally, the task information is a text vector;

[0240] The document determination module 606 is further configured to:

[0241] The object generation subtext included in the target object generation subtask is vectorized to obtain the text vector of the object generation subtext, wherein the target object generation subtask is any one of the plurality of object generation subtasks;

[0242] Determine the information retrieval vector corresponding to the description information of each candidate object in the data storage unit, and determine the similarity between the text vector and the information retrieval vector;

[0243] From multiple similarities, a target similarity greater than or equal to a preset similarity threshold is determined, and the candidate object description information corresponding to the target similarity is determined as the object description information.

[0244] Optionally, the object generation module 608 is further configured to:

[0245] Using a language processing model, objects are generated based on the object generation subtasks and the object description information to obtain the initial objects corresponding to the object generation subtasks.

[0246] Using the language processing model, the multiple initial objects are fused to obtain the target object.

[0247] Optionally, the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation.

[0248] The object generation module 608 is further configured to:

[0249] From the subtask sequence, a target code generation subtask is determined, wherein the subtask sequence contains multiple code generation subtasks ordered according to the execution steps of the code generation operation, and the target code generation subtask is the code generation subtask that is first in the subtask sequence.

[0250] From the code documentation, determine the target code documentation corresponding to the target code generation subtask, and use the language processing model to generate code based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask;

[0251] From the subtask sequence, determine the candidate code generation subtask that is located after the target code generation subtask and adjacent to the target code generation subtask;

[0252] The candidate code generation subtask is identified as the target code generation subtask, and the operation of determining the target code documentation corresponding to the target code generation subtask from the code documentation continues until the candidate code generation subtask is empty.

[0253] Optionally, there may be multiple initial codes;

[0254] The object generation module 608 is further configured to:

[0255] Multiple initial codes are treated as multiple executable codes, and each executable code is executed separately to obtain the code execution result of each executable code;

[0256] From multiple code execution results, determine the target code execution result that meets the code detection criteria;

[0257] If the execution result of the target code is one, the code to be executed corresponding to the execution result of the target code is used as the initial code corresponding to the target code generation subtask;

[0258] When there are multiple execution results of the target code, select one code to be executed from the code to be executed corresponding to each execution result as the initial code corresponding to the target code generation subtask;

[0259] If the execution result of the target code is zero, select one code from multiple executable codes as the initial code corresponding to the target code generation subtask.

[0260] Optionally, the object generation module 608 is further configured to:

[0261] If it is determined that the target code generation subtask is not at the beginning of the subtask sequence, the preceding subtask corresponding to the target code generation subtask is determined from the subtask sequence, wherein the preceding subtask is the code generation subtask that precedes the target code generation subtask in the subtask sequence.

[0262] Determine the preceding initial code of the preceding subtask and the code execution result of the preceding initial code, wherein the code execution result is obtained by executing the preceding initial code;

[0263] The target code generation subtask, the target code documentation, the preceding initial code, and the execution result of the preceding initial code are input into the language processing model to generate code, thereby obtaining the initial code corresponding to the target code generation subtask.

[0264] Optionally, the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation.

[0265] The object generation module 608 is further configured to:

[0266] Determine the execution result of each initial code;

[0267] Multiple code generation subtasks, the code documentation, multiple initial codes, and the code execution results corresponding to each initial code are input into a language processing model for code fusion to obtain the target code corresponding to the code generation task.

[0268] Optionally, the task determination module 602 is further configured to:

[0269] The client sends an object generation task for the target object, wherein the object generation task is sent by the client when the user performs an object generation operation based on the object generation interface;

[0270] After determining the target object corresponding to the object generation task based on multiple initial objects, the method further includes:

[0271] The target object is sent to the client.

[0272] The data processing apparatus disclosed in one or more embodiments can, during the generation of a target object, parse the object generation task and accurately divide the target object process into multiple object generation sub-tasks, thereby accurately knowing multiple execution steps for generating the target object; secondly, determine the object description information used to describe the target object, providing accurate reference data for generating the target object; finally, based on each object generation sub-task and the object description information, generate multiple initial objects in batches, and based on the multiple initial objects, accurately determine the target object corresponding to the object generation task, thereby avoiding the problem of inaccurate target objects that may exist in the process of generating document-type target objects, and improving the accuracy of target objects.

[0273] The above is an illustrative scheme of a data processing apparatus according to this embodiment. It should be noted that the technical solution of this data processing apparatus and the technical solution of the data processing method described above belong to the same concept. For details not described in detail in the technical solution of the data processing apparatus, please refer to the description of the technical solution of the data processing method described above.

[0274] Corresponding to the above method embodiments, this disclosure also provides a second data processing apparatus embodiment. Figure 7 shows a schematic diagram of the structure of a second data processing apparatus provided in one embodiment of this disclosure. As shown in Figure 7, the apparatus includes:

[0275] The task determination module 702 is configured to determine a code generation task, wherein the code generation task is used to request the generation of target code;

[0276] The task parsing module 704 is configured to parse the code generation task and obtain multiple code generation subtasks corresponding to the code generation task.

[0277] The document determination module 706 is configured to generate task information for subtasks based on each code and determine the code description document for the target code, wherein the code description document is a description document used to describe the target code;

[0278] The code generation module 708 is configured to generate code based on the code generation sub-tasks and the code documentation, obtain the initial code corresponding to each code generation sub-task, and determine the target code corresponding to the code generation task based on multiple initial codes.

[0279] The second data processing apparatus provided in one or more embodiments of this disclosure, during the generation of target code, can parse the code generation task and accurately divide the target code process into multiple code generation sub-tasks, thereby accurately knowing multiple execution steps for generating the target code; secondly, it determines a code specification document to describe the target code, providing accurate reference data for generating the target code; finally, based on each code generation sub-task and the code specification document, it generates multiple initial codes in batches, and based on the multiple initial codes, accurately determines the target code corresponding to the code generation task, thereby avoiding the problem of inaccurate target code that may exist in the process of generating document-type target code, and improving the accuracy of the target code.

[0280] The above is an illustrative scheme of the second data processing device in this embodiment. It should be noted that the technical solution of the second data processing device and the technical solution of the second data processing method described above belong to the same concept. For details not described in detail in the technical solution of the second data processing device, please refer to the description of the technical solution of the second data processing method described above.

[0281] Corresponding to the above method embodiments, this disclosure also provides a third data processing device embodiment. Figure 8 shows a schematic diagram of the structure of a third data processing device provided in one embodiment of this disclosure. As shown in Figure 8, this device is applied to cloud-side equipment and includes:

[0282] The task receiving module 802 is configured to receive an object generation task sent by the end-side device, wherein the object generation task is used to request the generation of a target object of document type;

[0283] The task parsing module 804 is configured to parse the object generation task and obtain multiple object generation subtasks corresponding to the object generation task.

[0284] The document determination module 806 is configured to generate task information for sub-tasks based on each object, and determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object;

[0285] The object generation module 808 is configured to generate objects based on the object generation sub-tasks and the object description information, obtain the initial objects corresponding to the object generation sub-tasks, and determine the target object corresponding to the object generation task based on multiple initial objects.

[0286] The object sending module 810 is configured to send the target object to the end-side device.

[0287] The data processing apparatus for cloud-side devices provided in one or more embodiments of this disclosure can, during the generation of a target object, parse the object generation task and accurately divide the target object process into multiple object generation sub-tasks, thereby accurately knowing the multiple execution steps for generating the target object; secondly, determine the object description information used to describe the target object, providing accurate reference data for generating the target object; finally, based on each object generation sub-task and the object description information, generate multiple initial objects in batches, and based on the multiple initial objects, accurately determine the target object corresponding to the object generation task, thereby avoiding the problem of inaccurate target objects that may exist in the process of generating document-type target objects, and improving the accuracy of target objects.

[0288] The above is an illustrative scheme of the third data processing device in this embodiment. It should be noted that the technical solution of this third data processing device belongs to the same concept as the technical solution of the third data processing method described above. For details not described in detail in the technical solution of the third data processing device, please refer to the description of the technical solution of the third data processing method described above.

[0289] Figure 9 shows a structural block diagram of a computing device 900 provided according to an embodiment of the present disclosure.

[0290] The computing device 900 includes:

[0291] Memory 910 and processor 920;

[0292] The memory 910 is used to store computer programs / instructions, and the processor 920 is used to execute the computer programs / instructions, which, when executed by the processor 920, implement the steps of any of the above methods.

[0293] In one or more embodiments of this disclosure, the computing device can be understood as an integrated smart terminal, including but not limited to a server, desktop computer, PC (Personal Computer), all-in-one model machine, mobile phone, tablet computer or other portable smart terminal, etc., and the computing device may have the model described in the above embodiments of this disclosure pre-installed.

[0294] Specifically, this computing device can pre-install various types of models, including but not limited to models in natural language processing, visual processing, speech processing, code processing, and multimodal task processing, thus providing diverse model selection. In different product forms, this computing device can support one or more model usage methods, including but not limited to model training, model invocation, model fine-tuning, model deployment, model inference, and application. In some product forms, this computing device also supports model management, including but not limited to multi-type model management (supporting the management of discriminative, generative, and other model types), model version control (supporting the control of different model versions), and model evaluation (evaluating model performance and effectiveness based on model evaluation tools). In other product forms, this computing device can also create applications based on models, providing API (Application Programming Interface) calling capabilities. Users can call models into created applications through the API interface, and application management tools are also provided for application management and monitoring.

[0295] Furthermore, the computing device may also include data management (supporting the creation and management of model tuning datasets), a training center (providing abundant training resources to help users learn and master AI (Artificial Intelligence) technology), and basic control capabilities (providing enterprise-level basic control capabilities to ensure the security and efficient operation of the system). Through the above functions, it provides a comprehensive and integrated device for AI development, training, deployment, and application.

[0296] Furthermore, the components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 and the memory 910 can be connected via a bus.

[0297] The computing device 900 may also include an access device that enables the computing device 900 to communicate with a database storing data via one or more networks. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the Internet. The access device may include one or more of any type of wired or wireless network interface (e.g., a network interface controller (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) interface, a Wi-MAX (Worldwide Interoperability for Microwave Access) interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, or a Near Field Communication (NFC) interface.

[0298] In one embodiment of this disclosure, the aforementioned components of the computing device 900, as well as other components not shown in FIG. 9, may also be connected to each other, for example, via a bus. It should be understood that the computing device structural block diagram shown in FIG. 9 is merely for illustrative purposes and is not intended to limit the scope of this disclosure. Those skilled in the art can add or replace other components as needed.

[0299] The computing device 900 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable computing devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or personal computers (PCs). The computing device 900 can also be a mobile or stationary server.

[0300] The above is an illustrative scheme of a computing device according to this embodiment. It should be noted that the technical solution of this computing device belongs to the same concept as the technical solution of any of the above methods, and any details not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of any of the above methods.

[0301] Figure 10 shows a structural block diagram of an electronic device 1000 provided according to an embodiment of the present disclosure.

[0302] The memory 1010 and the processor 1020 are connected via a bus 1030;

[0303] The memory 1010 is used to store computer programs / instructions, and the processor 1020 is used to execute the computer programs / instructions, which, when executed by the processor 1020, implement the steps of the method.

[0304] Specifically, the components of the electronic device 1000 include, but are not limited to, a memory 1010 and a processor 1020. The processor 1020 is connected to the memory 1010 via a bus 1030, and the database 1050 is used to store data.

[0305] Electronic device 1000 also includes access device 1040, which enables electronic device 1000 to communicate via one or more networks 1060. Examples of these networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or combinations of communication networks such as the Internet. Access device 1040 may include one or more of any type of wired or wireless network interface (e.g., network interface controller (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, a Wi-MAX (Worldwide Interoperability for Microwave Access) interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, a Near Field Communication (NFC) interface, and so on.

[0306] In one embodiment of this disclosure, the aforementioned components of the electronic device 1000, as well as other components not shown in FIG. 10, may also be connected to each other, for example, via a bus. It should be understood that the electronic device structural block diagram shown in FIG. 10 is merely for illustrative purposes and is not intended to limit the scope of this disclosure. Those skilled in the art can add or replace other components as needed.

[0307] Electronic device 1000 can be any type of stationary or mobile electronic device, including mobile computers or mobile electronic devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable electronic devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary electronic devices such as desktop computers or personal computers (PCs). Electronic device 1000 can also be a mobile or stationary server.

[0308] The above is an illustrative scheme of an electronic device according to this embodiment. It should be noted that the technical solution of this electronic device belongs to the same concept as the technical solution of any of the above methods, and any details not described in detail in the technical solution of the electronic device can be referred to the description of the technical solution of any of the above methods.

[0309] An embodiment of this disclosure also provides a computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of any of the above methods.

[0310] The various embodiments in this disclosure are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the computer-readable storage medium embodiments are basically similar to any of the above method embodiments, so the description is relatively simple; relevant parts can be referred to in the description of any of the above method embodiments.

[0311] An embodiment of this disclosure also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of any of the methods described above.

[0312] The above is an illustrative scheme of a computer program product according to this embodiment. It should be noted that the technical solution of this computer program product belongs to the same concept as the technical solution of any of the above methods, and any details not described in detail in the technical solution of the computer program product can be referred to the description of the technical solution of any of the above methods.

[0313] The foregoing has described specific embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0314] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or certain intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added or removed according to the requirements of patent practice. For example, in some regions, according to patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0315] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments of this disclosure are not limited to the described order of actions, because according to the embodiments of this disclosure, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments of this disclosure.

[0316] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0317] The preferred embodiments disclosed above are merely illustrative of this disclosure. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments of this disclosure. These embodiments are selected and specifically described in this disclosure to better explain the principles and practical applications of the embodiments of this disclosure, thereby enabling those skilled in the art to better understand and utilize this disclosure. This disclosure is limited only by the claims and their full scope and equivalents.

Claims

1. A data processing method, comprising: Determine an object generation task, wherein the object generation task is used to request the generation of a target object of document type; Analyze the object generation task to obtain multiple object generation subtasks corresponding to the object generation task; Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object; Objects are generated based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and the target object corresponding to the object generation task is determined based on multiple initial objects.

2. The data processing method according to claim 1, wherein parsing the object generation task to obtain multiple object generation sub-tasks corresponding to the object generation task includes: Based on the object information carried in the object generation task, a template object corresponding to the target object is determined, wherein the template object is a document-type template object; Based on the template object, the object generation task is parsed to obtain the multiple object generation subtasks.

3. The data processing method according to claim 2, wherein the object generation task includes generating text from objects; The step involves parsing the object generation task based on the template object to obtain multiple object generation sub-tasks corresponding to the object generation task, including: The template object and the object-generated text are input into the language processing model, wherein the object-generated text is text used to describe the target object; In the language processing model, the object-generated text is analyzed based on the template object to obtain multiple object-generated sub-texts that perform the object generation operation, wherein the multiple object-generated sub-texts correspond to the execution steps of the object generation operation; Based on the multiple objects, sub-text is generated, and the multiple objects are used to generate sub-tasks.

4. The data processing method according to any one of claims 1 to 3, wherein the task information is a text vector; The step of generating sub-tasks based on the task information of each object and determining the object description information of the target object includes: The object generation subtext included in the target object generation subtask is vectorized to obtain the text vector of the object generation subtext, wherein the target object generation subtask is any one of the plurality of object generation subtasks; Determine the information retrieval vector corresponding to the description information of each candidate object in the data storage unit, and determine the similarity between the text vector and the information retrieval vector; From multiple similarities, a target similarity greater than or equal to a preset similarity threshold is determined, and the candidate object description information corresponding to the target similarity is determined as the object description information.

5. The data processing method according to claim 4, wherein determining the similarity between the text vector and the information retrieval vector comprises: The similarity between the text vector and the information retrieval vector is calculated using at least one of cosine similarity, Euclidean distance, or Manhattan distance.

6. The data processing method according to claim 4 or 5, wherein the vectorization processing of the object generation sub-text included in the target object generation sub-task includes: Convert the object's generated subtext into a numerical text vector; The determination of the information retrieval vector corresponding to the description information of each candidate object in the data storage unit includes: The candidate object description information is converted into a numerical information retrieval vector.

7. The data processing method according to any one of claims 1 to 6, wherein the step of generating objects based on the object generation sub-tasks and the object description information to obtain initial objects corresponding to the object generation sub-tasks, and determining the target object corresponding to the object generation task based on multiple initial objects, comprises: Using a language processing model, objects are generated based on the object generation subtasks and the object description information to obtain the initial objects corresponding to the object generation subtasks. Using the language processing model, the multiple initial objects are fused to obtain the target object.

8. The data processing method according to claim 7, wherein the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation; The process of using a language processing model to generate objects based on the object generation subtasks and the object description information, and obtaining the initial objects corresponding to the object generation subtasks, includes: From the subtask sequence, a target code generation subtask is determined, wherein the subtask sequence contains multiple code generation subtasks ordered according to the execution steps of the code generation operation, and the target code generation subtask is the code generation subtask that is first in the subtask sequence. From the code documentation, determine the target code documentation corresponding to the target code generation subtask, and use the language processing model to generate code based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask; From the subtask sequence, determine the candidate code generation subtask that is located after the target code generation subtask and adjacent to the target code generation subtask; The candidate code generation subtask is identified as the target code generation subtask, and the operation of determining the target code documentation corresponding to the target code generation subtask from the code documentation continues until the candidate code generation subtask is empty.

9. The data processing method according to claim 8, wherein the initial code is multiple; After generating code using the language processing model based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask, the process further includes: Multiple initial codes are treated as multiple executable codes, and each executable code is executed separately to obtain the code execution result of each executable code; From multiple code execution results, determine the target code execution result that meets the code detection criteria; If the execution result of the target code is one, the code to be executed corresponding to the execution result of the target code is used as the initial code corresponding to the target code generation subtask; When there are multiple execution results of the target code, select one code to be executed from the code to be executed corresponding to each execution result as the initial code corresponding to the target code generation subtask; If the execution result of the target code is zero, select one code from multiple executable codes as the initial code corresponding to the target code generation subtask.

10. The data processing method according to claim 8 or 9, wherein the step of using the language processing model to generate code based on the target code generation subtask and the target code documentation to obtain the initial code corresponding to the target code generation subtask includes: If it is determined that the target code generation subtask is not at the beginning of the subtask sequence, the preceding subtask corresponding to the target code generation subtask is determined from the subtask sequence, wherein the preceding subtask is the code generation subtask that precedes the target code generation subtask in the subtask sequence. Determine the preceding initial code of the preceding subtask and the code execution result of the preceding initial code, wherein the code execution result is obtained by executing the preceding initial code; The target code generation subtask, the target code documentation, the preceding initial code, and the execution result of the preceding initial code are input into the language processing model to generate code, thereby obtaining the initial code corresponding to the target code generation subtask.

11. The data processing method according to any one of claims 7 to 10, wherein the initial object is initial code, the object generation task is a code generation task, the object generation subtask is a code generation subtask, and the object description information is a code documentation; The step of using the language processing model to perform object fusion on the multiple initial objects to obtain the target object corresponding to the object generation task includes: Determine the execution result of each initial code; Multiple code generation subtasks, the code documentation, multiple initial codes, and the code execution results corresponding to each initial code are input into a language processing model for code fusion to obtain the target code corresponding to the code generation task.

12. The data processing method according to any one of claims 1 to 11, wherein the task of determining the object generation includes: The client sends an object generation task for the target object, wherein the object generation task is sent by the client when the user performs an object generation operation based on the object generation interface; After determining the target object corresponding to the object generation task based on multiple initial objects, the method further includes: The target object is sent to the client.

13. The data processing method according to any one of claims 3 to 12, wherein the step of generating objects based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks includes: For each object generation subtask, multiple initial objects are generated using the language processing model; Execute each initial object separately and obtain the execution result of each initial object; From multiple initial objects, select one initial object based on the execution result as the initial object for generating the subtask.

14. The data processing method according to any one of claims 7 to 11, wherein the step of using the language processing model to perform object fusion on the plurality of initial objects includes: The initial object, execution result, and object description information corresponding to each object generation subtask are input into the language processing model for object fusion.

15. A data processing method, comprising: Determine the code generation task, wherein the code generation task is used to request the generation of target code; Parse the code generation task to obtain multiple code generation subtasks corresponding to the code generation task; Based on the task information of each code subtask, determine the code description document of the target code, wherein the code description document is a description document used to describe the target code; Code is generated based on the code generation subtasks and the code documentation to obtain the initial code corresponding to each code generation subtask, and the target code corresponding to the code generation task is determined based on multiple initial codes.

16. A data processing method, applied to a cloud-side device, comprising: The receiving end device sends an object generation task, wherein the object generation task is used to request the generation of a target object of document type; Analyze the object generation task to obtain multiple object generation subtasks corresponding to the object generation task; Based on the task information of generating subtasks for each object, determine the object description information of the target object, wherein the object description information is descriptive information used to describe the target object; Objects are generated based on the object generation sub-tasks and the object description information to obtain the initial objects corresponding to the object generation sub-tasks, and the target object corresponding to the object generation task is determined based on multiple initial objects. The target object is sent to the end-side device.

17. A computing device, comprising: Memory and processor; The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of the method according to any one of claims 1 to 16.

18. An electronic device comprising: A memory and a processor, the memory and the processor being connected via a bus; The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of the method according to any one of claims 1 to 16.

19. A computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 16.

20. A computer program product comprising a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 16.