A method and system for automatically generating software development tasks based on multimodal requirements analysis

By using a multimodal requirements analysis method, various materials are acquired and preprocessed to generate a global fusion feature vector and a structured intent set. The particle swarm optimization algorithm is then used to automatically generate software development tasks, solving the problems of low efficiency and misunderstanding in traditional software requirements analysis. This achieves efficient and accurate multimodal information processing and task generation.

CN121785568BActive Publication Date: 2026-06-30HANGZHOU JIANXIN TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU JIANXIN TECHNOLOGY CO LTD
Filing Date
2026-03-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional software requirements analysis mainly relies on manual interpretation of text documents, which is inefficient and prone to misunderstandings. It is also difficult to handle multimodal information such as interface sketches, business process diagrams, and meeting recordings, resulting in missing information dimensions and incomplete semantic understanding.

Method used

A multimodal requirements analysis method is adopted. By acquiring and preprocessing text, image and audio materials, features of each modality are extracted and global fusion feature vectors are generated and mapped to a unified semantic space. Structured intent sets are extracted, and the optimal task generation strategy is calculated using the particle swarm optimization algorithm. This automatically generates API interface design, database schema design and test case framework.

Benefits of technology

It achieves efficient and comprehensive semantic understanding of multimodal information, reduces bias in understanding requirements, and improves the efficiency and accuracy of software development task generation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121785568B_ABST
    Figure CN121785568B_ABST
Patent Text Reader

Abstract

This application relates to the field of software development data processing technology, and in particular to a method and system for automatically generating software development tasks based on multimodal requirement analysis. The method includes: acquiring user-uploaded materials to be processed, and preprocessing the materials to obtain preprocessed materials; extracting modal features of each modality in the preprocessed materials, and generating a global fusion feature vector based on the modality number of the preprocessed materials, so as to map each modal feature to a unified semantic space; extracting element information of the global fusion feature vector, and organizing the element information according to a preset structured format to obtain a structured intent set; calculating the fitness value of each particle based on the structured intent set, preset information of the particles, and preset evaluation indicators using a particle swarm optimization algorithm; iteratively updating the position and velocity of the particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value, thereby achieving comprehensive and accurate requirement analysis of multimodal information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of software development data processing technology, and in particular to a method and system for automatically generating software development tasks based on multimodal requirement analysis. Background Technology

[0002] Software requirements analysis is a core preliminary step in the software development process. Essentially, it transforms users' vague business goals and functional expectations into clear and feasible technical requirements definitions, providing a unified benchmark for subsequent design, coding, and testing, and avoiding project rework or failure due to misunderstandings of requirements.

[0003] Traditional software requirements analysis relies heavily on manual interpretation of text documents, which is not only inefficient but also prone to omissions or errors due to misunderstandings. Statistics show that nearly 70% of software project problems can be traced back to the requirements definition phase. With technological advancements, automated requirements analysis tools based on a single modality (such as plain text) have emerged. These tools utilize natural language processing technology to parse requirements documents and generate preliminary requirement entries.

[0004] However, such methods are difficult to effectively handle the widespread multimodal information in reality, such as non-text materials like user-provided interface sketches, business process diagrams, and meeting recordings. They have inherent limitations such as missing information dimensions and incomplete semantic understanding. Summary of the Invention

[0005] In order to conduct comprehensive and accurate requirements analysis of multimodal information, this application provides a method and system for automatically generating software development tasks based on multimodal requirements analysis.

[0006] Firstly, this application provides a method for automatically generating software development tasks based on multimodal requirements analysis, employing the following technical solution:

[0007] A method for automatically generating software development tasks based on multimodal requirements analysis, the method comprising:

[0008] The system obtains user-uploaded materials to be processed and preprocesses the materials to be processed to obtain preprocessed materials, wherein the materials to be processed are at least one of text modality, image modality, and audio modality;

[0009] Modal features of each modality in the preprocessed material are extracted and combined with the number of modalities in the preprocessed material to generate a global fusion feature vector, so as to map each modal feature to a unified semantic space;

[0010] Extract the element information from the global fusion feature vector, and organize the element information according to a preset structured format to obtain a structured intent set. The element information includes a first core intent, sub-intents, and constraints. The structured intent set includes a second core intent, a list of sub-intents, and a list of constraints.

[0011] Using the particle swarm optimization algorithm, based on the structured intent set, the preset information of the particles, and the preset evaluation index, the fitness value of each particle is calculated. The preset information of the particles includes the definition of dimension, position, and velocity. The dimension includes API interface design, database schema design, and test case scheme.

[0012] Iteratively update the position and velocity of the particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

[0013] In one possible implementation, the step of extracting modal features of each modality in the preprocessed material and generating a global fusion feature vector by combining the number of modalities in the preprocessed material includes:

[0014] Modal features of each modality are extracted from the preprocessed material, and the number of modalities is determined according to the types of modal features. The modal features corresponding to the text modality, image modality, and audio modality are text feature vector, image feature vector, and audio feature vector, respectively.

[0015] Determine the number of modalities contained in the preprocessed material.

[0016] When the number of modalities is 2, the modal features corresponding to each modality are concatenated or element-wise operated to generate a global fusion feature vector;

[0017] When the number of modalities is 3, each modal feature is independently encoded to generate an independent feature vector for each modality. The dynamic weight of each modality is calculated based on the importance of each modal feature. The independent feature vectors are then weighted and averaged to generate a global fused feature vector.

[0018] In one possible implementation, the step of calculating the fitness value of each particle using a particle swarm optimization algorithm, based on the structured intent set, preset information about the particles, and preset evaluation metrics, includes:

[0019] Based on the structured intent set, generate API interface documentation, data definition statements for creating database table structures, and test case framework;

[0020] Based on the API interface documentation, data definition statements for creating database table structures, and test case frameworks, define the preset information of particles;

[0021] Initialize the particle swarm and randomly generate the position and velocity of each particle, wherein the particle swarm includes a preset number of particles;

[0022] Initialize the individual optimal position of each particle and the globally optimal position randomly set in the particle swarm, and obtain the fitness value of each particle according to the preset evaluation index and the corresponding rule weight.

[0023] In one possible implementation, obtaining the fitness value for each particle includes:

[0024] For each particle, if the current fitness value is greater than the initial fitness value, update the current best position of the individual to the current position of the particle;

[0025] For a swarm of particles, if the highest current fitness value among all particles is greater than the initial fitness value of the particle at the initial global best position, then update the current global best position to the current position of the particle with the highest corresponding fitness value.

[0026] In one possible implementation, the iterative updating of particle positions and velocities to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value includes:

[0027] The iterative update rate is obtained based on the inertia weight, acceleration constant, and random number, and the current particle iteration rate is determined based on the iterative update rate and velocity boundary.

[0028] The iteration update position is obtained based on the current particle iteration velocity and the previous position of the corresponding particle, and the current particle iteration position is determined based on the iteration update position and the position boundary.

[0029] Based on the fitness values ​​of all particles in each iteration round within the preset maximum number of iterations, update the individual optimal position and global optimal position of each particle;

[0030] The task generation strategy corresponding to the particle at the global optimal position is the optimal task generation strategy.

[0031] In one possible implementation, after determining the task generation strategy corresponding to the particle at the global optimal position as the optimal task generation strategy, the process includes:

[0032] Obtain the feedback data for a preset period and label the feedback data with attribute tags, including problem type, associated module, associated parameter, and degree of impact;

[0033] Multiple feedback data sets are merged based on question type and associated parameters, and the frequency of the feedback data for each question type is counted.

[0034] The comprehensive weight of the feedback data for each question type is obtained based on the degree of impact and the proportion of frequency.

[0035] The unified adjustment value is calculated based on the parameters of the associated module;

[0036] Adjust the corresponding associated parameters according to the ranking based on the comprehensive weights;

[0037] Verify the results of the adjusted parameters based on the optimization rules.

[0038] In one possible implementation, the step of calculating the unified adjustment value of the parameters based on the associated module includes:

[0039] When the adjustment directions of the same parameter are opposite, calculate the sum of the comprehensive weights of the positive and negative directions, and determine the difference between the comprehensive weights of the positive and negative directions as the basis for adjustment.

[0040] Secondly, this application provides an automatic software development task generation system based on multimodal requirements analysis, which adopts the following technical solution:

[0041] A software development task automatic generation system based on multimodal requirements analysis, the system comprising:

[0042] The receiving module is used to acquire user-uploaded materials to be processed and to preprocess the materials to be processed to obtain preprocessed materials, wherein the materials to be processed are at least one of text modality, image modality, and audio modality;

[0043] The fusion module is used to extract modal features of each modality in the preprocessed material and generate a global fusion feature vector by combining the number of modalities in the preprocessed material, so as to map each modal feature to a unified semantic space;

[0044] The processing module is used to extract the element information of the global fusion feature vector and organize the element information according to a preset structured format to obtain a structured intent set. The element information includes a first core intent, sub-intents, and constraints. The structured intent set includes a second core intent, a list of sub-intents, and a list of constraints.

[0045] The calculation module is used to calculate the fitness value of each particle using the particle swarm optimization algorithm, based on the structured intent set, the preset information of the particles, and the preset evaluation index. The preset information of the particles includes the definition of dimension, position, and velocity. The dimension includes API interface design, database schema design, and test case scheme.

[0046] The generation module is used to iteratively update the position and velocity of particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

[0047] Thirdly, this application provides an electronic device that adopts the following technical solution:

[0048] An electronic device includes: a memory and a processor;

[0049] The memory stores computer-executed instructions;

[0050] The processor executes computer execution instructions stored in the memory, causing the processor to perform the first aspect and / or various possible implementations of the first aspect as described above.

[0051] Fourthly, this application provides a computer-readable storage medium, which adopts the following technical solution:

[0052] A computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the first aspect and / or various possible embodiments of the first aspect as described above.

[0053] This application provides a method and system for automatically generating software development tasks based on multimodal requirements analysis. By supporting the acquisition and preprocessing of multimodal materials such as text, images, and audio, it overcomes the limitations of traditional single-modal requirements analysis. It enables the processing of non-text materials such as user-provided interface sketches, business process diagrams, and meeting recordings, reducing the loss of information dimensions. Furthermore, by extracting features from each modality and combining them with the modality number to generate a global fusion feature vector, it maps different modal features into a unified semantic space, avoiding fragmentation in the semantic understanding of multimodal information and improving the comprehensiveness of multimodal information semantic understanding. Then, by extracting the element information of the global fusion feature vector and organizing it into a structured intent set according to a preset structured format, it transforms ambiguous user requirements into clear core intents, sub-intent lists, and constraint lists, reducing requirement comprehension bias. Finally, by calculating particle fitness values ​​based on preset information and preset evaluation indicators of the structured intent set particles and iteratively updating particle positions and velocities to determine the optimal task generation strategy, it achieves automated generation of API interface design, database schema design, and test case framework construction, replacing the traditional manual generation of software development tasks and improving the efficiency of requirements analysis and task generation. Attached Figure Description

[0054] Figure 1 This is a flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, provided as an embodiment of this application.

[0055] Figure 2 This is a flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, provided as an embodiment of this application.

[0056] Figure 3 This is a flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, provided as an embodiment of this application.

[0057] Figure 4 This is a schematic diagram of the structure of an automatic software development task generation system based on multimodal requirement analysis, provided as an embodiment of this application.

[0058] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0059] To better understand the purpose, technical solutions, and advantages of this application, it has been described and illustrated below with reference to the accompanying drawings and embodiments. However, those skilled in the art should understand that this application can be implemented without these details. In some cases, to avoid obscuring various aspects of this application due to unnecessary description, well-known methods, processes, systems, components, and / or circuits already described at a higher level will not be elaborated upon. It will be apparent to those skilled in the art that various modifications can be made to the embodiments disclosed in this application, and the general principles defined in this application can be applied to other embodiments and application scenarios without departing from the principles and scope of this application. Therefore, this application is not limited to the illustrated embodiments, but conforms to the broadest scope consistent with the scope of protection claimed in this application.

[0060] It should be noted that the descriptions of these embodiments are for the purpose of aiding understanding the present invention, but do not constitute a limitation thereof. Furthermore, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

[0061] It should be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For hardware implementations, the processor may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, other electronic units designed to perform the functions described herein, or combinations thereof.

[0062] When an embodiment is implemented as software, firmware, middleware, or microcode, program code, or code segments, it may be stored in a machine-readable medium, such as a storage component. A code segment may represent a procedure, function, subroutine, program, routine, subroutine, module, software package, class, or any combination of instructions, data structures, or program statements. One code segment can be coupled to another code segment or hardware circuitry by passing and / or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc., can be passed, forwarded, or transmitted using any suitable means, including memory sharing, messaging, token passing, network transmission, etc.

[0063] For software implementations, the techniques described herein can be implemented using modules (e.g., programs, functions, etc.) that perform the functions described herein. The software code can be stored in memory units and executed by a processor. The memory units can be implemented within or outside the processor; in the latter case, the memory units can be communicatively coupled to the processor via various methods known in this art.

[0064] In the description of this application, "several" means one or more, "more than" means two or more, "greater than," "less than," and "exceeding" are understood to exclude the stated number, while "above," "below," and "within" are understood to include the stated number. The use of "first" and "second" in the description is merely for distinguishing technical features and should not be construed as indicating or implying relative importance, or implicitly indicating the number of indicated technical features, or implicitly indicating the order of the indicated technical features.

[0065] In the description of this application, the terms "one embodiment," "some embodiments," "illustrative embodiment," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any one or more embodiments or examples.

[0066] Compared to the current methods of manually analyzing software requirements from text documents or using automated requirements analysis tools that rely solely on plain text, this approach is less efficient and suffers from issues such as incomplete information dimensions and semantic understanding.

[0067] To address the aforementioned technical issues, this application provides a method and system for automatically generating software development tasks based on multimodal requirements analysis, which efficiently, comprehensively, and accurately analyzes multimodal information for requirements.

[0068] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.

[0069] Figure 1 A flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, provided as an embodiment of this application; Figure 1 As shown in the figure, this application discloses a method for automatically generating software development tasks based on multimodal requirements analysis. The method includes:

[0070] S101. Obtain the user-uploaded material to be processed, and preprocess the material to be processed to obtain preprocessed material.

[0071] Users upload materials to be processed through a visual interactive interface or a file upload interface. The materials to be processed can be at least one of the following: text modality, image modality, and audio modality.

[0072] Users can upload materials to be processed through web-based upload, API interface integration, or local import.

[0073] Text-based materials typically include requirements documents, user stories, and feature description documents, and can be in formats such as txt, word, excel, ppt, and pdf.

[0074] For example, a user uploads a Word document titled "E-commerce APP Shopping Cart Product Requirements Document". The document contains descriptions and constraints of functions such as adding products, modifying quantities, deleting products, and checkout in the shopping cart, such as "Products cannot be added to the shopping cart when inventory is insufficient".

[0075] Image-based materials include UI sketches, business process diagrams, prototypes, etc., and can be in formats such as jpg, png, and gif.

[0076] For example, a user uploads a PNG file of a "UI sketch of an e-commerce app shopping cart page". The sketch includes the layout and style annotations of the product list area, quantity modification button, delete button, and checkout button.

[0077] Audio materials include recordings of requirement discussion meetings and audio recordings of users stating their requirements, and can be in formats such as wav and mp3.

[0078] For example, a user uploaded an MP3 recording of a discussion about the requirements of the shopping cart function in an e-commerce app. The recording mentioned that "it is necessary to add a function to calculate the shipping cost when shopping cart items are settled across stores".

[0079] Furthermore, based on modal classification, the system can automatically identify the types of materials uploaded by users for processing, for example:

[0080] Text modality is determined by file header features (such as the %PDF-1.7 header of a PDF file) and file extension (txt / docx / xlsx);

[0081] Image modality is determined by pixel features (such as PGB encoding of JPG) and file extension (jpg / png / svg);

[0082] Audio modality is determined by audio sampling characteristics (such as the frame structure of MP3) and file extensions (wav / mp3 / flac).

[0083] After receiving the materials uploaded by the user, the first step is to preprocess the materials, and different processing methods are used for materials of different modalities.

[0084] For text-based materials, text content can be extracted using document parsing tools. For example, Apache POI can be used to extract text content from Word or Excel format materials, and PDFBox can be used to extract text content from PDF format materials. After extracting the text content, formatting marks, including fonts, colors, and typographical symbols, are removed. Then, the text is converted into a plain text string, and finally, it is split according to preset text segmentation rules and stored as a JSON format text dataset.

[0085] Taking the aforementioned "E-commerce APP Shopping Cart Product Requirements Document" as an example, in the generated JSON dataset after conversion, each function point corresponds to one data point. For example, the "Add Item to Shopping Cart" function point includes fields such as "Function Name", "Function Description", and "Constraints".

[0086] For image-based materials, image preprocessing tools such as OpenCV can be used to uniformly scale the images to a preset resolution, then convert them to RGB color mode, remove image noise, convert the image data to tensor format, and record metadata such as the original size and format of the image, storing it in a structured database.

[0087] For the aforementioned "e-commerce APP shopping cart page UI sketch", the tensor data generated after processing can be directly read by the subsequent visual analysis model, while the metadata is used for subsequent result backtracking.

[0088] For audio materials, audio processing tools such as FFmpeg can be used to convert the audio into a uniform 16kHz sampling rate, 16-bit depth mono WAV format, remove background noise from the audio, and extract metadata such as audio duration and bit rate, and store it in an object storage service.

[0089] The audio recording of the discussion on the shopping cart function requirements of the e-commerce APP mentioned above, after being converted, can be further processed by a subsequent audio processor.

[0090] The materials to be processed are preprocessed to obtain preprocessed materials, which are JSON format text datasets, tensor format image data, and WAV format audio.

[0091] S102. Extract the modal features of each modality in the preprocessed material, and generate a global fusion feature vector by combining the modality number of the preprocessed material, so as to map each modal feature to a unified semantic space.

[0092] Each modality has a dedicated analytical model, which is used to extract modal features from the preprocessed material of that modality. For example, key features can be entity information in text, interface elements in images, and semantic content in audio.

[0093] For the text modality, a pre-trained Transformer model is used to process the text dataset in the pre-processed material. Specifically, the Transformer model can be the BERT model, which is a natural language processing model based on bidirectional Transformer. The BERT model can better understand the semantic context.

[0094] The Transformer model first analyzes the text, breaking down continuous text into meaningful words or sub-words. For example, "cross-store settlement" can be broken down into "cross", "store", and "settlement".

[0095] The word segmentation results are then converted into word vectors through an embedding layer, and the semantic information of words is represented by these numerical vectors.

[0096] Finally, the Transformer encoder captures the contextual dependencies in the text, extracts key entities (such as the function "add items to shopping cart" and the constraint "insufficient inventory") and semantic relationships (such as "adding items" requires "sufficient inventory"), and generates text feature vectors.

[0097] For example, for the text "Adding items to shopping cart, unable to add when inventory is insufficient", the extracted entities include "Adding items to shopping cart" (function point) and "Insufficient inventory" (constraint condition), and the semantic relationship is "Insufficient inventory → Prevent adding items". The corresponding text feature vector can quantify this semantic information.

[0098] For image modalities, a model architecture combining Convolutional Neural Networks (CNNs, which excel at extracting local features such as edges and textures) and Visual Transformers (ViTs, which excel at capturing global dependencies such as the overall layout of image interface elements) is used to process the standardized image tensors. First, local features of the image are extracted through the convolutional and pooling layers of the CNN, such as the boundaries of the "product list area" and the shape of the "checkout button" in the UI sketch. Then, the local feature maps extracted by the CNN are segmented into multiple image patches and input into the ViT model. A self-attention mechanism captures the relationships between different image patches, such as the positional adjacency between the "product list area" and the "quantity modification button," identifying interface elements such as buttons, input boxes, and lists; layout structures such as top-bottom and left-right layouts; and interactive controls such as clickable buttons and input text boxes, generating image feature vectors.

[0099] For example, in the "UI sketch of the shopping cart page of an e-commerce APP", the extracted interface elements include "product list area", "quantity modification button", and "checkout button". The layout structure is "product list area at the top, quantity modification button and checkout button on the right side of the product item". The interactive control attributes are "quantity modification button is clickable (to increase / decrease quantity) and checkout button is clickable (to trigger the checkout process)". The corresponding image feature vectors can quantify these visual information.

[0100] For the audio modality, the standardized WAV format speech is converted into Mel spectrograms by an audio processor. Mel spectrograms can better reflect the semantic information in speech. Then, the audio Transformer model is used to process the Mel spectrograms, and the temporal dependencies in speech are captured through a self-attention mechanism, such as the semantic relationship between "cross-store settlement" and "shipping fee calculation". Combined with ASR (automatic speech recognition, which converts speech to text) technology, the speech is converted into text. Then, based on the converted text, key semantic information is extracted, such as "add cross-store settlement shipping fee calculation function", and a speech feature vector is generated.

[0101] For example, the speech "I need to add shipping cost calculation function when checking out items in shopping cart across stores" contains key semantic information such as "cross-store checkout", "shipping cost calculation" and "add function" in both the converted text and speech feature vectors.

[0102] After extracting modal features from each modality, it is necessary to perform feature fusion and semantic alignment because the information dimensions carried by the modal features of different modalities differ. Text focuses on logical description, images focus on visual layout, and audio focuses on speech semantics. Feature fusion and semantic alignment can integrate fragmented information scattered in each modality and make up for the lack of information in a single modality. For example, the logical description of "shopping cart checkout function" in the text, the visual position of the checkout button in the image, and the speech semantics of "cross-store checkout" mentioned in the audio can be fused to form a complete chain of demand information.

[0103] Furthermore, the semantic representations of modal features differ across modalities, and direct use can lead to semantic gaps. By fusing and aligning all modal features to a unified semantic space, consistent semantic measurement standards can be achieved for different modal features, avoiding misunderstandings of requirements due to differences in modal semantics. The fused global feature vector not only contains modal features from each modality but also reflects the relationships between modalities, reducing omissions in requirement analysis due to insufficient features.

[0104] S103. Extract the element information of the global fusion feature vector, and organize the element information according to the preset structured format to obtain the structured intent set.

[0105] Based on the global fusion feature vector after cross-modal fusion, the element information of the requirement is extracted by combining the preset requirement intent template and the semantic parsing module of the knowledge graph. The element information includes the first core intent, sub-intents and constraints.

[0106] The global fusion feature vector can be mapped into a semantic label sequence containing "functional keywords, relational labels, and conditional keywords" using the improved GPT-4V multimodal large model.

[0107] Example: E-commerce shopping cart requirements are generated after decoding:

[0108] [Function keywords: shopping cart checkout; Relationship tag: contain; Function keywords: cross-store shipping fee calculation; Condition keywords: insufficient stock; Relationship tag: block; Function keywords: add product].

[0109] Extraction of the first core intent:

[0110] Filter out "functional keywords" from semantic tags, such as "shopping cart checkout", "add items", and "shipping fee calculation";

[0111] The top-level keyword is determined by "relationship tags" (such as "contains" or "depends on") or "functional priority table" (terminal function > operation function): if "shopping cart checkout" contains "shipping fee calculation" and depends on "add products", then "shopping cart checkout function implementation" is the primary core intent.

[0112] Extraction of sub-intent:

[0113] Filter out "functional keywords belonging to the primary core intent" and categorize them according to "data manipulation / interface interaction / logical calculation";

[0114] Keywords irrelevant to the core intent are removed, such as "user registration" which is unrelated to "shopping cart checkout". This results in sub-intents such as "cross-store shipping fee calculation", "adding items to shopping cart", and "modifying product quantity".

[0115] Constraint extraction:

[0116] Filter by "keyword criteria", such as "insufficient inventory" or "amount over 200 yuan";

[0117] Associate corresponding functions through "relationship tags" (such as "block" and "allow"), and organize constraint statements according to "condition + relationship + function", such as "insufficient inventory → block adding products" and "amount over 200 yuan → allow free shipping".

[0118] Finally, the rule engine is used to verify the logical consistency of the elements, such as whether the sub-intent covers the core intent's implementation requirements and whether the constraints are contradictory. For example, if the sub-intent does not have "freight calculation", it will prompt for supplementation; if the constraints "allow addition when inventory is insufficient" and "prevent addition when inventory is insufficient" conflict, the latter that matches the semantic tag will be retained.

[0119] After extracting the feature information, the verified feature information is reorganized according to a preset structured format to generate a structured result containing a second core intent, a list of sub-intents, and a list of constraints.

[0120] Specifically, the preset structured format definition can be:

[0121] When the structure field is the second core intent, the data type is text (sentence), and the format requirement is to include "domain + first core intent + goal";

[0122] When the structure field is a list of sub-intents, the data type is an array (text), and each sub-intent contains "name + type + associated element";

[0123] When the structure field is a list of structured conditions, the data type is an array (text), and each constraint contains "constraint object + rule + associated sub-intention".

[0124] The second core intent generation supplements the first core intent with "domain information", which can usually be extracted from the material's metadata, such as "e-commerce APP".

[0125] The sub-intent list generation involves labeling each sub-intent with its "type" (e.g., "cross-store shipping cost calculation" is a logical calculation type) and "related elements" (e.g., "shipping cost calculation area" in a related image), and sorting them by priority. Example:

[0126] [{'Name':'Cross-store shipping fee calculation','Type':'Logical calculation class','Related element':'Shipping fee calculation UI area'},{'Name':'Adding items to shopping cart','Type':'Data operation class','Related element':'Adding items button'}].

[0127] The constraint list generation annotates each constraint with its "associated sub-intention" (e.g., "Insufficient stock prevents addition" associated with "Add to cart items") and "constraint type" (e.g., prohibition condition, restriction condition). Example:

[0128] [{'Constraint Object':'Product Addition','Rule':'Cannot Add When Inventory is Insufficient','Associated Sub-Intent':'Add Product to Cart','Type':'Prohibited Condition'}].

[0129] Finally, the structured intent set is converted into JSON format and stored in a MySQL database.

[0130] S104. Using the particle swarm optimization algorithm, calculate the fitness value of each particle based on the structured intent set, the preset information of the particles, and the preset evaluation index.

[0131] Particle Swarm Optimization (PSO) is a branch of evolutionary computation. It is an optimization algorithm that simulates the foraging behavior of bird flocks. It finds the optimal solution by moving particles in the solution space, where particles represent different task generation strategies.

[0132] The preset information for a particle includes the definition of its dimension, position, and velocity.

[0133] The dimensions include API interface design, database schema design, and test case design. These three dimensions are key aspects of program development, and they collectively affect the quality, maintainability, and scalability of the entire developed system.

[0134] The API interface design adheres to RESTful principles, emphasizing resource orientation and a unified interface. Core practices include using plural nouns for URIs (e.g., / api / users instead of verbs), mapping business entities, and supporting standardized HTTP method operations (GET for data retrieval, POST for data creation, etc.). Furthermore, response data should include a status code, description, and data body, and version control and platform independence should be considered.

[0135] Database schema design needs to be decoupled from the API to improve flexibility. Entity relationships should be defined based on object modeling to ensure normalization and reduce redundancy, while avoiding excessive splitting that could degrade query performance. Data types, constraints, and indexing strategies should be clearly defined during the design phase, and the table structure should be planned with reference to the API's resource structure.

[0136] The test case plan should cover the functional verification of the interface, including normal processes, boundary conditions, and exception handling (such as parameter errors and insufficient permissions). Test cases should be designed based on the API documentation, executed automatically using tools (such as Apifox), and integrated with database validation to ensure coverage.

[0137] The collaboration of these three elements is achieved through full lifecycle management and can form a task generation strategy. For example, in a microservice architecture, API design drives the resource mapping of the database schema, while test cases verify the interaction between the two.

[0138] The PSO algorithm can be used to obtain different task generation strategies and quantify the matching degree of each particle with the structure requirements.

[0139] Definition of particle dimension:

[0140] Dimension 1: API interface design, including resource paths, HTTP methods, request / response parameters, parameter validation rules, such as the " / api / cart / settle" path, POST method, productId parameter, and inventory validation rules;

[0141] Dimension 2: Database schema design, including table names, field names, field types, primary keys / foreign keys, and constraint rules, such as the "cart_item" table, the product_id field, the VARCHAR(32) type, and the foreign key relationship with the product table;

[0142] Dimension 3: Test case plan, including test scenarios, preconditions, test steps, and expected results, such as the "add product when inventory is insufficient" scenario, the precondition that the user is already logged in, the API call steps, and the expected return error code.

[0143] Definition of particle position:

[0144] The "coordinates" of a particle in parameter space correspond to a specific task generation scheme.

[0145] For example, the location is set to [API design: POST / api / cart / items (parameters: userId / productId / quantity); Schema design: cart_item table (cart_item_id / product_id / quantity); Test case: Add when inventory is insufficient → return 400 error], and the location is encoded as a vector (e.g., parameter validation rules are represented by "1=inventory validation / 0=no validation").

[0146] Definition of particle velocity:

[0147] The particle's "direction of movement and step size" in the parameter space are used for subsequent iteration adjustments.

[0148] For example, if the API dimension matching degree of a certain particle is low, the step size of that dimension in the velocity vector is increased, such as adjusting the step size from 0.1 to 0.3, to guide it to move towards a better API solution.

[0149] Regarding the pre-set evaluation indicators:

[0150] Indicators can be set from three dimensions: demand matching, technical feasibility, and quality assurance, to ensure that the fitness value can fully reflect the quality of the particles.

[0151] For example, the specific indicators and calculation logic of the above three evaluation indicators are as follows:

[0152] 1. Demand matching degree

[0153] (1) Intent coverage: (Number of sub-intents covered by particles / Total number of sub-intents in the structured intent set) × 100%;

[0154] For example, if it covers two sub-intents, "Shipping Cost Calculation" and "Add Product," then the total coverage is 66.7% (out of 3).

[0155] (2) Constraint satisfaction rate: (Number of constraints satisfied by particles / Total number of constraints in the structured intent set) × 100%;

[0156] For example, if both constraints, "inventory verification" and "free shipping rule", are satisfied, the satisfaction rate is 100%.

[0157] 2. Technical feasibility

[0158] (1) Syntax compliance library: (Number of APIs / Schemas / Test Cases conforming to syntax specifications / Total number) × 100%;

[0159] For example, the API path conforms to the RESTful specification, and the schema field types are correct;

[0160] (2) Resource consumption rate: (1 - computational resource consumption of the particle scheme / maximum allowable system consumption) × 100%, the lower the score, the better;

[0161] For example, points will be deducted if redundant database table fields lead to high resource consumption;

[0162] 3. Quality Assurance

[0163] (1) Test coverage: (Function points covered by test cases / Total function points in the structured intent set) × 100%;

[0164] For example, if the function covers three points: "add / modify / delete", then the coverage rate is 100%.

[0165] (2) Historical defect matching rate: (Number of historical similar defects in particle avoidance / Total number of historical defects) × 100%;

[0166] For example, avoiding the "overselling of inventory" defect would be a plus.

[0167] Furthermore, the weight vectors of the specific indicators in each evaluation index are determined according to the scenario.

[0168] After defining the preset information of the particles, the particle swarm is initialized, which generates multiple random task generation strategies. Then, the task generation strategies are scored by calculating the fitness value of the particles.

[0169] When calculating the fitness value of each particle, the first step is to call a syntax checking tool to obtain the syntax compliance rate. The syntax checking tool can be Swagger to verify API compliance or MySQL syntax checker to verify schema. Then, the particle scheme is compared with the structured intent set to calculate the "intent coverage rate" and "constraint satisfaction rate". Finally, defect cases with similar requirements are matched from the historical defect library to calculate the "historical defect matching rate".

[0170] Then, the fitness value is obtained according to the formula, the specific formula is as follows:

[0171] Fitness value = Σ (indicator score × indicator weight);

[0172] For example, if a particle has "intent coverage of 80% (weight 0.2), constraint satisfaction of 100% (weight 0.3), syntax compliance of 90% (weight 0.1), resource consumption of 20% (weight 0.1, score 80), test coverage of 90% (weight 0.2), and historical defect matching rate of 80% (weight 0.1)," then its fitness value = 80 × 0.2 + 100 × 0.3 + 90 × 0.1 + 80 × 0.1 + 90 × 0.2 + 80 × 0.1 = 89.

[0173] Finally, the fitness value of each particle is associated with its position information and stored in the Redis cache.

[0174] S105. Iteratively update the position and velocity of the particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

[0175] Through multiple iterations, multiple fitness values ​​of particles are obtained, and particles can be moved to the global optimal position, ultimately outputting a development task strategy that best matches the structured requirements.

[0176] Convert the vector parameters of the globally optimal position into a strategy for actual development tasks:

[0177] API Interface Design: Generate Swagger format interface documentation, including path, HTTP method, parameters, and error codes; such as "POST / api / cart / settle (parameters: userId / cartId, error code 400 = insufficient stock);

[0178] Database schema design: Generate MySQL / PostgreSQL DDL statements, including table structure and constraints; such as "CREATE TABLE cart_item(cart_item_id VARCHAR(32)PRIMARY KEY,...)";

[0179] Test case scheme: Generate test case framework in Excel format, including scenarios, steps, and expected results; such as "Add product when inventory is insufficient: Call POST / api / cart / items, expected to return 400 error".

[0180] The three types of task content can be packaged into a ZIP format "development task package" according to the directory structure " / api documentation / / database / DDL / / test / test cases", which developers can download by clicking "download" on the task output page; or synchronized to IntelliJ IDEA and Eclipse through the plugin, so that developers can directly execute DDL and write test code.

[0181] This application provides an automatic software development task generation method based on multimodal requirement analysis. By supporting the uploading and preprocessing of multimodal materials (text, images, and audio), it overcomes the limitations of traditional single-text requirement analysis and solves the problem of missing non-textual material information. Then, it extracts features from each modality using various specialized models, combines modality number differentiation fusion, and maps multimodal features to a unified semantic space, eliminating semantic gaps and integrating fragmented information. Subsequently, based on the large multimodal model, it extracts core intents, sub-intents, and constraints, organizing them into a structured intent set according to a preset format, transforming ambiguous requirements into clear and standardized requirement definitions, reducing comprehension biases. Finally, through a particle swarm optimization algorithm, using API interface design, database schema design, and test case schemes as particle dimensions, and combining multi-dimensional evaluation indicators to calculate fitness values, iteratively updates particle positions and velocities to obtain the optimal task strategy, generating directly deployable development documents and code templates. Through the entire process design of multimodal material processing, cross-modal feature fusion, structured intent transformation, and particle swarm optimization, it achieves high efficiency and accuracy in the automatic generation of software development tasks.

[0182] Figure 2 A flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, as provided in an embodiment of this application, is shown below. Figure 2 As shown, based on the above embodiments, this embodiment includes the following method:

[0183] S201. Obtain the user-uploaded material to be processed, and preprocess the material to obtain preprocessed material.

[0184] Step S201 can be referred to the description of step S101.

[0185] S202. Extract the modal features of each mode in the preprocessed material.

[0186] Step S202 can be described with reference to step S102, wherein the modal features corresponding to the text modality, image modality, and audio modality are text feature vector, image feature vector, and audio feature vector, respectively.

[0187] S203. Determine the number of modes contained in the preprocessed material based on the type of modal features.

[0188] The modality number can be obtained based on the types of text feature vectors, image feature vectors, and audio feature vectors contained in the preprocessed material.

[0189] S204. When the number of modalities is 2, the modal features corresponding to each modality are concatenated or element-wise operated to generate a global fusion feature vector.

[0190] When the required materials are a simple combination of two modalities, such as "UI sketch + brief text description", the image feature vector and the text feature vector can be directly concatenated at the feature layer or operated on at the element level, such as weighted summation, to achieve early association of cross-modal features.

[0191] For example, a user provides a "UI sketch of an e-commerce app shopping cart page" and the text description "Clicking the checkout button will redirect to the order confirmation page." The system performs a weighted summation of the image feature vector of the "checkout button" in the image and the text feature vector of "checkout button → redirect to order confirmation page," directly associating the visual elements in the image with the functional description in the text to generate a fused feature vector. During this process, a preset weighting rule (e.g., 60% weight for image visual elements and 40% weight for text functional descriptions) ensures that the fused result better reflects the core requirements.

[0192] S205. When the number of modalities is 3, each modal feature is independently encoded to generate an independent feature vector for each modality. The dynamic weight of each modality is calculated based on the importance of each modal feature. The independent feature vectors are then weighted and averaged to generate a global fused feature vector.

[0193] When the required materials are a complex combination of three modalities, such as "video presentation + multiple meeting recordings + detailed requirements document", each modality is first encoded independently. That is, the video is decomposed into an image sequence, the recording is converted into text, and the text features of the document are extracted to generate independent feature vectors for each modality.

[0194] Then, the dynamic weights of each modality are calculated through an attention gating mechanism. Based on the variance of the feature vector of each modality, the larger the variance, the more important the information. The weights (w_text / w_vision / w_audio) are output, satisfying that the sum of weights = 1.

[0195] For example, in the requirement of "e-commerce APP shopping cart function", if the meeting recording contains key "shipping calculation logic" that is not mentioned in the document, the weight of the voice modality will be increased, such as from 20% to 35%, the weight of the text modality will be appropriately reduced, such as from 50% to 40%, and the weight of the image modality will remain unchanged, such as 25%.

[0196] Finally, the independent feature vectors of each modality are weighted and averaged according to the dynamic weights to generate a global fused feature vector. The feature vectors of different modalities are then mapped to a unified semantic space to ensure that the features of different modalities represent similar semantics in the same dimension. For example, the image features and text features of the "checkout button" are close to each other in the unified semantic space, thus achieving cross-modal semantic alignment.

[0197] If the modality count is 1, analysis and processing can be performed directly without semantic alignment. This application mainly targets multiple modalities, and single modalities will not be described in detail here.

[0198] S206. Extract the element information of the global fusion feature vector, and organize the element information according to the preset structured format to obtain a structured intent set.

[0199] Step S206 can be referred to the description of step S103.

[0200] S207. Generate API interface documentation based on structured intent sets, data definition statements for creating database table structures, and test case framework.

[0201] First, it is necessary to define the requirements and technology mapping rules, that is, to establish the mapping relationship between "sub-intents / constraints" and "technical modules" in the structured intent set. Example:

[0202] Sub-intent "Cross-store shipping fee calculation" → Mapping API interface (calculates shipping fees), database table (stores shipping fee rules), and test cases (verifies free shipping logic);

[0203] Constraint "Insufficient inventory, cannot add" → Mapping API parameter validation (inventory field validation), database constraints (inventory field non-negative), test cases (insufficient inventory scenario).

[0204] Then,

[0205] Generate API interface documentation: Generate a RESTful interface framework based on the "function name + associated elements" of the sub-intent, including resource path, HTTP method, and core parameters (refer to constraints).

[0206] For example, the "cross-store shipping fee calculation" function corresponds to POST / api / cart / calculate-freight, with parameters including userId (user ID), cartId (cart ID), and totalAmount (settlement amount).

[0207] Generate data definition statements, i.e., database DDL: design table structure based on the "data operation type" of sub-intents;

[0208] For example, the "Add to Cart" function requires the cart_item table to store the relationship between products and the shopping cart, with fields including cart_item_id (primary key), cart_id (foreign key related to the shopping cart table), product_id (product ID), quantity (quantity), and stock (inventory).

[0209] Generate a test case framework and design test scenarios based on "constraints + sub-intent priorities"; for example, the abnormal scenario test corresponding to "adding products when inventory is insufficient" includes "preconditions (product inventory = 0), test steps (calling the add product API), and expected results (returning a 400 error, indicating insufficient inventory)".

[0210] S208, based on API interface documentation, data definition statements for creating database table structures, test case framework, and predefined information for particles.

[0211] Step 208 can be referred to the description of step S104.

[0212] The definitions of the particle's dimension and corresponding sub-parameters are as follows:

[0213] Dimension 1: API Interface Design

[0214] Sub-parameters: Resource path compliance (1 = compliant with RESTful / 0 = non-compliant), parameter validation rules (1 = includes inventory validation / 0 = none), response code integrity (1 = includes 400 / 200 / 0 = missing).

[0215] Dimension 2: Database Schema Design

[0216] Sub-parameters: Field type correctness (1 = conforms to business requirements / 0 = does not conform), constraint integrity (1 = includes foreign key / 0 = none), index rationality (1 = includes product_id index / 0 = none);

[0217] Dimension 3: Test Case Design

[0218] Sub-parameters: Scenario coverage (1 = insufficient inventory coverage / 0 = none), Expected result accuracy (1 = consistent with constraints / 0 = inconsistent), Precondition completeness (1 = includes user login / 0 = none).

[0219] The definition of a particle's position involves encoding each dimension's sub-parameters into a vector, with the vector values ​​corresponding to the technical selection of the sub-parameters.

[0220] For example, a particle's position vector is: Position Vector = [API Dimension: (1,1,1); Schema Dimension: (1,1,1); Test Case Dimension: (1,1,1)], which means that the API conforms to RESTful, includes inventory verification, has a complete response code, the schema fields are correct, and it contains foreign keys and indexes, and the test cases cover key scenarios.

[0221] Regarding the definition of particle velocity, the velocity vector has the same dimension as the position vector, with a value range of [0, 0.5]. The step size is controlled to avoid over-iteration. Velocity represents the adjustment range of the particle on each sub-parameter. For example, the initial velocity vector = [API dimension: (0.1, 0.1, 0.1); Schema dimension: (0.1, 0.1, 0.1); Test case dimension: (0.1, 0.1, 0.1)].

[0222] S209. Initialize the particle swarm and randomly generate the position and velocity of each particle.

[0223] The number of particles is preset according to the complexity of the requirements, usually between 50 and 100. For example, in an e-commerce shopping cart scenario, 50 particles can be set, with each particle corresponding to a combination of "API + Schema + test cases".

[0224] Based on the range of sub-parameters of the particle dimension, the position vector of each particle is randomly generated. This may be due to missing inventory verification in the API dimension or missing index in the Schema dimension. The velocity vector of the particles is uniformly initialized to [0.1, 0.1, 0.1].

[0225] S210. Initialize the individual optimal position of each particle and the globally optimal position randomly set in the particle swarm, and obtain the fitness value of each particle according to the preset evaluation index and the corresponding rule weight.

[0226] Based on the initialized particle swarm, the individual optimal position and the global optimal position can be obtained, specifically:

[0227] Individual optimal position (pbest): Initially set to the initial position of each particle;

[0228] Global best position (gbest): Randomly select one of the 50 particles in the world as the initial global best, and set its initial fitness value to 0.

[0229] For example, particle swarm initialization fragments are shown in Table 1:

[0230] Table 1: Particle Swarm Initialization Fragment

[0231]

[0232] Based on the three dimensions of demand matching, technical feasibility, and quality assurance defined above, and using a shopping cart scenario as an example, the evaluation indicators and their corresponding rule weights are shown in Table 2:

[0233] Table 2: Evaluation Indicators and Corresponding Rule Weights

[0234]

[0235] When calculating each metric, Swagger is called to validate the API syntax, the MySQL syntax checker is called to validate the schema, and JUnit is called to validate the test cases. The sub-intents covered by the particles are counted (e.g., P001 covers 3), and the constraints satisfied (e.g., P001 satisfies 2). Finally, "shopping cart" related defects (such as "overselling inventory" and "incorrect shipping cost calculation") are matched from the defect library, and the number of particles avoided is counted.

[0236] The fitness value is obtained from the formula, which is as follows:

[0237] Fitness value = Σ (indicator score × indicator weight);

[0238] For example, the metric score and fitness value of P001 are:

[0239] Intent coverage 100% (3 / 3), constraint satisfaction 100% (2 / 2), syntax compliance 100%, resource consumption 90% (low usage), test coverage 100%, historical defect matching 100% (avoiding 2 defects).

[0240] Fitness value:

[0241] =100×0.2+100×0.3+100×0.1+90×0.1+100×0.2+100×0.1=99.

[0242] S211. For each particle, if the current fitness value is greater than the initial fitness value, update the current optimal position of the individual to the current position of the particle.

[0243] Each particle is initialized with its individual best position and a randomly assigned global best position within the particle swarm. The initial individual best position (pbest) corresponds to an initial fitness value. Since the particle's task generation strategy (position vector) has not been evaluated at this stage, the initial fitness value has no practical business significance and serves only as a "blank benchmark" for subsequent comparisons. For example, the initial fitness value of particle P001 is set to 0, and the same applies to P002 and P003, conforming to the conventional design of "0 as an invalid benchmark".

[0244] For example, P002 has an initial fitness value of 0, and currently 83 > 0, so pbest is updated to P002's current position;

[0245] P001 has an initial fitness value of 99. Currently, 0 < 99, so pbest remains at its initial position.

[0246] S212. For a swarm of particles, if the highest current fitness value among all particles is greater than the initial fitness value of the particle at the initial global optimal position, update the current global optimal position to the current position of the particle with the highest corresponding fitness value.

[0247] Iterate through the current fitness values ​​of all particles. If the maximum value is greater than the initial gbest (99% of P001), then update gbest; otherwise, keep P001 as gbest. For example:

[0248] Among the 50 particles, P001 has the highest fitness value of 99, while gbest is still P001.

[0249] S213. Obtain the iterative update rate based on the inertia weight, acceleration constant, and random number, and determine the current particle iteration rate based on the iterative update rate and velocity boundary.

[0250] The formula is as follows:

[0251] The iteration update rate is v(t+1) = ω × v(t) + c1 × r1 × (pbest-x(t)) + c2 × r2 × (gbest-x(t));

[0252] For example, taking the shopping cart scenario, the values ​​of each parameter are as follows:

[0253] ω: Inertia weight, which is dynamically adjusted. It is 0.9 in the early stage of iteration (first 10 rounds) (mainly for exploration) and 0.4 in the later stage (mainly for convergence).

[0254] c1 and c2: learning factors, representing the ability to learn from pbest and gbest, both of which are 2;

[0255] r1, r2: Random numbers, randomly generated within the range [0,1] (e.g., r1=0.6, r2=0.4).

[0256] v(t): Current velocity vector;

[0257] x(t): Current position vector;

[0258] pbest / gbest: Optimal position vector.

[0259] The velocity boundary is the maximum velocity value. For example, if the maximum velocity value is set to 0.5, and the calculated value of v(t+1) is greater than 0.5, then 0.5 is used to avoid excessive particle movement.

[0260] Example, P002 speed update (1st iteration, ω=0.9):

[0261] P002 current velocity vector (API dimension): [0.1, 0.1, 0.1];

[0262] P002 Current position vector (API dimension): [1,0,1] (Out of stock check, corresponding sub-parameter 2 is 0);

[0263] P002 pbest position (API dimension): [1,0,1] (consistent with the current position);

[0264] gbest position (API dimension): [1,1,1] (sub-parameter 2 is 1, including inventory verification);

[0265] Speed ​​calculation (API dimension sub-parameter 2):

[0266] v2(t+1)=0.9×0.1+2×0.6×(0-0)+2×0.4×(1-0)=0.09+0+0.8=0.89

[0267] Since it exceeds the boundary value of 0.5, we take 0.5;

[0268] After P002 update, the velocity vector (API dimension) is: [0.1, 0.5, 0.1] (the step size of sub-parameter 2 is increased, moving closer to gbest's inventory verification).

[0269] S214. Obtain the iteration update position based on the current particle iteration speed and the previous position of the corresponding particle, and determine the current particle iteration position based on the iteration update position and the position boundary.

[0270] Position update formula: x(t+1) = x(t) + v(t+1);

[0271] Position boundary control: The sub-parameter takes the value [0,1]. If x(t+1)>1, it takes the value 1; if x(t+1)<0, it takes the value 0, ensuring that the position conforms to the technical logic.

[0272] Example: P002 position update (API dimension sub-parameter 2):

[0273] P002 Current position (API dimension sub-parameter 2): 0 (no inventory verification);

[0274] P002 update speed (API dimension sub-parameter 2): 0.5;

[0275] Position calculation: x2(t+1)=0+0.5=0.5, the factor parameter must be an integer (0 or 1), rounded up to 1;

[0276] P002 Updated Location Vector (API Dimension): [1,1,1] (Added inventory verification, consistent with gbest).

[0277] S215. Based on the fitness values ​​of all particles in each iteration round within the preset maximum number of iterations, update the individual optimal position and global optimal position of each particle.

[0278] The maximum number of iterations can be a fixed value, such as 50 rounds, or the maximum number of iterations can be set by setting a convergence condition, such as stopping when "the change rate of gbest fitness value is <1% for 5 consecutive rounds".

[0279] In each iteration, steps S210-S213 are repeated. In the early stages, the particle positions differ greatly, and the focus is on exploring various solutions. In the later stages, the solution gradually approaches gbest and converges to the optimal solution.

[0280] S216. The task generation strategy corresponding to the particle at the global optimal position is the optimal task generation strategy.

[0281] By converting the position vector of the globally optimal particle (such as P001) into a practical technical document, we can obtain:

[0282] API interface design: Based on the location vector [1,1,1], generate a complete Swagger document, including inventory verification and free shipping logic;

[0283] Database schema design: Generate DDL statements containing foreign keys and indexes based on the position vector [1,1,1].

[0284] Test case scheme: Generate Excel test cases covering normal / abnormal scenarios based on the position vector [1,1,1].

[0285] Finally, package it into a ZIP file according to the directory structure " / api documentation / / database / DDL / / test / test cases", which can be downloaded by developers or synchronized to their IDE.

[0286] This application provides an automatic software development task generation method based on multimodal requirement analysis. By being compatible with multimodal materials including text, images, and audio, and combining modal preprocessing, it overcomes the limitations of traditional single-text processing and reduces the loss of information dimensions. Then, a dedicated model is used to extract features of each modality, and these features are fused differentially according to the number of modalities, mapping the multimodal features to a unified semantic space, eliminating semantic gaps, and integrating fragmented requirement information. Subsequently, core intents, sub-intents, and constraints are extracted and organized into a structured intent set according to a preset format, transforming ambiguous requirements into clear and standardized requirement definitions, reducing comprehension biases. Next, using API interfaces, database schemas, and test cases as particle dimensions, and combining multi-dimensional evaluation indicators, fitness values ​​are calculated. Particle swarm optimization iteratively updates particle positions and velocities, dynamically approximating the optimal task strategy. Finally, a development documentation package that can be directly implemented is output, replacing the traditional manual requirement analysis and task generation mode, significantly improving the comprehensiveness of requirement analysis and the efficiency of task generation.

[0287] Figure 3 A flowchart illustrating an automatic software development task generation method based on multimodal requirements analysis, as provided in an embodiment of this application, is shown below. Figure 3 As shown, based on the above embodiments, after determining the task generation strategy corresponding to the particle at the globally optimal position as the optimal task generation strategy, this embodiment includes:

[0288] S301. Obtain feedback data for a preset period and label the attribute tags of the feedback data. The attribute tags include problem type, associated module, associated parameters, and degree of impact.

[0289] Feedback data can be collected through the user end, the development end, and the operation and maintenance end.

[0290] Furthermore, the received feedback data is labeled with attribute tags, among which...

[0291] Problem types can include functional errors, performance issues, missing tests, syntax errors, and logical contradictions;

[0292] The associated modules can include API interface design (including sub-modules: inventory verification, freight calculation), database schema, and test case scheme;

[0293] The associated parameters can be API parameters (such as stock_check_rule), schema parameters (such as index configuration), and test case parameters (such as scene_coverage).

[0294] The impact levels are categorized as high (blocking services), medium (affecting some users), and low (no major impact).

[0295] For example, if the preset period is set to 1 month, the feedback data collected one month after the e-commerce APP shopping cart checkout function was launched is as follows:

[0296] Feedback 1: "When checking out across stores, the inventory of the same product in multiple stores is calculated incorrectly, resulting in insufficient inventory being displayed even though there is sufficient stock" (Related module: API interface - inventory verification; Impact level: high).

[0297] Feedback 2: "No extra charge was added for remote areas during freight calculation, which does not comply with business rules" (Related module: API interface - freight calculation; Impact level: Medium).

[0298] Feedback 3: "The test cases did not cover the 'shipping costs to remote areas' scenario, and the problem was only discovered after the deployment" (Related module: Test case plan; Impact level: Medium).

[0299] Feedback 4: "When querying shopping cart items in the database, the lack of indexes caused slow loading" (Related module: Database Schema; Impact level: High);

[0300] Feedback 5: "API response time is too long (more than 3 seconds), resulting in poor user experience" (Related module: API interface - performance; Impact level: high).

[0301] The tagged and organized feedback data is shown in Table 3:

[0302] Table 3: Feedback Data After Annotation and Editing

[0303]

[0304] S302. Merge multiple feedback data based on problem type and associated parameters, and count the frequency of feedback data for each problem type.

[0305] To improve processing efficiency, the tagged and organized feedback data can be grouped by problem type and associated parameters, and duplicate feedback can be merged. For example, multiple user feedbacks on "shipping costs in remote areas" can be merged into one.

[0306] Furthermore, the frequency of each feedback group is counted; a higher frequency indicates a more prevalent problem.

[0307] For example, the results of merging feedback data and frequency statistics are shown in Table 4:

[0308] Table 4: Results of Feedback Data Merging and Frequency Statistics

[0309]

[0310] S303. Obtain the comprehensive weight of the feedback data for each problem type based on the degree of impact and the proportion of frequency.

[0311] Considering the priority of impact level, since high-impact issues need to be addressed first, the weighting of impact level and frequency can be set to 0.6 and 0.4 respectively.

[0312] Overall weight = Influence weight × 0.6 + Frequency percentage × 0.4;

[0313] The weighting of the degree of influence is as follows: High = 100, Medium = 60, Low = 30.

[0314] Frequency percentage = (frequency of this group / total frequency) × 100%. Taking Table 4 as an example, the total frequency = 8 + 12 + 3 + 5 + 7 = 35.

[0315] Therefore, referring to Table 4, the overall weighting results of the merged feedback data are shown in Table 5:

[0316] Table 5: Overall Weighting Results of the Merged Feedback Data

[0317]

[0318] S304. Calculate the unified adjustment value of the parameters based on the associated module.

[0319] When determining the direction of adjustment, the parameters are adjusted positively or negatively based on the feedback. Positive adjustment indicates the addition of new logic, while negative adjustment indicates the deletion of incorrect logic.

[0320] Example,

[0321] G001 (stock_check_rule): Requires the addition of "cross-store inventory merging calculation logic" → positive adjustment;

[0322] G004 (index_config): A new "product_id+shop_id composite index" needs to be added → positive adjustment;

[0323] When the adjustment directions of the same parameter are opposite, such as adding an index to some feedback and deleting an index from some feedback, the sum of the comprehensive weights of the positive and negative directions is calculated, and the difference between the comprehensive weights of the positive and negative directions is used as the basis for adjustment.

[0324] Furthermore, the adjustment value can be expressed as the product of the comprehensive weight and the adjustment coefficient, representing the priority and magnitude of the adjustment. For example, the high impact coefficient is 1.2, the medium impact coefficient is 1.0, and the low impact coefficient is 0.8.

[0325] Suppose that G004 has conflict feedback: "Some feedback needs to add a composite index (overall weight 65.71, positive), some feedback needs to delete redundant indexes (overall weight 20, negative)";

[0326] The combined weight difference between positive and negative factors = 65.71 - 20 = 45.71 (positive factor is dominant);

[0327] Unified adjustment value = 45.71 × 1.2 (high impact) ≈ 54.85.

[0328] S305. Adjust the corresponding associated parameters according to the sorting of the comprehensive weights.

[0329] You can sort them from highest to lowest according to their overall weight, and prioritize adjusting parameters with higher weights.

[0330] When adjusting parameters, specifically:

[0331] API parameter (stock_check_rule): Add "cross-store inventory merging logic" to the shipping calculation interface. If sum(shop_stock)≥quantity, it can be added.

[0332] Database parameters (index_config): Execute ALTERTABLE cart_itemADDINDEX idx_product_shop(product_id,shop_id); to add a composite index;

[0333] API performance parameters (response_time_threshold): Optimize SQL queries (using new indexes), add caching (Redis caches shopping cart data), and set the response time threshold to "≤1.5 seconds".

[0334] S306. Verify the running results of the adjusted parameters according to the optimization rules.

[0335] After parameter adjustments are completed, functional verification, performance verification, and user verification are required. The specific verification rules are defined as follows:

[0336] Functional verification: Execute new test cases (such as "cross-store inventory merging scenario" and "remote area shipping cost scenario"), and the pass rate must be ≥99%;

[0337] Performance verification: Through JMeter load testing, API response time ≤ 1.5 seconds, database query time ≤ 500ms;

[0338] User verification: A phased rollout to 10% of users will be conducted to collect feedback, with a problem rate of ≤0.1%.

[0339] If all validations pass, the adjusted strategy will be officially released; if they fail, such as if performance does not meet the requirements, the process will return to S304 to recalculate the adjusted value.

[0340] The verification results after parameter adjustment are shown in Table 6:

[0341] Table 6: Verification results after parameter adjustment

[0342]

[0343] This application provides an automatic software development task generation method based on multimodal requirement analysis. It acquires feedback data at preset intervals and labels it with attribute tags such as problem type and associated modules. The method combines problem type and associated parameters, merges the data, and calculates the frequency. Then, it calculates a comprehensive weight based on the degree of impact and frequency proportion. The parameters are then sorted by weight, and their unified adjustment values ​​are calculated to optimize the associated parameters. Finally, the adjustment results are verified, achieving closed-loop optimization of the optimal task generation strategy. This process accurately identifies defects such as functional errors and performance issues after the strategy is implemented, prioritizes resolving high-impact, high-frequency problems, dynamically adapts to changes in actual business needs, compensates for potential insufficient coverage of edge scenarios in the initial strategy, further improves the accuracy and applicability of software development task generation, and ensures long-term adaptability to business needs.

[0344] Figure 4 A schematic diagram of the structure of an automatic software development task generation system based on multimodal requirements analysis, provided in an embodiment of this application, is shown below. Figure 4 As shown, the software development task automatic generation system 40 based on multimodal requirement analysis provided in this embodiment includes:

[0345] The receiving module 401 is used to acquire the material to be processed uploaded by the user and preprocess the material to be processed to obtain preprocessed material, wherein the material to be processed is at least one of text modality, image modality, and audio modality;

[0346] The fusion module 402 is used to extract the modal features of each modality in the preprocessed material and generate a global fusion feature vector by combining the number of modalities in the preprocessed material, so as to map each modal feature to a unified semantic space;

[0347] The processing module 403 is used to extract the element information of the global fusion feature vector and organize the element information according to a preset structured format to obtain a structured intent set. The element information includes a first core intent, sub-intents, and constraints. The structured intent set includes a second core intent, a list of sub-intents, and a list of constraints.

[0348] The calculation module 404 is used to calculate the fitness value of each particle using the particle swarm optimization algorithm, based on the structured intent set, the preset information of the particles, and the preset evaluation index. The preset information of the particles includes the definition of dimension, position, and velocity. The dimension includes API interface design, database schema design, and test case scheme.

[0349] The generation module 405 is used to iteratively update the position and velocity of particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

[0350] This embodiment provides an automatic software development task generation system based on multimodal requirement analysis, which can execute the methods provided in the above-described method embodiments. Its implementation principle and technical effects are similar, and will not be described in detail here.

[0351] Figure 5 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application, such as... Figure 5 As shown, the electronic device 50 provided in this embodiment includes:

[0352] The device 50 includes at least one processor 501 and a memory 502. Optionally, the device 50 also includes a communication component 503. The processor 501, memory 502, and communication component 503 are connected via a bus 504.

[0353] In a specific implementation, at least one processor 501 executes computer execution instructions stored in memory 502, causing at least one processor 501 to perform the above-described method.

[0354] The specific implementation process of processor 501 can be found in the above method embodiments, and its implementation principle and technical effect are similar. It will not be repeated here.

[0355] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.

[0356] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.

[0357] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.

[0358] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.

[0359] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the above-described method.

[0360] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.

[0361] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in the device.

[0362] The division of units is merely a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0363] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0364] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0365] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0366] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0367] Finally, it should be noted that other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein, and is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

Claims

1. A method for automatically generating software development tasks based on multimodal requirements analysis, characterized in that, The method includes: The system obtains user-uploaded materials to be processed and preprocesses the materials to be processed to obtain preprocessed materials, wherein the materials to be processed are at least one of text modality, image modality, and audio modality; Modal features of each modality in the preprocessed material are extracted and combined with the number of modalities in the preprocessed material to generate a global fusion feature vector, so as to map each modal feature to a unified semantic space; The global fusion feature vector is mapped to a semantic tag sequence containing functional keywords, relational tags, and conditional keywords. Element information is extracted from the global fusion feature vector and organized according to a preset structured format to obtain a structured intent set. The element information includes a first core intent, sub-intents, and constraints. Specifically, functional keywords in the semantic tags are filtered, and the top-level keyword is determined as the first core intent through relational tags. Functional keywords belonging to the first core intent are filtered, and keywords irrelevant to the core intent are removed to obtain sub-intents. Conditional keywords are filtered, and corresponding functions are associated through relational tags. Constraint statements are organized according to conditions, relationships, and functions to obtain constraints. The structured intent set includes a second core intent, a list of sub-intents, and a list of constraints. The second core intent is based on the first core intent and supplements it with domain information. The list of sub-intents labels each sub-intent with its type, associated elements, and is sorted by priority. The list of constraints labels each constraint with its associated sub-intent and constraint type. Using the particle swarm optimization algorithm, based on the structured intent set, the preset information of the particles, and the preset evaluation metrics, the fitness value of each particle is calculated. This includes generating API interface documents, data definition statements for creating database table structures, and test case frameworks based on the structured intent set; defining the preset information of the particles based on the API interface documents, data definition statements for creating database table structures, and test case frameworks. The preset information of the particles includes the definition of dimensions, positions, and velocities. The particle dimensions include API interface design, database schema design, and test case schemes. The particle positions correspond to specific task generation schemes. The particle velocities are the direction and step size of the particle's movement in the parameter space. Iteratively update the position and velocity of the particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

2. The method according to claim 1, characterized in that, The process of extracting modal features for each modality in the preprocessed material and generating a global fusion feature vector by combining the number of modalities in the preprocessed material includes: Modal features of each modality are extracted from the preprocessed material, and the modal features corresponding to the text modality, image modality, and audio modality are respectively text feature vector, image feature vector, and audio feature vector; The number of modalities contained in the preprocessed material is determined based on the types of modal features. When the number of modalities is 2, the modal features corresponding to each modality are concatenated or element-wise operated to generate a global fusion feature vector; When the number of modalities is 3, each modal feature is independently encoded to generate an independent feature vector for each modality. The dynamic weight of each modality is calculated based on the importance of each modal feature. The independent feature vectors are then weighted and averaged to generate a global fused feature vector.

3. The method according to claim 1, characterized in that, The step of calculating the fitness value of each particle using a particle swarm optimization algorithm, based on the structured intent set, preset information of the particles, and preset evaluation metrics, further includes: Initialize the particle swarm and randomly generate the position and velocity of each particle, wherein the particle swarm includes a preset number of particles; Initialize the individual optimal position of each particle and the globally optimal position randomly set in the particle swarm, and obtain the fitness value of each particle according to the preset evaluation index and the corresponding rule weight.

4. The method according to claim 3, characterized in that, After obtaining the fitness value of each particle, the process includes: For each particle, if the current fitness value is greater than the initial fitness value, update the current best position of the individual to the current position of the particle; For a swarm of particles, if the highest current fitness value among all particles is greater than the initial fitness value of the particle at the initial global best position, then update the current global best position to the current position of the particle with the highest corresponding fitness value.

5. The method according to claim 4, characterized in that, The iterative update of particle positions and velocities to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value includes: The iterative update rate is obtained based on the inertia weight, acceleration constant, and random number, and the current particle iteration rate is determined based on the iterative update rate and velocity boundary. The iteration update position is obtained based on the current particle iteration velocity and the previous position of the corresponding particle, and the current particle iteration position is determined based on the iteration update position and the position boundary. Based on the fitness values ​​of all particles in each iteration round within the preset maximum number of iterations, update the individual optimal position and global optimal position of each particle; The task generation strategy corresponding to the particle at the global optimal position is the optimal task generation strategy.

6. The method according to claim 5, characterized in that, After determining the task generation strategy corresponding to the particle with the global optimal position as the optimal task generation strategy, the following steps are included: Obtain feedback data for a preset period and label the feedback data with attribute tags, including problem type, associated module, associated parameters, and degree of impact; Multiple feedback data sets are merged based on question type and associated parameters, and the frequency of the feedback data for each question type is counted. The comprehensive weight of the feedback data for each question type is obtained based on the degree of impact and the proportion of frequency. The parameters are adjusted uniformly based on the values ​​calculated by the associated modules; Adjust the corresponding associated parameters according to the ranking based on the comprehensive weights; Verify the results of the adjusted parameters based on the optimization rules.

7. The method according to claim 6, characterized in that, The unified adjustment value calculated based on the parameters from the associated module includes: When the adjustment directions of the same parameter are opposite, calculate the sum of the comprehensive weights of the positive and negative directions, and determine the difference between the comprehensive weights of the positive and negative directions as the basis for adjustment.

8. A software development task automatic generation system based on multimodal requirements analysis, characterized in that, The system includes: The receiving module is used to acquire user-uploaded materials to be processed and to preprocess the materials to be processed to obtain preprocessed materials, wherein the materials to be processed are at least one of text modality, image modality, and audio modality; The fusion module is used to extract modal features of each modality in the preprocessed material and generate a global fusion feature vector by combining the number of modalities in the preprocessed material, so as to map each modal feature to a unified semantic space; The processing module maps the global fusion feature vector into a semantic tag sequence containing functional keywords, relational tags, and conditional keywords. It extracts element information from the global fusion feature vector and organizes this element information according to a preset structured format to obtain a structured intent set. The element information includes a first core intent, sub-intents, and constraints. Specifically, it filters functional keywords from the semantic tags, determining the top-level keyword as the first core intent through relational tags; it filters functional keywords belonging to the first core intent, removing keywords irrelevant to the core intent to obtain sub-intents; it filters conditional keywords, associating them with corresponding functions through relational tags, and organizing constraint statements according to conditions, relationships, and functions to obtain constraints. The structured intent set includes a second core intent, a list of sub-intents, and a list of constraints. The second core intent supplements the first core intent with domain information. The sub-intent list labels each sub-intent with its type, associated elements, and sorts them by priority. The constraint list labels each constraint with associated sub-intents and constraint types. The calculation module is used to calculate the fitness value of each particle using the particle swarm optimization algorithm, based on the structured intent set, the preset information of the particles, and the preset evaluation metrics. This includes generating API interface documentation, data definition statements for creating database table structures, and a test case framework based on the structured intent set; defining the preset information of the particles based on the API interface documentation, data definition statements for creating database table structures, and the test case framework. The preset information of the particles includes the definitions of dimension, position, and velocity. The particle dimension includes the API interface design, database schema design, and test case scheme; the particle position corresponds to a specific task generation scheme; and the particle velocity is the particle's movement direction and step size in the parameter space. The generation module is used to iteratively update the position and velocity of particles to obtain the optimal task generation strategy corresponding to the particle with the highest fitness value.

9. An electronic device, characterized in that, include: Memory, processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-7.