Gui layout automatic generation method and system fusing retrieval enhancement and constraint reasoning
By integrating retrieval enhancement and constrained reasoning methods, and combining multidimensional similarity retrieval with a large-scale language model trained in three stages, a high-fidelity GUI layout that conforms to industrial aesthetic standards is generated, solving the problem that the generated results do not conform to the design intent in existing technologies.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to generate high-fidelity GUI layouts that meet industrial aesthetic standards, and large language models lack precise control and logic in two-dimensional spatial layout tasks.
We employ a method that integrates retrieval enhancement and constraint reasoning. By analyzing the designer's intent through constraint parsing and scene recognition modules, and combining layout generation scheduling and RAG modules for multi-dimensional similarity retrieval, we generate a GUI layout that conforms to the constraints using a large-scale language model trained in three stages.
It achieves high-fidelity, logical, and aesthetically aligned GUI layout generation, ensuring that the generated results meet the designer's intent and industrial aesthetic standards.
Smart Images

Figure CN122240108A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer-aided design technology, specifically relating to a method and system for automatically generating GUI layouts that integrates enhanced retrieval and constraint reasoning. Background Technology
[0002] In modern software engineering and internet product development, the design of the graphical user interface (GUI) is a crucial element in determining the user experience. Traditional GUI design processes, especially for the layout design of standardized components such as cards, lists, and forms, require designers to spend a significant amount of time on pixel-level alignment, spacing adjustments, and multi-screen adaptation. While some rule-based or template-based automatic layout tools exist, they typically lack flexibility, struggle to understand the designer's ambiguous intentions, and cannot handle complex semantic constraints.
[0003] In recent years, deep learning-based generative methods (such as GANs and VAEs) have attempted to address this problem. However, most of them are based on image generation paradigms, producing uneditable images with limited control over component placement, making them unsuitable for direct integration into existing industrial development processes. Large Language Models (LLMs) have brought revolutionary breakthroughs to code generation, but general-purpose LLMs are primarily adept at handling one-dimensional natural language or logic code. For GUI layout tasks requiring strong two-dimensional spatial awareness and strict constraints, they often exhibit "spatial illusions"—the generated layouts may be grammatically correct, but visually overlap, misalign, or fail to conform to design specifications.
[0004] Therefore, there is an urgent need for an intelligent code generation solution that can deeply understand GUI design languages, accurately respond to designers' multi-granular constraints, and generate high-fidelity code that meets industry aesthetic standards. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention provides a method and system for automatically generating GUI layouts that integrates enhanced retrieval and constrained reasoning, thereby resolving the issues in the prior art. The technical solution adopted by this invention is as follows: An automatic GUI layout generation system that integrates retrieval enhancement and constraint reasoning includes: The constraint parsing and scene recognition module is used to receive GUI design sketches input by users, parse their multi-scene intentions on a two-dimensional plane, and convert the visual elements in the GUI design sketches into layout description language text that includes multi-condition mixed constraints. The layout generation scheduling and RAG module is used to take layout description language text with multiple mixed constraints as input, call retrieval enhancement generation to obtain semantic and structural GUI layout examples from the knowledge base, construct dynamic prompt words including thought chain reasoning instructions, and call a large language model to generate the complete layout description language text of the target GUI. The high-fidelity design source file decompilation module is used to receive the layout description language text of the target GUI, perform syntax parsing and structural verification, and decompile it into a GUI design source file that can be rendered by the front end or edited by design software.
[0006] Furthermore, the large-scale language model is a model that has undergone progressive training and optimization in three stages: syntax, logic, and aesthetics, for the GUI layout task.
[0007] Furthermore, when constructing multi-condition hybrid constraints, the constraint parsing and scene recognition module parses the element constraints in the GUI design sketch into three different granularities of layout description language text, including: Strong constraints: For GUI elements whose positions or sizes have been locked by the designer, they are resolved into bounding box coordinates with definite values; Weak constraints: These correspond to GUI elements that the designer allows the algorithm to fine-tune and optimize, and are resolved into bounding box coordinates with adjustable markers; Constraints to be inferred: These correspond to blank areas where the designer only specifies the type but not the location, and are resolved into placeholders to be generated.
[0008] Furthermore, when performing retrieval enhancement generation, the layout generation scheduling and RAG module employs a multi-dimensional similarity scoring strategy to recall the Top-K standardized layout examples from the UI design knowledge base; the weighted components of the multi-dimensional similarity scoring strategy include: Component quantity similarity: The similarity is calculated based on the difference in the total number of GUI elements between the input GUI design sketch and the samples in the UI design knowledge base; Text semantic similarity: Using a pre-trained text embedding model, calculate the semantic cosine similarity between the text content in the input GUI design sketch and the sample content in the UI design knowledge base; Visual layout similarity: Using a pre-trained visual encoder, feature vectors are extracted from the input GUI design sketch and the wireframes rendered from the UI design knowledge base, and the cosine similarity of their visual features is calculated.
[0009] Furthermore, when constructing dynamic prompts, the layout generation scheduling and RAG module combines thought chain instructions with retrieval enhancement to generate retrieved GUI layout examples, and generates them by simulating the designer's reasoning path using a large language model; the reasoning path includes: Analyze the top-K standard layout examples recalled to summarize the design patterns and typography rules in the current scenario; Analyze the strong constraints, weak constraints, and constraints to be inferred in the layout description language text in the GUI design sketch; Based on design patterns and multi-condition hybrid constraints, coordinates are planned and assigned to GUI elements to generate the final GUI layout DSL.
[0010] A three-stage progressive training method for large language models used for GUI layout generation includes the following steps: S1, Supervised Fine-tuning Phase: Using a dataset that includes GUI design sketch information, samples, and real layout description language text, train the basic model to master the syntax format of GUI layout description language text and the basic layout paradigms in multiple scenarios. S2, SFT knowledge distillation stage: Using triplet data including input, reasoning path generated by teacher model and output, inject GUI design logic into the model after the supervised fine-tuning stage, so that it has the ability to output thought chain; S3, GRPO reinforcement learning stage: Using unlabeled sketch data and a pre-defined composite reward function, the model after the SFT knowledge distillation stage is optimized for preference alignment so that the generated results conform to GUI design aesthetic standards.
[0011] Furthermore, in S3, a composite reward function is used. include: Formatting Correctness Reward : A binary gating reward used to determine whether the generated layout description language text conforms to the parser specification; Element overlap penalty : Used to calculate and penalize the negative reward for illegal overlap between the bounding boxes of any two GUI elements in the layout; Nine-square grid matching rewards : A reward used to evaluate whether the positional distribution of the generated GUI elements in the canvas grid is consistent with the actual design draft; Layout description language text alignment reward The reward is based on Jaccard similarity and is used to evaluate the degree of matching between the set of alignment rules declared in the generated layout description language text and the set of alignment rules in the actual design.
[0012] Furthermore, in S3, a policy gradient optimization method based on within-group normalization is adopted, whose advantage value is generated from the same input sketch. The total reward for each candidate GUI layout is calculated using intra-group Z-Score normalization to eliminate the influence of differences in the numerical range of different reward items.
[0013] This invention offers the following advantages: It solves the problem of general-purpose LLM lacking precise control, logical consistency, and aesthetic alignment when handling high-dimensional, multi-constraint, and multi-scene 2D GUI spatial layout tasks; it achieves deep alignment between designer intent and algorithm generation logic through the precise parsing of multi-granular constraints via constraint parsing and scene recognition modules; the multi-dimensional similarity retrieval strategy of the layout generation scheduling and RAG module ensures a high degree of matching between recalled examples and input scenes; the three-stage progressive training method optimizes model capabilities layer by layer from grammatical foundations and logical reasoning to aesthetic alignment; and the composite reward function and intra-group standardization strategy further guarantee the correctness of the generated results' format, the rationality of the layout, and the aesthetic appeal of the design. Attached Figure Description
[0014] Figure 1 This is an overall architecture diagram of the system structure of the present invention; Figure 2 This is an analytical schematic diagram of the multi-granularity hybrid constraint DSL in this invention; Figure 3 This is a schematic diagram of the dynamic inference link that integrates RAG and CoT in this invention; Figure 4 This is a schematic diagram of the three-stage training process and reward function of GRPO for GUI aesthetics in this invention; Figure 5 This is a schematic diagram of the overall process and training procedure of the present invention. Detailed Implementation
[0015] The following will be described in conjunction with embodiments of the present invention. Figures 1-5 The technical solutions in the embodiments of the present invention will be clearly and completely described. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Unless otherwise specified, the technical means used in the embodiments are conventional means well known to those skilled in the art.
[0016] This invention proposes an automatic GUI layout generation system that integrates retrieval enhancement and constraint reasoning, mainly comprising three core processing modules: The Constraint Parsing and Scene Recognition Module receives GUI design sketches input by the user, parses their multi-scene intentions on a two-dimensional plane, and converts the visual elements in the sketches into Layout Description Language (DSL) text containing mixed constraints. As the system's perception entry point, this module is responsible for understanding the designer's sketches. It not only identifies component types in the sketches, such as buttons and text boxes, but more importantly, it uses algorithms to parse the designer's operational intentions into three levels of constraints: strong constraints, weak constraints, and constraints to be inferred. This mixed constraint expression method, for the first time in machine generation, retains the control of human designers. Specifically, the Constraint Parsing and Scene Recognition Module consists of a rule-based scene classification algorithm and a constraint syntax conversion program. It determines the scene category by statistically analyzing the type combinations of UI elements in the sketches and encodes the coordinate information of the elements into DSL text based on their constraint states.
[0017] The Layout Generation Scheduling and RAG module takes the DSL text containing multiple mixed constraints as input, invokes Retrieval Enhanced Generation (RAG) to obtain semantically and structurally relevant GUI layout examples from the knowledge base, constructs dynamic prompts containing CoT (Cooperation of Thought) reasoning instructions, and calls a large language model to generate the complete target GUI layout DSL. The Layout Generation Scheduling and RAG module serves as the system's reasoning hub. A Retrieval Enhanced Generation (RAG) mechanism is introduced. By calculating the similarity between the sketch and historical high-quality design drafts in the knowledge base across structural, semantic, and visual dimensions, the most relevant design paradigms are recalled as references. Subsequently, combined with CoT technology, prompts are dynamically constructed to guide the large model to simulate the thinking path of a senior designer in analysis, planning, and execution, thereby referencing excellent layout patterns from the reference examples while meeting user constraints. Specifically, the layout generation scheduling and RAG module is composed of a vector retrieval engine, a prompt word construction program, a large language model inference service, and a deterministic rule verification engine. The vector retrieval engine realizes similarity retrieval in both text and visual channels based on the FAISS vector index library and the SentenceTransformers text encoding model. The knowledge base is stored in the Neo4j graph database. The large language model inference service is deployed on a server equipped with a GPU through the vLLM framework. The rule verification engine performs overlap detection, boundary verification, and alignment verification on the CPU using deterministic algorithms.
[0018] The high-fidelity design source file decompilation module receives the target GUI layout DSL, performs syntax parsing and structural verification, and decompiles it into a high-fidelity GUI design source file that can be rendered by the front end or edited by design software. This module serves as the system's output. It maps the intermediate DSL code generated from the large model into HTML / CSS code directly usable by front-end development or a JSON file recognizable by design software (such as Figma) through rigorous syntax tree parsing. Specifically, the high-fidelity design source file decompilation module consists of an HTML / CSS code generator and a JSON format transpiler for design software, mapping the intermediate DSL representation into renderable HTML / CSS code or editable JSON source files for design software (such as Figma) through syntax tree parsing.
[0019] Specifically, when parsing the multi-condition hybrid constraints, the constraint parsing and scene recognition module parses the element constraints in the GUI sketch into three different granularity DSL representations: (1) Strong constraints: For GUI elements whose positions or sizes have been locked by the designer, they are resolved into bounding box coordinates with precise values; (2) Weak constraints: GUI elements that correspond to the designer’s general placement and allow the algorithm to make fine adjustments and optimizations are parsed into bounding box coordinates with fine-tunable markers; (3) Constraints to be inferred: The blank areas that correspond to the designer only specifying the type but not the location are parsed as placeholders to be generated.
[0020] Specifically, when performing RAG retrieval, the layout generation scheduling and RAG module employs a GUI-specific multidimensional similarity scoring strategy to recall the Top-K standardized layout examples from the vertical UI design knowledge base. The weighted components of the scoring strategy include: Component quantity similarity: The similarity is calculated based on the difference in the total number of GUI elements between the input sketch and the samples in the library; Text semantic similarity: Using a pre-trained text embedding model, calculate the semantic cosine similarity between the text content in the input sketch and the sample content in the library; Visual layout similarity: Using a pre-trained visual encoder, feature vectors are extracted from the input sketch and the rendered wireframes of the samples in the library, and the cosine similarity of their visual features is calculated.
[0021] Specifically, when constructing dynamic prompts, the layout generation scheduling and RAG module deeply integrates the Thought Chain (CoT) instructions with the GUI layout examples retrieved by RAG, forcing the large language model to follow the reasoning path of a simulated designer during generation. This reasoning path includes: S1. Analyze the Top-K GUI layout examples recalled by RAG and summarize the design patterns and layout rules in the current scenario; S2. Analyze the strong constraints, weak constraints, and constraints to be inferred in the user sketch DSL. S3. Based on the design patterns and constraints described above, plan and assign coordinates to GUI elements to generate the final GUI layout DSL.
[0022] This invention also proposes a three-stage progressive training method for large-scale language models used for GUI layout generation, comprising the following steps: S1, Supervised Fine-tuning (SFT) phase: Using data pairs containing sketches, examples, and real layout DSLs, train the base model to master the syntax format of GUI layout DSLs and basic layout paradigms in multiple scenarios; S2, SFT knowledge distillation stage: Using triplet data containing input, reasoning path generated by teacher model and output, inject GUI design logic into the SFT model to enable it to output thought chain; S3, GRPO reinforcement learning stage: Using unlabeled sketch data and a pre-defined composite reward function, the model after SFT distillation is optimized for preference alignment so that the generated result conforms to the GUI design aesthetic standards.
[0023] Specifically, the composite reward function used in the S3 stage It is specifically designed for quantifying GUI layout quality, and its specific features include: Formatting Correctness Reward ( ): A binary gated reward used to determine whether the generated DSL text conforms to the parser specification; Element overlap penalty ( ): Used to calculate and penalize the negative reward for illegal overlap between the bounding boxes of any two GUI elements in the layout; Nine-square grid matching rewards ( ): A reward used to evaluate whether the macroscopic positional distribution of the generated GUI elements under the canvas grid division is consistent with the actual design draft; DSL alignment bonus ( ): A reward based on Jaccard similarity, used to evaluate the degree of matching between the set of alignment rules declared in the generated DSL and the set of alignment rules in the actual design.
[0024] Specifically, the GRPO reinforcement learning in the S3 stage employs a policy gradient optimization method based on within-group normalization, where the advantage is generated by applying the same input sketch. The total reward for each candidate GUI layout is calculated using in-group Z-Score normalization to eliminate the influence of differences in the numerical range of different reward items.
[0025] Specifically, the high-fidelity design source file decompilation module is used to: convert the generated target GUI layout DSL into an in-memory abstract syntax tree (AST) using a strict DSL parser; traverse the abstract syntax tree and, according to the target platform specification, map the attributes of GUI elements into JSON description files that can be recognized by the front-end code or design software plugins.
[0026] like Figure 1 The GUI layout automatic generation system integrating retrieval enhancement and constraint reasoning of this invention, in its specific implementation, includes three core processing stages: input parsing, reasoning generation, and output decompilation; wherein: 1. Constraint Resolution and Scene Recognition Phase: This phase is executed by the GUI constraint resolution and scene recognition module. Assume the user has drawn a rough sketch of a "login page" on the front-end drawing board, which includes a logo placeholder at the top, two input boxes (username and password) in the middle, and a button at the bottom.
[0027] Scene recognition: The module first extracts features from the elements in the input sketch. By detecting the combination of the two key element types "TextInput" and "Button", and the characteristic of a small number of page elements (usually less than 10), the algorithm automatically calculates the scene confidence and determines that the current design intent is a "Form" scene.
[0028] Multi-condition mixed constraint resolution: The module traverses all primitives in the sketch and transforms their geometric properties into constraint expressions in the Layout Description Language (DSL). Strong constraint parsing: The module detects that the user has set the "locked" attribute for the top logo element. It extracts its precise coordinates (e.g., x=20, y=20, w=50, h=50) and generates a strong constraint instruction in the DSL: Icon: -id: <logo>-boundingbox: from (20,20) to (70,70). This instructs the model not to change these coordinates during generation.
[0029] Weak constraint parsing: The module detects that the user has dragged and placed two input boxes roughly in the center of the screen without locking them. It extracts their approximate range and generates a weak constraint instruction in the DSL: TextInput: -id: <user>-boundingbox: ~from(50,100) to (300,140). The symbol "~" indicates that the model can be fine-tuned in terms of coordinates to achieve alignment while maintaining relative position.
[0030] Constraint resolution for inference: The user drew a box at the bottom without specifying its exact location, only marking it as "Login". The module generates the following instruction for inference in the DSL: Button: -id: <login>-boundingbox: (Generate). This instructs the model to automatically calculate the optimal position for the button based on the layout logic.
[0031] 2. Layout Generation Scheduling and RAG Phase: This phase is executed by the Layout Generation Scheduling and RAG module and is the core of inference.
[0032] (1) Multidimensional Similarity RAG Retrieval: The system uses the parsed DSL text and the wireframe rendering of the sketch as query conditions to perform a retrieval in a pre-set vertical UI design knowledge base. The retrieval algorithm adopts a multidimensional weighted strategy: Calculate the similarity of the number of components and quickly filter out samples with too large a difference in the number of elements (such as complex dashboards); The BERT model is used to calculate the semantic similarity of texts and match samples with the same semantic meaning, such as "Login" and "Password". The CLIP visual encoder is used to extract visual feature vectors from the sketches, and visual layout similarity is calculated. The system ultimately recalls the top-3 best-matching canonical login page layouts (DSLs) as reference examples.
[0033] (2) Dynamic Thought Chain (CoT) Prompt Construction: Modules are dynamically assembled and input into the large model's Prompt. The Prompt structure includes: System command: Define the model identity as "Senior UI Designer" and provide the standard syntax definition for DSL.
[0034] Reference Context: Fill in the three high-quality landing page DSLs recalled above as material for In-Context Learning.
[0035] User Tasks and CoT Instructions: Enter the user's constrained DSL and attach specific inference instructions: "First, analyze the alignment patterns in the reference sample (such as vertical centering, left alignment); then analyze the strong / weak constraints in the user sketch; finally, synthesize and plan to generate the target layout."
[0036] (3) LLM Inference: The above prompt is fed into a large language model that has been specifically trained. The model first outputs a piece of reasoning text, such as: "The reference sample shows that login forms usually use a vertical flow layout. The user has locked the logo position and requires the input boxes to be in the middle. Therefore, I will keep the logo still, left-align the two input boxes and distribute them vertically, and finally place the login button 20 pixels below the input boxes." Then, the model outputs the final DSL code.
[0037] 3. High-fidelity design source file decompilation stage: This stage is executed by the high-fidelity design source file decompilation module.
[0038] Syntax validation and AST construction: The module has a built-in strict DSL parser that performs validity checks on the DSL output by the model (such as bracket matching and coordinate value validity). After the validation passes, the DSL text is parsed into an abstract syntax tree (AST) in memory.
[0039] Code generation: Traverse the AST nodes and transform them according to the requirements of the target platform. For example, if the target is web development, the module generates HTML / CSS code, converting the boundingbox in the DSL into CSS styles such as `position: absolute; left:...; top: ...`; if the target is design software (such as Figma), it generates a JSON description file that conforms to its plugin API specification, enabling automatic drawing on the design software canvas.
[0040] like Figure 3 The three-stage progressive training method for large-scale language models used in GUI layout generation, as described in this invention, includes a three-stage progressive training strategy, specifically: Phase 1, DSL Syntax Adaptation (Supervised Fine-tuning of SFT): (1) Data preparation: Collect approximately 2,000 high-quality UI layout data. Through automated scripting, convert the actual design draft into a "Ground-Truth DSL" and perform random masking operations on it (such as randomly replacing some coordinates with (Generate) or adding ~ markers) to construct a "sketching input-complete output" data pair.
[0041] (2) Training process: Use the above data to fine-tune all parameters of the base model (such as the Qwen series).
[0042] (3) Objective: This stage does not require the model to understand deep aesthetics, but only requires the model to master the special syntactic structure of DSL (such as key-value pair format, indentation rules) and the basic structural paradigms of different scenarios, so as to solve the problem of high error rate of general model generation format.
[0043] Phase Two, Design Logic Injection (SFT Knowledge Distillation): (1) Data preparation: Construct triplet data using a teacher model with stronger reasoning ability (such as GPT-4o). Specifically, feed the "sketching input" and "real layout output" to the teacher model at the same time, and ask it to generate a "reasoning path" to explain the derivation process from input to output (e.g., explain why it should be left aligned and why the spacing should be set to 8px).
[0044] (2) Training process: The model in stage one is retrained using a dataset containing "input-inference path-output".
[0045] (3) Objective: Through knowledge distillation, the "design thinking" of the teacher model is injected into the small model, so that it can learn to make logical plans before generating the layout, which significantly improves the structural rationality of the layout.
[0046] Phase 3, Aesthetic Preference Alignment (GRPO Reinforcement Learning): (1) Data preparation: Collect approximately 5000 unlabeled GUI sketches (DSLs) as Prompt input. Composite reward function design: This is the core innovation of this invention. To quantify "aesthetics," this invention constructs a composite reward function. (Reference Figure 4 It contains four components: Formatting Correctness Reward ( If the generated DSL cannot be parsed by the parser, 0 points are given; otherwise, the basic points are given.
[0047] Element overlap penalty ( ): Calculates the intersection-union ratio (IoU) between all pairs of elements in the layout. If illegal overlaps that are not parent-child relationships are found (such as text obscuring an image), a high negative score penalty is imposed.
[0048] Nine-square grid macro matching reward ( Divide the canvas into a 3x3 grid. Evaluate whether the generated elements fall within a reasonable macro area (e.g., the title should typically be in the top three grids).
[0049] DSL alignment bonus ( ): Analyze whether the generated DSL explicitly declares rules such as Align Left and Distribute Vertical, and calculate the Jaccard similarity between these rules and the rule set of the actual design draft.
[0050] The training process employs the GRPO (Group Relative Policy Optimization) algorithm. For each input sketch, the model samples and generates... There are 4 (e.g., 4) different layout candidates. These 4 candidates are scored using the reward function described above. The advantage value of each token in each candidate is calculated based on the group-wide Z-Score normalization (i.e., the current candidate score minus the group average score, then divided by the group standard deviation). The model parameters are updated using this advantage value.
[0051] By using reinforcement learning, the model is forced to learn the "implicit preferences" of human designers through exploration, thus solving the aesthetic defects that SFT models are prone to, such as minor misalignments and loose layouts, and ultimately achieving high-fidelity layout generation.
[0052] The above embodiments are merely descriptions of preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, alterations, alterations, or substitutions made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.< / login> < / user> < / logo>
Claims
1. A GUI layout automatic generation system that fuses retrieval enhancement and constraint reasoning, characterized by, include: The constraint parsing and scene recognition module is used to receive GUI design sketches input by users, parse their multi-scene intentions on a two-dimensional plane, and convert the visual elements in the GUI design sketches into layout description language text that includes multi-condition mixed constraints. The layout generation scheduling and RAG module is used to take layout description language text with multiple mixed constraints as input, call retrieval enhancement generation to obtain semantic and structural GUI layout examples from the knowledge base, construct dynamic prompt words including thought chain reasoning instructions, and call a large language model to generate the complete layout description language text of the target GUI. The high-fidelity design source file decompilation module is used to receive the layout description language text of the target GUI, perform syntax parsing and structural verification, and decompile it into a GUI design source file that can be rendered by the front end or edited by design software.
2. The GUI layout automatic generation system integrating enhanced retrieval and constrained reasoning as described in claim 1, characterized in that, The large-scale language model is a model that has been progressively trained and optimized in three stages—syntax, logic, and aesthetics—for GUI layout tasks.
3. The GUI layout automatic generation system integrating enhanced retrieval and constrained reasoning as described in claim 1, characterized in that, When constructing multi-condition hybrid constraints, the constraint parsing and scene recognition module parses the element constraints in the GUI design sketch into three different granularities of layout description language text, including: Strong constraints: For GUI elements whose positions or sizes have been locked by the designer, they are resolved into bounding box coordinates with definite values; Weak constraints: These correspond to GUI elements that the designer allows the algorithm to fine-tune and optimize, and are resolved into bounding box coordinates with adjustable markers; Constraints to be inferred: These correspond to blank areas where the designer only specifies the type but not the location, and are resolved into placeholders to be generated.
4. The GUI layout automatic generation system integrating enhanced retrieval and constrained reasoning according to claim 3, characterized in that, When performing enhanced retrieval generation, the layout generation scheduling and RAG module employs a multi-dimensional similarity scoring strategy to recall the Top-K standardized layout examples from the UI design knowledge base. The weighted components of the multidimensional similarity scoring strategy include: Component quantity similarity: The similarity is calculated based on the difference in the total number of GUI elements between the input GUI design sketch and the samples in the UI design knowledge base; Text semantic similarity: Using a pre-trained text embedding model, calculate the semantic cosine similarity between the text content in the input GUI design sketch and the sample content in the UI design knowledge base; Visual layout similarity: Using a pre-trained visual encoder, feature vectors are extracted from the input GUI design sketch and the wireframes rendered from the UI design knowledge base, and the cosine similarity of their visual features is calculated.
5. The GUI layout automatic generation system integrating enhanced retrieval and constrained reasoning according to claim 4, characterized in that, When constructing dynamic prompts, the layout generation scheduling and RAG module combines thought chain instructions with retrieval enhancement to generate retrieved GUI layout examples, and generates them by simulating the designer's reasoning path through a large language model; the reasoning path includes: Analyze the top-K standard layout examples recalled to summarize the design patterns and typography rules in the current scenario; Analyze the strong constraints, weak constraints, and constraints to be inferred in the layout description language text in the GUI design sketch; Based on design patterns and multi-condition hybrid constraints, coordinates are planned and assigned to GUI elements to generate the final GUI layout DSL.
6. A three-stage progressive training method for large-scale language models used for GUI layout generation, characterized in that, The GUI layout automatic generation system for fusion retrieval enhancement and constraint reasoning as described in claim 2 includes the following steps: S1, Supervised Fine-tuning Phase: Using a dataset that includes GUI design sketch information, samples, and real layout description language text, train the basic model to master the syntax format of GUI layout description language text and the basic layout paradigms in multiple scenarios. S2, SFT knowledge distillation stage: Using triplet data including input, reasoning path generated by the teacher model and output, GUI design logic is injected into the model after the supervised fine-tuning stage, enabling it to output thought chains; S3, GRPO reinforcement learning stage: Using unlabeled sketch data and a pre-defined composite reward function, the model after the SFT knowledge distillation stage is optimized for preference alignment so that the generated results conform to GUI design aesthetic standards.
7. The three-stage progressive training method for large language models for GUI layout generation according to claim 6, characterized in that, In S3, the composite reward function is used. include: Formatting Correctness Reward : A binary gating reward used to determine whether the generated layout description language text conforms to the parser specification; Element overlap penalty : Used to calculate and penalize the negative reward for illegal overlap between the bounding boxes of any two GUI elements in the layout; Nine-square grid matching rewards : A reward used to evaluate whether the positional distribution of the generated GUI elements in the canvas grid is consistent with the actual design draft; Layout description language text alignment reward The reward is based on Jaccard similarity and is used to evaluate the degree of matching between the set of alignment rules declared in the generated layout description language text and the set of alignment rules in the actual design.
8. The three-stage progressive training method for large language models for GUI layout generation according to claim 6, characterized in that, In S3, a policy gradient optimization method based on within-group normalization is used, and its advantage value is generated by applying the same input sketch. The total reward for each candidate GUI layout is calculated using intra-group Z-Score normalization to eliminate the influence of differences in the numerical range of different reward items.