Intelligent home style design method and system based on generative adversarial network

The smart home stylization design method using generative adversarial networks solves the problem that visual images cannot be converted into structured data in existing technologies. It achieves precise alignment between visual effects and structured design data, improves the practicality and feasibility of the design, and meets the diverse needs of industry and education.

CN122242274APending Publication Date: 2026-06-19GUANGZHOU INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU INST OF TECH
Filing Date
2026-04-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing smart home stylized design methods only generate simple visual images, lacking structured design data. They cannot be integrated with CAD detailed design, manufacturing, and teaching training. Furthermore, the generation process lacks consistency constraints between image generation and data generation, which makes it impossible to meet the needs of industrial applications and industry-education integration.

Method used

A smart home stylized design method based on generative adversarial networks is adopted. The method extracts text, image and structural features through multimodal fusion encoder, generates conditional guidance vectors, and uses dual-channel generative adversarial network to simultaneously output home scene effect images and structured design data. Combined with differentiable rendering consistency verification and interactive optimization interface, the method ensures accurate alignment between visual effects and structured data.

Benefits of technology

It enables the simultaneous generation of visual effects and structured design data, improving the practicality and feasibility of design outcomes. It can directly connect to downstream processes, reduce connection costs, and improve design accuracy and reliability, meeting the needs of industry and education.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242274A_ABST
    Figure CN122242274A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for stylized design of smart homes based on generative adversarial networks (GANs), relating to the field of computer-aided smart home design. It employs a multimodal fusion encoder to encode and fuse stylized design information, extracting text, image, and structural features respectively, and then deeply fusing them through an attention mechanism to generate conditional guidance vectors containing spatial layout and stylistic features. These conditional guidance vectors are then input into a pre-trained dual-channel GAN. This invention solves the problem that existing AI-powered smart home stylized designs can only generate uneditable visual images, achieving simultaneous generation of visual effects and structured design data, thus improving the practicality and feasibility of the design results. Through a dual-channel generation architecture and differentiable rendering consistency constraints, it ensures precise alignment between visual images and structured data, improving design accuracy and reliability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of smart home computer design technology, and in particular to a method and system for stylized smart home design based on generative adversarial networks. Background Technology

[0002] In the field of smart home design, artificial intelligence technology is being applied more and more widely to improve design efficiency and meet diverse style requirements. Among them, stylized design methods based on generative adversarial networks have become one of the mainstream technologies. Currently, this type of design method mainly focuses on the generation of visual effects, often using techniques such as style transfer to combine preset styles with room structures and output renderings of smart home scenes to quickly present the visual presentation of the design scheme.

[0003] However, existing technologies have significant limitations, with the core issue being the insufficient practicality of the generated results. Current methods generate only simple pixel-level visual images; design elements are inextricably intertwined, preventing editing and adjustment of individual furniture or decorative components, and hindering subsequent engineering analysis. Due to the lack of structured design data, the generated renderings are difficult to integrate with downstream key processes such as CAD detailed design, manufacturing, cost accounting, and teaching training. This results in AI design outcomes remaining merely at the visual display level, unable to achieve real-world application. Furthermore, existing generative adversarial networks (GANs) do not effectively constrain the consistency between image generation and data generation during training. Even when some solutions attempt to output simple design data, mismatches between visual effects and design data occur, further restricting the industrial application of the technology and failing to meet the needs of actual industry and the requirements of industry-education integration scenarios.

[0004] Existing design methods generate only simple visual images, lacking structured design data support, and cannot be integrated into subsequent processes such as CAD detailed design, production and manufacturing, cost accounting, and teaching and training. Furthermore, existing generative adversarial networks lack consistency constraints between image generation and data generation during training, resulting in a mismatch between the generated visual effects and the design data. This further limits the practical application of AI design results, making it difficult to meet the actual needs of industry and the requirements of industry-education integration scenarios. Summary of the Invention

[0005] To address the aforementioned technical problems, this invention provides a method and system for stylized design of smart homes based on generative adversarial networks. The technical solution adopted is as follows:

[0006] A stylized design approach for smart homes based on generative adversarial networks includes the following steps:

[0007] Step 1: Obtain user-inputted smart home stylized design information, which includes room structure information and style preference information;

[0008] Step 2: A multimodal fusion encoder is used to encode and fuse the stylized design information of smart home, extracting three types of features: text, image, and structure, and then deeply fusing them through an attention mechanism to generate a conditional guided vector containing spatial layout and style features.

[0009] Step 3: Input the conditional guidance vector into the pre-trained dual-channel generative adversarial network; the generator of the dual-channel generative adversarial network simultaneously executes the first generation process and the second generation process:

[0010] The first generation process outputs a home scene effect image that meets the conditions from the image generation branch;

[0011] The second generation process involves the structured decoding branch outputting structured design data corresponding to the visual content in the home scene effect image. The structured design data includes parametric geometric descriptions and recommended material categories for the main furniture and decorative components in the scene. The parametric geometric descriptions include component type, three-dimensional size parameters, spatial position coordinates, and orientation angle. The recommended material categories are directly predicted and output by the structured decoding branch based on conditional guidance vectors.

[0012] Step 4: After verifying the consistency of the home scene effect image and the structured design data through differentiable rendering, associate and encapsulate them to generate a digital design draft.

[0013] Optionally, in step 2, the encoding fusion process employs a multimodal fusion encoder, which includes a text encoder, an image encoder, and a structural encoder. These encoders extract features from the text description, reference image, and room structure diagram, respectively, and achieve deep fusion of the three types of features through an attention mechanism to generate a unified conditional guidance vector.

[0014] Optionally, in step 3, during the training process, the dual-channel generative adversarial network uses a differentiable rendering layer with built-in vertex shading, patch rasterization, and material sampling to constrain the consistency of the output results of the first generation process and the second generation process. Specifically, the differentiable rendering layer takes structured design data as input, performs vertex transformation, patch rasterization, and pixel-by-pixel material sampling, outputs a simplified preview image with material, calculates the difference between the simplified preview image and the high-realism effect image output by the first generation process in the depth feature space, and uses the difference as the consistency loss, which is included in the total loss function of the network training.

[0015] Optionally, the total training loss function of the dual-channel generative adversarial network for:

[0016] ;

[0017] in, To combat the loss, it is used to ensure the realism of the generated images and the rationality of the structured data; This is a conditional loss function used to constrain the consistency between the generated results and the multimodal conditional information input by the user. Image loss is used to optimize pixel-level precision and visual effects in highly realistic images; This is a parameter loss used to monitor the accuracy of parametric geometric descriptions and material category data in structured design data; This is a differentiable rendering consistency loss used to ensure semantic alignment and geometric consistency between the outputs of the image generation branch and the structured decoding branch. The calculation formula is:

[0018] ;

[0019] in, The consistency loss weighting coefficient, The structured design data output by the structured decoding branch for the differentiable rendering layer The rendered simplified preview image, Generate a branch output image of the home scene effect from the image. This is the perceptual loss function used to calculate the difference between two images in the deep feature space.

[0020] Optionally, in step 3, the structured design data also includes component connection relationships and a material and process mapping list. The material and process mapping list recommends specific material specifications and processing standards for each parametric component. The material specifications include material sub-types, color codes, and specification parameters, while the processing standards include surface treatment methods and splicing process requirements.

[0021] Optionally, in step 4, the specific process for generating the digital design draft includes:

[0022] Based on the component types, recommended material categories, and material and process mapping lists in the structured design data, the system calls upon a pre-built material and process knowledge base to retrieve matching detailed material specifications, process descriptions, and unit cost data. Combining the component's three-dimensional dimensional parameters, the system calculates the component's material consumption and automatically generates a preliminary cost estimate report and a manufacturing feasibility assessment report. The system then encapsulates the home scene effect images, structured design data, cost estimate report, and manufacturing feasibility assessment report into a structured file package, i.e., a digital design draft.

[0023] Optionally, the training data samples used to train the dual-channel generative adversarial network are in the form of image, parameter, and material triples. Each sample includes a room structure diagram, style description text, a real smart home scene rendering, and labeled parametric geometric data, labeled material category labels, and a real bill of materials for each major furniture and decorative component.

[0024] Optionally, the labeling process for training data samples includes:

[0025] Target detection is performed on the main furniture and decorative components in the real rendering, and the bounding box and category of each component are labeled; parametric annotation is performed on each component to determine its 3D dimensions, spatial pose and connection relationship; combined with the historical bill of materials, the actual material category, material specifications and processing technology of each component are labeled to form complete ternary annotation data.

[0026] Optionally, step 5 is also included, which provides an interactive optimization interface to receive user instructions on modifying the parametric geometric description and recommended material category of any component in the digital design draft, and re-triggers the dual-channel generative adversarial network based on the modified parameters to make local or global adjustments, and synchronously updates all related information in the digital design draft until the user's needs are met.

[0027] It also includes a parameter constraint verification function, which automatically issues a reminder and provides reasonable parameter suggestions when the user's input modification command exceeds the preset reasonable parameter range, while prohibiting the execution of invalid modification commands.

[0028] The Smart Home Stylization Design System Based on Generative Adversarial Networks is used to implement the smart home stylization design method based on generative adversarial networks. The system includes an input interface module, a conditional coding module, a dual-channel generation module, a material and process knowledge base module, a draft synthesis module, an interactive editing module, and an output module.

[0029] The input interface module is used to acquire and preprocess user multimodal input information;

[0030] The conditional coding module has a built-in multimodal fusion encoder that communicates with the input interface module to perform feature fusion on the preprocessed information and generate a conditional guidance vector.

[0031] The dual-channel generation module has a built-in dual-channel generative adversarial network that simultaneously generates home scene effect images and structured design data, and constrains the consistency of the two through a differentiable rendering layer.

[0032] The material and process knowledge base module is communicatively connected to the dual-channel generation module and stores relevant material, process, and cost data.

[0033] The draft synthesis module communicates with the dual-channel generation module, associates and verifies the home scene effect image and structured data, calls the material and process knowledge base module to generate relevant reports and encapsulates them into a digital design draft;

[0034] The interactive editing module is communicatively connected to the draft synthesis module, providing a visual operation interface, receiving user modification instructions and triggering network adjustments and draft updates;

[0035] The output module is communicatively connected to the interactive editing module and is used for exporting digital design drafts.

[0036] In summary, the present invention has at least one of the following beneficial technical effects:

[0037] This invention provides a method and system for stylized design of smart homes based on generative adversarial networks. It solves the problem that existing AI-powered smart home stylized designs can only generate uneditable visual images, achieving simultaneous generation of visual effects and structured design data, thus improving the practicality and feasibility of the design results. Through a dual-channel generation architecture and differentiable rendering consistency constraints, it ensures precise alignment between visual images and structured data, improving design accuracy and reliability.

[0038] The generated digital design draft can be directly integrated with multiple downstream processes, reducing the cost of connecting design with subsequent stages and improving design efficiency. Interactive optimization functions and parameter constraint verification further ensure the rationality and manufacturability of the design. At the same time, the application of triplet training data improves the model's generalization ability, taking into account both industrial application value and the needs of industry-education integration. Attached Figure Description

[0039] Figure 1 This is a flowchart illustrating the smart home stylization design method based on generative adversarial networks of this invention.

[0040] Figure 2 This is a schematic diagram of a dual-channel generative adversarial network structure according to a specific embodiment of the present invention;

[0041] Figure 3 These are images depicting the home scene effects of a two-bedroom Nordic minimalist smart home according to a specific embodiment of the present invention.

[0042] Figure 4 This is a schematic diagram on the first page of a draft digital design for a two-bedroom Nordic minimalist smart home, based on a specific embodiment of the present invention.

[0043] Figure 5 This is a schematic diagram on the second page of a draft digital design for a two-bedroom Nordic minimalist smart home, based on a specific embodiment of the present invention.

[0044] Figure 6This is a schematic diagram on page three of a draft digital design for a two-bedroom Nordic minimalist smart home, based on a specific embodiment of the present invention.

[0045] Figure 7 This is a schematic diagram on page four of a draft digital design for a two-bedroom Nordic minimalist smart home, based on a specific embodiment of the present invention. Detailed Implementation

[0046] The present invention will be further described in detail below with reference to the accompanying drawings. Figures 1-7 .

[0047] This invention discloses a method and system for stylized design of smart homes based on generative adversarial networks.

[0048] Example 1

[0049] A stylized design approach for smart homes based on generative adversarial networks includes the following steps:

[0050] Step 1: Obtain user-inputted smart home stylized design information, which includes room structure information and style preference information;

[0051] Step 2: A multimodal fusion encoder is used to encode and fuse the stylized design information of smart home, extracting three types of features: text, image, and structure, and then deeply fusing them through an attention mechanism to generate a conditional guided vector containing spatial layout and style features.

[0052] Step 3: Input the conditional guidance vector into the pre-trained dual-channel generative adversarial network; the generator of the dual-channel generative adversarial network simultaneously executes the first generation process and the second generation process:

[0053] The first generation process outputs a home scene effect image that meets the conditions from the image generation branch;

[0054] The second generation process involves the structured decoding branch outputting structured design data corresponding to the visual content in the home scene effect image. The structured design data includes parametric geometric descriptions and recommended material categories for the main furniture and decorative components in the scene. The parametric geometric descriptions include component type, three-dimensional size parameters, spatial position coordinates, and orientation angle. The recommended material categories are directly predicted and output by the structured decoding branch based on conditional guidance vectors.

[0055] Step 4: After verifying the consistency of the home scene effect image and the structured design data through differentiable rendering, associate and encapsulate them to generate a digital design draft.

[0056] By employing the aforementioned technical solution, the system acquires user-inputted room structure and style preferences. A multimodal fusion encoder extracts text, image, and structural features respectively, and performs deep fusion using an attention mechanism to generate a conditional guidance vector that simultaneously incorporates spatial layout and style characteristics. This conditional guidance vector is then input into a pre-trained dual-channel generative adversarial network (GAN). The generator simultaneously executes two generation processes: the image generation branch directly outputs a home scene effect image that meets the requirements based on the conditions, while the structured decoding branch outputs structured design data corresponding to the visual content in the effect image. This data includes parametric geometric descriptions of major furniture and decorative components, as well as recommended material categories directly predicted based on the conditional guidance vector. Finally, a differentiable rendering consistency check is performed on the home scene effect image and the structured design data. After confirming semantic alignment, the data is encapsulated into a digital design draft.

[0057] Example 2

[0058] In step 2, the encoding fusion process uses a multimodal fusion encoder, which includes a text encoder, an image encoder, and a structural encoder. These encoders extract features from the text description, reference image, and room structure diagram, respectively. The deep fusion of the three types of features is achieved through an attention mechanism to generate a unified conditional guidance vector.

[0059] By adopting the above technical solution, the multimodal fusion encoder consists of a text encoder, an image encoder, and a structural encoder. The text encoder is responsible for extracting semantic features from the style description text, the image encoder extracts visual style features from the reference image, and the structural encoder extracts spatial layout features from the room structure diagram. Subsequently, an attention mechanism is used to deeply fuse the three types of features. By calculating cross-modal association weights, information from different sources complements and strengthens each other, ultimately generating a unified conditional guidance vector to accurately express the user's multimodal design intent.

[0060] Example 3

[0061] In step 3, during the training process, the dual-channel generative adversarial network uses a differentiable rendering layer with built-in vertex shading, patch rasterization, and material sampling to constrain the consistency of the output results of the first generation process and the second generation process. Specifically, the differentiable rendering layer takes structured design data as input, performs vertex transformation, patch rasterization, and pixel-by-pixel material sampling, outputs a simplified preview image with material, calculates the difference between the simplified preview image and the high-realism effect image output by the first generation process in the depth feature space, and uses the difference as the consistency loss, which is included in the total loss function of the network training.

[0062] By employing the above technical solution, a built-in differentiable rendering layer is introduced during the network training phase to enforce consistency between the outputs of the two generation branches. Specifically, the differentiable rendering layer receives the structured design data output from the structured decoding branch as input, and sequentially performs vertex transformation, patch rasterization, and pixel-by-pixel material sampling to generate a simplified preview image with material information. Then, the difference between this simplified preview image and the highly realistic image output from the image generation branch in the depth feature space is calculated, and this difference is defined as the consistency loss and included in the total loss function of the network training. This mechanism ensures that the geometric and material parameters in the structured design data can reproduce visual content that is semantically consistent with the generated image, thereby ensuring the alignment of the dual-channel outputs.

[0063] Example 4

[0064] The total training loss function of the dual-channel generative adversarial network for:

[0065] ;

[0066] in, To combat the loss, it is used to ensure the realism of the generated images and the rationality of the structured data; This is a conditional loss function used to constrain the consistency between the generated results and the multimodal conditional information input by the user. Image loss is used to optimize pixel-level precision and visual effects in highly realistic images; This is a parameter loss used to monitor the accuracy of parametric geometric descriptions and material category data in structured design data; This is a differentiable rendering consistency loss used to ensure semantic alignment and geometric consistency between the outputs of the image generation branch and the structured decoding branch. The calculation formula is:

[0067] ;

[0068] in, The consistency loss weighting coefficient, The structured design data output by the structured decoding branch for the differentiable rendering layer The rendered simplified preview image, Generate a branch output image of the home scene effect from the image. This is the perceptual loss function used to calculate the difference between two images in the deep feature space.

[0069] By employing the above technical solutions, a comprehensive multivariate loss function is used to train the dual-channel generative adversarial network. The adversarial loss enhances the realism of the generated images and the rationality of the structured data; the conditional loss ensures that the generated results remain consistent with the multimodal conditional information input by the user; the image loss directly optimizes the pixel-level accuracy and overall visual effect of the highly realistic images; and the parameter loss precisely supervises the accuracy of geometric parameters such as component size, position, and orientation, as well as material category data, in the structured design data. The differentiable rendering consistency loss transforms the structured design data into a simplified preview image through a differentiable rendering layer. Then, the perceptual loss function is used to calculate the feature differences between this preview image and the output image of the image generation branch in the deep feature space, multiplying them by weighting coefficients and incorporating them into the total loss, thereby forcing strict geometric and semantic alignment between the two branch outputs.

[0070] Example 5

[0071] In step 3, the structured design data also includes component connection relationships and a material and process mapping list. The material and process mapping list recommends specific material specifications and processing standards for each parametric component. The material specifications include material sub-types, color codes, and specification parameters, while the processing standards include surface treatment methods and splicing process requirements.

[0072] By adopting the above technical solution, the structured design data not only includes parametric geometric descriptions and recommended material categories for each component, but also further expands the connection relationships between components and a material and process mapping list for each parametric component. The material and process mapping list recommends specific executable material specifications and processing standards for each component. Material specifications can be refined to material sub-types, color codes, and specification parameters, while processing standards specify surface treatment methods and splicing process requirements. This ensures that the generated design results not only remain at the visual level but can also be directly used to guide material selection and manufacturing.

[0073] Example 6

[0074] Step 4, the specific process of generating a digital design draft includes:

[0075] Based on the component types, recommended material categories, and material and process mapping lists in the structured design data, the system calls upon a pre-built material and process knowledge base to retrieve matching detailed material specifications, process descriptions, and unit cost data. Combining the component's three-dimensional dimensional parameters, the system calculates the component's material consumption and automatically generates a preliminary cost estimate report and a manufacturing feasibility assessment report. The system then encapsulates the home scene effect images, structured design data, cost estimate report, and manufacturing feasibility assessment report into a structured file package, i.e., a digital design draft.

[0076] By employing the aforementioned technical solution, when generating a digital design draft, a pre-built material and process knowledge base is automatically invoked based on the component types, recommended material categories, and material and process mapping lists in the structured design data. The knowledge base precisely retrieves matching detailed material specifications, process descriptions, and unit cost data, and accurately calculates material consumption by combining the three-dimensional dimensional parameters of each component, thereby automatically generating a preliminary cost estimation report and a manufacturing feasibility assessment report. Finally, the home scene rendering images, complete structured design data, cost estimation report, and manufacturing feasibility assessment report are integrated and packaged into a structured file package, forming a digital design draft that can be directly used for communication and subsequent production.

[0077] Example 7

[0078] The training data samples used to train the dual-channel generative adversarial network are in the form of triples of images, parameters, and materials. Each sample includes a room structure diagram, style description text, a real smart home scene rendering, and labeled parametric geometric data of each major furniture and decorative component, labeled material category labels, and a real bill of materials.

[0079] By employing the aforementioned technical solution, the training data for the dual-channel generative adversarial network is constructed in the form of triples of images, parameters, and materials. Each training sample contains a complete correspondence from input to output: the input side includes a room structure diagram, style description text, and realistic smart home scene renderings; the supervision side includes manually annotated parametric geometric data of each major furniture and decorative component, material category labels, and a complete bill of materials obtained from real projects. This triple data structure enables the network to simultaneously learn complex mappings from multimodal conditions to visual images, component geometric parameters, and material information.

[0080] Example 8

[0081] The annotation process for training data samples includes:

[0082] Target detection is performed on the main furniture and decorative components in the real rendering, and the bounding box and category of each component are labeled; parametric annotation is performed on each component to determine its 3D dimensions, spatial pose and connection relationship; combined with the historical bill of materials, the actual material category, material specifications and processing technology of each component are labeled to form complete ternary annotation data.

[0083] By employing the above technical solution, the annotation of training samples follows a meticulous, step-by-step process. First, target detection is performed on the main furniture and decorative components in the realistic renderings, annotating the bounding box and semantic category of each component. Next, each component is parametrically annotated, determining its 3D dimensions, spatial pose, and inter-component connections through calculation or modeling. Finally, by comparing with historical bills of materials and other real production data, the actual material category, specific material specifications, and processing technology of each component are annotated. These three steps generate complete and accurate triplet annotation data for images, parameters, and materials, providing high-quality supervisory signals for network training.

[0084] Example 9

[0085] It also includes step 5, which provides an interactive optimization interface to receive user instructions on modifying the parametric geometric description and recommended material category of any component in the digital design draft. Based on the modified parameters, the dual-channel generative adversarial network is re-triggered to make local or global adjustments, and all related information in the digital design draft is updated synchronously until the user's needs are met.

[0086] It also includes a parameter constraint verification function, which automatically issues a reminder and provides reasonable parameter suggestions when the user's input modification command exceeds the preset reasonable parameter range, while prohibiting the execution of invalid modification commands.

[0087] By adopting the above technical solution and providing an interactive optimization interface, users can directly modify component parameters and material information in the digital design draft. After modification, the dual-channel generative adversarial network is re-triggered to make corresponding adjustments, synchronously updating all related content in the draft to ensure that the adjusted data is consistent with the visual effect. Simultaneously, a parameter constraint verification function is added to judge the rationality of user modification commands, promptly alerting users to invalid operations and providing suggestions, prohibiting unreasonable modifications, and ensuring the feasibility of the design deliverables.

[0088] By adopting the above technical solution, after generating the initial digital design draft, the system provides an interactive optimization interface. Users can directly modify the parametric geometric description values ​​or recommended material categories of any component in the draft. After receiving the modification command, the system re-drives the dual-channel generative adversarial network to regenerate locally or globally based on the new parameters, and simultaneously updates the effect images, structured data, cost reports, and other comprehensive information to achieve WYSIWYG iterative optimization. At the same time, the interface has a built-in parameter constraint verification function that checks in real time whether the parameters input by the user are within the preset range of structural rationality and material feasibility. If they exceed the limits, it automatically triggers a reminder and provides reasonable parameter suggestions, while preventing invalid modifications from taking effect and avoiding the generation of unusable designs.

[0089] Example 10

[0090] The Smart Home Stylization Design System Based on Generative Adversarial Networks is used to implement the smart home stylization design method based on generative adversarial networks. The system includes an input interface module, a conditional coding module, a dual-channel generation module, a material and process knowledge base module, a draft synthesis module, an interactive editing module, and an output module.

[0091] The input interface module is used to acquire and preprocess user multimodal input information;

[0092] The conditional coding module has a built-in multimodal fusion encoder that communicates with the input interface module to perform feature fusion on the preprocessed information and generate a conditional guidance vector.

[0093] The dual-channel generation module has a built-in dual-channel generative adversarial network that simultaneously generates home scene effect images and structured design data, and constrains the consistency of the two through a differentiable rendering layer.

[0094] The material and process knowledge base module is communicatively connected to the dual-channel generation module and stores relevant material, process, and cost data.

[0095] The draft synthesis module communicates with the dual-channel generation module, associates and verifies the home scene effect image and structured data, calls the material and process knowledge base module to generate relevant reports and encapsulates them into a digital design draft;

[0096] The interactive editing module is communicatively connected to the draft synthesis module, providing a visual operation interface, receiving user modification instructions and triggering network adjustments and draft updates;

[0097] The output module is communicatively connected to the interactive editing module and is used for exporting digital design drafts.

[0098] By adopting the above technical solution, the input interface module acquires and preprocesses the user's multimodal design information; the conditional encoding module embeds a multimodal fusion encoder to transform the preprocessed information into conditional guidance vectors; the dual-channel generation module carries a trained dual-channel generative adversarial network, simultaneously outputting home scene effect images and structured design data, and using an internal rendering consistency mechanism to ensure alignment between the two; the material and process knowledge base module stores and continuously maintains material, process, and cost data for other modules to access; the draft synthesis module performs correlation verification between the effect images and structured data, and combines the knowledge base to generate cost and manufacturing feasibility reports, ultimately encapsulating them into a digital design draft; the interactive editing module provides a visual operating environment, supporting users to modify component parameters and trigger incremental adjustments to the network, updating the entire draft content in a linked manner; the output module is responsible for exporting the finally confirmed digital design draft in a standard format. All modules collaborate to complete the entire process from design input and intelligent generation to interactive optimization and result output.

[0099] The following specific embodiments illustrate the implementation principle of the present invention:

[0100] This application is used in a two-bedroom smart home stylization design scenario, fully realizing a smart home stylization design method and corresponding system based on generative adversarial networks. The specific implementation process is as follows:

[0101] First, the system's input interface module acquires multimodal smart home stylistic design information from the user. This includes a floor plan of a two-bedroom apartment, showing the spatial division and dimensions of the living room, master bedroom, secondary bedroom, kitchen, and bathroom. The style preference is Nordic minimalism, with relevant text descriptions and reference images, specifying a predominantly light color scheme with natural wood elements and simple, elegant furniture designs. The input interface module preprocesses the acquired multimodal information, removing invalid data and standardizing the information format to provide qualified input for subsequent processing.

[0102] After preprocessing, the information is fed into the system's conditional encoding module. This module incorporates a multimodal fusion encoder, comprising three dedicated branches: a text encoder, an image encoder, and a structural encoder, each processing different types of input information. The text encoder parses text descriptions of a Nordic minimalist style, extracting abstract style features and functional requirements such as light color schemes, natural wood elements, and simple shapes. The image encoder processes style reference images, capturing visual features such as color schemes, furniture shapes, and decorative details. The structural encoder parses a two-bedroom floor plan, extracting spatial structural features such as spatial boundaries, dimensional parameters, and area divisions. After feature extraction by the three dedicated encoders, an attention mechanism is used to achieve deep fusion of the three types of features. Weights are automatically assigned to each type of feature, strengthening features highly relevant to the user's core needs and subsequent generation process, while weakening irrelevant secondary features to avoid interference between different types of features. Finally, a unified and coherent conditional guidance vector is generated, accurately conveying the user's spatial constraints and style preferences.

[0103] After the conditional guidance vector is generated, it is fed into the system's dual-channel generation module. This module has a built-in pre-trained dual-channel generative adversarial network. The network's generator contains two parallel and closely related output branches: an image generation branch and a structured decoding branch. The two generation processes are executed in parallel, independently, and with bidirectional constraints. The first generation process, the image generation branch, outputs a two-bedroom home scene effect image that conforms to the Nordic minimalist style based on the spatial layout planning and style guidance in the conditional guidance vector. It covers panoramic images of various areas such as the living room, master bedroom, and secondary bedroom, and the images clearly show the furniture placement, color matching, and decorative effects. The second generation process, the structured decoding branch, directly predicts and synchronously outputs the structured design data corresponding to the effect image based on the conditional guidance vector. In addition to parametric geometric descriptions and recommended material categories of the main furniture and decorative components in the scene, it also supplements the component connection relationships and material process mapping list. The parametric geometric description includes the type, three-dimensional dimensions, spatial coordinates, and orientation angles of each component, such as the geometric parameters of a three-seater sofa in the living room. The recommended material category is directly predicted and output by the structured decoding branch, eliminating the need for image region recognition; for example, linen is recommended for sofas, and solid wood for wardrobes. The component connection relationships clearly define the assembly logic and connection methods between various furniture and decorative components, such as the spacing between the sofa and side table in the living room, and the assembly relationship between the wardrobe and bedside table in the master bedroom. The material and process mapping list matches specific material specifications and processing standards for each parametric component. For example, the material specifications for a sofa include the specific type, color number, and specifications of the linen fabric, while the processing standards include surface stitching methods and edge treatment requirements. The material and process of a wardrobe corresponds to the specifications of solid wood, matte varnish treatment, and splicing process requirements, enabling the structured design data to directly connect with the manufacturing process.

[0104] During training, the aforementioned dual-channel generative adversarial network employs a differentiable rendering layer with built-in vertex shading, patch rasterization, and material sampling to constrain the consistency of the outputs from the two generation processes. Simultaneously, a pre-defined total training loss function is applied to achieve comprehensive constraints. The differentiable rendering layer takes structured design data as input, performs vertex transformation, patch rasterization, and pixel-by-pixel material sampling, and outputs a simplified preview image with material details. This preview image intuitively reflects the furniture's form, material texture, and spatial placement corresponding to the structured data. Subsequently, the difference between this simplified preview image and the highly realistic image output by the image generation branch in the depth feature space is calculated. This difference is defined as the consistency loss and incorporated into the total training loss function. The overall training loss function integrates five different types of losses, each complementing and fulfilling its specific function: adversarial loss ensures that the generated images closely resemble real-world scenes and that the structured data conforms to engineering specifications through adversarial training mechanisms; conditional loss constrains the generated results to maintain consistency with user input requirements; image loss optimizes the visual quality of the generated images, improving clarity and color harmony; parametric loss supervises the accuracy of the structured design data; and consistency loss ensures that the outputs of the two generation branches maintain semantic and geometric consistency, with weight coefficients adjusting their influence on the overall loss, making network training more aligned with actual design needs.

[0105] The training data samples used to train this dual-channel generative adversarial network are in the form of image, parameter, and material triples. Each sample includes room structure diagrams for different apartment types, descriptive text in different styles, real smart home scene renderings corresponding to the apartment types and styles, and annotated parametric geometric data, labeled material category tags, and a real bill of materials for each major furniture and decorative component in the real rendering. This achieves a precise correspondence between input conditions, visual effects, and structured data, covering multiple apartment types and styles, enabling the network to learn the correlation between visual generation and data generation during training. The sample annotation process revolves around the real rendering. First, target detection is performed on the major furniture and decorative components in the real rendering, and the bounding boxes and categories of each component are labeled. Then, each component is parametrically annotated to determine its 3D dimensions, spatial pose, and connectivity. Finally, combined with historical bills of materials, the real material category, material specifications, and processing technology of each component are labeled, gradually forming complete triplet annotation data, providing reliable supervision signals for network training.

[0106] After the structured design data and home scene effect images are generated, they undergo a consistency check using differentiable rendering and are then synchronously transmitted to the system's draft synthesis module. This module maintains communication connections with the dual-channel generation module and the material and process knowledge base module. The material and process knowledge base module stores detailed material specifications, processing standards, unit cost data, and other relevant information. The draft synthesis module first performs a correlation check between the effect images and the structured design data to ensure that each visual element corresponds to unique structured data. Then, it calls the material and process knowledge base module, which retrieves matching detailed data based on the component type, recommended material category, and material and process mapping list in the structured design data. Combining this with the three-dimensional dimensional parameters of each furniture component, it calculates the material consumption and automatically generates a preliminary cost estimate report and a manufacturing feasibility assessment report. The cost estimate report covers the material cost, processing cost, and total design cost of each component, while the manufacturing feasibility assessment report analyzes the processing difficulty and material availability of each component. Finally, the home scene effect images, structured design data, cost estimate report, and manufacturing feasibility assessment report are integrated and packaged to form a standardized digital design draft.

[0107] After the digital design draft is generated, it can be optimized and adjusted through the system's interactive editing module with parameter constraint verification. This module communicates with the draft synthesis module, providing a visual operation interface that allows users to directly modify the parametric geometric description, recommended material category, and other information of any component in the digital design draft. When a user inputs a modification command, the interactive editing module receives and parses the command, first verifying the reasonableness of the parameters. If successful, it re-triggers the dual-channel generation module to perform local or global adjustments, synchronously updating all related information in the digital design draft until the user's needs are met. Simultaneously, the interactive editing module also has a parameter constraint verification function. When the user's input modification command exceeds the preset reasonable parameter range, it automatically issues a reminder and provides reasonable parameter suggestions, while prohibiting the execution of invalid modification commands, ensuring the feasibility of the design deliverables.

[0108] Once the digital design draft is finalized, users can export the draft through the system's output module. This module communicates with the interactive editing module and supports the export of the digital design draft in a standardized structured file format. This facilitates subsequent integration with downstream processes such as CAD detailed design, production and manufacturing, cost accounting, and teaching and training, truly realizing the transformation of AI design results from visual display to practical application.

[0109] Figure 3 These are images showcasing the interior design of a two-bedroom Scandinavian-style minimalist smart home. Figures 4-7This draft digital design, designed to complement the Nordic minimalist smart home style of a two-bedroom apartment, is based on a dual-channel generative adversarial network for intelligent generation and finalized through manual verification. It focuses on a two-bedroom Nordic minimalist smart home scene design: the overall color scheme is predominantly off-white or beige, complemented by light-colored natural wood tones, creating a minimalist living and dining area with no high-saturation color clashes and ample natural light. The core output includes precise geometric parameters of the living room furniture and their spatial assembly relationships, matching linen fabrics, E0-grade environmentally friendly materials, and standardized processing techniques. The cost estimate, including the core living room area and extended costs throughout the house, totals 64,430 yuan (including 5% wastage). Evaluations show that all materials are universal and processes are standard, ensuring 100% feasibility for manufacturing. It also supports parameter interaction and optimization, and can ultimately export standardized files in multiple formats, achieving a closed-loop process from AI design to construction implementation.

[0110] The above are all preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Therefore, all equivalent changes made in accordance with the structure, shape and principle of the present invention should be covered within the scope of protection of the present invention.

Claims

1. A method for stylized design of smart homes based on generative adversarial networks, characterized in that, Includes the following steps: Step 1: Obtain user-inputted smart home stylized design information, which includes room structure information and style preference information; Step 2: A multimodal fusion encoder is used to encode and fuse the stylized design information of smart home, extracting three types of features: text, image, and structure, and then deeply fusing them through an attention mechanism to generate a conditional guided vector containing spatial layout and style features. Step 3: Input the conditional guidance vector into the pre-trained dual-channel generative adversarial network; the generator of the dual-channel generative adversarial network simultaneously executes the first generation process and the second generation process: The first generation process outputs a home scene effect image that meets the conditions from the image generation branch; The second generation process involves the structured decoding branch outputting structured design data corresponding to the visual content in the home scene effect image. The structured design data includes parametric geometric descriptions and recommended material categories for the main furniture and decorative components in the scene. The parametric geometric descriptions include component type, three-dimensional size parameters, spatial position coordinates, and orientation angle. The recommended material categories are directly predicted and output by the structured decoding branch based on conditional guidance vectors. Step 4: After verifying the consistency of the home scene effect image and the structured design data through differentiable rendering, associate and encapsulate them to generate a digital design draft.

2. The smart home stylized design method based on generative adversarial networks according to claim 1, characterized in that, In step 2, the encoding fusion process uses a multimodal fusion encoder, which includes a text encoder, an image encoder, and a structural encoder. These encoders extract features from the text description, reference image, and room structure diagram, respectively. The deep fusion of the three types of features is achieved through an attention mechanism to generate a unified conditional guidance vector.

3. The smart home stylized design method based on generative adversarial networks according to claim 2, characterized in that, In step 3, during the training process, the dual-channel generative adversarial network uses a differentiable rendering layer with built-in vertex shading, patch rasterization, and material sampling to constrain the consistency of the output results of the first generation process and the second generation process. Specifically, the differentiable rendering layer takes structured design data as input, performs vertex transformation, patch rasterization, and pixel-by-pixel material sampling, outputs a simplified preview image with material, calculates the difference between the simplified preview image and the high-realism effect image output by the first generation process in the depth feature space, and uses the difference as the consistency loss, which is included in the total loss function of the network training.

4. The smart home stylized design method based on generative adversarial networks according to claim 3, characterized in that, The total training loss function of the dual-channel generative adversarial network for: ; in, To combat the loss, it is used to ensure the realism of the generated images and the rationality of the structured data; This is a conditional loss function used to constrain the consistency between the generated results and the multimodal conditional information input by the user. Image loss is used to optimize pixel-level precision and visual effects in highly realistic images; This is a parameter loss used to monitor the accuracy of parametric geometric descriptions and material category data in structured design data; This is a differentiable rendering consistency loss used to ensure semantic alignment and geometric consistency between the outputs of the image generation branch and the structured decoding branch. The calculation formula is: ; in, The consistency loss weighting coefficient, The structured design data output by the structured decoding branch for the differentiable rendering layer The rendered simplified preview image, Generate a branch output image of the home scene effect from the image. This is the perceptual loss function used to calculate the difference between two images in the deep feature space.

5. The smart home stylized design method based on generative adversarial networks according to claim 4, characterized in that, In step 3, the structured design data also includes component connection relationships and a material and process mapping list. The material and process mapping list recommends specific material specifications and processing standards for each parametric component. The material specifications include material sub-types, color codes, and specification parameters, while the processing standards include surface treatment methods and splicing process requirements.

6. The smart home stylized design method based on generative adversarial networks according to claim 5, characterized in that, Step 4, the specific process of generating a digital design draft includes: Based on the component types, recommended material categories, and material and process mapping lists in the structured design data, the system calls upon a pre-built material and process knowledge base to retrieve matching detailed material specifications, process descriptions, and unit cost data. Combining the component's three-dimensional dimensional parameters, the system calculates the component's material consumption and automatically generates a preliminary cost estimate report and a manufacturing feasibility assessment report. The system then encapsulates the home scene effect images, structured design data, cost estimate report, and manufacturing feasibility assessment report into a structured file package, i.e., a digital design draft.

7. The smart home stylized design method based on generative adversarial networks according to claim 6, characterized in that, The training data samples used to train the dual-channel generative adversarial network are in the form of triples of images, parameters, and materials. Each sample includes a room structure diagram, style description text, a real smart home scene rendering, and labeled parametric geometric data of each major furniture and decorative component, labeled material category labels, and a real bill of materials.

8. The smart home stylized design method based on generative adversarial networks according to claim 7, characterized in that, The annotation process for training data samples includes: Target detection is performed on the main furniture and decorative components in the real rendering, and the bounding box and category of each component are labeled; parametric annotation is performed on each component to determine its 3D dimensions, spatial pose and connection relationship; combined with the historical bill of materials, the actual material category, material specifications and processing technology of each component are labeled to form complete ternary annotation data.

9. The smart home stylized design method based on generative adversarial networks according to claim 8, characterized in that, It also includes step 5, which provides an interactive optimization interface to receive user instructions on modifying the parametric geometric description and recommended material category of any component in the digital design draft. Based on the modified parameters, the dual-channel generative adversarial network is re-triggered to make local or global adjustments, and all related information in the digital design draft is updated synchronously until the user's needs are met. It also includes a parameter constraint verification function, which automatically issues a reminder and provides reasonable parameter suggestions when the user's input modification command exceeds the preset reasonable parameter range, while prohibiting the execution of invalid modification commands.

10. A smart home stylized design system based on generative adversarial networks, characterized in that, To implement the smart home stylization design method based on generative adversarial networks as described in claim 9, the system includes an input interface module, a conditional coding module, a dual-channel generation module, a material and process knowledge base module, a draft synthesis module, an interactive editing module, and an output module; The input interface module is used to acquire and preprocess user multimodal input information; The conditional coding module has a built-in multimodal fusion encoder that communicates with the input interface module to perform feature fusion on the preprocessed information and generate a conditional guidance vector. The dual-channel generation module has a built-in dual-channel generative adversarial network that simultaneously generates home scene effect images and structured design data, and constrains the consistency of the two through a differentiable rendering layer. The material and process knowledge base module is communicatively connected to the dual-channel generation module and stores relevant material, process, and cost data. The draft synthesis module communicates with the dual-channel generation module, associates and verifies the home scene effect image and structured data, calls the material and process knowledge base module to generate relevant reports and encapsulates them into a digital design draft; The interactive editing module is communicatively connected to the draft synthesis module, providing a visual operation interface, receiving user modification instructions and triggering network adjustments and draft updates; The output module is communicatively connected to the interactive editing module and is used for exporting digital design drafts.