A method for building specification standardization translation and knowledge graph construction based on domain-specific language

By employing a domain-specific language-based method for the standardized translation of building codes and the construction of knowledge graphs, the flexibility and accuracy issues of building code digitization and compliance checks in existing technologies have been addressed. This has enabled the full-spectrum digitization and automated compliance checks of building codes, thereby improving the efficiency and accuracy of intelligent building design.

CN122240708APending Publication Date: 2026-06-19HARBIN INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN INST OF TECH
Filing Date
2026-03-17
Publication Date
2026-06-19

Smart Images

  • Figure CN122240708A_ABST
    Figure CN122240708A_ABST
Patent Text Reader

Abstract

This invention proposes a method for the standardized translation and knowledge graph construction of building code specifications based on a domain-specific language. The method includes: hierarchical parsing and atomized segmentation of building code documents; classifying clauses into four categories based on semantic features: numerical, logical, topological, and fuzzy; using Prompt Engineering to guide a large-scale language model to convert natural language clauses into structured DSL JSON; and constructing a hybrid knowledge base storage architecture including a graph database, a relational database, and a rule engine. This invention bridges the semantic gap between natural language and executable logic through DSL, achieving automatic conversion of building codes from unstructured text to a machine-interpretable and reasonable knowledge graph. It supports the precise expression of complex logical nesting and spatial relationships, significantly improving the efficiency and accuracy of automated compliance checks in building design.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of Building Information Modeling (BIM) technology and Automated Compliance Inspection (ACC), specifically to the digitization of building codes, knowledge graph construction, and domain-specific language (DSL) design and application. It is applicable to the automatic verification and analysis of building code compliance based on BIM models throughout the entire lifecycle of design, construction, and operation and maintenance of residential, public buildings, and industrial facilities. In particular, it relates to a method for the standardized translation of building code specifications and the construction of knowledge graphs based on domain-specific languages. Background Technology

[0002] With the widespread adoption of BIM technology in the construction industry, automated compliance checks (ACC) have become a core component for achieving intelligent building design and efficient approval. Building codes, as the core basis for compliance checks, are currently primarily in the form of unstructured natural language text. Traditional methods relying on manual review suffer from low efficiency, large errors, and inconsistent implementation standards, seriously impacting building safety and project progress efficiency.

[0003] There are many shortcomings in the existing digital and compliance inspection technologies for building codes: (1) Traditional hard coding methods are difficult to adapt to code updates and regional differences, and have poor flexibility; (2) Simply relying on large language models (LLM) for rule extraction is prone to "illusion", and the accuracy and interpretability of the rules are insufficient; (3) The construction cost of knowledge formalization methods such as ontology is high, and it is difficult to capture the implicit logic and complex nested relationships in the code; (4) There is a lack of differentiated processing schemes for different types of code clauses, the semantic gap is significant, and it is difficult to achieve deep integration with BIM models; (5) The formal expression of topological constraints and fuzzy clauses is difficult, and existing technologies are difficult to achieve accurate verification.

[0004] The aforementioned technical deficiencies prevent the existing ACC system from meeting the needs of smart buildings for digitalized standards, precise rules, and automated inspections. There is an urgent need for an integrated solution that can balance automation, rule accuracy, and BIM compatibility. Summary of the Invention

[0005] The purpose of this invention is to solve the problems in the prior art by proposing a method for the standardized translation of building specifications and the construction of knowledge graphs based on domain-specific languages.

[0006] This invention is achieved through the following technical solution: This invention proposes a method for the standardized translation and knowledge graph construction of building specifications based on a domain-specific language, the method comprising: Step S1: Document Preprocessing and Clause Segmentation; Step S1 includes: First, performing text cleaning and standardization on the original building code document to remove format noise; then, identifying the chapter-section-clause-section hierarchical structure of the document based on regular expressions and constructing a nested tree data structure, while extracting metadata information; next, identifying the logical boundaries between clauses through syntactic parsing, segmenting complex long sentences into the smallest executable constraint units, and using a large language model for reference resolution and semantic completion to ensure that each atomic clause contains complete "object-attribute-constraint" semantics; finally, extracting keywords such as "should," "should be," and "may" based on modal verb recognition technology, labeling the constraint strength as mandatory or recommended, and forming a standardized and structured set of clauses; Step S2: Multi-dimensional clause classification; The segmented clauses are classified in multiple dimensions based on semantic features; Step S3: DSL semantic conversion; design corresponding DSL schemas for different types of clauses; Step S4: Hybrid database storage; Store various DSL JSONs in corresponding adapted databases. Step S5: Knowledge Graph Construction; Based on the classification and storage in S4, the scattered numerical, logical, topological, and fuzzy text data are aggregated into the Neo4j graph database to construct a unified knowledge graph covering four core types of nodes: specifications, clauses, entities, and parameters.

[0007] Further, in step S2, quantitative index constraints containing comparison operators are identified as numerical terms.

[0008] Further, in step S2, dependency constraints containing conditional patterns are captured as logical terms.

[0009] Further, in step S2, geometric positional relation constraints containing spatial predicates are extracted as topological terms.

[0010] Furthermore, in step S2, subjective guiding constraints containing qualitative descriptions are classified as vague clauses.

[0011] Further, in step S3, the Schema includes target entity, inspection attributes, operators, constraint values, priority, and applicable condition fields; a Prompt template integrating role definitions, task instructions, output format examples, and domain knowledge constraints is constructed, and natural language terms are parsed into structured DSL JSON objects using a large language model; the generated DSL is manually verified to ensure accurate numerical unit correspondence, complete capture of preconditions, and correct mapping of topological relationships, so as to form a high-quality intermediate representation of executable rules.

[0012] Furthermore, in step S4, numerical clauses are stored in a PostgreSQL relational database, using structured fields to store inspection targets, operators, and limits, and using JSONB type to store applicable conditions and creating a GIN index to support fast contextual retrieval; logical clauses are converted into Drools rule language, and a "violation pattern matching" strategy is used to convert specification requirements into negative capture patterns. When a non-compliant BIM component is matched, a verification result object containing clause traceability information is instantiated; topological clauses are built in a Neo4j graph database, using the "entity type-constraint edge-entity type" triplet pattern, defining specification objects with nodes and storing spatial relationship types and constraint levels through edges; and fuzzy clauses are built into a lightweight knowledge base based on local CSV files and semantic retrieval, achieving efficient storage and management of the entire spectrum of building code data.

[0013] Furthermore, in step S5, the dependencies and potential conflicts between clauses are defined, and the graph traversal algorithm is used to realize the multi-dimensional impact analysis of design changes and rule association reasoning.

[0014] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method for translating and constructing a knowledge graph based on a domain-specific language building code standardization.

[0015] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the method for translating and constructing a knowledge graph based on a domain-specific language building code standardization.

[0016] The beneficial effects of this invention are: 1. High-precision semantic preservation: By using DSL as an intermediate representation layer and combining LLM semantic understanding with manual verification, the illusion problem of purely automated methods is effectively avoided, ensuring the accurate transmission of canonical semantics.

[0017] 2. Full-type coverage: Differentiated processing procedures are designed for four types of constraints: numerical, logical, topological, and fuzzy, realizing the full spectrum of digital building codes and overcoming the limitation of traditional methods that only support simple numerical constraints.

[0018] 3. Semantic interoperability: The DSL Schema directly maps BIM data structures and attributes, eliminating the semantic gap between natural language specifications and machine models, and supporting plug-and-play compliance checks.

[0019] 4. High-efficiency storage and retrieval: It adopts a hybrid storage strategy of "on-demand allocation", with relational databases handling efficient numerical calculations, graph databases optimizing spatial relationship traversal, and rule engines supporting complex forward reasoning. Overall performance is significantly better than a single storage solution.

[0020] 5. Explanation and Traceability: Visualization of clause dependencies based on knowledge graphs supports multi-dimensional impact analysis of design decisions and detection of regulatory conflicts, providing interpretability support for intelligent drawing review. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0022] Figure 1 This is a flowchart of a method for translating and constructing a knowledge graph based on a domain-specific language, as described in this invention. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0024] This invention provides a method for the standardized translation of building codes and the construction of knowledge graphs based on domain-specific languages, aiming to achieve: (1) automated, high-precision parsing and formalization of unstructured building codes; (2) unified expression and storage of four types of constraints: numerical, logical, topological and fuzzy; (3) construction of a credible semantic bridge between natural language codes and BIM model data; and (4) automated compliance checks that support complex conditional reasoning.

[0025] Specifically, in combination Figure 1 This invention proposes a method for the standardized translation and knowledge graph construction of building specifications based on a domain-specific language, the method comprising: Step S1: Document Preprocessing and Clause Segmentation; Step S1 includes: First, performing text cleaning and standardization on the original building code document to remove format noise; then, identifying the chapter-section-clause hierarchy of the document based on regular expressions and constructing a nested tree data structure, while extracting metadata information such as the code name, version number, applicable region, and effective date; next, identifying the logical boundaries between clauses through syntactic parsing, segmenting complex long sentences into the smallest executable constraint units, and using a large language model for reference resolution and semantic completion to ensure that each atomic clause contains complete "object-attribute-constraint" semantics; finally, extracting keywords such as "should," "should be," and "may" based on modal verb recognition technology, labeling the constraint strength as mandatory or recommended, and forming a standardized and structured set of clauses; Step S2: Multi-dimensional clause classification; The segmented clauses are classified in multiple dimensions based on semantic features; Step S3: DSL semantic conversion; design corresponding DSL schemas for different types of clauses; Step S4: Hybrid database storage; Store various DSL JSONs in corresponding adapted databases. Step S5: Knowledge Graph Construction; Based on the classification and storage in S4, the scattered numerical, logical, topological, and fuzzy text data are aggregated into the Neo4j graph database to construct a unified knowledge graph covering four core types of nodes: specifications, clauses, entities, and parameters.

[0026] Further, in step S2, quantitative index constraints containing comparison operators (≥, ≤, greater than, not less than) are identified as numerical terms.

[0027] Further, in step S2, dependency constraints containing conditional patterns (when... should, if... then) are captured as logical terms.

[0028] Further, in step S2, geometric positional relation constraints containing spatial predicates (adjacent, connected, contained in) are extracted as topological terms.

[0029] Furthermore, in step S2, subjective guiding constraints containing qualitative descriptions (reasonable layout, coordination, aesthetics) are categorized as fuzzy clauses, thereby establishing a differentiated digital processing foundation.

[0030] Further, in step S3, the Schema includes target entity, inspection attributes, operators, constraint values, priority, and applicable condition fields; a Prompt template integrating role definitions, task instructions, output format examples, and domain knowledge constraints is constructed, and natural language terms are parsed into structured DSL JSON objects using a large language model; the generated DSL is manually verified to ensure accurate numerical unit correspondence, complete capture of preconditions, and correct mapping of topological relationships, so as to form a high-quality intermediate representation of executable rules.

[0031] Furthermore, in step S4, numerical clauses are stored in a PostgreSQL relational database, using structured fields to store inspection targets, operators, and limits, and using JSONB type to store applicable conditions and creating a GIN index to support fast contextual retrieval; logical clauses are converted into Drools rule language (.drl), and a "violation pattern matching" strategy is used to transform specification requirements into negative capture patterns. When a non-compliant BIM component is matched, a verification result object containing clause traceability information is instantiated; topological clauses are constructed in a Neo4j graph database, using the "entity type-constraint edge-entity type" triplet pattern, defining specification objects with nodes and storing spatial relationship types and constraint levels through edges; and fuzzy clauses are constructed into a lightweight knowledge base based on local CSV files and semantic retrieval, achieving efficient storage and management of the entire spectrum of building code data.

[0032] Furthermore, in step S5, dependencies (such as parameter references) and potential conflict edges between clauses are defined, and a graph traversal algorithm is used to realize multi-dimensional impact analysis of design changes and rule association reasoning.

[0033] This invention proposes a method for the standardized translation and knowledge graph construction of building codes based on a domain-specific language, belonging to the field of building information technology and automated compliance inspection (ACC). The method includes: hierarchical parsing and atomic segmentation of building code documents; classifying clauses into four categories based on semantic features: numerical, logical, topological, and fuzzy; using Prompt Engineering to guide a large language model (LLM) to convert natural language clauses into structured DSLJSON; and constructing a hybrid knowledge base storage architecture including a graph database (Neo4j), a relational database (PostgreSQL), and a rule engine (Drools). This invention bridges the semantic gap between natural language and executable logic through DSL, achieving automatic conversion of building codes from unstructured text to a machine-interpretable and reasonable knowledge graph. It supports the precise expression of complex logical nesting and spatial relationships, significantly improving the efficiency and accuracy of automated compliance inspection of building designs.

[0034] Example The following is in conjunction with the appendix Figure 1 The specific embodiments of the present invention will be further described in detail below. This embodiment takes the "Residential Design Code" (GB 50096-2011) as an example to illustrate the specific implementation process of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0035] like Figure 1 As shown, this invention provides a method for the standardized translation and knowledge graph construction of building specifications based on a domain-specific language, which includes five steps: document preprocessing and clause segmentation; multi-dimensional clause classification; DSL semantic conversion; hybrid database storage; and knowledge graph construction. like Figure 1 As shown, in the document preprocessing and clause segmentation section, taking the "Residential Design Code" (GB 50096-2011) as an example, the original PDF document is first cleaned and standardized to remove headers, footers, and formatting noise. Then, the document's "chapter-section-clause" hierarchical structure is identified based on regular expressions (e.g., identifying "5.1.1"). The code extracts the clause number and content of "The usable area of ​​a bedroom should not be less than 12m²", extracts metadata information such as the standard name, version number 2011, applicable area (residential buildings nationwide), and effective date; then, it identifies the logical boundaries between clauses through syntactic parsing, and segments complex long sentences (such as "When a bedroom is located in a basement, ventilation facilities should be installed and the net height should not be less than 2.40m") into the smallest executable constraint units. It uses a large language model to resolve pronouns and complete semantics (replacing pronouns such as "its" and "the room" with specific entity names such as "bedroom") to ensure that each atomic clause contains complete "object-attribute-constraint" semantics; finally, it extracts keywords such as "should not", "should not", and "should" based on modal verb recognition technology, and marks "should not be less than 12m²" as mandatory and "should not be located in a basement" as recommended, ultimately forming a standardized and structured set of atomic clauses.

[0036] like Figure 1As shown, in the multi-dimensional clause classification section, after completing the clause segmentation, the 338 segmented clauses were divided into four categories based on semantic features: numerical, logical, topological, and fuzzy. Specifically, this includes: identifying quantitative indicator constraints containing comparison operators (≥, ≤, greater than, not less than) as numerical clauses (123 clauses, accounting for 36.4%, such as "the usable area of ​​the bedroom should not be less than 12m²" and "the net height should not be less than 2.40m"); capturing dependency constraints containing conditional patterns (when..., if..., then) as logical clauses (159 clauses, accounting for...). 47.0%, such as "When the bedroom is located in the basement, a ventilation shaft should be installed"); extract geometric positional relationship constraints containing spatial predicates (adjacent, connected, contained in) as topological clauses (45 in total, accounting for 13.3%, such as "The kitchen should be adjacent to the dining room" and "The bathroom should not be located above the bedroom of the lower-level resident"); classify subjective guiding constraints containing qualitative descriptions (reasonable layout, coordination, aesthetics) as fuzzy clauses (11 in total, accounting for 3.3%, such as "The facade design should be aesthetically pleasing" and "The apartment layout should be reasonable"), thereby establishing a differentiated digital processing foundation.

[0037] like Figure 1 As shown, in the DSL semantic conversion section, corresponding DSL Schemas are designed for the four types of clauses mentioned above. Each Schema includes fields for target entities (corresponding to BIM component types), check attributes, operators, constraint values, priority, and applicable conditions. A Prompt template is constructed that integrates role definitions (clearly defining the LLM as a "digital expert of building codes"), task instructions (requiring strict adherence to the Schema output in JSON format), output format examples (providing correct conversion examples of similar clauses in GB 50096-2011), and domain knowledge constraints (embedding specific entity lists such as "bedroom" and "kitchen" and operator enumeration values). A large language model is used to parse the natural language clauses into structured DSL JSON objects. For example, "The usable area of ​​a bedroom should not be less than 12m²" is converted to {target_entity: "Bedroom", check_attribute: "Usable area", operator: ">=", value: "12", unit: "m²", ... The `mandatory: "true"` statement transforms "When the bedroom is located in the basement, a ventilation shaft should be installed" into a logical structure containing an array of conditions and a `conclusion` object. The generated DSL is then manually validated to ensure that the numerical units are accurately matched (e.g., recognizing "m²" instead of "cm²"), that the preconditions are fully captured (e.g., applicable scenarios such as "six stories and below residential buildings"), and that the topological relationships are correctly mapped (e.g., "adjacent_to" accurately corresponds to the "adjacent" semantics), in order to form a high-quality intermediate representation of executable rules.

[0038] like Figure 1 As shown, in the hybrid database storage section, a hybrid storage architecture of "on-demand allocation and division of responsibilities" is constructed to achieve efficient storage and management of the entire spectrum of building code data: 123 numerical clauses are stored in a PostgreSQL relational database, using structured fields to store inspection targets, operators, and limits, and using JSONB types to store applicable conditions (such as "building_type": "residential", "stories": "<=6"), and a GIN index is created to support fast contextual retrieval based on project attributes. 159 logical clauses are encoded into Drools rule files (.drl). The rule writing adopts a structure of first setting the violation conditions and then configuring the handling actions: the "when" clause defines the combination of BIM component attributes that constitute a violation (e.g., the bedroom space type is "basement" and there is no "ventilation shaft" among the associated components), the "then" clause instantiates a ValidationResult object, encapsulates metadata such as clause number, original specification text, and constraint strength (mandatory / recommended), and finally deploys the rule to the Drools Business Central platform to expose it as a remote service interface; 45 topological clauses are built in the Neo4j graph database, using the "entity type-constraint edge-entity type" triple pattern, defining specification objects with nodes (such as "bedroom", "ventilation duct", "kitchen") and storing spatial relationship types and constraint levels through edges (e.g., creating relationship (k:Kitchen)-[:ADJACENT_TO {clause_id:"5.3.1"}]->(d:DiningRoom)); 11 fuzzy clauses are built into a lightweight knowledge base based on local CSV files and semantic retrieval, storing their original text, chapters, and associated entity information, for semantic retrieval and prompts in subsequent design auxiliary decision-making.

[0039] like Figure 1As shown, in the knowledge graph construction part, a knowledge graph containing four core node types—specification, clause, entity, and parameter—is built in Neo4j to manage complex relationship networks. First, Specification nodes are created to store the metadata of GB 50096-2011; 338 Clause nodes are created to store atomic clauses and their DSL content; Entity nodes are created to abstract building entities (such as "bedroom," "kitchen," and "balcony"); and Parameter nodes are created to store inspection parameters (such as "usable area," "net height," and "ventilation opening area"). Then, directed nodes are defined... Relationship edges express logic and dependencies, including (Clause)-[APPLIES_TO]->(Entity) indicating the applicable object of the clause, (Clause)-[DEPENDS_ON]->(Clause) indicating parameter dependencies between clauses (such as daylighting standards depending on the window-to-floor area ratio), (Clause)-[CONFLICTS_WITH]->(Clause) indicating potential conflicts, and (Entity)-[HAS_PARAMETER]->(Parameter) indicating entity attributes. Using graph algorithms for path searching, when a designer modifies the bedroom window size in the BIM environment, the system executes a Cypher traversal query to match nodes such as (Bedroom)-[:HAS_PARAMETER]->(Daylighting Coefficient), (Ventilation Opening Area), etc., and proactively prompts fuzzy clauses related to "facade aesthetics" and "cost indicators" through path search, realizing multi-dimensional impact analysis and rule-aware association of design changes.

[0040] The present invention also proposes an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method for translating and constructing a knowledge graph based on a domain-specific language building code standardization.

[0041] The present invention also proposes a computer-readable storage medium for storing computer instructions, which, when executed by a processor, implement the steps of the method for translating and constructing a knowledge graph based on a domain-specific language building code standardization.

[0042] The memory in this application embodiment can be volatile memory or non-volatile memory, or it can include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory used in the methods described in this invention is intended to include, but is not limited to, these and any other suitable types of memory.

[0043] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid-state disks (SSDs)).

[0044] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The steps of the method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules in the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.

[0045] It should be noted that the processor in the embodiments of this application can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method embodiments can be completed by the integrated logic circuitry in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied as execution by a hardware decoding processor, or as a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads the information in the memory and, in conjunction with its hardware, completes the steps of the above methods.

[0046] The above provides a detailed description of the method for standardized translation and knowledge graph construction of building specifications based on a domain-specific language proposed in this invention. Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this invention. Therefore, the content of this specification should not be construed as a limitation of this invention.

Claims

1. A method for building specification standardization translation and knowledge graph construction based on domain-specific language, characterized in that, The method includes: Step S1: Document Preprocessing and Clause Segmentation; Step S1 includes: First, performing text cleaning and standardization on the original building code document to remove format noise; then, identifying the chapter-section-clause hierarchy of the document based on regular expressions and constructing a nested tree data structure, while extracting metadata information; next, identifying the logical boundaries between clauses through syntactic parsing, segmenting complex long sentences into the smallest executable constraint units, and using a large language model for reference resolution and semantic completion to ensure that each atomic clause contains complete "object-attribute-constraint" semantics; finally, extracting the keywords "should," "should be," and "may" based on modal verb recognition technology, labeling the constraint strength as mandatory or recommended, and forming a standardized and structured set of clauses; Step S2: Multi-dimensional clause classification; The segmented clauses are classified in multiple dimensions based on semantic features; Step S3: DSL semantic conversion; design corresponding DSL schemas for different types of clauses; Step S4: Hybrid database storage; Store various DSL JSONs in corresponding adapted databases. Step S5: Knowledge Graph Construction; Based on the classification and storage in S4, the scattered numerical, logical, topological, and fuzzy text data are aggregated into the Neo4j graph database to construct a unified knowledge graph covering four core types of nodes: specifications, clauses, entities, and parameters.

2. The method of claim 1, wherein, In step S2, quantitative index constraints containing comparison operators are identified as numerical terms.

3. The method of claim 1, wherein, In step S2, the dependency constraints containing the conditional pattern are captured as logical terms.

4. The method of claim 1, wherein, In step S2, geometric positional relation constraints containing spatial predicates are extracted as topological terms.

5. The method of claim 1, wherein, In step S2, subjective guiding constraints containing qualitative descriptions are classified as vague clauses.

6. The method of claim 1, wherein, In step S3, the Schema includes fields for target entity, inspection attributes, operators, constraint values, priority, and applicable conditions; a Prompt template integrating role definitions, task instructions, output format examples, and domain knowledge constraints is constructed, and natural language terms are parsed into structured DSL JSON objects using a large language model; the generated DSL is manually verified to ensure accurate numerical unit correspondence, complete capture of preconditions, and correct mapping of topological relationships, so as to form a high-quality intermediate representation of executable rules.

7. The method of claim 1, wherein, In step S4, numerical clauses are stored in a PostgreSQL relational database, using structured fields to store inspection targets, operators, and limits, and using JSONB type to store applicable conditions and creating a GIN index to support fast contextual retrieval; logical clauses are converted into Drools rule language, and a "violation pattern matching" strategy is used to convert specification requirements into negative capture patterns. When a non-compliant BIM component is matched, a verification result object containing clause traceability information is instantiated; topological clauses are built in a Neo4j graph database, using the "entity type-constraint edge-entity type" triplet pattern, defining specification objects with nodes and storing spatial relationship types and constraint levels through edges; By building a lightweight knowledge base based on local CSV files and semantic retrieval for fuzzy clauses, efficient storage and management of the entire spectrum of building code data can be achieved.

8. The method of claim 1, wherein, In step S5, the dependencies and potential conflicts between clauses are defined, and the graph traversal algorithm is used to realize the multi-dimensional impact analysis of design changes and rule association reasoning. 9.An electronic device comprising a memory and a processor, the memory storing a computer program, wherein, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1-8.

10. A computer readable storage medium for storing computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the steps of the method according to any one of claims 1-8.