Industrial explosion map knowledge graph construction method, device, system and storage medium

By employing multimodal alignment and fusion, cascaded expert chains, and triple consistency checks, the problem of extracting assembly relationships from industrial exploded diagrams has been solved, enabling efficient and accurate structured knowledge construction and enhancing intelligent support for assembly planning and management.

CN122242684APending Publication Date: 2026-06-19BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently and accurately extract structured assembly knowledge from industrial exploded diagrams, especially in scenarios with complex layouts, dense leader lines, component occlusion, and small components. Component identification and assembly relationships are prone to mismatch, and omissions and conflicts are difficult to detect and correct. The lack of cross-modal alignment and global topological constraints leads to uncertainty in the directionality, hierarchy depth, and order of relationships.

Method used

By employing a multimodal alignment and fusion mechanism, combined with a scene-aware cascaded expert chain and triple consistency verification, a knowledge graph for industrial exploded diagrams is constructed. Specific steps include: parsing image and text data, standardizing scale and coordinates, generating unified feature representations, performing component detection and relation candidate generation, executing visual, topological, and assembly rule consistency verification, dynamically segmenting triples and resolving conflicts, and forming auditable structured results.

Benefits of technology

It improves the efficiency and accuracy of assembly knowledge extraction and modeling, enhances adaptability to complex scenarios, ensures the rationality of the directionality, hierarchical depth and sequence of relationships, achieves auditability and traceability of results, and reduces the cost of manual annotation and review.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242684A_ABST
    Figure CN122242684A_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, system, and storage medium for constructing an industrial exploded view knowledge graph. It utilizes collaborative image and text processing, cross-modal alignment and fusion, cascaded expert chain scene graph construction, and triple consistency verification to achieve efficient and auditable extraction of assembly knowledge. By employing the technical solution of this invention, the efficiency and accuracy of knowledge extraction and modeling are improved through effective parsing of multimodal inputs and consistency-driven optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of information processing technology, specifically relating to a method, apparatus, system, and storage medium for constructing an industrial exploded map knowledge graph. Background Technology

[0002] Exploded diagrams in industry serve as a crucial medium for expressing assembly relationships and communicating processes, and are widely used in design, manufacturing, and operation and maintenance. With the advancement of digital transformation, the efficient and accurate extraction of structured assembly knowledge from numerous heterogeneous exploded diagrams and their accompanying text / lists has become a key requirement for improving design reuse, assembly planning, and process management efficiency. Existing methods often rely on single-modal processing or rule-based scripts, which struggle to robustly handle scenarios involving complex layouts, dense leader lines, component obscuring, and small components. This leads to mismatches in component identification and assembly relationships, and makes it difficult to promptly detect and correct omissions and conflicts.

[0003] Traditional single-modal workflows based on OCR or object detection typically process images and text separately, lacking cross-modal alignment and unified representation of "components-leaders-text entities." This makes them susceptible to factors such as image quality, line type interference, layout differences, and inconsistent naming conventions, leading to problems like unstable anchoring, entity ambiguity, and uncertain relationship directions. Relationship extraction relying on fixed rules suffers from high adaptation costs and poor transferability when dealing with multi-source document layouts, and is insufficient in depicting complex hierarchies and sequences.

[0004] On the other hand, existing relationship inferences are mostly based on local evidence, lacking global topological constraints and assembly rule verification. This makes it difficult to guarantee the directionality, hierarchical depth, and order of relationships such as assembly (parent-child), subordination (primary-secondary), and adjacency (adjacent / contact), which can easily lead to loops, multiple edges, and semantic conflicts. The extracted results generally lack evidence chains and auditable indexes, making it impossible to establish traceable associations between triples and corresponding image segments, connection trajectories, and rule entries, which is not conducive to subsequent verification, review, and compliance documentation. Summary of the Invention

[0005] To address the problems existing in the prior art, this invention provides a method, apparatus, system, and storage medium for constructing an industrial exploded map knowledge graph.

[0006] To achieve the above objectives, the present invention provides the following solution: A method for constructing an industrial exploded map knowledge graph includes: Step S1: Analyze the exploded image and the accompanying text data to obtain visual structural features and semantic descriptions; Step S2: Standardize the visual structural features and semantic descriptive features in terms of scale, coordinates and units, and perform cross-modal alignment based on spatial proximity, connection topology and text similarity to generate a unified feature representation; Step S3: Based on unified feature representation, a scene-aware cascaded expert chain is used to sequentially complete component detection, relation candidate generation, and local refinement, and construct an initial scene graph containing assembly relationships; Step S4: Perform a triple consistency check on each relationship / local structure for the initial scene graph and merge them into a comprehensive relationship confidence score; the triple consistency check includes: visual evidence consistency check, structural topology consistency check and assembly rule consistency check; Step S5: Perform dynamic triplet segmentation in the scene graph based on the comprehensive confidence of the relationship, weight score and filter the candidate paths, and remove duplicate or contradictory relationships through constraint-driven conflict resolution, outputting assembly triplets represented as "subject-verb-object"; associate each triplet with image segments, line trajectories and rule entry evidence indexes to form an auditable structured result.

[0007] The present invention also provides an apparatus for constructing an industrial exploded map knowledge graph, comprising: The first processing module is used to parse and process the exploded image and the accompanying text data to obtain visual structural features and semantic descriptions. The second processing module is used to standardize the scale, coordinates and units of visual structural features and semantic descriptive features, and to perform cross-modal alignment based on spatial proximity, connection topology and text similarity to generate a unified feature representation. The third processing module is used to construct an initial scene graph containing assembly relationships by using a scene-aware cascaded expert chain based on a unified feature representation. The fourth processing module is used to perform a triple consistency check on each relationship / local structure in the initial scene graph and merge them into a comprehensive relationship confidence score. The triple consistency check includes: visual evidence consistency check, structural topology consistency check, and assembly rule consistency check. The fifth processing module is used to perform dynamic triplet cutting in the scene graph based on the comprehensive confidence of the relationship, to perform weighted scoring and filtering of candidate paths, and to remove duplicate or contradictory relationships through constraint-driven conflict resolution, outputting assembly triplets represented as "subject-verb-object"; and to associate each triplet with image segments, line trajectories and rule entry evidence indexes to form an auditable structured result.

[0008] The present invention also provides an industrial exploded diagram knowledge graph construction system, comprising: a memory and a processor, wherein the memory stores a computer program executed by the processor, and the computer program executes an industrial exploded diagram knowledge graph construction method when executed by the processor.

[0009] The present invention also provides a storage medium storing a computer program, which executes an industrial exploded diagram knowledge graph construction method when running.

[0010] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention achieves a knowledge graph builder for industrial exploded diagrams by combining multimodal alignment and fusion mechanisms, scene-aware cascaded expert chains, and triple consistency verification. It effectively integrates image and accompanying text / list inputs, establishing stable associations between "components—leaders—text entities" on the input side through cross-modal anchoring and unified feature representation, laying the foundation for subsequent scene graph construction. Scene graph generation via cascaded expert chains improves adaptability to complex layouts, dense connections, component occlusion, and small-sized components, enhancing the completeness and coherence of modeling assembly / subordination / adjacency relationships. In the verification stage, a triple verification of visual evidence consistency, structural topology consistency, and assembly rule consistency is introduced, along with confidence fusion, ensuring the rationality of the directionality, hierarchical depth, and sequence of relationships, and suppressing false positive propagation. During the generation phase, dynamic triplet segmentation and conflict resolution are employed to output assembly triples represented as "subject-verb-object," which are then associated with evidence indexes of image fragments, connection trajectories, and rule entries to ensure auditability and traceability of the results. The rule set can be conditionally configured and maintained according to component categories, facilitating dynamic expansion and updates as business evolves. In summary, this invention improves the efficiency and accuracy of assembly knowledge extraction and modeling, reduces the cost of manual annotation and verification, and provides reliable intelligent support for assembly planning, design retrieval, and process management. Attached Figure Description

[0011] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 This is a flowchart of the industrial exploded map knowledge graph construction method according to an embodiment of the present invention; Figure 2 This is an example of an industrial exploded view and a schematic diagram of the "scene view - verification - ternary set" generation process according to an embodiment of the present invention. Detailed Implementation

[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0014] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0015] Example 1 As shown in 1 and 2, this invention provides a method for constructing an industrial exploded view knowledge graph. By combining image and text input, using multimodal alignment and fusion, cascaded expert chain scene graph construction, and visual-structural-rule joint verification, it achieves efficient generation and auditable export of assembly triples. The specific steps are as follows: Step S100: Multimodal Input Processing; By processing the exploded image and accompanying text / manifestation data, visual structural features and semantic descriptions can be obtained simultaneously. On the image input side, structural cues such as component outlines and leader lines are extracted; on the text input side, component names, numbers, and assembly identifiers are parsed. After normalization, the above inputs are ensured to maintain consistency in subsequent multimodal fusion processes, specifically including the following steps S101–S103: Step S101: Image Input; Feature extraction is performed on the input exploded image. The main goal is to obtain geometric and connection information for subsequent fusion and scene graph generation, specifically including: Contour features: Robustly extract the outer contour and boundary of the part through denoising, contrast enhancement and morphological operations to describe the geometry of the part; Connection features: Track the leader lines and retain information such as endpoints, inflection points, and directions for subsequent component-text anchoring and relationship hints; Fragment Indexing: Generate indexable image fragments by connecting valid component regions and line trajectories, which serve as the basis for subsequent evidence chain association.

[0016] Step S102: Text Input; Semantic extraction is performed on the accompanying text / list, with the main goal of obtaining entities and identifiers related to the assembly relationship, specifically including: Layout and Recognition: Perform OCR and layout analysis to locate text blocks and numbered blocks; Standardization: Standardize the formats of synonyms, abbreviations, and numbering to establish a consistent field representation; Entity extraction: Extract key elements such as component names, numbers, and assembly identifiers to form a structured text entity set.

[0017] Step S103: Initial Image-Text Correspondence; This involves initially anchoring the image and text, specifically including: Candidate matching: Based on the leader line endpoint, layout, and spatial proximity, generate candidate correspondences for "component candidate - text entity - leader line endpoint"; Consistency screening: Eliminate obviously conflicting matches and retain the Top-k candidates; Output: Anchor pairs and their initial matching confidence scores are formed, providing input for subsequent fusion.

[0018] Step S200: Multimodal fusion; By fusing image features and text features through cross-modal alignment and unified representation, a unified feature representation is generated, enabling accurate capture of assembly scene intent. This specifically includes the following steps S201–S203: Step S201: Feature adaptation; standardize the scale, coordinates and units to establish comparable confidence metrics; Step S202: Cross-modal alignment: Based on spatial proximity, connection topology and text similarity, candidate anchor pairs are screened and conflict resolved; Step S203: Fusion Representation Generation: The geometric and semantic contexts are fused in the unified processing layer to output a unified feature representation of “components-lead-offlines-text entities” as input for scene graph construction.

[0019] Step S300: Scene graph generation; After multimodal fusion is completed, an initial scene graph containing assembly relationships is constructed based on a unified feature representation, specifically including the following steps S301–S303: Step S301: Node and basic feature generation; Generate a set of component nodes based on the component's outer contour and anchoring results, and establish a source index (image fragment ID, text entity ID, connection trajectory).

[0020] Step S302: Generate candidate relationships; combine relative orientation and line direction to generate three types of candidate relationships: assembly (parent-child), subordination (primary-secondary), and adjacency (adjacent / contact), and record the relationship direction, hierarchy depth, and assembly order identifier.

[0021] Step S303: Local refinement; perform boundary correction and candidate redundancy removal for overlapping, occlusion, and small-scale components, eliminating self-loops, repetitions, and obvious boundary violations. For clarity, this can be formalized in stages: in, To input an exploded view, This is the i-th level operator for constructing the scene graph from the image; Output the first-level scene graph; A set of nodes; Let it be the set of edges.

[0022] in, The first-level relation confidence tensor / matrix; Add evidence extraction and aggregation functions for the first level; This is new evidence at this level; This refers to the relational state output from the previous level. And satisfy + This indicates the authority to integrate the two routes at this level; This represents a nonlinear mapping, preserving the output within [0, 1]. The two are weighted and then nonlinearly processed to obtain the final result. .

[0023] Step S400: Consistency Verification; The verification process is illustrated using a two-stage reducer as an example. Figure 2 As shown, the diagram contains both reasonable candidates (such as the external meshing of planetary gears and ring gears, and the support / fitting of shafts and bearing housings) and several incorrect connections (such as misconnecting planetary gears as "supports", misjudging needle rollers and gearbox housings as "contacts", and misjudging crankshafts and needle rollers as "connections"). Then, each candidate relationship is checked in three categories, and evidence is recorded, specifically including the following steps S401–S404.

[0024] Step S401: Visual evidence consistency verification; The exploded view includes: sun gear, planet gears (multiple pieces), planet carrier, internal gear ring (first-stage / second-stage), needle rollers (groups), bearing housing, shaft, key, washer / spacer, housing, etc. First, identify the nodes of the above components from the exploded view, and generate candidate relationships based on spatial proximity, leader line direction, and connection direction, and output the visual consistency score. Figure 2 The planetary gears shown have aligned and continuous tooth profiles on the ring gear, resulting in visual consistency; the sun gear has inconsistent housing orientations, leading to weaker visual support; the needle rollers have no direct contact with the gearbox housing, resulting in visual inconsistency, and therefore are assigned lower scores. in, Visual consistency score represents entities (discrete objects such as parts / sub-components / fasteners, etc.) in the exploded view under visual evidence. and Belongs to Relationship Credibility; Don't represent as , The bounding box region in image I, represented by pixel coordinates; φ_vis is the relation type; φ_vis is the scoring function. Based on the comprehensive characteristics of relative orientation, leader line direction, boundary contact and occlusion, an assessment is made. and Image-level correlation Credibility.

[0025] Step S402: Structural Topology Consistency Check; Combining component hierarchy and assembly sequence, check whether it conforms to structural common sense such as "planetary gears belong to planetary carrier components, the gear ring and planetary gears are meshing edges, and needle rollers should be grouped with corresponding retainers or journals"; the direct connection with the crankshaft is determined to be inconsistent. (In the initial scene diagram) Construct an adjacency matrix above, where, A collection of nodes for components and relationships. Let be the set of edges for candidate assembly relationships. Let the original adjacency matrix be A. To introduce self-loops to preserve the information of the current node, let A be the matrix of the adjacency matrix. Its degree matrix Node characteristics are Node representations are updated with embeddings via lightweight graph convolutions: in, For layer index; for The degree matrix; It represents symmetric normalized adjacency and stable propagation; Indicates the first Layer nodes; This indicates the learnable weights of this layer; This represents a nonlinear function.

[0026] And embed the components in the exploded view as entity nodes. With Relationship Estimate the structural consistency score: Among them, "node" specifically refers to an entity node. (Components / sub-assemblies / fasteners, etc.) Represents a node The final representation; Characteristics of edges / relationships; This represents the structural consistency evaluation function; This represents the structural consistency score, normalized to [0, 1].

[0027] Step S403: Assembly rule consistency check; call the assembly rule library to verify the legality of relationships and threshold conditions, such as: key connections must occur between the shaft and the hub; external meshing is only allowed between "planetary gears and internal gear rings / gear rings"; support / fitting takes precedence between "shaft and bearing housing" and not "needle rollers and housing", etc. Any non-compliance is marked as a conflict, defined as: in, This represents two component entities; This represents a general assembly rule set. Represents the domain assembly rule set; Indicator functions; This represents the rule consistency score (hit = 1, miss = 0).

[0028] Step S404: Confidence Fusion; The three scores are fused according to their weights to obtain a comprehensive confidence score. Only high-confidence relationships are retained, and conflict resolution is performed. The final output includes triples such as "(spindle, mounted on, bearing housing)" and "(gear, fitted on, spindle)", and each triple is associated with a corresponding evidence index. in, Visual consistency score: The score is for structural consistency. The score is the consistency score for the rules. , , These correspond to visual consistency weight, structural consistency weight, and rule consistency weight, respectively, and all three are non-negative real numbers. ; This represents the overall confidence level of the relationship, which is used for subsequent path selection and conflict resolution.

[0029] To facilitate project implementation, it is recommended that the rule set be maintained as a categorized list (example below) and linked with evidence / index, as shown in Figure 1: Table 1

[0030] Step S500: Triple generation; Based on the combined results of the three types of checks, conflicting relationships are deleted or rewritten, and synonymous or duplicate relationships are merged. Triples are generated based on the overall confidence of the relationships, and conflict resolution and evidence indexing are completed. Specifically, this includes the following steps S501–S503: Step S501: Dynamic triplet cutting; Enumerate a set of candidate paths of finite length in the scene graph: Where P is the set of candidate paths; M is the number of candidate paths, which is determined by the search and pruning strategies from the upper bound. This is the k-th candidate path; The i-th node in the scene graph corresponds to a component / entity; For nodes and The relationship type is defined as i = 1, 2, ..., n, where n is the number of nodes in the path.

[0031] Calculate a comprehensive score for each path: Where S(P) is the overall score of path P, and α, β, and γ are local consistency terms, respectively. Context consistency items Assembly sequence items The weights are all non-negative and α + β + γ = 1, where λ is the length penalty coefficient. Local confidence scores for aggregation edges ; For context consistency; To account for assembly order deviations; simultaneously, a path length penalty is introduced to suppress false detection propagation, where λ∈[0,1]. Path length penalty (normalized to [0, 1]); finally, only high-confidence paths are retained and cut. Triplet.

[0032] Step S502: Conflict resolution; perform threshold selection or constraint-driven disambiguation on competing relationships pointing to the same goal, remove duplicate or contradictory triples, and ensure consistency in directionality, hierarchy, and order. For example... Figure 2 As shown, taking a two-stage reducer as an example, unreasonable elements such as "support" for planetary gears, "connection" for needle rollers, "contact" for the gearbox housing, and "connection" for the crankshaft are removed; while elements such as "external meshing" (in multiple places) between planetary gears and ring gears, "support / fit" between shaft and bearing housing, "key connection" between shaft and key, and "connection / installation" between planetary carrier and related parts are retained.

[0033] Step S503: Evidence Indexing and Export; Associate each triple with its evidence (image fragments, line trajectories, and rule entry indexes) and export it in a structured form to achieve traceability and auditability.

[0034] Figure 2 The following is a schematic diagram of the processing flow for a two-stage reducer example: the upper part is an exploded industrial view, the middle part is the initial scene diagram, and the lower part is the corrected scene diagram after verification and fusion via S401–S404. The circled component nodes and connections in the diagram correspond to the component names and candidate relationships given in the embodiment, and the arrows indicate the verification direction and path selection. The table showing the correspondence between the example and the correction process is as follows: Table 2

[0035] The key point of this invention is: Multimodal Input Processing: One of the core aspects of this invention is the collaborative parsing of images and text on the input side. On the image side, denoising, enhancement, and morphological processing are used to stably extract structural clues such as component outlines and leader lines from exploded views. On the text side, OCR and normalization processing are used to extract semantic elements such as component names, numbers, and assembly identifiers. An initial correspondence of "component candidate - text entity - leader line endpoint" is established through layout and proximity topology, ensuring that appearance structural information and annotation semantics can be consistently referenced in subsequent processing stages.

[0036] Multimodal feature alignment and fusion: Based on scale, coordinate, and unit standardization, candidate alignment and filtering of image and text elements are performed according to spatial proximity, connection topology, and text similarity. Geometric and semantic contexts are fused at a unified processing layer to generate a unified feature representation. This fusion method ensures the consistency and integrity of "component-leader-text entity" across different modalities, providing usable and traceable integrated features for subsequent determination of structural relationships.

[0037] Scene graph construction (cascaded expert chain): A scene-aware cascaded process is adopted. First, component detection and candidate merging are performed. Then, based on relative orientation and lead-out line pointing, candidate relationships such as assembly (parent-child), subordination (master-auxiliary), and adjacency (adjacent / contact) are generated. Local refinement is performed on occlusion and small-sized components. Finally, the joint optimization of nodes and edges is completed by combining global and local context to form an initial scene graph. The relationship direction, hierarchy depth, and assembly order identifier are recorded simultaneously to improve structural coherence.

[0038] Consistency verification and confidence fusion: Relationships in the initial scene graph are verified for visual evidence consistency, structural topology consistency, and assembly rule consistency. The former examines local evidence such as visibility and connectivity, while the latter two verify topological constraints and assembly rules (direction, order, allowed / prohibited adjacency, etc.). The three consistency scores are then fused using preset or adaptive weights to obtain a comprehensive relationship confidence score. This score guides subsequent path selection and scoring, effectively suppressing false positives and semantic conflicts.

[0039] Triple generation and conflict resolution: Guided by the overall confidence of relationships, candidate paths of a preset length are enumerated in the scene graph and weighted and scored (considering node / edge confidence, contextual consistency, and assembly order constraints). Dynamic triple cutting is performed, and only high-confidence paths are retained to generate assembly triples represented by ⟨subject-verb-object>. Conflict resolution is carried out for competing relationships pointing to the same goal, and duplicate or contradictory triples are deleted to ensure overall consistency in directionality and hierarchy.

[0040] Evidence Index and Auditable Export: An evidence index is created for each assembly triple, which is associated with the corresponding image fragments, line trajectories, and triggered assembly rule entries. The results are provided in a structured form for querying and exporting, enabling traceability and auditability. This facilitates rapid review and compliance documentation in applications such as assembly planning, design retrieval, and process management.

[0041] Example 2 The present invention also provides an apparatus for constructing an industrial exploded map knowledge graph, comprising: The first processing module is used to parse and process the exploded image and the accompanying text data to obtain visual structural features and semantic descriptions. The second processing module is used to standardize the scale, coordinates and units of visual structural features and semantic descriptive features, and to perform cross-modal alignment based on spatial proximity, connection topology and text similarity to generate a unified feature representation. The third processing module is used to construct an initial scene graph containing assembly relationships by using a scene-aware cascaded expert chain based on a unified feature representation. The fourth processing module is used to perform a triple consistency check on each relationship / local structure in the initial scene graph and merge them into a comprehensive relationship confidence score. The triple consistency check includes: visual evidence consistency check, structural topology consistency check, and assembly rule consistency check. The fifth processing module is used to perform dynamic triplet cutting in the scene graph based on the comprehensive confidence of the relationship, to perform weighted scoring and filtering of candidate paths, and to remove duplicate or contradictory relationships through constraint-driven conflict resolution, outputting assembly triplets represented as "subject-verb-object"; and to associate each triplet with image segments, line trajectories and rule entry evidence indexes to form an auditable structured result.

[0042] Example 3 The present invention also provides an industrial exploded diagram knowledge graph construction system, comprising: a memory and a processor, wherein the memory stores a computer program executed by the processor, and the computer program executes an industrial exploded diagram knowledge graph construction method when executed by the processor.

[0043] Example 4 The present invention also provides a storage medium storing a computer program, which executes an industrial exploded diagram knowledge graph construction method when running.

[0044] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. A method for constructing an industrial exploded view knowledge graph, characterized in that, include: Step S1: Analyze the exploded image and the accompanying text data to obtain visual structural features and semantic descriptions; Step S2: Standardize the visual structural features and semantic descriptive features in terms of scale, coordinates and units, and perform cross-modal alignment based on spatial proximity, connection topology and text similarity to generate a unified feature representation; Step S3: Based on unified feature representation, a scene-aware cascaded expert chain is used to sequentially complete component detection, relation candidate generation, and local refinement, and construct an initial scene graph containing assembly relationships; Step S4: Perform a triple consistency check on each relationship / local structure for the initial scene graph and merge them into a comprehensive relationship confidence score; the triple consistency check includes: visual evidence consistency check, structural topology consistency check and assembly rule consistency check; Step S5: Perform dynamic triplet segmentation in the scene graph based on the comprehensive confidence of the relationship, perform weighted scoring and screening of candidate paths, and remove duplicate or contradictory relationships through constraint-driven conflict resolution, outputting assembly triplets represented as "subject-verb-object"; associate each triplet with image segments, line trajectories and rule entry evidence indexes to form an auditable structured result.

2. An industrial exploded view knowledge graph construction device, characterized in that, include: The first processing module is used to parse and process the exploded image and the accompanying text data to obtain visual structural features and semantic descriptions. The second processing module is used to standardize the scale, coordinates and units of visual structural features and semantic descriptive features, and to perform cross-modal alignment based on spatial proximity, connection topology and text similarity to generate a unified feature representation. The third processing module is used to construct an initial scene graph containing assembly relationships by using a scene-aware cascaded expert chain based on a unified feature representation. The fourth processing module is used to perform a triple consistency check on each relationship / local structure in the initial scene graph and merge them into a comprehensive relationship confidence score. The triple consistency check includes: visual evidence consistency check, structural topology consistency check, and assembly rule consistency check. The fifth processing module is used to perform dynamic triplet cutting in the scene graph based on the comprehensive confidence of the relationship, to perform weighted scoring and filtering of candidate paths, and to remove duplicate or contradictory relationships through constraint-driven conflict resolution, outputting assembly triplets represented as "subject-verb-object"; and to associate each triplet with image segments, line trajectories and rule entry evidence indexes to form an auditable structured result.

3. A system for constructing an industrial exploded view knowledge graph, characterized in that, include: A memory and a processor, wherein the memory stores a computer program executed by the processor, the computer program executing the industrial exploded map knowledge graph construction method as described in claim 1 when run by the processor.

4. A storage medium, characterized in that, The storage medium stores a computer program, which executes the industrial exploded map knowledge graph construction method as described in claim 1 when it runs.