A multi-modal digital twin inference method and system for industrial scenarios

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a multimodal twin state candidate set and a process flow model, dynamically selecting or reconstructing the inference structure, executing multi-path parallel inference and performing consistency verification, the problem of self-contradictory inference results under complex process stage changes in the existing technology is solved, and the adaptability and accuracy of inference in industrial scenarios are improved.

CN122242756APending Publication Date: 2026-06-19BEIJING JIAOTONG UNIV

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING JIAOTONG UNIV
Filing Date: 2026-03-23
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

Existing digital twin reasoning technologies for industrial scenarios struggle to address the issue of contradictory reasoning results under complex process stage variations and multimodal data conditions.

Method used

Multimodal data from the industrial site is collected, features are extracted and mapped, a multimodal twin state candidate set is constructed, and a process stage judgment result is generated by combining the process flow model. Based on the judgment result, a reasoning structure matching the current process stage is selected or reconstructed from the reasoning structure library. Multi-path parallel reasoning is executed, and consistency verification is performed to generate the final reasoning result.

Benefits of technology

It enables the dynamic adaptation of digital twin systems to complex processes, improves the adaptability and accuracy of reasoning, generates logically consistent and more interpretable final reasoning results, and enhances reliability and decision support capabilities in industrial scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242756A_ABST

Patent Text Reader

Abstract

This invention discloses a multimodal digital twin reasoning method and system for industrial scenarios, relating to the field of digital twin reasoning technology. The method includes: collecting multimodal data from the industrial site; preprocessing and extracting features; mapping the extracted features to the attribute space of the digital twin object; constructing a multimodal digital twin state candidate set; generating corresponding process stage judgment results by combining a pre-defined process flow model; selecting or reconstructing a reasoning structure matching the current process stage from a pre-built reasoning structure library based on the judgment results; merging and updating the multimodal digital twin state candidate set to generate a digital twin state consistent with the current process stage; and triggering the corresponding reasoning task based on the reasoning structure adapted to the digital twin state and stage. This invention improves the adaptability and reliability of the industrial digital twin reasoning process through a reasoning structure selection and reconstruction mechanism, combined with multi-path parallel reasoning and consistency verification methods.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of digital twin reasoning technology, and in particular to a multimodal digital twin reasoning method and system for industrial scenarios. Background Technology

[0002] With the deep integration of new-generation information technology and industrial manufacturing, digital twin technology is gradually becoming an important technical means to support industrial system modeling, monitoring, and decision-making. By digitally modeling physical entities, processes, and their operating status, and combining multi-source information such as sensor data, log data, and image data, digital twin systems can reflect the operating status of the industrial site in real time in virtual space, providing support for production optimization, equipment maintenance, and safety management.

[0003] However, existing digital twin inference technologies for industrial scenarios typically focus on the design of static inference models or fixed inference rules. Their inference structure is predetermined during the system deployment phase and is rarely dynamically adjusted according to changes in process stages during actual operation. When there are clear stages in the industrial process flow, and different process stages have significant differences in the inference object, inference rules, and inference objectives, existing technologies often struggle to adapt effectively. Summary of the Invention

[0004] In view of the aforementioned existing problems, the present invention is proposed.

[0005] Therefore, this invention provides a multimodal digital twin reasoning method for industrial scenarios, which solves the problem that the reasoning results of existing technologies are prone to contradictions under complex process stage changes and multimodal data conditions.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: Firstly, the present invention provides a multimodal digital twin reasoning method for industrial scenarios, comprising, Multimodal data from industrial sites is collected, preprocessed, and then features are extracted. The extracted features are mapped to the attribute space of the digital twin object to construct a multimodal twin state candidate set. Combined with a pre-set process flow model, the corresponding process stage judgment result is generated. Based on the judgment results, a reasoning structure matching the current process stage is selected or reconstructed from the pre-built reasoning structure library, and the multimodal twin state candidate set is fused and updated to generate a digital twin state consistent with the current process stage. Based on the reasoning structure that adapts to the state and stage of the digital twin, the corresponding reasoning task is triggered, multi-path parallel reasoning is executed to generate reasoning results, consistency verification is performed on the reasoning results, self-contradictory reasoning paths are detected in the reasoning process, and the final reasoning result is generated.

[0007] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the following steps are included: collecting multimodal data from the industrial site, preprocessing it, extracting features, mapping the extracted features to the attribute space of the digital twin object, and constructing a multimodal twin state candidate set: Collect operational status, images, and log data from the industrial site, preprocess the collected data, and generate a preprocessed dataset. Feature extraction is performed on the preprocessed dataset to generate a set of feature vectors containing state features, visual features, and text features; The layout of the industrial site is determined, and a digital twin object of the industrial site is generated. A linear mapping is used to map the feature vector set to the digital twin object of the industrial site to generate a multimodal twin state candidate set. The multimodal twin state candidate set refers to the data set used to characterize the current operating status of physical objects in the industrial field in the digital twin.

[0008] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the step of generating corresponding process stage determination results by combining a preset process flow model includes the following steps: Historical process data from industrial sites is collected, preprocessed, and features related to process stages are extracted. Based on the labeled historical data, a classification algorithm is used to predict the process stage, and the parameters of the process flow model are continuously optimized through k-fold cross-validation. When the number of iterations reaches the preset upper limit, the trained process flow model is obtained. The multimodal twin state candidate set is input into the trained process flow model, and the output is a judgment result containing the features of the current process stage and the judgment confidence.

[0009] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the step of selecting or reconstructing a reasoning structure matching the current process stage from a pre-built reasoning structure library based on the determination result includes the following steps: Construct a library containing reasoning structures for different process stages. Each reasoning structure includes a process stage identifier, the attribute type of the reasoning object, the reasoning rules, and the corresponding reasoning path organization method. The reasoning structure refers to a structured rule model used to organize and constrain the reasoning process.

[0010] Based on the determination result of the current process stage, perform a search in the reasoning structure library to retrieve the reasoning structure corresponding to the current process stage; If a reasoning structure that is consistent with the current process stage exists in the reasoning structure library, then that reasoning structure will be loaded as the reasoning structure for the current stage.

[0011] If there is no inference structure in the inference structure library that is consistent with the current process stage, an inference structure reconstruction operation is performed. The K nearest neighbor algorithm is used to match the inference structure that best fits the current process stage among the existing inference structures, and the DQN algorithm is used to adjust it to obtain the reconstructed inference structure. The reconstructed inference structure is then loaded as the inference structure for the current process stage. The aforementioned reasoning structure reconstruction refers to adjusting the reasoning rules based on the reasoning structures corresponding to adjacent process stages to generate a reasoning structure that conforms to the current process stage.

[0012] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the step of fusing and updating the multimodal twin state candidate set to generate a digital twin state consistent with the current process stage includes the following steps: Using the process stage characteristics and the judgment confidence results as constraints, multiple twin state candidates that reflect the state of the same digital twin object are integrated into a multimodal twin state candidate set. Candidate digital twin states that do not conform to the current process stage are eliminated, and the state representation of the digital twin object is updated based on the integrated candidate digital twin states to generate a digital twin state consistent with the current process stage.

[0013] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the reasoning structure based on digital twin state and stage adaptation triggers the corresponding reasoning task, executes multi-path parallel reasoning to generate reasoning results, and includes the following steps: Read the digital twin state consistent with the current process stage, including the attribute values, status identifiers, event information, and stage-related constraint markers of the digital twin object; Read and parse the inference structure that matches the current process stage, and determine the set of input variables, the types of inference tasks that can be triggered, and the rules for inference branches for the inference structure; The inference branch rules refer to the set of rules used in the inference process to determine the conditions for choosing different inference paths.

[0014] The digital twin state is mapped to the input of the reasoning structure, generating a reasoning context that includes input variable assignments and missing labels; The missing marker refers to the identification information used to identify the absence of input variables in the reasoning context.

[0015] Based on the reasoning structure, a set of reasoning tasks corresponding to the current process stage is generated. For each reasoning task, the dependencies of the reasoning task are extracted from the digital twin state and compared with the reasoning context to generate task triggering conditions. Based on the task triggering conditions, the triggering judgment of the reasoning task is executed. If the triggering conditions are met, the reasoning task is generated. If the triggering conditions are partially met, the completion and downgrade reasoning branches of the reasoning task are triggered and the reasoning task is generated. If the triggering conditions are not met, the reasoning task is not generated and the reason for not triggering is recorded. The completion of the inference task involves filling in the missing input variables required for the inference task based on the missing markers recorded in the inference context, the most recent valid values of the historical digital twin state, the default range values of the process stage constraints, and the estimated values derived from the inference rules defined in the inference structure. The completed input variables are then used together with the original input variables to form the updated inference context. When the completion result does not meet the task triggering conditions, the degradation inference branch selects an alternative inference path with weaker constraints to perform inference according to the preset degradation inference rules in the inference structure. The generated inference tasks are integrated to generate a set of trigger tasks, and a task execution description is generated for each inference task in the set. The task execution description includes the correspondence between task objectives, input variables, inference constraints, output format, and branching rules; Based on the task execution description, the reasoning branch rules corresponding to the reasoning task in the reasoning structure are parsed, and the various reasoning path types that the reasoning task can adopt are determined. For each triggered reasoning task, based on the established reasoning branch rules and the reasoning context, multiple initial reasoning paths are constructed, an initial set of reasoning paths is generated, and a path description is generated for each path. The path description includes the path identifier, the inference link used by the path, the set of input variables, the path constraints, and the path termination conditions. Read the path description of each path from the initial set of inference paths, and create an independent execution unit for each path; On each inference path, the input variables and missing flags of the inference context are read, inference is performed according to the inference link specified by the path, intermediate inference results are generated, and path constraints and path termination checks are performed on the intermediate inference results. If the check passes and the path termination condition is met, a path-level result is generated. If the check fails or the path termination condition is not met, the path is terminated and the path is marked as invalid. The path-level results include path identifier, path status, path output, key intermediate conclusions, and reasoning trajectory; For each inference task, the path-level results of all paths under that task are aggregated to generate task-level inference results.

[0016] As a preferred embodiment of the multimodal digital twin reasoning method for industrial scenarios described in this invention, the step of performing consistency verification on the reasoning results, detecting self-contradictory reasoning paths during the reasoning process, and generating the final reasoning result includes the following steps: Based on the task-level reasoning results, extract the path-level result list, as well as the path output, key intermediate conclusions, and reasoning trajectory of the corresponding path. Based on the list of path-level results, all path-level results are standardized to generate a consistency check set. Based on the reasoning structure and preset consistency rules, a set of consistency verification criteria is constructed; The set of consistency verification criteria includes cross-path conflict determination rules, mutual exclusion constraint rules, stage constraint consistency rules, and multi-value conflict rules for the same variable. For the consistency verification set, perform consistency verification according to the consistency verification criterion set, detect whether there are logical conflicts, mutual exclusion conflicts or constraint conflicts between the path output and the key intermediate conclusions, and generate a set of conflict relationships; Based on the set of conflicting relationships, locate the self-contradictory reasoning path, perform elimination and record the cause of conflict, and obtain the set of consistent paths; For each inference task, the path-level results in the consistent path set are aggregated to generate the final inference result; Based on the final reasoning results, corresponding process adjustment suggestions and control decision instructions are generated to adjust the operating parameters, process paths and resource allocation of industrial equipment. The reasoning process that generates the final reasoning result is encoded and compressed to generate a compressed representation of the reasoning path. The compressed representation of the reasoning path is then processed with a unified structure and uploaded to the database for storage.

[0017] Secondly, this invention provides a multimodal digital twin reasoning system for industrial scenarios, comprising: The multimodal data acquisition module collects multi-source heterogeneous multimodal data from the industrial site and performs unified preprocessing on the collected data; The twin attribute mapping module extracts key features from preprocessed multimodal data and maps these features to the attribute space of the digital twin object. The reasoning structure matching module, in conjunction with a preset process flow model, determines the current process stage of the industrial object and matches the corresponding reasoning structure. The inference task generation module determines the types of inference tasks to be executed based on the inference structure adapted to the current process stage and in combination with the digital twin state, and generates a set of inference tasks. The inference execution module parses the inference branch rules in the inference structure, constructs multiple inference paths and executes them in parallel, and generates path-level inference results based on the inference links defined by the paths. The inference result generation module performs consistency checks on the inference results of multiple paths and generates the final inference result based on the consistency check results.

[0018] Thirdly, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, wherein: when the computer program is executed by the processor, it implements any step of the multimodal digital twin reasoning method for industrial scenarios as described in the first aspect of the present invention.

[0019] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein: when the computer program is executed by a processor, it implements any step of the multimodal digital twin reasoning method for industrial scenarios as described in the first aspect of the present invention.

[0020] The beneficial effects of this invention are as follows: By constructing a multimodal twin state candidate set and combining it with a process flow model to generate process stage determination results, the digital twin system can accurately characterize the state features of industrial objects at different process stages within a unified semantic space; based on the process stage determination results, a reasoning structure matching the current process stage is selected or reconstructed from the reasoning structure library, and the multimodal twin state candidate set is merged and updated, realizing dynamic adaptation between the reasoning structure and the process stage, so that the reasoning process no longer depends on fixed rules or static models, improving the adaptability and accuracy of reasoning in complex process flow change scenarios; based on the stage-adapted reasoning structure, the reasoning task is triggered and multi-path parallel reasoning is executed. By exploring multiple reasoning paths in parallel and verifying the consistency of results, contradictory reasoning paths can be identified and eliminated when there is incomplete data or uncertain reasoning conditions, generating logically consistent and more interpretable final reasoning results, improving the reliability, stability, and decision support capability of the digital twin reasoning process in industrial scenarios. Attached Figure Description

[0021] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a flowchart of a multimodal digital twin reasoning method for industrial scenarios.

[0023] Figure 2This is a schematic diagram of a multimodal digital twin reasoning system for industrial scenarios.

[0024] Figure 3 A flowchart for generating inference results for multi-path inference execution. Detailed Implementation

[0025] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0026] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0027] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0028] Reference Figures 1-3 This is one embodiment of the present invention, which provides a multimodal digital twin reasoning method for industrial scenarios, including the following steps: Multimodal data from industrial sites is collected, preprocessed, and then features are extracted. The extracted features are mapped to the attribute space of the digital twin object to construct a multimodal twin state candidate set. Combined with a pre-set process flow model, the corresponding process stage judgment result is generated.

[0029] Specifically, the system collects operational status, images, and log data from the industrial site, preprocesses the collected data, and generates a preprocessed dataset. Feature extraction is performed on the preprocessed dataset to generate a set of feature vectors containing state features, visual features, and text features; The layout of the industrial site is determined, and a digital twin object of the industrial site is generated. A linear mapping is used to map the feature vector set to the digital twin object of the industrial site, generating a multimodal twin state candidate set.

[0030] By extracting features from the preprocessed data, a set of feature vectors containing state features, visual features, and text features is generated, realizing a structured expression of multimodal information in the industrial field, enabling different types of data to be collaboratively represented in the same feature space. A linear mapping method is used to map the set of feature vectors to the attribute space of the digital twin object, constructing a multimodal twin state candidate set, so that multimodal features and specific industrial objects form a clear correspondence.

[0031] Furthermore, historical process data from industrial sites is collected, preprocessed, and features related to process stages are extracted. Based on the labeled historical data, a classification algorithm is used to predict process stages, and the parameters of the process flow model are continuously optimized through k-fold cross-validation. When the number of iterations reaches the preset upper limit, the trained process flow model is obtained. The multimodal twin state candidate set is input into the trained process flow model, and the output is a judgment result containing the features of the current process stage and the judgment confidence.

[0032] By extracting key features related to the process stage and modeling the process stage using a classification algorithm based on labeled historical data, the process stage determination process can fully combine historical experience and actual operating characteristics, improving the stability and adaptability of process stage identification. The cross-validation mechanism is used to continuously optimize the parameters of the process flow model, so that the model can maintain good generalization ability under different process scenarios and avoid the distortion of judgment results due to fluctuations in process data.

[0033] Based on the judgment results, a reasoning structure matching the current process stage is selected or reconstructed from the pre-built reasoning structure library, and the multimodal twin state candidate set is fused and updated to generate a digital twin state consistent with the current process stage.

[0034] Specifically, a library containing reasoning structures for different process stages is constructed. Each reasoning structure includes a process stage identifier, the attribute type of the reasoning object, the reasoning rules, and the corresponding reasoning path organization method. Based on the determination result of the current process stage, a search is performed in the inference structure library to retrieve the inference structure corresponding to the current process stage.

[0035] If a reasoning structure that is consistent with the current process stage exists in the reasoning structure library, then that reasoning structure will be loaded as the reasoning structure for the current stage. If there is no inference structure in the inference structure library that is consistent with the current process stage, extract the feature vector of the existing inference structure, calculate the cosine similarity with the feature vector of the current process stage, and select the K inference structures with the highest cosine similarity in the inference structure library as candidate inference structures.

[0036] For each candidate reasoning structure, the reasoning path, reasoning rules, and path execution order constraints of the reasoning structure are encoded into structural states to generate a state space; The actions that adjust the reasoning path, reasoning rules, and execution order of the reasoning structure are integrated into the action space; The DQN algorithm is used to iteratively reconstruct the candidate inference structure. Each time an adjustment is made, the Q value of the inference structure is obtained based on the feedback from the current process stage. The process involves continuously performing adjustments and updating the state space of the candidate inference structure. The iteration is complete when the number of iterations reaches a preset upper limit. Among the candidate inference structures, the state space with the largest historical Q value is selected, and its corresponding inference structure is used as the reconstructed inference structure. The reconstructed inference structure is then loaded as the inference structure for the current process stage.

[0037] By vectorizing the existing inference structures and performing similarity analysis based on the characteristics of the process stages, candidate inference structures with high relevance to the current process stage are selected, thus avoiding the loss of stage constraints in the inference process when a completely matching structure is lacking. The inference paths, inference rules, and path execution order constraints of the candidate inference structures are encoded as structural states, and corresponding action spaces are constructed. A reinforcement learning-based structural reconstruction mechanism is introduced, enabling the inference structure to be dynamically adjusted and optimized under the guidance of feedback from the current process stage. Ultimately, a more suitable inference structure for the current process stage is obtained, improving the adaptability of the inference structure in complex process change scenarios.

[0038] Furthermore, the process stage characteristics and the determination confidence results are used as constraints to integrate multiple twin state candidates that reflect the state of the same digital twin object in a multimodal twin state candidate set. Candidate digital twin states that do not conform to the current process stage are eliminated, and the state representation of the digital twin object is updated based on the integrated candidate digital twin states to generate a digital twin state consistent with the current process stage.

[0039] By integrating the process stage features and the judgment confidence results as constraints, the multimodal twin state candidate set is integrated, enabling the digital twin state to fully consider the specific needs of the current process stage and ensuring a high degree of consistency between the twin state and the actual process stage. By eliminating twin state candidates that do not conform to the current process stage and updating the state representation of the digital twin object based on the integrated candidate set, dynamic optimization and adjustment of the digital twin state are achieved, improving the adaptability of the inference model and enhancing the responsiveness of the digital twin system to changes in process stages in industrial scenarios.

[0040] Based on the reasoning structure that adapts to the state and stage of the digital twin, the corresponding reasoning task is triggered, multi-path parallel reasoning is executed to generate reasoning results, consistency verification is performed on the reasoning results, self-contradictory reasoning paths are detected in the reasoning process, and the final reasoning result is generated.

[0041] Specifically, read the digital twin state that is consistent with the current process stage, including the attribute values, status identifiers, event information, and stage-related constraint markers of the digital twin object; Read and parse the inference structure that matches the current process stage, and determine the set of input variables, the types of inference tasks that can be triggered, and the rules for inference branches for the inference structure; The digital twin state is mapped to the input of the reasoning structure, generating a reasoning context that includes input variable assignments and missing markers.

[0042] Based on the reasoning structure, a set of reasoning tasks corresponding to the current process stage is generated. For each reasoning task, the dependencies of the reasoning task are extracted from the digital twin state and compared with the reasoning context to generate task triggering conditions. Based on the task triggering conditions, the triggering judgment of the reasoning task is executed. If the triggering conditions are met, the reasoning task is generated. If the triggering conditions are partially met, the completion and downgrade reasoning branches of the reasoning task are triggered and the reasoning task is generated. If the triggering conditions are not met, the reasoning task is not generated and the reason for not triggering is recorded. The completion of the inference task involves filling in the missing input variables required for the inference task based on the missing markers recorded in the inference context, the most recent valid values of the historical digital twin state, the default range values of the process stage constraints, and the estimated values derived from the inference rules defined in the inference structure. The completed input variables are then used together with the original input variables to form the updated inference context. When the completion result does not meet the task triggering conditions, the degradation inference branch selects an alternative inference path with weaker constraints to perform inference according to the preset degradation inference rules in the inference structure.

[0043] The generated inference tasks are integrated to generate a set of trigger tasks, and a task execution description is generated for each inference task in the set. The task execution description includes the correspondence between task objectives, input variables, inference constraints, output format, and branching rules.

[0044] Based on the task execution description, the reasoning branch rules corresponding to the reasoning task in the reasoning structure are parsed, and the various reasoning path types that the reasoning task can adopt are determined. For each triggered reasoning task, multiple initial reasoning paths are constructed based on the established reasoning branch rules and the reasoning context. Generate an initial set of inference paths and generate a path description for each path; The path description includes the path identifier, the inference link used by the path, the set of input variables, the path constraints, and the path termination conditions.

[0045] Read the path description of each path from the initial set of inference paths, and create an independent execution unit for each path; On each inference path, the input variables and missing flags of the inference context are read, inference is performed according to the inference link specified by the path, intermediate inference results are generated, and path constraints and path termination checks are performed on the intermediate inference results. If the check passes and the path termination condition is met, a path-level result is generated. If the check fails or the path termination condition is not met, the path is terminated and the path is marked as invalid. The path-level results include path identifier, path status, path output, key intermediate conclusions, and reasoning trajectory.

[0046] For each inference task, the path-level results of all paths under that task are aggregated to generate task-level inference results.

[0047] By reading the digital twin state consistent with the current process stage and combining it with an inference structure matching the process stage, the inference process can be dynamically adjusted based on real-time digital twin data and stage characteristics, improving the accuracy and adaptability of the inference results. Mapping the digital twin state to the input of the inference structure generates an inference context containing input variable assignments and missing value markers, providing a complete and consistent input basis for inference task generation. Extracting dependencies and combining them with the inference context to generate task triggering conditions ensures that tasks are triggered only when necessary conditions are met, avoiding the generation of invalid inference tasks and improving the efficiency of the inference system. For cases where task triggering conditions are partially met, by completing missing input variables or executing a downgraded inference branch, inference can continue even when data is incomplete or conditions are not fully met, ensuring the continuity and stability of the inference process. By generating a clear execution description for each inference task and parsing the inference branch rules in the inference structure, the diversity and accuracy of inference paths are ensured.

[0048] Furthermore, based on the task-level reasoning results, a list of path-level results, as well as the path outputs, key intermediate conclusions, and reasoning trajectories of the corresponding paths are extracted. Based on the list of path-level results, all path-level results are standardized to generate a consistency check set. Based on the reasoning structure and preset consistency rules, a set of consistency verification criteria is constructed; The set of consistency verification criteria includes cross-path conflict determination rules, mutual exclusion constraint rules, stage constraint consistency rules, and multi-value conflict rules for the same variable.

[0049] For the consistency verification set, perform consistency verification according to the consistency verification criterion set, detect whether there are logical conflicts, mutual exclusion conflicts or constraint conflicts between the path output and the key intermediate conclusions, and generate a set of conflict relationships; Based on the set of conflicting relationships, locate the self-contradictory reasoning path, perform elimination and record the cause of conflict, and obtain the set of consistent paths; For each inference task, the path-level results in the consistent path set are aggregated to generate the final inference result; The reasoning process that generates the final reasoning result is encoded and compressed to generate a compressed representation of the reasoning path. The compressed representation of the reasoning path is then processed with a unified structure and uploaded to the database for storage.

[0050] By constructing a set of consistency verification criteria based on the reasoning structure and preset rules, including rules for cross-path conflict determination, mutual exclusion constraints, stage constraint consistency, and multi-value conflict of the same variable, the beneficial effect of multi-dimensional conflict detection capability is achieved. The consistency verification set is verified according to the criterion set to detect logical conflicts, mutual exclusion conflicts, or constraint conflicts between the path output and key intermediate conclusions and generate a set of conflict relationships, thereby automatically identifying the inherent contradictions in the reasoning process and improving the reliability and logical consistency of the reasoning results. By locating self-contradictory reasoning paths based on the set of conflict relationships, performing elimination, and recording the reasons for the conflicts to obtain a set of consistent paths, the filtering of invalid or contradictory paths is achieved, and the quality of reasoning paths is optimized.

[0051] This embodiment also provides a multimodal digital twin reasoning system for industrial scenarios, including: The multimodal data acquisition module collects multi-source heterogeneous multimodal data from the industrial site and performs unified preprocessing on the collected data; The twin attribute mapping module extracts key features from preprocessed multimodal data and maps these features to the attribute space of the digital twin object. The reasoning structure matching module, in conjunction with a preset process flow model, determines the current process stage of the industrial object and matches the corresponding reasoning structure. The inference task generation module determines the types of inference tasks to be executed based on the inference structure adapted to the current process stage and in combination with the digital twin state, and generates a set of inference tasks. The inference execution module parses the inference branch rules in the inference structure, constructs multiple inference paths and executes them in parallel, and generates path-level inference results based on the inference links defined by the paths. The inference result generation module performs consistency checks on the inference results of multiple paths and generates the final inference result based on the consistency check results.

[0052] This embodiment also provides a computer device applicable to the multimodal digital twin reasoning method for industrial scenarios, including: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement the multimodal digital twin reasoning method for industrial scenarios proposed in the above embodiment.

[0053] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.

[0054] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the multimodal digital twin reasoning method for industrial scenarios proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0055] In summary, this invention achieves the following: First, by constructing a multimodal twin state candidate set and combining it with a process flow model to generate process stage determination results, the digital twin system can accurately characterize the state features of industrial objects at different process stages within a unified semantic space. Second, based on the process stage determination results, it selects or reconstructs a reasoning structure matching the current process stage from a reasoning structure library and merges and updates the multimodal twin state candidate set, realizing dynamic adaptation between the reasoning structure and the process stage. This eliminates reliance on fixed rules or static models, improving the adaptability and accuracy of reasoning in complex process flow scenarios. Third, by triggering reasoning tasks and executing multi-path parallel reasoning based on the stage-adapted reasoning structure, and by exploring multiple reasoning paths in parallel and verifying result consistency, it can identify and eliminate contradictory reasoning paths even in cases of incomplete data or uncertain reasoning conditions, generating logically consistent and more interpretable final reasoning results. This enhances the reliability, stability, and decision support capabilities of the digital twin reasoning process in industrial scenarios.

[0056] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A multimodal digital twin reasoning method for industrial scenarios, characterized in that: include, Multimodal data from industrial sites is collected, preprocessed, and then features are extracted. The extracted features are mapped to the attribute space of the digital twin object to construct a multimodal twin state candidate set. Combined with a pre-set process flow model, the corresponding process stage judgment result is generated. Based on the judgment results, a reasoning structure matching the current process stage is selected or reconstructed from the pre-built reasoning structure library, and the multimodal twin state candidate set is fused and updated to generate a digital twin state consistent with the current process stage. Based on the reasoning structure that adapts to the state and stage of the digital twin, the corresponding reasoning task is triggered, multi-path parallel reasoning is executed to generate reasoning results, consistency verification is performed on the reasoning results, self-contradictory reasoning paths are detected in the reasoning process, and the final reasoning result is generated.

2. The multimodal digital twin reasoning method for industrial scenarios as described in claim 1, characterized in that: The process of collecting multimodal data from industrial sites, preprocessing it, extracting features, and mapping the extracted features to the attribute space of the digital twin object to construct a multimodal twin state candidate set includes the following steps: Collect operational status, images, and log data from the industrial site, preprocess the collected data, and generate a preprocessed dataset. Feature extraction is performed on the preprocessed dataset to generate a set of feature vectors containing state features, visual features, and text features; The layout of the industrial site is determined, and a digital twin object of the industrial site is generated. A linear mapping is used to map the feature vector set to the digital twin object of the industrial site, generating a multimodal twin state candidate set.

3. The multimodal digital twin reasoning method for industrial scenarios as described in claim 2, characterized in that: The step of generating the corresponding process stage determination result by combining the preset process flow model includes the following steps: Historical process data from industrial sites is collected, preprocessed, and features related to process stages are extracted. Based on the labeled historical data, a classification algorithm is used to predict the process stage, and the parameters of the process flow model are continuously optimized through k-fold cross-validation. When the number of iterations reaches the preset upper limit, the trained process flow model is obtained. The multimodal twin state candidate set is input into the trained process flow model, and the output is a judgment result containing the features of the current process stage and the judgment confidence.

4. The multimodal digital twin reasoning method for industrial scenarios as described in claim 3, characterized in that: The step of selecting or reconstructing a reasoning structure that matches the current process stage from a pre-built reasoning structure library based on the determination result includes the following steps: Construct a library containing reasoning structures for different process stages. Each reasoning structure includes a process stage identifier, the attribute type of the reasoning object, the reasoning rules, and the organization of the corresponding reasoning path. Based on the determination result of the current process stage, perform a search in the reasoning structure library to retrieve the reasoning structure corresponding to the current process stage; If a reasoning structure that is consistent with the current process stage exists in the reasoning structure library, then that reasoning structure will be loaded as the reasoning structure for the current stage. If there is no inference structure in the inference structure library that is consistent with the current process stage, an inference structure reconstruction operation is performed. The K-nearest neighbor algorithm is used to match the inference structure that best fits the current process stage among the existing inference structures, and the DQN algorithm is used to adjust it to obtain the reconstructed inference structure. The reconstructed inference structure is then loaded as the inference structure for the current process stage.

5. The multimodal digital twin reasoning method for industrial scenarios as described in claim 4, characterized in that: The process of fusing and updating the multimodal twin state candidate set to generate a digital twin state consistent with the current process stage includes the following steps: Using the process stage characteristics and the judgment confidence results as constraints, multiple twin state candidates that reflect the state of the same digital twin object are integrated into a multimodal twin state candidate set. Candidate digital twin states that do not conform to the current process stage are eliminated, and the state representation of the digital twin object is updated based on the integrated candidate digital twin states to generate a digital twin state consistent with the current process stage.

6. The multimodal digital twin reasoning method for industrial scenarios as described in claim 5, characterized in that: The reasoning structure based on digital twin state and stage adaptation triggers the corresponding reasoning task, executes multi-path parallel reasoning to generate reasoning results, including the following steps: Read the digital twin state consistent with the current process stage, including the attribute values, status identifiers, event information, and stage-related constraint markers of the digital twin object; Read and parse the inference structure that matches the current process stage, and determine the set of input variables, the types of inference tasks that can be triggered, and the rules for inference branches for the inference structure; The digital twin state is mapped to the input of the reasoning structure, generating a reasoning context that includes input variable assignments and missing labels; Based on the reasoning structure, a set of reasoning tasks corresponding to the current process stage is generated. For each reasoning task, the dependencies of the reasoning task are extracted from the digital twin state and compared with the reasoning context to generate task triggering conditions. Based on the task triggering conditions, the triggering judgment of the reasoning task is executed. If the triggering conditions are met, the reasoning task is generated. If the triggering conditions are partially met, the completion and downgrade reasoning branches of the reasoning task are triggered and the reasoning task is generated. If the triggering conditions are not met, the reasoning task is not generated and the reason for not triggering is recorded. The generated inference tasks are integrated to generate a set of trigger tasks, and a task execution description is generated for each inference task in the set. Based on the task execution description, the reasoning branch rules corresponding to the reasoning task in the reasoning structure are parsed, and the various reasoning path types that the reasoning task can adopt are determined. For each triggered reasoning task, based on the established reasoning branch rules and the reasoning context, multiple initial reasoning paths are constructed, an initial set of reasoning paths is generated, and a path description is generated for each path. Read the path description of each path from the initial set of inference paths, and create an independent execution unit for each path; On each inference path, the input variables and missing flags of the inference context are read, inference is performed according to the inference link specified by the path, intermediate inference results are generated, and path constraints and path termination checks are performed on the intermediate inference results. If the check passes and the path termination condition is met, a path-level result is generated. If the check fails or the path termination condition is not met, the path is terminated and the path is marked as invalid. For each inference task, the path-level results of all paths under that task are aggregated to generate task-level inference results.

7. The multimodal digital twin reasoning method for industrial scenarios as described in claim 6, characterized in that: The process of performing consistency verification on the reasoning results, detecting self-contradictory reasoning paths during the reasoning process, and generating the final reasoning result includes the following steps: Based on the task-level reasoning results, extract the path-level result list, as well as the path output, key intermediate conclusions, and reasoning trajectory of the corresponding path. Based on the list of path-level results, all path-level results are standardized to generate a consistency check set. Based on the reasoning structure and preset consistency rules, a set of consistency verification criteria is constructed; For the consistency verification set, perform consistency verification according to the consistency verification criterion set, detect whether there are logical conflicts, mutual exclusion conflicts or constraint conflicts between the path output and the key intermediate conclusions, and generate a set of conflict relationships; Based on the set of conflicting relationships, locate the self-contradictory reasoning path, perform elimination and record the cause of conflict, and obtain the set of consistent paths; For each inference task, the path-level results in the consistent path set are aggregated to generate the final inference result; The reasoning process that generates the final reasoning result is encoded and compressed to generate a compressed representation of the reasoning path. The compressed representation of the reasoning path is then processed with a unified structure and uploaded to the database for storage.

8. A multimodal digital twin reasoning system for industrial scenarios, based on the multimodal digital twin reasoning method for industrial scenarios as described in any one of claims 1 to 7, characterized in that: include, The multimodal data acquisition module collects multi-source heterogeneous multimodal data from the industrial site and performs unified preprocessing on the collected data; The twin attribute mapping module extracts key features from preprocessed multimodal data and maps these features to the attribute space of the digital twin object. The reasoning structure matching module, in conjunction with a preset process flow model, determines the current process stage of the industrial object and matches the corresponding reasoning structure. The inference task generation module determines the types of inference tasks to be executed based on the inference structure adapted to the current process stage and in combination with the digital twin state, and generates a set of inference tasks. The inference execution module parses the inference branch rules in the inference structure, constructs multiple inference paths and executes them in parallel, and generates path-level inference results based on the inference links defined by the paths. The inference result generation module performs consistency checks on the inference results of multiple paths and generates the final inference result based on the consistency check results.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that: When the processor executes the computer program, it implements the steps of the multimodal digital twin reasoning method for industrial scenarios as described in any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that: When the computer program is executed by the processor, it implements the steps of the multimodal digital twin reasoning method for industrial scenarios as described in any one of claims 1 to 7.