Topological map and semantic embedding fusion large language model forklift fault diagnosis method

By constructing a topological graph and component semantic vector matrix, and combining it with a large language model, the semantic and topological weights are adaptively adjusted to solve the problems of mechanism topological modeling and fuzzy description in forklift fault diagnosis. This achieves highly accurate and robust fault location and supports localized deployment.

CN122242724APending Publication Date: 2026-06-19SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Filing Date
2026-03-04
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing forklift fault diagnosis technologies fail to effectively utilize the mechanism topology modeling of multi-physics coupling, struggle to handle long-range physical correlations in non-Euclidean space, and lack robustness when faced with fuzzy fault descriptions.

Method used

By constructing a topological map and component semantic vector matrix, and combining it with a large language model, the semantic and topological weights are adaptively adjusted to achieve hybrid kernel fault component sorting and diagnosis driven by the semantic clarity of fault description.

Benefits of technology

It improves the accuracy and robustness of forklift fault diagnosis, reduces the impact of noise, ensures the physical and logical consistency of diagnostic recommendations, and supports full-process localized deployment to avoid data privacy risks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242724A_ABST
    Figure CN122242724A_ABST
Patent Text Reader

Abstract

This invention discloses a forklift fault diagnosis method using a large language model that integrates topological mapping and semantic embedding fusion. The method includes: acquiring a mechanistic topological map and component semantic vector matrices, and calculating a topological distance matrix; receiving natural language fault descriptions, extracting features from the natural language fault descriptions, and calculating a semantic clarity index; constructing a mapping relationship between the semantic clarity index and fusion weights; adaptively adjusting the relative weights of semantic space relevance and mechanistic topological space proximity in reasoning based on the clarity of the natural language fault descriptions; encoding the fault descriptions into fault semantic vectors, calculating the semantic similarity and topological affinity of each component, and linearly fusing them according to weights to obtain a comprehensive score; ranking the components to generate a sequence of suspected faulty components; assembling the fault descriptions, the sequence of suspected faulty components, and their topological relationships into a structured input to generate forklift fault diagnosis conclusions and troubleshooting suggestions. This invention can improve the accuracy and robustness of forklift fault diagnosis.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method for diagnosing forklift faults using a large language model that integrates topological graphs and semantic embedding, belonging to the field of intelligent diagnosis and maintenance technology for industrial vehicles. Background Technology

[0002] Industrial forklifts are core material handling equipment in warehousing, logistics, port transshipment, and discrete manufacturing scenarios. Their operating conditions are complex and variable, and the system exhibits highly electromechanical-hydraulic coupling characteristics. During long-term service, frequent faults occur, such as insufficient power in the lifting mechanism, abnormal vibration in the travel system, and blockage in the steering hydraulic circuit, severely impacting production efficiency and operational safety. Existing technologies for fault diagnosis of such complex electromechanical systems mainly follow three approaches, but all have significant limitations in practical applications.

[0003] Specifically, for diagnostic technologies based on deterministic rules and expert knowledge, the methods mainly rely on fault tree analysis (FTA) or static expert systems. Maintenance personnel need to follow the pre-defined tree branches of "fault phenomenon - logic gate - root cause" to troubleshoot step by step based on paper technical documents or fixed electronic manuals. However, this method has significant semantic gaps. For example, front-line operators usually use unstructured and vague natural language to describe faults, which is difficult to directly map to the fault codes or logic nodes precisely defined in the expert system. At the same time, its knowledge base construction relies on manual input, and the updates and iterations are lagging behind, making it difficult to adapt to the state evolution of the equipment throughout its entire life cycle. Then, for data-driven diagnostic technology based on multi-sensor signal processing, the method deploys sensor arrays such as vibration, pressure, and current to collect time series signals and build statistical models or deep learning models (such as CNN and RNN) for pattern recognition. Although it has high accuracy in classifying specific typical faults, its limitations lie in modal singularity and hardware dependence. For example, this method is difficult to effectively utilize unstructured data rich in semantic information, such as maintenance work orders and log texts. Furthermore, it has extremely high requirements for sensor placement, signal-to-noise ratio, and data completeness. In actual working conditions where sensors are missing or signals drift, the robustness of the model drops sharply. Then, for generative diagnostic technology based on general large language models, with the development of generative artificial intelligence, directly using large language models (LLM) for fault question answering has become a new trend. However, general LLM is essentially a text generator based on probability statistics, which lacks explicit modeling of the physical entities and mechanistic logic of industrial forklift equipment. Without the constraints of domain knowledge graph, the model is prone to hallucination, that is, generating diagnostic suggestions that are semantically fluent but physically infeasible or violate causal logic, which seriously affects the credibility and security of diagnostic conclusions.

[0004] Furthermore, from a mathematical perspective, existing adaptive diagnostic algorithms suffer from theoretical flaws. Specifically, in algorithms such as Bayesian optimization or Gaussian Process Regression (GPR), the core kernel function is typically defined in Euclidean space (e.g., the radial basis function RBF), relying solely on the Euclidean distance or inner product between parameter vectors to measure similarity. This metric cannot characterize the complex non-Euclidean topology of industrial forklift equipment, i.e., the mechanistic relationships between physically connected but spatially distant components. Simultaneously, existing algorithms lack a mechanism for perceiving the information entropy of input information, failing to adaptively adjust the fusion weights of semantic and mechanistic priors based on the semantic clarity of the user's description, resulting in insufficient flexibility when processing multi-source heterogeneous information.

[0005] Therefore, existing forklift fault diagnosis technologies have several problems, such as not explicitly modeling the mechanism topology of multi-physics coupling, difficulty in handling long-range physical correlations in non-Euclidean space, and insufficient system robustness due to the use of static fusion strategies when facing fuzzy fault descriptions. Summary of the Invention

[0006] This invention provides a method for diagnosing forklift faults using a large language model that integrates topological graphs and semantic embeddings, aiming to solve at least one of the technical problems existing in the prior art.

[0007] The technical solution of this invention relates to a forklift fault diagnosis method based on a large language model that integrates topological graphs and semantic embedding fusion. The method according to this invention includes the following steps:

[0008] S1. Obtain the offline constructed mechanism topology map and component semantic vector matrix;

[0009] Specifically, the mechanism topology map is constructed based on the physical connection relationship of forklift components and the topology distance matrix is ​​calculated. At the same time, the semantic vector matrix of the components is obtained by encoding the enhanced text of the components.

[0010] S2. Receive the natural language fault description input by the user during online diagnosis, extract component features, system keyword features and fuzzy word features from the natural language fault description, and calculate the semantic clarity index;

[0011] S3. Construct a dynamic mapping relationship between semantic clarity index and fusion weight; based on the clarity of the natural language fault description, adaptively adjust the relative weights of semantic space relevance and mechanism topological space proximity in reasoning, so as to perform adaptive dynamic calculation of semantic and topological weights.

[0012] S4. Encode the fault description into a fault semantic vector, calculate the semantic similarity and topological affinity of each component, and linearly fuse them according to weights to obtain a comprehensive score. Sort the components to generate a sequence of suspected faulty components, so as to obtain a hybrid kernel faulty component ranking of topological and semantic components.

[0013] S5. Assemble the fault description, suspected fault component sequence and their topological relationship into a structured input, input it into the large language model to generate forklift fault diagnosis conclusions and troubleshooting suggestions, so as to obtain the large language model diagnosis results.

[0014] Furthermore, in step S1, the mechanism topology map is constructed based on the physical connection relationships of device components, wherein the component node set Represented as:

[0015]

[0016] In the formula, This represents the set of nodes in the graph; Represents a set of nodes The node with index i in the set N is a positive integer where 1 ≤ i ≤ N; N represents the set of nodes. The total number of nodes.

[0017] The physical connection relationships between components are denoted as a weighted edge set. , build with Weighted topological graph based on Based on the aforementioned mechanism topology graph, the path length between any two nodes is calculated using the shortest path algorithm, resulting in the topological distance matrix, which is represented as follows:

[0018]

[0019] In the formula, Represents the topological distance matrix; elements Indicates the first Nodes With the Nodes The shortest path length between them; This represents an N×N dimensional real matrix.

[0020] Furthermore, in step S1, an enhanced text description is constructed for each component, including system information, functional description, relationships between upstream and downstream components, and typical fault phenomena. Using pre-trained semantic embedding models Each enhanced text is encoded to obtain a component semantic vector, which is represented as follows:

[0021]

[0022] In the formula, Indicates the first Semantic vectors of each component ;

[0023] All component semantic vectors are stacked to form the component semantic vector matrix, which is represented as follows:

[0024]

[0025] In the formula, Represents the semantic vector matrix of components. Represents a node The corresponding component semantic vector, and the semantic vector With node set Middle node One-to-one correspondence, therefore The number of rows N and the set of nodes The total number of nodes is the same; This represents an N×d dimensional real matrix.

[0026] Furthermore, in step S2, the input forklift fault description text is received. Then, the text is segmented and matched with keywords to obtain:

[0027] Number of hits The number of hits includes the total number of hit component names, system names, and condition trigger words.

[0028] The number of ambiguous words in the text and total word count Thus, the proportion of fuzzy words can be obtained. It is represented as follows:

[0029]

[0030] The text length feature L, and the normalization function set by the predefined normalization function. Obtain the length normalization index It is represented as follows:

[0031]

[0032] The number of hits The proportion of fuzzy words With the length normalization index Linear weighting is performed to obtain the semantic clarity score. It includes the following steps:

[0033] First, normalize the number of hits to convert it into a hit ratio. Its representation is as follows:

[0034]

[0035] In the formula, The preset maximum number of hits is used to normalize the number of hits to the [0,1] interval, and then the semantic clarity score is calculated according to the following formula:

[0036]

[0037] In the formula, This indicates the feature weights of keyword hit rates for components, systems, and operating conditions. The weights of the features representing the proportion of fuzzy expressions are indicated. Represents the text length normalized feature weights. and .

[0038] Furthermore, in step S4, the semantic embedding model is utilized. Fault description text Encoded as fault semantic vectors Calculate the fault semantic vector and the semantic vectors of each component. The cosine similarity between them yields the semantic similarity sequence. It is represented as follows:

[0039]

[0040] In the formula, This represents the fault semantic vector obtained by encoding the fault description text using a semantic embedding model. ; i represents the index of a node component in the node set 𝑉, where i is a positive integer and 1 ≤ i ≤ N. This represents the node with index i. ;

[0041] The component with the highest similarity from the semantic similarity sequence is selected as the semantic anchor component. It is represented as follows:

[0042]

[0043] Extract the topological distances from the anchor point component to each component based on the topological distance matrix. And calculate the topological affinity sequence based on the exponential decay function of topological distance. It is represented as follows:

[0044]

[0045] Using semantic kernel weight parameters Constructing a hybrid topological and semantic kernel score using convex combination form. It is represented as follows:

[0046]

[0047] For all components according to Sort the components in descending order and select a preset number of components from the sorted results to form a set of suspected faulty components.

[0048] Furthermore, in step S3, the semantic kernel weight parameter mapping function The parameters were obtained offline on a historical fault sample set by a Bayesian optimization algorithm based on Gaussian process regression, with the weighted sum of Top-K hit rate and average reciprocal ranking (MRR) as the optimization objective function.

[0049] Furthermore, in step S5, the fault description text, the set of suspected faulty components and the system to which each component belongs, and the topological distance relative to the semantic anchor component are included. Its corresponding hybrid kernel score The prompts are arranged into hierarchical and structured words. These prompts are then input into a locally deployed large language model. Under the prior constraints provided by the topology graph and semantic embedding, the large language model generates a fault mechanism analysis for the fault description, a priority troubleshooting order based on the mechanism path, and specific maintenance suggestions, which are then output as diagnostic results.

[0050] The present invention also relates to a computer-readable storage medium having program instructions stored thereon, which, when executed by a processor, implement the above-described method.

[0051] The technical solution of the present invention also relates to a forklift fault diagnosis system based on a large language model of topological graph and semantic embedding fusion, the system including a computer device, the computer device including the aforementioned computer-readable storage medium.

[0052] The technical solution of the present invention also relates to a forklift fault diagnosis system based on a large language model using topological graphs and semantic embedding fusion, the system comprising:

[0053] A dual-space construction module is used to build the device's mechanistic topology map and component semantic vector matrix;

[0054] The semantic awareness module is used to quantify the semantic clarity score of the fault description text input by the user;

[0055] An adaptive inference engine is used to dynamically calculate semantic kernel weight parameters based on semantic clarity scores, and to score and rank the fault relevance of all parts in the map based on a hybrid topological and semantic kernel function, thereby filtering a set of suspected faulty parts.

[0056] The generative diagnostic module encapsulates fault descriptions and sets of suspected faulty components into structured prompt words, driving a large language model to generate diagnostic reports.

[0057] The beneficial effects of this invention are as follows:

[0058] This invention relates to a forklift fault diagnosis method and system based on a large language model that integrates topological graphs and semantic embedding. It utilizes semantic associations and mechanistic topology at the kernel function level and improves the accuracy and robustness of forklift fault diagnosis through a clarity-driven adaptive weighting mechanism.

[0059] This invention is robust and has excellent noise resistance. It integrates topological graphs and semantic embedding within a unified framework, and the weights are adaptively adjusted by clarity. Compared with methods that rely solely on semantics or fixed weights, it can still maintain a high recall rate for key components even when faced with low-quality input conditions such as vague descriptions from frontline personnel or missing component names.

[0060] This invention offers precise location and a rigorous measurement system. It achieves mathematical alignment between non-Euclidean topological space and Euclidean semantic space through heterogeneous hybrid kernel functions. On the forklift fault sample set, its Top-K hit rate and average reciprocal ranking are superior to existing baseline methods, significantly improving the accuracy of fault location.

[0061] This invention is highly interpretable and physically and logically consistent. It significantly reduces the LLM illusion through topological constraint prompts, ensuring that diagnostic recommendations strictly conform to the physical structure and causal transmission logic of the device.

[0062] This invention offers flexible engineering deployment and controllable data privacy. Its system supports full-process localized deployment, completely avoiding the security risks of industrial data going to the cloud. Furthermore, it adopts a decoupled architecture design, which only requires replacing the underlying graph to achieve low-cost migration and reuse across devices. Attached Figure Description

[0063] Figure 1 This is a flowchart of a method according to an embodiment of the present invention.

[0064] Figure 2 This is a block diagram illustrating the principle of heterogeneous dual space and topological and semantic hybrid kernel computation according to an embodiment of the present invention. Detailed Implementation

[0065] The following will provide a clear and complete description of the concept, specific structure, and technical effects of the present invention in conjunction with the embodiments and accompanying drawings, so as to fully understand the purpose, solution, and effects of the present invention.

[0066] It should be noted that, unless otherwise specified, when a feature is referred to as "fixed" or "connected" to another feature, it can be directly fixed or connected to the other feature, or indirectly fixed or connected to the other feature. The singular forms "a," "described," and "the" used herein are also intended to include the plural forms, unless the context clearly indicates otherwise. Furthermore, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in this specification is for the purpose of describing particular embodiments only and not for limiting the invention. The term "and / or" as used herein includes any combination of one or more of the associated listed items.

[0067] It should be understood that although the terms first, second, third, etc., may be used to describe various elements in this disclosure, these elements should not be limited to these terms. These terms are used only to distinguish elements of the same type from one another. For example, a first element may also be referred to as a second element without departing from the scope of this disclosure, and similarly, a second element may also be referred to as a first element. Any and all instances or exemplary language (“e.g.,” “such as,” etc.) provided herein are intended only to better illustrate embodiments of the invention and, unless otherwise required, do not impose a limitation on the scope of the invention.

[0068] Reference Figures 1 to 2 In some embodiments, the forklift fault diagnosis method based on a large language model using topological graphs and semantic embedding fusion according to the present invention includes at least the following steps:

[0069] S1. Obtain the offline constructed mechanism topology map and component semantic vector matrix;

[0070] Among them, a mechanism topology map is constructed based on the physical connection relationship of forklift components and the topology distance matrix is ​​calculated. At the same time, the semantic vector matrix of the components is obtained by encoding the enhanced text of the components.

[0071] S2. Receive the natural language fault description input by the user during online diagnosis, extract component features, system keyword features and fuzzy word features from the natural language fault description, and calculate the semantic clarity index.

[0072] S3. Construct a dynamic mapping relationship between semantic clarity index and fusion weight; based on the clarity of natural language fault description, adaptively adjust the relative weight of semantic space relevance and mechanism topological space proximity in reasoning, so as to perform adaptive dynamic calculation of semantic and topological weights.

[0073] S4. Encode the fault description into a fault semantic vector, calculate the semantic similarity and topological affinity of each component, and linearly fuse them according to weights to obtain a comprehensive score. Sort the components to generate a sequence of suspected faulty components, so as to obtain a hybrid kernel faulty component ranking of topological and semantic components.

[0074] S5. Assemble the fault description, suspected fault component sequence and their topological relationship into a structured input, input it into the large language model to generate forklift fault diagnosis conclusions and troubleshooting suggestions, so as to obtain the large language model diagnosis results.

[0075] To address the limitations of existing fault diagnosis systems for complex electromechanical systems, this invention proposes a novel fault diagnosis method that deeply integrates the graph structure prior of mechanistic topology with the semantic embedding representation of a large language model, and achieves a unity of physical interpretability and semantic understanding through a formalized heterogeneous hybrid kernel function and adaptive weight adjustment mechanism.

[0076] The fault diagnosis method of this invention integrates topological graphs and semantic embeddings through a unified framework, and adaptively adjusts weights driven by clarity, thereby exhibiting excellent robustness and noise resistance, maintaining high recall even with low-quality input. This invention also achieves mathematical alignment between the topological and semantic spaces through heterogeneous hybrid kernel functions, resulting in accurate fault location and a rigorous measurement system, with evaluation metrics superior to existing methods. Furthermore, this invention introduces topological constraint prompts to enhance interpretability and ensure the physical and logical consistency of diagnostic suggestions. The system supports end-to-end local deployment to guarantee data privacy and security, and employs a decoupled architecture design, enabling low-cost migration and reuse across devices simply by replacing the underlying graph.

[0077] In some embodiments of this invention, a mechanistic topology map and a component semantic vector matrix are constructed. Specifically, a mechanistic topology map is constructed based on the physical connection relationships of the forklift's hydraulic, traveling, steering, and electrical systems, and a topological distance matrix representing physical reachability is generated using a shortest path algorithm. Simultaneously, deep encoding is performed on the enhanced text of components, including system affiliation, functional definitions, and typical failure modes, to construct a high-dimensional component semantic vector matrix, achieving a dual representation of the equipment's physical mechanism and semantic knowledge.

[0078] In some specific embodiments of the present invention, the mechanistic topology graph is constructed based on the physical connection relationships of device components, mapping the components as a set of graph nodes. It is represented as follows:

[0079]

[0080] In the formula, Indicates the first Each graph node Represents the set of nodes The total number of nodes in;

[0081] The physical connections between components are denoted as a weighted edge set. , build with Weighted topological graph based on Based on the aforementioned mechanism and topology graph, the path length between any two nodes is calculated using the shortest path algorithm, resulting in a topological distance matrix, which is represented as follows:

[0082]

[0083] In the formula, Represents the topological distance matrix; elements Indicates the first Nodes With the Nodes The shortest path length between them; This represents an N×N dimensional real matrix.

[0084] Furthermore, an enhanced text description is constructed for each component, including system information, functional specifications, relationships with upstream and downstream components, and typical fault phenomena. Using pre-trained semantic embedding models Each enhanced text is encoded to obtain a component semantic vector, which is represented as follows:

[0085]

[0086] In the formula, Indicates the first Semantic vectors of each component;

[0087] The component semantic vectors are stacked to form a component semantic vector matrix, which is represented as follows:

[0088]

[0089] In the formula, Represents the semantic vector matrix of components. Represents a node The corresponding component semantic vector, and the semantic vector With node set Middle node One-to-one correspondence, therefore The number of rows N and the set of nodes The total number of nodes is the same; This represents an N×d dimensional real matrix.

[0090] In some embodiments of the present invention, the present invention realizes multi-dimensional quantification of semantic clarity of fault description. Specifically, after receiving the user's natural language fault description, the hit density of domain entity keywords, the distribution ratio of fuzzy modifiers and the normalized features of text length in real time are extracted through feature engineering. The semantic clarity score that represents the uncertainty of input information is calculated, providing a quantitative basis for the adjustment of subsequent reasoning strategies.

[0091] In some specific embodiments of the present invention, a forklift malfunction description text input by the user is received. Next, the text is first segmented and matched with keywords to obtain:

[0092] ① Total number of hit component names, system names, and condition trigger words ,

[0093] ② The number of vague terms in the text and total word count Thus, the proportion of fuzzy words can be obtained. It is represented as follows:

[0094]

[0095] ③ Text length feature L, and normalization function Obtain the length normalization index It is represented as follows:

[0096]

[0097] Then, the number of hits , proportion of fuzzy words With length index Linear weighting is performed to obtain the semantic clarity score. Specifically, the number of hits is first normalized to a hit ratio. Its representation is as follows:

[0098]

[0099] In the formula, The maximum number of hits is preset and used to normalize the number of hits to the [0,1] interval. Then, the semantic clarity score is calculated using the following formula:

[0100]

[0101] In the formula, This indicates the feature weight of the keyword hit rate for components / systems. The weights of the features representing the proportion of fuzzy expressions are indicated. Represents the text length normalized feature weights. and This results in a semantic clarity score. It can comprehensively reflect the professionalism and clarity of the fault description.

[0102] In some embodiments of this invention, the semantic and topological weights are adaptively and dynamically calculated. Specifically, a nonlinear monotonic mapping mechanism based on clarity scores is established to dynamically solve the semantic kernel weight parameters. Specifically, semantic weights are automatically increased in cases of precise description to leverage the direct matching advantage of the Vector Space Model (VSM), while topological weights are automatically increased in cases of ambiguous description or missing terms to utilize physical proximity for reasoning compensation, thus achieving adaptive dynamic fusion of the two types of prior knowledge during the reasoning process.

[0103] In some specific embodiments of the present invention, the semantic kernel weight parameter mapping function The parameters were obtained offline on a historical fault sample set by a Bayesian optimization algorithm based on Gaussian process regression, with the weighted sum of Top-K hit rate and average reciprocal ranking (MRR) as the optimization objective function.

[0104] In some embodiments of the present invention, the present invention generates a fault source ranking based on a heterogeneous hybrid kernel. Specifically, the fault description is mapped to a query vector to determine semantic anchors, and the topological affinity is calculated by combining the geodesic distance of the anchors in the topological graph. A topological-semantic heterogeneous hybrid kernel function is constructed through a convex combination form to calculate the comprehensive relevance score of all nodes in the graph, and a set of suspicious components containing potential fault sources and their propagation paths is selected accordingly.

[0105] In some specific embodiments of the present invention, the semantic embedding model described in step S1 is utilized. Fault description text Encoded as fault semantic vectors Calculate the fault semantic vector and the semantic vectors of each component. The cosine similarity between them yields the semantic similarity sequence. It is represented as follows:

[0106]

[0107] In the formula, This represents the fault semantic vector obtained by encoding the fault description text using a semantic embedding model. ; i represents the index of a node component in the node set 𝑉, where i is a positive integer and 1 ≤ i ≤ N. This represents the node with index i. ;

[0108] The component with the highest similarity from the semantic similarity sequence is selected as the semantic anchor component. It is represented as follows:

[0109]

[0110] Extract the topological distances from the anchor point component to each component based on the topological distance matrix. And calculate the topological affinity sequence based on the exponential decay function of topological distance. It is represented as follows:

[0111]

[0112] Using the semantic kernel weight parameters from step S3 Constructing a hybrid topological and semantic kernel score using convex combination form. It is represented as follows:

[0113]

[0114] For all components according to Sort the components in descending order and select a preset number of components from the sorted results to form a set of suspected faulty components.

[0115] In some embodiments of the present invention, the present invention uses generative diagnosis of a large language model under topological constraints. Specifically, the fault description, the set of suspicious components selected and their corresponding topological paths are encapsulated into a structured prompt template and input into a locally deployed large language model. By utilizing the generative reasoning capability of the model, under the dual constraints of physical topological reachability and semantic relevance, illusory information that violates the mechanism is eliminated, and a diagnostic report containing mechanism analysis, troubleshooting order and maintenance suggestions is generated.

[0116] In some specific embodiments of the present invention, the fault description text, the set of suspected faulty components and the system to which each component belongs, and the topological distance relative to the semantic anchor component are included. Its corresponding hybrid kernel score The prompts are arranged into hierarchical and structured words. These prompts are then input into a locally deployed large language model. Under the prior constraints provided by the topology graph and semantic embedding, the large language model generates a fault mechanism analysis for the fault description, a priority troubleshooting order based on the mechanism path, and specific maintenance suggestions, which are then output as diagnostic results.

[0117] Reference Figures 1 to 2In some embodiments, the forklift fault diagnosis system based on topological graph and semantic embedding fusion according to the present invention includes a dual-space construction module, a semantic perception module, an adaptive inference engine, and a generative diagnosis module. The dual-space construction module is used to establish the device's mechanistic topological graph and component semantic vector matrix; the semantic perception module is used to quantify the semantic clarity score of the fault description text input by the user; the adaptive inference engine is used to dynamically calculate semantic kernel weight parameters based on the semantic clarity score, and to score and rank the fault relevance of all components in the graph based on the topological and semantic hybrid kernel function, thereby filtering a set of suspected faulty components; the generative diagnosis module is used to encapsulate the fault description and the set of suspected faulty components into structured prompt words, driving the large language model to generate a diagnostic report.

[0118] In some embodiments of the present invention, see Figure 2 The diagram illustrates the architecture of heterogeneous dual space and topological and semantic hybrid kernel computing principles. Specifically, the heterogeneous dual space and topological and semantic hybrid kernel computing of this invention includes at least the following core functional modules:

[0119] F1. Construct a semantic embedding space using a dual-space construction module, and utilize a pre-trained semantic model to interpret the fault description input by the user. Encoded as a high-dimensional query vector And calculate its relationship with the vectors of each component in the component library. cosine similarity It provides semantic-dimensional relevance metrics;

[0120] F2. Run the semantic clarity assessment unit in the semantic awareness module to quantitatively assess the ambiguity of the fault description and dynamically output adaptive weight parameters to adjust the inference strategy. ;

[0121] F3. Construct a mechanistic topology space using a dual-space construction module, based on the shortest path matrix built offline. Extract the physical topological distance from the semantic anchor point to the target component. And it is converted into topological affinity through an exponential decay function. It provides a physical dimension of accessibility.

[0122] F4. Perform topological and semantic hybrid kernel computation through the adaptive inference engine, receiving the above semantic similarity, topological affinity and adaptive weights, and complete the weighted fusion of heterogeneous features through convex combination formula;

[0123] F5. Generate a comprehensive score sequence through the generative diagnostic module. Based on the scores calculated by the hybrid kernel, the components in the entire image are sorted in descending order, and a set of suspected faulty components with high confidence is selected, providing structured input for subsequent large model inference.

[0124] In some embodiments of this invention, the experimental environment and parameters for the examples of this invention are as follows: an 80-core server-grade CPU with a main frequency of 2.10 GHz is used, equipped with four NVIDIA GeForce RTX 2080 Ti graphics cards, running a Linux operating system, and offline evaluation of topology map construction, semantic embedding calculation, and diagnostic algorithms is completed in a PyTorch-based machine learning environment (torch 2.5.1, torchaudio 2.5.1, torchvision 0.20.1). The forklift component information, fault samples, and evaluation datasets used in this invention are all collected, organized, and labeled manually in actual application scenarios.

[0125] This is illustrated with a specific embodiment. The offline construction of the topology map and semantic embedding of the present invention includes at least the following steps:

[0126] A1. Collect the complete vehicle drawings and maintenance manuals for the target forklift model, establish a standard parts list, and complete the system division.

[0127] A2. Establish a node set based on the connection relationships of the hydraulic circuit, transmission chain, and signal harness. With edge set The edge weights can be taken uniformly. It can also be adjusted according to the differences in connection strength.

[0128] A3. Use Dijkstra, Floyd-Warshall, or other equivalent graph shortest path algorithms to calculate the shortest path for all node pairs in one go, and obtain the topological distance matrix. .

[0129] A4. Write an enhanced description for each component, including its functionality and fault characteristics. The semantic embedding model BAAI / bge-small-zh-v1.5 is invoked to generate feature vectors. Stacking forms a semantic matrix .

[0130] This is illustrated with a specific embodiment. The fault description semantic clarity and adaptive weighting of the present invention includes at least the following steps:

[0131] B1. Maintain a dictionary of component names, system names, and operating condition trigger words in the system, and build a dictionary of fuzzy expressions that include words like "seems," "feels," "somewhat," and "occasionally" to express uncertainty.

[0132] B2. When the user inputs a fault description At that time, word segmentation and matching are performed to obtain... , , and Calculated according to the aforementioned formula , , and .

[0133] B3. Set the relevant parameters for the instance as shown in Table 1.

[0134] Table 1. Configuration of Core Parameters for Adaptive Fusion Large Language Model for Forklift Fault Diagnosis

[0135] Parameter categories Parameter name Parameter value Function Description Adaptive fusion weight mapping parameters 0.2 lower bound of semantic weight Adaptive fusion weight mapping parameters 0.9 upper bound of semantic weight Semantic clarity feature weight parameters 0.5 Keyword Hit Rate Feature Weights for Components / Systems Semantic clarity feature weight parameters 0.3 Fuzzy expression word proportion feature weight Semantic clarity feature weight parameters 0.2 Text length normalized feature weights

[0136] B3. Calculate the semantic clarity score. For different forklift models or different fault datasets, the parameters in the table can be automatically calibrated using methods such as Bayesian optimization to achieve better diagnostic performance.

[0137] This is illustrated with a specific embodiment. The topological and semantic hybrid kernel reasoning and large language model diagnosis of the present invention includes at least the following steps:

[0138] C1. Call the embedded model to obtain the fault description vector. and the component vector matrix Calculate semantic similarity for each vector Select semantic anchor components Extract from the topological distance matrix Calculate topological affinity According to adaptive weights Get the overall score .

[0139] C2. Press all components Sort, take the first Each component is considered as a set of suspected faulty components, and the mechanistic paths from anchor points to these components can be optionally extracted as explanatory subgraphs.

[0140] C3. Describe the fault Anchor point components Names, systems, and topological distances of Top-K suspicious components and scores The structured prompts, along with instructions such as "provide fault cause analysis and troubleshooting steps," are input into a locally deployed large language model, such as Qwen2.5-7B-Instruct. This model generates a diagnostic report for maintenance personnel and presents it to the user through the Streamlit visual interactive terminal.

[0141] Experimental testing and effectiveness verification were conducted on this invention. Specifically, to verify the effectiveness of the method of this invention, 30 typical forklift failure samples were selected on the system constructed in the above three specific embodiments. Each sample contains a natural language description and a corresponding set of target components. The evaluation metrics were Top-5 hit rate and average reciprocal ranking (MRR).

[0142] Comparing four methods on the same dataset: Method 1, Semantic only: based solely on semantic similarity. Sorting; Method 2: Topo only: sorting by topological affinity only Sorting; Method 3: Fixed fusion: using fixed weights ,use Method 4: Clarity-adaptive fusion (this invention): Calculated according to Example 2. Then use .

[0143] The results of the above four methods on 30 samples are shown in Table 2.

[0144] Table 2 Performance comparison of different methods on 30 samples (Top-5)

[0145] method illustrate Top-5 accuracy MRR Semantic only semantic similarity only 0.333 0.193 Topo only Topological affinity only 0.367 0.178 Fixed fusion Fixed weight fusion 0.367 0.188 Clarity-adaptive fusion This invention is adaptive. 0.933 0.798

[0146] As can be seen, the method of this invention improves the Top-5 hit rate from about 0.33–0.37 to 0.93 and the MRR from about 0.18 to about 0.80 on the same dataset, which is significantly better than semantic-only, topology-only, and fixed-weight fusion schemes. This shows that the adaptive topology-semantic fusion mechanism based on semantic clarity can effectively improve the accuracy and robustness of forklift fault diagnosis.

[0147] It should be understood that the method steps in the embodiments of the present invention can be implemented or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer-readable storage medium. The method can use standard programming techniques. Each program can be implemented in a high-level procedural or object-oriented programming language to communicate with the computer system. However, if necessary, the program can be implemented in assembly or machine language. In any case, the language can be a compiled or interpreted language. Furthermore, for this purpose, the program can run on a programmed application-specific integrated circuit (ASIC).

[0148] Furthermore, the procedures described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by the context. The procedures described herein (or variations and / or combinations thereof) may be executed under the control of one or more computer systems configured with executable instructions, and may be implemented by hardware or a combination thereof as code (e.g., executable instructions, one or more computer programs, or one or more applications) that commonly executes on one or more processors. The computer program comprises a plurality of instructions executable by one or more processors.

[0149] Furthermore, the method can be implemented in any suitable type of computing platform, including but not limited to personal computers, minicomputers, mainframes, workstations, networked or distributed computing environments, standalone or integrated computer platforms, or in communication with charged particle tools or other imaging devices, etc. Aspects of the invention can be implemented as machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and / or write storage medium, RSM, ROM, etc., such that it can be read by a programmable computer, and when the storage medium or device is read by the computer, it can be used to configure and operate the computer to perform the processes described herein. Furthermore, the machine-readable code, or portions thereof, can be transmitted via wired or wireless networks. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media comprises instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. When programmed according to the methods and techniques described in the invention, the invention may also include the computer itself.

[0150] A computer program can be applied to input data to perform the functions described herein, thereby transforming the input data to generate output data stored in non-volatile memory. The output information can also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects generated on the display.

[0151] The above description is merely a preferred embodiment of the present invention. The present invention is not limited to the above-described embodiments. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention, as long as they achieve the technical effects of the present invention by the same means, should be included within the scope of protection of the present invention. Within the scope of protection of the present invention, the technical solutions and / or implementation methods can have various modifications and variations.

Claims

1. A forklift fault diagnosis method based on a large language model fusion of topological graphs and semantic embedding, characterized in that, The method includes the following steps: S1. Obtain the offline constructed mechanism topology map and component semantic vector matrix; Specifically, the mechanism topology map is constructed based on the physical connection relationship of forklift components and the topology distance matrix is ​​calculated. At the same time, the semantic vector matrix of the components is obtained by encoding the enhanced text of the components. S2. Receive the natural language fault description input by the user during online diagnosis, extract component features, system keyword features and fuzzy word features from the natural language fault description, and calculate the semantic clarity index; S3. Construct a dynamic mapping relationship between semantic clarity index and fusion weight; based on the clarity of the natural language fault description, adaptively adjust the relative weights of semantic space relevance and mechanism topological space proximity in reasoning, so as to perform adaptive dynamic calculation of semantic and topological weights. S4. Encode the fault description into a fault semantic vector, calculate the semantic similarity and topological affinity of each component, and linearly fuse them according to weights to obtain a comprehensive score. Sort the components to generate a sequence of suspected faulty components, so as to obtain a hybrid kernel faulty component ranking of topological and semantic components. S5. Assemble the fault description, suspected fault component sequence and their topological relationship into a structured input, input it into the large language model to generate forklift fault diagnosis conclusions and troubleshooting suggestions, so as to obtain the large language model diagnosis results.

2. The method according to claim 1, characterized in that, In step S1 The mechanistic topology map is constructed based on the physical connection relationships of device components, wherein the component node set Represented as: in, Represents the set of nodes in the graph; Represents the set of nodes The node with index i in the middle, where i is a positive integer and 1 ≤ i ≤ 1. ; Represents the set of nodes The total number of nodes in the system. The physical connection relationships between components are denoted as a weighted edge set. , build with Weighted topological graph based on Based on the aforementioned mechanism topology graph, the path length between any two nodes is calculated using the shortest path algorithm, resulting in the topological distance matrix, which is represented as follows: In the formula, Represents the topological distance matrix; elements Indicates the first Nodes With the Nodes The shortest path length between them; This represents an N×N dimensional real matrix.

3. The method according to claim 2, characterized in that, In step S1 For each component, an enhanced text description is constructed that includes system information, functional specifications, relationships with upstream and downstream components, and typical fault phenomena. Using pre-trained semantic embedding models Each enhanced text is encoded to obtain a component semantic vector, which is represented as follows: In the formula, Indicates the first Semantic vectors of each component ; All component semantic vectors are stacked to form the component semantic vector matrix, which is represented as follows: In the formula, Represents the semantic vector matrix of components. Represents a node The corresponding component semantic vector, and the semantic vector With node set Middle node One-to-one correspondence, therefore The number of rows N and the set of nodes The total number of nodes is the same; This represents an N×d dimensional real matrix.

4. The method according to claim 2, characterized in that, In step S2 Received forklift fault description text Then, the text is segmented and matched with keywords to obtain: Number of hits The number of hits includes the total number of hit component names, system names, and condition trigger words. The number of ambiguous words in the text and total word count Thus, the proportion of fuzzy words can be obtained. It is represented as follows: The text length feature L, and the normalization function set by the predefined normalization function. Obtain the length normalization index It is represented as follows: The number of hits The proportion of fuzzy words With the length normalization index Linear weighting is performed to obtain the semantic clarity score. It includes the following steps: First, normalize the number of hits to convert it into a hit ratio. Its representation is as follows: In the formula, The preset maximum number of hits is used to normalize the number of hits to the [0,1] interval, and then the semantic clarity score is calculated according to the following formula: In the formula, This indicates the feature weights of keyword hit rates for components, systems, and operating conditions. The weights of the features representing the proportion of fuzzy expressions are indicated. Represents the text length normalized feature weights. and .

5. The method according to claim 4, characterized in that, In step S4 Using the semantic embedding model Fault description text Encoded as fault semantic vectors Calculate the fault semantic vector and the semantic vectors of each component. The cosine similarity between them yields the semantic similarity sequence. It is represented as follows: In the formula, This represents the fault semantic vector obtained by encoding the fault description text using a semantic embedding model. ; i represents the index of a node component in the node set 𝑉, where i is a positive integer and 1 ≤ i ≤ N. This represents the node with index i. ; The component with the highest similarity from the semantic similarity sequence is selected as the semantic anchor component. It is represented as follows: Extract the topological distances from the anchor point component to each component based on the topological distance matrix. And calculate the topological affinity sequence based on the exponential decay function of topological distance. It is represented as follows: Using semantic kernel weight parameters Constructing a hybrid topological and semantic kernel score using convex combination form. It is represented as follows: For all components according to Sort the components in descending order and select a preset number of components from the sorted results to form a set of suspected faulty components.

6. The method according to claim 5, characterized in that, In step S3 The semantic kernel weight parameter mapping function The parameters were obtained offline on a historical fault sample set by a Bayesian optimization algorithm based on Gaussian process regression, with the weighted sum of Top-K hit rate and average reciprocal ranking (MRR) as the optimization objective function.

7. The method according to claim 5, characterized in that, In step S5 The fault description text, the set of suspected faulty components and the system to which each component belongs, and the topological distance relative to the semantic anchor component are combined. Its corresponding hybrid kernel score The prompts are arranged into hierarchical and structured words. These prompts are then input into a locally deployed large language model. Under the prior constraints provided by the topology graph and semantic embedding, the large language model generates a fault mechanism analysis for the fault description, a priority troubleshooting order based on the mechanism path, and specific maintenance suggestions, which are then output as diagnostic results.

8. A computer-readable storage medium, characterized in that, It stores program instructions that, when executed by a processor, implement the method as described in any one of claims 1 to 7.

9. A forklift fault diagnosis system based on a large language model using topological graphs and semantic embedding fusion, characterized in that: include: A computer device, the computer device comprising the computer-readable storage medium according to claim 8.

10. The system according to claim 9, characterized in that, include: A dual-space construction module is used to build the device's mechanistic topology map and component semantic vector matrix; The semantic awareness module is used to quantify the semantic clarity score of the fault description text input by the user; An adaptive inference engine is used to dynamically calculate semantic kernel weight parameters based on semantic clarity scores, and to score and rank the fault relevance of all parts in the map based on a hybrid topological and semantic kernel function, thereby filtering a set of suspected faulty parts. The generative diagnostic module encapsulates fault descriptions and sets of suspected faulty components into structured prompt words, driving a large language model to generate diagnostic reports.