A vehicle diagnosis and repair knowledge base construction method, an electronic device, and a storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By decoupling the document processing flow of the vehicle diagnosis and repair knowledge base through techniques such as template methods and bridging patterns, and by optimizing knowledge generation by combining large language models, the problems of rigid parsing processes and poor query results in existing technologies have been solved, and efficient and accurate knowledge base construction and querying have been achieved.

CN122309657APending Publication Date: 2026-06-30FAW JIEFANG AUTOMOTIVE CO

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: FAW JIEFANG AUTOMOTIVE CO
Filing Date: 2026-03-13
Publication Date: 2026-06-30

Application Information

Patent Timeline

13 Mar 2026

Application

30 Jun 2026

Publication

CN122309657A

IPC: G06F16/3329; G06F16/334; G06F16/338; G06N5/022; G06N3/0455; G06N5/04

AI Tagging

Technology Topics

Linguistic model Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Bloom cognitive level constraint-based achievement-oriented education diagnosis method and system
CN122453570ALinguistic model Algorithm
Systems and methods for training a multi-modal language model with reasoning
US20260148541A1Character and pattern recognition Linguistic modelModal language
Method and device for evaluating quality of activities of adolescents based on LLM, and storage medium
CN122332558AEvaluation result Linguistic model
A large model-based exclusive team performance portrait generation and intelligent evaluation system
CN122264618Aobjective evaluationComprehensive quantitative evaluationData processing applications Inference methods Linguistic model Data acquisition
Open-vocabulary segmentation method and system with multi-modal model representation optimization
CN118823350BPattern recognition Visual technology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, the parsing process of vehicle diagnosis and repair knowledge bases is rigid, has poor adaptability, cannot flexibly adjust parsing strategies, has high expansion costs, low knowledge parsing quality, poor query results, and low recall and accuracy.

Method used

It employs template method, bridge pattern, strategy pattern, chain of responsibility pattern and factory pattern to decouple document reading, cleaning, segmentation, knowledge generation and knowledge storage. It combines a large language model for knowledge generation and querying, and supports plug-in extensions for multiple document types and parsing strategies.

Benefits of technology

It achieves highly configurable and flexible expansion of parsing rules, improves the efficiency and quality of knowledge base construction, significantly improves the recall and accuracy of knowledge queries, and ensures the standardization and security of knowledge use.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309657A_ABST

Patent Text Reader

Abstract

This application discloses a method for constructing a vehicle diagnosis and repair knowledge base, an electronic device, and a storage medium, relating to the field of vehicle diagnosis and repair. The method includes: Step S1, knowledge base creation involves uploading knowledge documents to the knowledge base, then sequentially performing standardized processing including document reading, cleaning, segmentation, knowledge generation, and knowledge storage, before saving the knowledge to a database. The process of saving knowledge to the database uses a template method to define a unified document parsing and processing flow, and a bridging pattern to decouple each processing stage. The specific processing at each stage uses a strategy pattern to load different implementation methods. The cleaning stage uses a chain of responsibility pattern to flexibly configure the rule execution order. Step S6, diagnosis and repair knowledge query involves standardizing the text of a user's question, retrieving it from the knowledge base, and generating a corresponding question based on a large language model. The method utilizes plug-in technology to enable configurability of knowledge document parsing rules.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of vehicle diagnosis and repair, and in particular to methods for constructing vehicle diagnosis and repair knowledge bases, electronic devices, storage media, and vehicle diagnosis and repair platforms. Background Technology

[0002] Patent application number 202410521558.8 relates to the field of vehicle fault diagnosis and repair, specifically disclosing a method, diagnosis and repair method, and system for constructing a knowledge graph of commercial vehicle faults. The method categorizes the vehicle structure based on the vehicle's Bill of Materials (BOM) and modifies the classification structure based on expert knowledge, ensuring parent-child relationships at each level. Based on the vehicle structure classification, it collects fault information for all existing faults under each category, establishing a fault tree from various component levels to specific faults. A vehicle fault ontology is established based on the structure and information contained in the fault tree. Using the vehicle fault ontology as the basic framework, knowledge extraction and triple filling are performed on the fault tree to construct the vehicle fault knowledge graph. This invention improves the comprehensiveness and accuracy of the fault model, and improves the accuracy of data acquisition by retrieving user descriptions through word segmentation combined with triples. However, this patent only extracts key information from the fault content into triples and establishes the knowledge graph; significant work is still needed for fault analysis and triple extraction. The invention implements a commercial vehicle diagnostic and repair knowledge base with highly configurable knowledge document parsing rules to improve the quality of knowledge parsing.

[0003] Patent application number 202411352724.2 relates to a vehicle fault detection and repair method, device, electronic equipment, and storage medium, belonging to the field of vehicle fault detection technology. The method includes acquiring historical data from the OEM and basic vehicle operating data; the basic operating data includes at least one basic parameter data and fault codes; constructing a knowledge graph based on the historical data; diagnosing at least one basic parameter data in the basic operating data one by one according to the knowledge graph, obtaining a diagnostic result corresponding to each basic parameter data; when the diagnostic result indicates a fault, determining the fault parameter data and correcting it. This invention constructs a knowledge graph and diagnoses at least one basic parameter data in the basic operating data one by one through the knowledge graph, improving the detection efficiency and accuracy of fault diagnosis. When a fault is detected, the fault parameter data is corrected, thereby enabling real-time repair of vehicle faults. However, this patent uses general parsing methods in the document knowledge parsing and understanding process, which cannot read knowledge according to the characteristics of the document. This easily leads to incomplete knowledge, discrepancies between knowledge titles and content, etc.

[0004] This application proposes a commercial vehicle diagnostic and repair knowledge base with highly configurable knowledge document parsing rules to improve the quality of knowledge parsing. Summary of the Invention

[0005] The purpose of this invention is to provide a method for constructing a vehicle diagnosis and repair knowledge base, an electronic device, a storage medium, and a vehicle diagnosis and repair platform, thereby solving at least one of a number of technical problems.

[0006] Core technical issues: 1. Rigid knowledge parsing process and poor adaptability: Existing technologies use a fixed document parsing process, which cannot flexibly adjust parsing strategies according to different types and characteristics of diagnostic and repair knowledge documents. Furthermore, the high coupling between processing stages makes expansion costs extremely high when adding new document types and parsing strategies, failing to meet the diverse parsing needs of commercial vehicle diagnostic and repair knowledge documents. 2. Low quality of knowledge parsing, insufficient completeness and accuracy: On the one hand, traditional general parsing methods are prone to incomplete knowledge extraction and discrepancies between titles and content; on the other hand, some technologies rely heavily on manual fault analysis and knowledge extraction, resulting in extremely low efficiency. Poor descriptions of the original documents directly lead to chaotic logic and poor usability of the generated knowledge. 3. Poor query results for diagnostic and repair knowledge, with low recall and precision: Existing knowledge bases have a single retrieval method, failing to consider both semantic similarity matching and precise keyword matching. They also lack an effective answer generation mechanism, making it difficult to extract comprehensive and accurate commercial vehicle diagnostic and repair conclusions from search results, thus failing to meet users' core need for efficient querying of diagnostic and repair knowledge.

[0007] This invention provides the following solution:

[0008] According to a first aspect of the present invention, a method for constructing a vehicle diagnostic and repair knowledge base is provided. Based on configurable knowledge document parsing rules, the method includes:

[0009] Step S1, the creation of the knowledge base is: after uploading the knowledge document to the knowledge base, it undergoes standardized processing of document reading, cleaning, segmentation, knowledge generation, and knowledge storage in sequence, and then the knowledge is saved to the database.

[0010] The process of saving knowledge to the database uses a template method to define a unified document parsing and processing flow, and uses the bridging pattern to decouple the various processing stages of reading, cleaning, segmentation, knowledge generation and knowledge storage.

[0011] The specific processing at each stage uses different implementation methods based on the strategy pattern;

[0012] The cleaning phase employs a chain-of-responsibility model to flexibly configure the order of rule execution.

[0013] This also includes the fact that strategies for all stages are created uniformly using the factory pattern;

[0014] Step S6, the query of diagnostic and repair knowledge is the process of standardizing the text of the user's question, retrieving the knowledge base, and generating the corresponding question from the user's question in combination with the large language model.

[0015] Among them, the knowledge document parsing rules are configurable based on plug-in technology.

[0016] Furthermore, the specific implementations of the Template Method, Bridge pattern, Strategy pattern, Chain of Responsibility pattern, and Factory pattern are as follows:

[0017] The template method includes defining a fixed process for document parsing through the run method of the abstract class AbstractProcessor, which is used to differentiate the processing details of each stage of reading, cleaning, segmentation, knowledge generation, and knowledge storage.

[0018] The bridging pattern includes injecting the implementation classes of the five stages of document reading, cleaning, chunking, knowledge generation, and knowledge storage into the fixed process of step S1 above, with the specific implementation classes of each stage specified by the subclass of AbstractProcessor.

[0019] The strategy pattern includes configuring multiple replaceable specific implementation strategies for each processing stage, and dynamically selecting them based on the document content and characteristics;

[0020] The chain of responsibility pattern involves building multiple cleaning components into a linked list, so that document content flows sequentially through each cleaning component to complete multi-level cleaning.

[0021] The factory pattern includes setting up a dedicated factory class to dynamically create corresponding strategy objects for each processing stage. These strategy objects are attached to a specified location in the parsing process as plugins.

[0022] Furthermore, including:

[0023] Document reading includes:

[0024] S11. Select the corresponding parsing library to open the document object according to the type of the document to be processed. Supported document types include xlsx, xls, docx, csv, pdf, markdown, and txt.

[0025] S12. Read the document content according to the preset granularity and save it as a list of document fragments. Excel and csv are read by line, docx is read by paragraph, pdf is read by page, markdown is read by title and chapter, and txt is read by line break.

[0026] S13. Encapsulate the document fragment content and source location information into a document_content object and add it to the read result list;

[0027] S14. Close the document object and return a list[document_content] of uniform structure as the read result;

[0028] The document reading strategy object is created by the DocumentParserFactory factory class;

[0029] When adding a new document type, implement a subclass of AbstractDocumentParser to complete the plug-in extension;

[0030] in,

[0031] The structure of the document_content object includes:

[0032] page_content: A string type used to store sub-content in the document;

[0033] metadata: A dictionary type used to store source information of the content, including filenames and the location of page_content in the document;

[0034] document_id: String / numeric type, used to store the primary key information of the document in the database;

[0035] knowledge_list: A list type used to store a list of knowledge extracted from page_content.

[0036] Furthermore, including:

[0037] Document cleaning includes:

[0038] S21. Dynamically generate cleaning components based on the cleaning component names specified by the user and save them to the component list;

[0039] The provided cleaning strategies include invalid character removal, email address removal, URL address removal, and extra whitespace removal;

[0040] S22. Traverse the document, read the returned list[document_content], and extract the page_content field of each object as the content to be cleaned;

[0041] S23. The content to be cleaned is sequentially passed to each cleaning component in the chain of responsibility pattern, with the output of the previous component serving as the input of the next component.

[0042] S24. Update the page_content field of the corresponding document_content object with the results processed by all cleaning components;

[0043] S25. After the traversal is complete, return the cleaned list[document_content];

[0044] The cleaning components are created using the factory pattern. The add_next method is used to build a linked list of cleaning components, which is used to add or delete cleaning components and adjust the execution order.

[0045] Furthermore, including:

[0046] Document chunking includes:

[0047] S31. Iterate through the list[document_content] returned by the content cleaning sub-process and extract the page_content field of each object;

[0048] S32. Divide page_content into chunks according to the chunking strategy configured by the user, and return the chunk_list chunking result;

[0049] Chunking strategies include delimiter-based chunking, hierarchical chunking, and Markdown document title formatting chunking, where the chunking results are adapted to the text length limits of large language models.

[0050] S33. Create a new document_content object for each element in chunk_list as page_content, and save it to the list of objects after chunking;

[0051] S34. Return the list[document_content] after the blocks are divided;

[0052] The splitting strategy object is created using the factory pattern. When adding a new splitting strategy, a subclass of AbstractSplitter is implemented to complete the plug-in extension.

[0053] Furthermore, including:

[0054] Knowledge generation includes:

[0055] S41. Iterate through the list[document_content] returned by the document chunking sub-process and extract the page_content field of each object;

[0056] S42. Extract knowledge from page_content according to the knowledge generation strategy configured by the user and return a knowledge list;

[0057] Knowledge generation strategies include: direct knowledge generation from document fragments, parent-child segmented knowledge generation, large language model-based knowledge generation, and question-and-answer pair knowledge generation.

[0058] S43. Update the knowledge list to the knowledge_list field of the corresponding document_content object;

[0059] S44. After the traversal is complete, return the updated list[document_content];

[0060] The knowledge generation strategy object is created by the KnowledgeExtractorFactory class. When adding a new knowledge generation strategy, a subclass of AbstractKnowledgeExtractor is implemented to complete the plug-in extension.

[0061] Furthermore, including:

[0062] Knowledge storage includes:

[0063] S51. Iterate through the list[document_content] returned by the knowledge generation sub-process and extract the knowledge_list field of each object;

[0064] S52. Save the knowledge in knowledge_list to the knowledge base according to the knowledge storage strategy configured by the user. The provided knowledge storage strategies include vector storage, keyword storage, and hybrid storage. Hybrid storage is completed by calling the implementation classes of vector storage and keyword storage respectively.

[0065] S53. After traversal is completed, the entire knowledge base creation process is finished.

[0066] The knowledge storage strategy object is created by the KnowledgeSaverFactory factory class. New storage strategies are added to implement the subclass of AbstractKnowledgeSaver, completing the plug-in extension.

[0067] Also includes:

[0068] AbstractProcessor has two subclasses: AutoProcessor and ProfessionalProcessor. AutoProcessor automatically bridges the default implementation classes of the five processing stages based on the file extension of the file to be processed, thus completing the automated parsing.

[0069] ProfessionalProcessor creates implementation classes for five processing stages based on the user's custom configuration, leaving the control details of each stage to the user, thus achieving a high degree of configurability of the parsing rules.

[0070] Furthermore, including:

[0071] Diagnosis and repair knowledge inquiry includes:

[0072] S61. Perform vectorization and keyword extraction on the user's question text;

[0073] S62. Use the question vector to perform vector similarity retrieval in the knowledge base and obtain the top 5 results with the highest similarity. At the same time, use the question keywords to perform keyword retrieval in the knowledge base and obtain the top 5 results with the highest similarity.

[0074] S63. Sort the two types of search results in a mixed order to obtain the basic answer materials for the knowledge base;

[0075] S64. Integrate user questions and basic answer materials into prompts in a preset format and input them into the large language model;

[0076] S65. The large language model summarizes and refines the answer based on the prompt words and the prompt word engineering template, and generates the final diagnosis and repair knowledge answer.

[0077] In the knowledge question answering module, when performing a search, the top 3 knowledge items with the highest matching degree are first obtained as basic materials, and then processed by a large language model to generate answers.

[0078] Furthermore, including:

[0079] Knowledge base creation and diagnostic knowledge retrieval are conducted based on a standardized knowledge base usage process;

[0080] The workflow includes five modules: user management, knowledge base management, document management, document parsing, and knowledge Q&A, and sets two levels of permissions for administrators and ordinary users.

[0081] Administrators have full permissions for knowledge base creation, user management, document management, document parsing, and knowledge Q&A, while ordinary users only have knowledge Q&A permissions.

[0082] The knowledge base management module is used to manage access information for text embedding models, access information for large language models, and prompt word engineering templates for large language models.

[0083] The document management module is used to upload and download knowledge documents from the knowledge base.

[0084] According to a second aspect of the present invention, an electronic device is provided, comprising: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other via the communication bus;

[0085] The memory stores computer programs, which, when executed by the processor, cause the processor to perform steps such as the method for building a vehicle diagnostic knowledge base.

[0086] According to a third aspect of the present invention, a computer-readable storage medium is provided, comprising: storing a computer program executable by an electronic device, wherein when the computer program is run on the electronic device, the electronic device performs steps such as a method for constructing a vehicle diagnostic knowledge base.

[0087] According to a fourth aspect of the present invention, a vehicle diagnostic and repair platform is provided, comprising:

[0088] Electronic devices used to implement steps such as methods for building a vehicle diagnostic and repair knowledge base;

[0089] The processor runs programs, and when the program runs, it executes steps such as methods for building a vehicle diagnostic knowledge base from data output by electronic devices.

[0090] Storage medium used to store programs that, when running, execute steps such as methods for building a vehicle diagnostic knowledge base based on data output from electronic devices.

[0091] The above solution achieves the following beneficial technical effects:

[0092] This application utilizes a collaborative approach based on the template method, bridge pattern, strategy pattern, chain of responsibility pattern, and factory pattern to decouple the entire process of document reading, cleaning, segmentation, knowledge generation, and knowledge storage. Through a plug-in extension method, adding new document types and parsing strategies only requires implementing the corresponding abstract class subclass, significantly reducing the development cost of process extension. It achieves precise adaptation of parsing rules to various commercial vehicle diagnostic and repair knowledge documents, enabling highly configurable and flexible expansion of the parsing process.

[0093] This application achieves multi-level configurable content cleaning through the chain of responsibility model, effectively eliminating invalid information; it introduces a large language model to participate in knowledge generation, optimizes and summarizes the defective content of the original document, and completely solves the problems of incomplete knowledge extraction and logical confusion; at the same time, it replaces a large amount of manual parsing work, greatly improves the construction efficiency of the knowledge base and the practicality and completeness of knowledge; and significantly improves the quality of knowledge parsing and construction efficiency.

[0094] This application employs a hybrid retrieval method combining vectorization and keywords, balancing semantic similarity and keyword accuracy. By leveraging the summarizing and refining capabilities of a large language model, comprehensive and accurate diagnostic and repair answers are extracted from the hybrid retrieval results, significantly improving the recall and accuracy of knowledge queries. Simultaneously, the hierarchical permission design enables refined management of the knowledge base, ensuring the standardization and security of knowledge use, and greatly enhancing the recall and accuracy of knowledge queries.

[0095] Figure caption

[0096] Figure 1This is a flowchart of a method for constructing a vehicle diagnosis and repair knowledge base provided by one or more embodiments of the present invention.

[0097] Figure 2 This is a schematic diagram of the knowledge base usage process provided in a specific embodiment of the present invention.

[0098] Figure 3 This is a schematic diagram of a file parsing process provided in a specific embodiment of the present invention.

[0099] Figure 4 This is a schematic diagram of a file parsing bridge class provided in a specific embodiment of the present invention.

[0100] Figure 5 This is a schematic diagram of a document knowledge reading process provided in a specific embodiment of the present invention.

[0101] Figure 6 This is a schematic diagram of a document knowledge reading class provided in a specific embodiment of the present invention.

[0102] Figure 7 This is a schematic diagram of a content cleaning process provided in a specific embodiment of the present invention.

[0103] Figure 8 This is a schematic diagram of a content cleaning class provided in a specific embodiment of the present invention.

[0104] Figure 9 This is a schematic diagram of a document segmentation process provided in a specific embodiment of the present invention.

[0105] Figure 10 This is a schematic diagram of a document segmentation class provided in a specific embodiment of the present invention.

[0106] Figure 11 This is a schematic diagram of a knowledge generation process provided in a specific embodiment of the present invention.

[0107] Figure 12 This is a schematic diagram of a knowledge generation class provided in a specific embodiment of the present invention.

[0108] Figure 13 This is a schematic diagram of a knowledge storage process provided in a specific embodiment of the present invention.

[0109] Figure 14 This is a schematic diagram of a knowledge storage class provided in a specific embodiment of the present invention.

[0110] Figure 15 This is an electronic device structure diagram of a vehicle diagnosis and repair knowledge base construction method provided by one or more embodiments of the present invention. Detailed Implementation

[0111] The technical solution of the present invention will now be clearly and completely described with reference to the figures. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0112] Figure 1 This is a flowchart of a method for constructing a vehicle diagnosis and repair knowledge base provided by one or more embodiments of the present invention.

[0113] like Figure 1 The method shown is a vehicle diagnosis and repair knowledge base construction method for commercial vehicles with highly configurable knowledge document parsing rules. It belongs to the field of medium and heavy-duty commercial vehicle technology and includes two core steps: knowledge base creation and diagnosis and repair knowledge query. The whole is based on a plug-in design to realize flexible configuration of the knowledge document parsing process.

[0114] Based on configurable knowledge document parsing rules, the methods for constructing a vehicle diagnostic and repair knowledge base include:

[0115] Step S1, the creation of the knowledge base is: after uploading the knowledge document to the knowledge base, it undergoes standardized processing of document reading, cleaning, segmentation, knowledge generation, and knowledge storage in sequence, and then the knowledge is saved to the database.

[0116] The process of saving knowledge to the database uses a template method to define a unified document parsing and processing flow, and uses the bridging pattern to decouple the various processing stages of reading, cleaning, segmentation, knowledge generation and knowledge storage.

[0117] The specific processing at each stage uses different implementation methods based on the strategy pattern;

[0118] The cleaning phase employs a chain-of-responsibility model to flexibly configure the order of rule execution.

[0119] This also includes the fact that strategies for all stages are created uniformly using the factory pattern;

[0120] Step S6, the query of diagnostic and repair knowledge is the process of standardizing the text of the user's question, retrieving the knowledge base, and generating the corresponding question from the user's question in combination with the large language model.

[0121] Among them, the knowledge document parsing rules are configurable based on plug-in technology.

[0122] In this embodiment, the specific implementations of the Template Method, Bridge Pattern, Strategy Pattern, Chain of Responsibility Pattern, and Factory Pattern are as follows:

[0123] The template method includes defining a fixed process for document parsing through the run method of the abstract class AbstractProcessor, which is used to differentiate the processing details of each stage of reading, cleaning, segmentation, knowledge generation, and knowledge storage.

[0124] The bridging pattern includes injecting the implementation classes of the five stages of document reading, cleaning, chunking, knowledge generation, and knowledge storage into the fixed process of step S1 above, with the specific implementation classes of each stage specified by the subclass of AbstractProcessor.

[0125] The strategy pattern includes configuring multiple replaceable specific implementation strategies for each processing stage, and dynamically selecting them based on the document content and characteristics;

[0126] The chain of responsibility pattern involves building multiple cleaning components into a linked list, so that document content flows sequentially through each cleaning component to complete multi-level cleaning.

[0127] The factory pattern includes setting up a dedicated factory class to dynamically create corresponding strategy objects for each processing stage. These strategy objects are attached to a specified location in the parsing process as plugins.

[0128] In this embodiment, it includes:

[0129] Document reading includes:

[0130] S11. Select the corresponding parsing library to open the document object according to the type of the document to be processed. Supported document types include xlsx, xls, docx, csv, pdf, markdown, and txt.

[0131] S12. Read the document content according to the preset granularity and save it as a list of document fragments. Excel and csv are read by line, docx is read by paragraph, pdf is read by page, markdown is read by title and chapter, and txt is read by line break.

[0132] S13. Encapsulate the document fragment content and source location information into a document_content object and add it to the read result list;

[0133] S14. Close the document object and return a list[document_content] of uniform structure as the read result;

[0134] The document reading strategy object is created by the DocumentParserFactory factory class;

[0135] When adding a new document type, implement a subclass of AbstractDocumentParser to complete the plug-in extension;

[0136] in,

[0137] The structure of the document_content object includes:

[0138] page_content: A string type used to store sub-content in the document;

[0139] metadata: A dictionary type used to store source information of the content, including filenames and the location of page_content in the document;

[0140] document_id: String / numeric type, used to store the primary key information of the document in the database;

[0141] knowledge_list: A list type used to store a list of knowledge extracted from page_content.

[0142] In this embodiment, it includes:

[0143] Document cleaning includes:

[0144] S21. Dynamically generate cleaning components based on the cleaning component names specified by the user and save them to the component list;

[0145] The provided cleaning strategies include invalid character removal, email address removal, URL address removal, and extra whitespace removal;

[0146] S22. Traverse the document, read the returned list[document_content], and extract the page_content field of each object as the content to be cleaned;

[0147] S23. The content to be cleaned is sequentially passed to each cleaning component in the chain of responsibility pattern, with the output of the previous component serving as the input of the next component.

[0148] S24. Update the page_content field of the corresponding document_content object with the results processed by all cleaning components;

[0149] S25. After the traversal is complete, return the cleaned list[document_content];

[0150] The cleaning components are created using the factory pattern. The add_next method is used to build a linked list of cleaning components, which is used to add or delete cleaning components and adjust the execution order.

[0151] In this embodiment, it includes:

[0152] Document chunking includes:

[0153] S31. Iterate through the list[document_content] returned by the content cleaning sub-process and extract the page_content field of each object;

[0154] S32. Divide page_content into chunks according to the chunking strategy configured by the user, and return the chunk_list chunking result;

[0155] Chunking strategies include delimiter-based chunking, hierarchical chunking, and Markdown document title formatting chunking, where the chunking results are adapted to the text length limits of large language models.

[0156] S33. Create a new document_content object for each element in chunk_list as page_content, and save it to the list of objects after chunking;

[0157] S34. Return the list[document_content] after the blocks are divided;

[0158] The splitting strategy object is created using the factory pattern. When adding a new splitting strategy, a subclass of AbstractSplitter is implemented to complete the plug-in extension.

[0159] In this embodiment, it includes:

[0160] Knowledge generation includes:

[0161] S41. Iterate through the list[document_content] returned by the document chunking sub-process and extract the page_content field of each object;

[0162] S42. Extract knowledge from page_content according to the knowledge generation strategy configured by the user and return a knowledge list;

[0163] Knowledge generation strategies include: direct knowledge generation from document fragments, parent-child segmented knowledge generation, large language model-based knowledge generation, and question-and-answer pair knowledge generation.

[0164] S43. Update the knowledge list to the knowledge_list field of the corresponding document_content object;

[0165] S44. After the traversal is complete, return the updated list[document_content];

[0166] The knowledge generation strategy object is created by the KnowledgeExtractorFactory class. When adding a new knowledge generation strategy, a subclass of AbstractKnowledgeExtractor is implemented to complete the plug-in extension.

[0167] In this embodiment, it includes:

[0168] Knowledge storage includes:

[0169] S51. Iterate through the list[document_content] returned by the knowledge generation sub-process and extract the knowledge_list field of each object;

[0170] S52. Save the knowledge in knowledge_list to the knowledge base according to the knowledge storage strategy configured by the user. The provided knowledge storage strategies include vector storage, keyword storage, and hybrid storage. Hybrid storage is completed by calling the implementation classes of vector storage and keyword storage respectively.

[0171] S53. After traversal is completed, the entire knowledge base creation process is finished.

[0172] The knowledge storage strategy object is created by the KnowledgeSaverFactory factory class. New storage strategies are added to implement the subclass of AbstractKnowledgeSaver, completing the plug-in extension.

[0173] Also includes:

[0174] AbstractProcessor has two subclasses: AutoProcessor and ProfessionalProcessor. AutoProcessor automatically bridges the default implementation classes of the five processing stages based on the file extension of the file to be processed, thus completing the automated parsing.

[0175] ProfessionalProcessor creates implementation classes for five processing stages based on the user's custom configuration, leaving the control details of each stage to the user, thus achieving a high degree of configurability of the parsing rules.

[0176] In this embodiment, it includes:

[0177] Diagnosis and repair knowledge inquiry includes:

[0178] S61. Perform vectorization and keyword extraction on the user's question text;

[0179] S62. Use the question vector to perform vector similarity retrieval in the knowledge base and obtain the top 5 results with the highest similarity. At the same time, use the question keywords to perform keyword retrieval in the knowledge base and obtain the top 5 results with the highest similarity.

[0180] S63. Sort the two types of search results in a mixed order to obtain the basic answer materials for the knowledge base;

[0181] S64. Integrate user questions and basic answer materials into prompts in a preset format and input them into the large language model;

[0182] S65. The large language model summarizes and refines the answer based on the prompt words and the prompt word engineering template, and generates the final diagnosis and repair knowledge answer.

[0183] In the knowledge question answering module, when performing a search, the top 3 knowledge items with the highest matching degree are first obtained as basic materials, and then processed by a large language model to generate answers.

[0184] In this embodiment, it includes:

[0185] Knowledge base creation and diagnostic knowledge retrieval are conducted based on a standardized knowledge base usage process;

[0186] The workflow includes five modules: user management, knowledge base management, document management, document parsing, and knowledge Q&A, and sets two levels of permissions for administrators and ordinary users.

[0187] Administrators have full permissions for knowledge base creation, user management, document management, document parsing, and knowledge Q&A, while ordinary users only have knowledge Q&A permissions.

[0188] The knowledge base management module is used to manage access information for text embedding models, access information for large language models, and prompt word engineering templates for large language models.

[0189] The document management module is used to upload and download knowledge documents from the knowledge base.

[0190] When performing a search, the knowledge question answering module can first obtain the top 3 knowledge items with the highest matching degree as basic materials, and then process them through a large language model to generate answers.

[0191] It is worth noting that although this system / device only discloses the above-mentioned modules / units, it does not mean that this system / device is limited to the above-mentioned basic functional modules. On the contrary, what this invention intends to express is that, based on the above-mentioned basic functional modules, those skilled in the art can add one or more functional modules in combination with the prior art to form an infinite number of embodiments or technical solutions. That is to say, this system is open rather than closed. It cannot be assumed that the scope of protection of the claims of this invention is limited to the above-disclosed basic functional modules just because this embodiment only discloses a few basic functional modules.

[0192] In one specific embodiment, a method for constructing a commercial vehicle diagnostic and repair knowledge base with highly configurable knowledge document parsing rules is disclosed.

[0193] I. Creation of the knowledge base.

[0194] The creation process of a knowledge base involves uploading knowledge documents to the knowledge base, then processing them through a series of steps including document reading, cleaning, segmentation, knowledge generation, and knowledge storage, before finally saving the knowledge to the database.

[0195] Specifically as follows:

[0196] 1. Because the document processing flow is fixed, with only the processing details at each stage differing, this invention uses a template method to define a unified processing flow.

[0197] 2. The various processing stages in the process are decoupled using the bridge pattern to facilitate understanding and expansion.

[0198] 3. The specific processing at each stage is implemented using different methods in a strategy pattern.

[0199] 4. For the cleaning phase, the chain of responsibility model is used for flexible configuration and execution.

[0200] 5. The strategy uses the factory pattern to create a unified structure.

[0201] II. Diagnosis and Repair Knowledge Inquiry.

[0202] 1. Vectorize and process the question text using keywords. Use the question vectors to perform a vector similarity query in the knowledge base, obtaining the top 5 most similar questions; use the question keywords to perform a keyword query in the knowledge base, obtaining the top 5 most similar questions; combine and sort these results to obtain the answers to the corresponding questions.

[0203] 2. Unify the questions and answers from the knowledge base into prompt words, and then summarize the answers using a large language model.

[0204] This application utilizes a plug-in design to implement a flexibly configurable knowledge document parsing process, addressing the unsatisfactory performance of traditional knowledge base implementations. A unified processing flow is defined using a template method; each processing stage within the flow is decoupled using a bridge pattern for ease of understanding and expansion; and different implementation methods are loaded for each stage's specific processing using a strategy pattern. Flexible rules, tailored to the document's content and characteristics, significantly improve the recall and accuracy of knowledge queries. A large language model is used for knowledge generation, resolving knowledge quality issues caused by poor document descriptions.

[0205] In this embodiment, the knowledge base usage process is as follows: Figure 2 As shown:

[0206] 1. User Management: Manages user passwords and role information. Only administrators have permission to create knowledge bases, manage users, and manage documents; regular users only have the permission to ask questions.

[0207] 2. Knowledge Base Management: Used to manage access information for text embedding models used in the knowledge base; access information for large language models; and prompt word engineering templates used by large language models.

[0208] 3. Document Management: Used for uploading and downloading knowledge documents in the knowledge base.

[0209] 4. Document parsing: The core part of this application, a document processing function with highly configurable parsing rules.

[0210] 5. Knowledge Q&A: First, the question text is used to find the top 3 knowledge items that best match the question in the knowledge base; then, the question text and the retrieved knowledge items are passed to the predefined prompt word template. After the prompt words are generated, the large language model summarizes and polishes the answer.

[0211] In this embodiment, the unified document parsing process defined by the template method pattern and the bridge pattern is as follows: Figure 3 As shown, the template method defines a unified processing flow, and the implementation of each step in the flow is injected into the processing flow by the bridge pattern. The processing flow is as follows: Figure 3 As shown:

[0212] 1. The document knowledge reading sub-process is executed to read different types of documents (xlsx, xls, docx, csv, pdf, markdown, txt) into a unified structure (list[document_content]) for subsequent processing. The specific reading process will be explained in later chapters.

[0213] The structure of document_content is as follows:

[0214] page_content: A string type used to store sub-content in a document.

[0215] metadata: A dictionary type used to store source information of the content, including (filename, page_content's location information in the document).

[0216] document_id: The primary key information of the document in the database.

[0217] knowledge_list: A list of knowledge items.

[0218] 2. Determine if content cleaning rules have been set. If so, execute the content cleaning sub-process; otherwise, directly execute the document chunking processing sub-process.

[0219] 3. The content cleaning sub-process cleanses each piece of content from step 1, including: removing invalid characters, URL information, email information, and redundant whitespace. The specific cleaning steps to be performed are specified by the user. The detailed content cleaning process will be explained in later chapters.

[0220] 4. The function of the document chunking sub-process is to divide the document content into segments of length supported by the large language model. Document chunking methods include: symbol-based extraction; title-based extraction; and content-layered extraction. The specific document chunking process will be explained in later chapters.

[0221] 5. The purpose of the knowledge generation sub-process is to extract knowledge from document blocks. Methods for knowledge extraction include: whole-block extraction; question-answer pair extraction; knowledge summarization using a large language model; and parent-child segment knowledge extraction. The specific knowledge generation process will be explained in later chapters.

[0222] 6. The purpose of the knowledge storage sub-process is to store knowledge using different methods. These methods include vector-based storage and full-text search-based storage. The specific knowledge storage process will be explained in later chapters.

[0223] In this embodiment, the UML class diagram using the bridging method is as follows: Figure 4As shown, the document parsing process is defined using the run method of the abstract class AbstractProcessor. The subclasses AutoProcessor and ProfessionalProcessor are responsible for specifying the specific implementation classes for the five stages (document reading: AbstractDocumentParser, content cleaning: AbstractContentCleaner, document splitting: AbstractSplitter, knowledge generation: AbstractKnowledgeExtractor, knowledge storage: AbstractKnowledgeSaver).

[0224] The subclass AutoProcessor implements the default five-stage implementation class based on the file extension to be processed.

[0225] The subclass ProfessionalProcessor implements a five-stage implementation class created based on user-specified configuration, handing over the control details of the five stages to the user, thus providing highly configurable capabilities.

[0226] The document is read differently depending on its type; the results are returned with the same outcome. The processing flow is as follows: Figure 4 As shown, select the appropriate Python library based on the document type and open the document.

[0227] 2. Different types of documents require different levels of granularity in reading:

[0228] For Excel documents (xlsx, xls), read the contents of all sheets, treating each row of data as a document_content object;

[0229] For CSV documents, each row of data is treated as a document_content object;

[0230] For docx documents, each paragraph is treated as a document_content object;

[0231] For PDF documents, each page is treated as a document_content object;

[0232] For Markdown documents, the headings and sections are treated as a single document_content object;

[0233] For a txt document, a newline character is treated as a document_content object.

[0234] 3. Encapsulate the document fragment content and the document fragment source location information into a document_content object.

[0235] 4. The final result returned is of type list[document_content].

[0236] In this embodiment, the specific document reading is implemented by dynamically injecting a strategy into the document parsing process. The strategy object is created using the factory pattern, such as... Figure 6 As shown, the entire document parsing process does not need to concern itself with the specific implementation of the document reading strategy. The specific document reading strategy can be treated as a plugin, installed in a slot at a specified location within the parsing process. If a new type of document needs to be read later, only a new subclass of AbstractDocumentParser needs to be implemented.

[0237] The AbstractDocumentParser class generates specific strategy objects through the DocumentParserFactory factory class to complete the document reading process.

[0238] In this embodiment, after reading content fragments from a document, users can choose whether to perform content cleaning and which cleaning items to select. The processing flow is as follows: Figure 7 As shown:

[0239] 1. Dynamically generate cleaning components based on the selected cleaning component name and save them to the cleaning component list.

[0240] 2. Iterate through the list[document_content] returned by the document reading process, and perform the following operation on each document_content object.

[0241] 3. Retrieve the content of the document fragment saved in the document_content object (page_content field).

[0242] 4. The content of the document fragment is taken as input and passed to the first cleaning component. Its output is used as the input parameter for the second cleaning component. After being processed by all cleaning components in sequence, the output of the last component is taken as the final processing result.

[0243] 5. Use the result from step 4 to update the page_content field of the document_content object from step 3.

[0244] 6. Once all elements of list[document_content] in step 2 have been processed, return the result of this list[document_content] type.

[0245] In this embodiment, the text cleaning process is designed using the chain of responsibility pattern, where text content flows sequentially into several designated cleaners. Which cleaners are used is configured by the strategy pattern, and their objects are created using the factory pattern. Figure 8 As shown. The entire document parsing process does not need to concern itself with the specific implementation of the content cleaning strategy. The content cleaning strategy can be treated as a plugin and installed in a slot at a specified location in the parsing process. This embodiment provides four cleaning strategies: invalid character cleaning, email address cleaning, URL address cleaning, and redundant whitespace cleaning. Multiple cleaners can be linked together using the `add_next` method of the cleaner object to perform sequential text content cleaning.

[0246] In a knowledge base, appropriately sized document blocks satisfy the text length limitations of large language models while ensuring the integrity of knowledge. Therefore, document segmentation is crucial for improving the effectiveness of the knowledge base. This embodiment provides multiple segmentation strategies and uses configuration to dynamically set segmentation rules, allowing for segmentation based on document characteristics. The processing flow is as follows: Figure 9 As shown:

[0247] 1. Iterate through the list[document_content] result returned by the content cleaning subprocess, and perform the following operation on each document_content object.

[0248] 2. Retrieve the content of the document fragment saved in the document_content object (page_content field).

[0249] 3. For document fragments, the content is divided into chunks according to the configured strategy. The chunking strategies provided by this invention include: delimiter-based chunking; hierarchical chunking; and Markdown document title formatting. A chunk_list is returned after chunking.

[0250] 4. For each element returned from step 3 in the `chunk_list`, create a new `document_content` object, making the element's value the value of its `page_content` field. Save the newly created `document_content` objects to the `chunk_list`.

[0251] 5. Return the list of document_content objects after chunking, as the result of the document chunking process.

[0252] In this embodiment, the specific document segmentation process is implemented by dynamically injecting a strategy into the document parsing process. The strategy object is created using the factory pattern. Figure 10 The class diagram is used to illustrate the relationships:

[0253] The entire document parsing process does not need to concern itself with the specific implementation of the document segmentation strategy. The specific document segmentation strategy can be treated as a plugin, installed in a slot at a specified location within the parsing process. If a new document segmentation strategy is needed later, only a new subclass of AbstractSplitter needs to be implemented.

[0254] In this embodiment, generating reasonable knowledge content in the knowledge base helps improve the recall and accuracy of knowledge. This invention provides multiple knowledge generation strategies to address documents with different characteristics. The processing flow of the knowledge generation sub-process is as follows: Figure 11 As shown:

[0255] 1. Iterate through the list of document_content objects returned by the document chunking sub-procedure, and perform the following operations on each document_content object.

[0256] 2. Retrieve the content of the document fragment saved in the document_content object (page_content field).

[0257] 3. For the content of document fragments, knowledge is generated according to the configured strategies. The knowledge generation strategies provided by this invention include: direct knowledge generation from document fragments; parent-child segmented knowledge generation; large language model-based knowledge generation; and question-and-answer pair knowledge generation.

[0258] 4. Use the knowledge generated in step 3 as a knowledge list and save it to the knowledge_list field of the document_content object.

[0259] 5. Return the updated list of document_content objects as the result of the knowledge generation process.

[0260] In this embodiment, the specific knowledge generation process is implemented by dynamically injecting it into the document parsing process using a strategy approach. The strategy object is created using the factory pattern. Figure 12 The class diagram is used to illustrate the relationships:

[0261] The entire document parsing process doesn't need to concern itself with the specific implementation of the knowledge generation strategy. The specific knowledge generation strategy can be treated as a plugin, installed in a slot at a designated location within the parsing process. If a new document segmentation strategy is needed later, simply implement a new subclass of AbstractKnowledgeExtractor.

[0262] Generating reasonable knowledge content in a knowledge base helps improve the recall and accuracy of knowledge. This invention provides multiple knowledge generation strategies to address documents with different characteristics.

[0263] In this embodiment, the processing flow of the knowledge generation sub-process is as follows: Figure 13 As shown:

[0264] 1. Iterate through the document_content object column returned by the knowledge generation sub-process, and perform the following operation on each document_content object.

[0265] 2. Retrieve the knowledge list information from the document_content object (knowledge_list field).

[0266] 3. For the knowledge list (knowledge_list), it is stored according to the configured knowledge storage strategy. The knowledge storage strategies provided by this invention include: storing vector information; storing keyword information; and storing a mixture of information.

[0267] The specific knowledge storage process is implemented by dynamically injecting it into the document parsing flow using a strategy approach. Strategy objects are created using the factory pattern. For example... Figure 14 The class diagram is used to illustrate the relationships:

[0268] The entire document parsing process doesn't need to concern itself with the specific implementation of the knowledge generation strategy. The specific knowledge storage strategy can be treated as a plugin, installed in a slot at a designated location within the parsing process. The implementation of hybrid knowledge storage is achieved by separately calling two implementation classes: vector storage and keyword storage. If a new document segmentation strategy is needed later, only a new subclass of AbstractKnowledgeSaver needs to be implemented.

[0269] Figure 15 This is an electronic device structure diagram of a vehicle diagnosis and repair knowledge base construction method provided by one or more embodiments of the present invention.

[0270] like Figure 15 As shown, this application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

[0271] The memory stores a computer program, which, when executed by a processor, causes the processor to perform steps of a method for constructing a vehicle diagnostic and repair knowledge base.

[0272] This application also provides a computer-readable storage medium storing a computer program executable by an electronic device, which, when run on the electronic device, causes the electronic device to perform the steps of a vehicle diagnostic and repair knowledge base construction method.

[0273] This application also provides a vehicle diagnostic and repair platform, including:

[0274] Electronic equipment used to implement the steps of constructing a vehicle diagnostic and repair knowledge base;

[0275] The processor runs programs, and when the programs are running, they execute the steps of the vehicle diagnostic knowledge base construction method based on data output from electronic devices.

[0276] Storage medium used to store programs that, when running, execute steps of a vehicle diagnostic knowledge base construction method based on data output from electronic devices.

[0277] The communication bus mentioned in the above electronic devices can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in the diagram, but this does not indicate that there is only one bus or one type of bus.

[0278] The electronic device comprises a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on the operating system. The hardware layer includes hardware such as a central processing unit (CPU), a memory management unit (MMU), and memory. The operating system can be any one or more computer operating systems that control the electronic device through processes, such as Linux, Unix, Android, iOS, or Windows. Furthermore, in this embodiment of the invention, the electronic device can be a smartphone, tablet computer, or other handheld device, or a desktop computer, portable computer, or other electronic device; there is no particular limitation in this embodiment.

[0279] In this embodiment of the invention, the executing entity for electronic device control can be an electronic device itself, or a functional module within an electronic device capable of calling and executing a program. The electronic device can obtain the firmware corresponding to the storage medium. This firmware is provided by the supplier, and different storage media may have the same or different firmware; no limitation is made here. After obtaining the firmware corresponding to the storage medium, the electronic device can write this firmware into the storage medium; specifically, it burns the firmware corresponding to the storage medium into the storage medium. The process of burning the firmware into the storage medium can be implemented using existing technology, and will not be elaborated upon in this embodiment of the invention.

[0280] Electronic devices can also obtain reset commands corresponding to the storage media. The reset commands corresponding to the storage media are provided by the supplier. The reset commands corresponding to different storage media can be the same or different, and no restrictions are imposed here.

[0281] At this time, the storage medium of the electronic device is a storage medium on which the corresponding firmware has been written. The electronic device can respond to the reset command corresponding to the storage medium on which the corresponding firmware has been written, thereby resetting the storage medium on which the corresponding firmware has been written according to the reset command. The process of resetting the storage medium according to the reset command can be implemented by existing technology and will not be described in detail in this embodiment of the invention.

[0282] For ease of description, the above devices are described separately by function as various units and modules. Of course, in implementing this application, the functions of each unit and module can be implemented in one or more software and / or hardware.

[0283] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the meaning consistent with their meaning in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined.

[0284] For the sake of simplicity, the method embodiments are described as a series of actions. However, those skilled in the art should understand that the embodiments of the present invention are not limited to the described order of actions, because according to the embodiments of the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily essential to the embodiments of the present invention.

[0285] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this application.

[0286] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for constructing a vehicle diagnosis and repair knowledge base, characterized in that, The method for constructing a vehicle diagnostic and repair knowledge base based on configurable knowledge document parsing rules includes: Step S1, knowledge base creation: After uploading the knowledge document to the knowledge base, it undergoes standardized processing in sequence, including document reading, cleaning, segmentation, knowledge generation, and knowledge storage, before the knowledge is saved to the database; Among them, knowledge is stored in the database, a template method is used to define a unified document parsing and processing flow, and the bridging pattern is used to decouple the various processing stages of reading, cleaning, segmentation, knowledge generation and knowledge storage. The specific processing at each stage uses different implementation methods based on the strategy pattern; The cleaning phase employs a chain-of-responsibility model to flexibly configure the order of rule execution. This also includes the fact that strategies for all stages are created uniformly using the factory pattern; Step S6, querying diagnostic and repair knowledge: After standardizing the text of the user's question, the knowledge base is retrieved, and the corresponding question is generated by combining the large language model; Among them, the knowledge document parsing rules are configurable based on plug-in technology.

2. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, The specific implementations of the Template Method, Bridge Pattern, Strategy Pattern, Chain of Responsibility Pattern, and Factory Pattern are as follows: The template method includes defining a fixed process for document parsing through the run method of the abstract class AbstractProcessor, which is used to differentiate the processing details of each stage of reading, cleaning, segmentation, knowledge generation, and knowledge storage. The Bridge pattern includes injecting the implementation classes of the five stages of document reading, cleaning, chunking, knowledge generation, and knowledge storage into the fixed process of step S1, with the specific implementation classes of each stage specified by the subclass of AbstractProcessor. The strategy pattern includes configuring multiple replaceable specific implementation strategies for each processing stage, and dynamically selecting them based on the document content and characteristics; The chain of responsibility pattern involves building multiple cleaning components into a linked list, so that document content flows sequentially through each cleaning component to complete multi-level cleaning. The factory pattern includes setting up a dedicated factory class to dynamically create corresponding strategy objects for each processing stage. These strategy objects are attached to a specified location in the parsing process as plugins.

3. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Document reading includes: S11. Based on the type of the document to be processed, select the corresponding parsing library to open the document object. Supported document types include xlsx, xls, docx, csv, pdf, markdown, and txt. S12. Read the document content according to the preset granularity and save it as a list of document fragments. Excel and csv are read by line, docx is read by paragraph, pdf is read by page, markdown is read by title and chapter, and txt is read by line break. S13. Encapsulate the document fragment content and source location information into a document_content object and add it to the read result list; S14. Close the document object and return a list[document_content] of uniform structure as the read result; The document reading strategy object is created by the DocumentParserFactory factory class; When adding a new document type, implement a subclass of AbstractDocumentParser to complete the plug-in extension; in, The structure of the document_content object includes: page_content: A string type used to store sub-content in the document; metadata: A dictionary type used to store source information of the content, including filenames and the location of page_content in the document; document_id: String / numeric type, used to store the primary key information of the document in the database; knowledge_list: A list type used to store a list of knowledge extracted from page_content.

4. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Document cleaning includes: S21. Dynamically generate cleaning components based on the cleaning component names specified by the user and save them to the component list; The provided cleaning strategies include invalid character removal, email address removal, URL address removal, and extra whitespace removal; S22. Traverse the document, read the returned list[document_content], and extract the page_content field of each object as the content to be cleaned; S23. The content to be cleaned is sequentially passed to each cleaning component in the chain of responsibility pattern, with the output of the previous component serving as the input of the next component. S24. Update the page_content field of the corresponding document_content object with the results processed by all cleaning components; S25. After the traversal is complete, return the cleaned list[document_content]; The cleaning components are created using the factory pattern. The add_next method is used to build a linked list of cleaning components, which is used to add or delete cleaning components and adjust the execution order.

5. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Document chunking includes: S31. Iterate through the list[document_content] returned by the content cleaning sub-process and extract the page_content field of each object; S32. Divide page_content into chunks according to the chunking strategy configured by the user, and return the chunk_list chunking result; Chunking strategies include delimiter-based chunking, hierarchical chunking, and Markdown document title formatting chunking, where the chunking results are adapted to the text length limits of large language models. S33. Create a new document_content object for each element in chunk_list as page_content, and save it to the list of objects after chunking; S34. Return the list[document_content] after the blocks are divided; The splitting strategy object is created using the factory pattern. When adding a new splitting strategy, a subclass of AbstractSplitter is implemented to complete the plug-in extension.

6. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Knowledge generation includes: S41. Iterate through the list[document_content] returned by the document chunking sub-process and extract the page_content field of each object; S42. Extract knowledge from page_content according to the knowledge generation strategy configured by the user and return a knowledge list; Knowledge generation strategies include: direct knowledge generation from document fragments, parent-child segmented knowledge generation, large language model-based knowledge generation, and question-and-answer pair knowledge generation. S43. Update the knowledge list to the knowledge_list field of the corresponding document_content object; S44. After the traversal is complete, return the updated list[document_content]; The knowledge generation strategy object is created by the KnowledgeExtractorFactory class. When adding a new knowledge generation strategy, a subclass of AbstractKnowledgeExtractor is implemented to complete the plug-in extension.

7. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Knowledge storage includes: S51. Iterate through the list[document_content] returned by the knowledge generation sub-process and extract the knowledge_list field of each object; S52. Save the knowledge in knowledge_list to the knowledge base according to the knowledge storage strategy configured by the user. The provided knowledge storage strategies include vector storage, keyword storage, and hybrid storage. Hybrid storage is completed by calling the implementation classes of vector storage and keyword storage respectively. S53. After traversal is completed, the entire knowledge base creation process is finished. The knowledge storage strategy object is created by the KnowledgeSaverFactory factory class. New storage strategies are added to implement the subclass of AbstractKnowledgeSaver, completing the plug-in extension. Also includes: AbstractProcessor has two subclasses: AutoProcessor and ProfessionalProcessor. AutoProcessor automatically bridges the default implementation classes of the five processing stages based on the file extension of the file to be processed, thus completing the automated parsing. ProfessionalProcessor creates implementation classes for five processing stages based on the user's custom configuration, leaving the control details of each stage to the user, thus achieving a high degree of configurability of the parsing rules.

8. The method for constructing a vehicle diagnosis and repair knowledge base according to claim 1, characterized in that, include: Diagnosis and repair knowledge inquiry includes: S61. Perform vectorization and keyword extraction on the user's question text; S62. Use the question vector to perform vector similarity retrieval in the knowledge base and obtain the top 5 results with the highest similarity. At the same time, use the question keywords to perform keyword retrieval in the knowledge base and obtain the top 5 results with the highest similarity. S63. Sort the two types of search results in a mixed order to obtain the basic answer materials for the knowledge base; S64. Integrate user questions and basic answer materials into prompts in a preset format and input them into the large language model; S65. The large language model summarizes and refines the answer based on the prompt words and the prompt word engineering template, and generates the final diagnosis and repair knowledge answer. In the knowledge question answering module, when performing a search, the top 3 knowledge items with the highest matching degree are first obtained as basic materials, and then processed by a large language model to generate answers.

9. The method for constructing a vehicle diagnosis and repair knowledge base according to any one of claims 1 to 8, characterized in that, include: Knowledge base creation and diagnostic knowledge retrieval are conducted based on a standardized knowledge base usage process; The workflow includes five modules: user management, knowledge base management, document management, document parsing, and knowledge Q&A, and sets two levels of permissions for administrators and ordinary users. Administrators have full permissions for knowledge base creation, user management, document management, document parsing, and knowledge Q&A, while ordinary users only have knowledge Q&A permissions. The knowledge base management module is used to manage access information for text embedding models, access information for large language models, and prompt word engineering templates for large language models. The document management module is used to upload and download knowledge documents from the knowledge base.

10. An electronic device, characterized in that, include: The processor, communication interface, memory, and communication bus are connected, with the processor, communication interface, and memory communicating with each other via the communication bus. The memory stores a computer program that, when executed by a processor, causes the processor to perform the steps of the vehicle diagnostic and repair knowledge base construction method as described in any one of claims 1 to 9.

Citation Information

Patent Citations

Commercial vehicle whole vehicle fault knowledge graph construction method, diagnosis and repair method and system
CN118312627A
Vehicle fault detection and maintenance method and device, electronic equipment and storage medium
CN119417441A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Commercial vehicle whole vehicle fault knowledge graph construction method, diagnosis and repair method and system

Vehicle fault detection and maintenance method and device, electronic equipment and storage medium