Object model generation method and apparatus, and knowledge base construction method and apparatus
By combining a large language model and an industrial knowledge base, and employing hybrid dynamic prompting engineering and text embedding processing, the problems of large model illusion and limited access to professional knowledge in industrial object model construction are solved, resulting in more accurate and efficient object model generation.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA UNITED NETWORK COMM GRP CO LTD
- Filing Date
- 2025-12-24
- Publication Date
- 2026-06-18
Smart Images

Figure CN2025145180_18062026_PF_FP_ABST
Abstract
Description
Object model generation method, knowledge base construction method and device
[0001] This application claims priority to Chinese patent application No. 202510125476.6, filed on January 26, 2025, the entire contents of which are incorporated herein by reference. Technical Field
[0002] This disclosure relates to the field of model generation technology, and in particular to a method for generating object models, a method for constructing knowledge bases, and an apparatus. Background Technology
[0003] An industrial object model is a way to digitally represent physical objects such as industrial equipment, production lines, and products in the context of the Industrial Internet and intelligent manufacturing. It is a simple form of 3D modeling or visual simulation. An industrial object model includes the behavioral characteristics, operating parameters, business logic, and related services and interfaces of these physical objects. Summary of the Invention
[0004] Firstly, a method for generating an object model is provided, applied to an object model generation device. The object model generation device is communicatively connected to a knowledge base construction device, which pre-stores a knowledge base. The knowledge base includes a text database storing standard data and a vector database storing multiple first vectors. The multiple first vectors are obtained by embedding the standard data into text. The object model generation method includes: acquiring user input instructions and embedding them into text to obtain second vectors, where the user input instructions refer to unstructured text data and the second vectors refer to sparse vectors in a high-dimensional space; retrieving multiple third vectors from the multiple first vectors stored in the vector database based on the second vectors, and filtering the search results from the standard data stored in the text database; and generating a target object model based on the user input instructions and the search results through a pre-set hybrid prompting process and a pre-set large language model.
[0005] In some embodiments, the hybrid suggestion engineering includes static suggestion engineering and dynamic suggestion engineering. Based on user input instructions and search results, a target object model is generated using a preset hybrid suggestion engineering and a preset large language model. This includes: obtaining the role, target question, target, and target constraints of the target object model based on user input instructions; establishing a static suggestion engineering based on the role, target question, target, and target constraints of the target object model; establishing a dynamic suggestion engineering based on the search results; and generating the target object model using the large language model based on the static and dynamic suggestion engineering.
[0006] In some embodiments, establishing a dynamic prompting project based on the search results includes: obtaining the equipment structure, technical parameters, maintenance content, and model specifications of the target object model based on the search results; and establishing a dynamic prompting project based on the equipment structure, technical parameters, maintenance content, and model specifications of the target object model.
[0007] In some embodiments, generating a target object model based on a large language model using static and dynamic prompting engineering includes: concatenating user input instructions, static prompting engineering, and dynamic prompting engineering to obtain a target input instruction; and generating a target object model based on the target input instruction using a large language model.
[0008] In some embodiments, retrieving multiple third vectors from multiple first vectors stored in a vector database based on a second vector includes: calculating the similarity between the second vector and each of the multiple first vectors according to a preset similarity algorithm; reversing the order of the multiple first vectors according to the similarity between the second vector and each of the multiple first vectors to obtain multiple sorted first vectors; and extracting multiple third vectors from the sorted multiple first vectors.
[0009] Secondly, a knowledge base construction method is provided, applied to a knowledge base construction apparatus. This method includes: acquiring standard data, performing text embedding processing on the standard data to obtain multiple first vectors; storing the standard data in a text database, and storing the multiple first vectors in a vector database to obtain a knowledge base. The knowledge base is used in the object model generation method of any embodiment of the first aspect described above.
[0010] In some embodiments, the standard data includes: labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data. The standard data undergoes text embedding processing to obtain multiple first vectors, including: classifying the labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data according to a preset classification standard to obtain multiple first text data; deduplicating the multiple first text data using a minimum hash algorithm to obtain multiple second text data; and performing text embedding processing on the multiple second text data to obtain multiple first vectors.
[0011] In some embodiments, text embedding processing is performed on multiple second text data to obtain multiple first vectors, including: performing text segmentation processing on multiple second text data according to a preset text length to obtain multiple third text data; performing deduplication processing on multiple third text data using a minimum hash algorithm to obtain multiple fourth text data; and performing text embedding processing on multiple fourth text data using a preset BERT-based general embedding model to obtain multiple first vectors.
[0012] Thirdly, a device for generating an object model is provided. The object model generating device is configured to communicate with a knowledge base building device, which pre-stores a knowledge base. The knowledge base includes a text database storing standard data and a vector database storing multiple first vectors. The multiple first vectors are obtained by embedding the standard data into text. The object model generating device includes: an instruction acquisition module, a vector retrieval module, and a model generation module.
[0013] The instruction acquisition module is used to acquire user input instructions and perform text embedding processing on the user input instructions to obtain a second vector. User input instructions refer to unstructured text data, and the second vector refers to a sparse vector in a high-dimensional space.
[0014] The vector retrieval module is used to retrieve multiple third vectors from multiple first vectors stored in the vector database based on the second vector, and to filter the retrieval results from the standard data stored in the text database.
[0015] The model generation module is used to generate target object models based on user input instructions and search results, using a pre-set hybrid prompting engineering and a pre-set large language model.
[0016] Fourthly, a knowledge base construction apparatus is provided. The knowledge base construction apparatus includes an acquisition module and a storage module.
[0017] The acquisition module is used to acquire standard data and perform text embedding processing on the standard data to obtain multiple first vectors.
[0018] The storage module stores standard data in a text database and multiple first vectors in a vector database to obtain a knowledge base. The knowledge base is used to generate an apparatus based on a material model from the third aspect.
[0019] Fifthly, an electronic device is provided. The electronic device includes a processor and a memory communicatively connected to the processor. The memory stores computer-executable instructions. The processor executes the computer-executable instructions stored in the memory to implement a method for generating an object model in any embodiment of the first aspect and a method for constructing a knowledge base in any embodiment of the second aspect.
[0020] Sixthly, a computer-readable storage medium is provided. The computer-readable storage medium stores computer-executable instructions. When executed by a processor, the computer-executable instructions are used to implement a method for generating an object model according to any embodiment of the first aspect and a method for constructing a knowledge base according to any embodiment of the second aspect.
[0021] In a seventh aspect, a computer program product is provided. The computer program product includes a computer program. When executed by a processor, the computer program is used to implement a method for generating an object model in any embodiment of the first aspect and a method for constructing a knowledge base in any embodiment of the second aspect. Attached Figure Description
[0022] To more clearly illustrate the technical solutions in the embodiments or related technologies of this disclosure, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. However, the accompanying drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0023] Figure 1 is a block diagram of an apparatus according to some embodiments;
[0024] Figure 2 is a flowchart of a method for generating an object model according to some embodiments;
[0025] Figure 3 is a flowchart of another object model generation method according to some embodiments;
[0026] Figure 4 is a flowchart of another object model generation method according to some embodiments;
[0027] Figure 5 is a flowchart of a knowledge base construction method according to some embodiments;
[0028] Figure 6 is a flowchart of another knowledge base construction method according to some embodiments;
[0029] Figure 7 is a flowchart of yet another knowledge base construction method according to some embodiments;
[0030] Figure 8 is a flowchart of another object model generation method according to some embodiments;
[0031] Figure 9 is a block diagram of a knowledge base according to some embodiments;
[0032] Figure 10 is a block diagram of a physical model generation apparatus according to some embodiments;
[0033] Figure 11 is a block diagram of another object model generation apparatus according to some embodiments;
[0034] Figure 12 is a block diagram of a knowledge base construction apparatus according to some embodiments;
[0035] Figure 13 is a block diagram of another knowledge base construction apparatus according to some embodiments;
[0036] Figure 14 is a block diagram of an electronic device according to some embodiments.
[0037] Reference numerals: 100 - Object model generation device; 200 - Knowledge base construction device; 210 - Knowledge base; 211 - Text database; 212 - Vector database; 2111 - Topic classification library; 2112 - Industry classification library; 2113 - Application classification library; 2114 - Supplementary classification library; 810 - Instruction acquisition module; 820 - Vector retrieval module; 8201 - Similarity module; 8202 - Sorting module; 8203 - Extraction module; 830 - Model generation module; 8301 - Static engineering module; 8302 - Dynamic engineering module; 8303 - Target object model module; 8304 - Dynamic data module; 8305 - Dynamic creation module; 8306 - Target instruction module; 8307 - Generated model module; 910 - Acquisition module; 9101 - Classification module; 9102 - Deduplication module; 9103 - Embedding processing module; 9104 - Segmentation processing module; 9105 - Processing module; 9106 - Vector conversion module; 920 - Storage module; 1010 - Processor; 1020 - Memory; 1030 - Communication component; 1040 - Bus. Detailed Implementation
[0038] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0039] In the embodiments of this disclosure, the terms "first" and "second" are used to distinguish identical or similar items with substantially the same function and effect. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, and that "first" and "second" do not necessarily imply difference. It should be noted that in the embodiments of this disclosure, words such as "exemplarily" or "for example" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplarily" or "for example" in this disclosure should not be construed as being more preferred or advantageous than other embodiments or design solutions. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts by way of example. In the embodiments of this disclosure, "at least one" refers to one or more, and "more than one" refers to two or more.
[0040] It should be noted that the phrase "at the moment when..." in the embodiments of this disclosure can refer to the instant at which a certain situation occurs, or to a period of time after the occurrence of a certain situation; the embodiments of this disclosure do not limit this. Furthermore, the object model generation method and the knowledge base construction method provided in the embodiments of this disclosure are merely examples, and the object model generation method and the knowledge base construction method may also include more or less content.
[0041] To facilitate a clear description of the technical solutions in the embodiments of this disclosure, some terms and technologies involved in the embodiments of this disclosure will be briefly introduced below:
[0042] Industrial object modeling (IAM) typically refers to a method of digitally describing equipment, products, production lines, and even entire factories in the physical world, within scenarios such as the Industrial Internet, the Internet of Things (IoT), and smart manufacturing. This model serves as a bridge between physical assets and the digital world, helping engineers and developers understand, simulate, predict, and optimize industrial processes.
[0043] Based on Generalized Embedding (BGE) models are general embedding models based on Bidirectional Encoder Representations from Transformers (BERT). They aim to generate high-quality text embedding vectors that can be used for various natural language processing tasks, such as information retrieval, semantic similarity calculation, and question answering systems. BGE models capture deep semantic features of text by fine-tuning a pre-trained BERT model and may combine with other techniques in some cases to enhance their performance. The BGE model disclosed herein can be the BGE-zh model, i.e., a BGE model optimized for Chinese.
[0044] BERT-based general embedding models are a technique that uses pre-trained language models to generate text embedding vectors. They capture contextual information in text through a bidirectional decoder and encoder, thus providing powerful semantic representations for natural language processing tasks. BERT-based general embedding models are widely used in various tasks, such as text classification, sentiment analysis, question answering systems, and named entity recognition.
[0045] An industrial knowledge base refers to a collection of knowledge established within a specific industrial field to support decision-making, optimize production processes, and improve product quality and efficiency. In this disclosure, the industrial knowledge base is a knowledge base that integrates various professional knowledge and model data, and is established according to certain classification standards.
[0046] Hybrid dynamic engineering: In this disclosure, it refers to a prompting project that is a combination of static prompting project and dynamic prompting project.
[0047] Static hint engineering is a subfield of hint engineering. It focuses on designing fixed, predefined text hints for large models that do not dynamically change based on context or user input. Static hints are typically used in applications requiring consistency and predictability, such as automated question answering systems, text generation tasks, and classification tasks.
[0048] Dynamic tooltip engineering is an advanced area of tooltip engineering. It focuses on creating tools that can automatically adjust based on context, user input, or other variables. Unlike static tools, dynamic tools can adaptively change their content according to different conditions and scenarios, thus providing greater flexibility to meet diverse task requirements.
[0049] Large models: These typically refer to deep learning models in the fields of machine learning and artificial intelligence that have a very large number of parameters, huge training datasets, and high computational resource consumption.
[0050] Large language models: These typically refer to deep learning models in machine learning and artificial intelligence that have a very large number of parameters, huge training datasets, and high computational resource consumption. Due to their scale, these models have stronger expressive and generalization capabilities, can capture more complex patterns, and perform well on various tasks.
[0051] The "big model illusion" refers to the phenomenon where large language models, when generating text or providing information, may produce content that appears reasonable but is actually inaccurate, nonexistent, or misleading. This phenomenon typically occurs when the model makes inferences or extrapolations based on its internal parameters and training data, especially when dealing with content beyond its training scope or lacking sufficient contextual support. The big model illusion includes intrinsic and extrinsic illusions. Intrinsic illusions refer to a conflict between the model's output and the input content, while extrinsic illusions refer to information generated by the model that cannot be verified by the input content.
[0052] In the industrial sector, from early manual production to mechanization and automation, and now to intelligent manufacturing, each transformation has greatly boosted productivity and promoted social progress.
[0053] Industrial object models, in the context of the Industrial Internet and intelligent manufacturing, refer to a method of digitally representing physical objects such as industrial equipment, production lines, and products in the physical world. By establishing such models, virtualized management, simulation, optimization, and intelligent operation of industrial assets can be achieved. Through industrial object models, combining human creativity with machine efficiency, new ways of working can be created, and the interconnectivity between devices makes the production process transparent and intelligent, thereby supporting remote monitoring and operation.
[0054] In related technologies, the construction of industrial object models mainly involves manually constructed models that are then materialized and assembled to achieve a usable effect; or large models combined with manually input knowledge are used to construct industrial object models. However, due to the illusion of a large model, conflicts may arise between the model's output and input content, making it impossible to verify the information generated by the model using the input content. Furthermore, the models acquire knowledge solely through manual input or publicly available knowledge on the internet, resulting in limited access to specialized knowledge and scarce data, which may lead to inaccurate model outputs.
[0055] To address the aforementioned issues, this disclosure provides a method for generating object models, a method for constructing knowledge bases, and an apparatus, which can be used in the field of model generation technology. The aim is to solve the problem that large models rely on a single source of professional knowledge, resulting in the inability to accurately generate object models.
[0056] Considering the widespread adoption of internet and artificial intelligence technologies, large-scale models are used to generate object models. In some embodiments of this disclosure, the object model generation approach is improved to achieve accurate object model generation.
[0057] Furthermore, the knowledge source for the object model has been optimized to an industry knowledge base to ensure that the object model can acquire comprehensive professional knowledge. This knowledge base not only covers various object model construction specifications and standards, but also includes professional knowledge from various fields and human-aided knowledge, thus providing a solid knowledge foundation for large models and helping to generate accurate and professional object models.
[0058] Furthermore, in some embodiments of the present disclosure, a hybrid dynamic prompting engineering technique is introduced to improve the understanding of user input instructions, enabling the large model to easily understand the user's intent, ensuring that the generated object model is more in line with actual needs and expectations, thereby further improving the accuracy of the object model.
[0059] The technical solutions of this disclosure and how they solve the aforementioned technical problems will be described in detail below. It is understood that the embodiments in this disclosure can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of this disclosure will now be described with reference to the accompanying drawings.
[0060] Figure 1 is a block diagram of an apparatus according to some embodiments. As shown in Figure 1, the object model generation apparatus 100 is communicatively connected to the knowledge base construction apparatus 200. The object model generation method is applied to the object model generation apparatus 100, and the knowledge base construction apparatus 200 pre-stores a knowledge base 210. The knowledge base 210 includes a text database 211 storing standard data and a vector database 212 storing multiple first vectors. The multiple first vectors are obtained by embedding the standard data into text.
[0061] Figure 2 is a flowchart of a method for generating an object model according to some embodiments. As shown in Figure 2, the method includes steps S201 to S203:
[0062] In S201, the user input command is obtained and the user input command is processed by text embedding to obtain the second vector.
[0063] User input commands refer to unstructured text data, and the second vector refers to a sparse vector in a high-dimensional space.
[0064] For example, the user input command refers to the command entered by the user through the large model. The second vector refers to the question text vector obtained by vector transformation based on the user input command through the BGE-zh model.
[0065] The system acquires user input commands and performs text embedding processing on the user input commands using the BGE model, encoding the user input commands into a second vector.
[0066] In S202, based on the second vector, multiple third vectors are retrieved from multiple first vectors stored in the vector database, and the retrieval results are obtained by filtering from the standard data stored in the text database.
[0067] For example, the knowledge base contains a vector database that stores multiple first vectors. Furthermore, the first vectors in the vector database and the standard data in the text database have a mapping relationship between the standard data and the first vectors.
[0068] Based on the second vector, similarity is calculated between the second vector and multiple first vectors in the vector database. Similarity calculation methods include the cosine function algorithm and the Euclidean distance function.
[0069] The selected first vectors are sorted from highest to lowest similarity, and multiple elements from each first vector are taken to obtain multiple third vectors. An element refers to the value or other data structure of each first vector in a specific dimension.
[0070] Based on the mapping relationship between the third vector and the standard data, multiple standard data are obtained, which are the search results.
[0071] In S203, based on user input instructions and search results, a target object model is generated through a pre-set hybrid prompting process and a pre-set large language model.
[0072] For example, hybrid prompting engineering includes static prompting engineering and dynamic prompting engineering. Static prompting engineering includes multiple steps: role creation, target problem description, target setting, and target constraint addition. Dynamic prompting engineering includes multiple steps: acquiring equipment structure data, technical parameters, maintenance content, object model specifications, and other supplementary content. Other supplementary content can be the user manual content of the target object model input by the user, or other relevant content from the knowledge base.
[0073] Based on user input commands and pre-defined hybrid prompts, the large language model is used as input to accurately generate the target object model.
[0074] In a method for generating an object model provided in some embodiments of this disclosure, user input instructions are obtained, and the user input instructions are processed by text embedding to obtain a second vector. The user input instructions refer to unstructured text data, and the second vector refers to a sparse vector in a high-dimensional space. Based on the second vector, multiple third vectors are retrieved from multiple first vectors stored in a vector database, and retrieval results are obtained by filtering from standard data stored in a text database. Based on the user input instructions and retrieval results, a target object model is generated through a preset hybrid prompting engineering and a preset large language model.
[0075] This allows user input commands to be converted into vectors, supporting complex calculations and processing even complex user input commands, while also facilitating subsequent retrieval. Furthermore, by searching the knowledge base and obtaining search results, subsequent model generation can reference more professional knowledge, and precise information retrieval can be performed based on a vector database built from a professional knowledge base, extracting useful information to help generate the finished product model. In addition, through hybrid prompting engineering, large models can accurately understand the user's question and generate industrial product models that meet the user's requirements.
[0076] Figure 3 is a flowchart of another object model generation method according to some embodiments. As shown in Figure 3, step S202 includes steps S2021 to S2024:
[0077] In S2021, the similarity between the second vector and each of the first vectors is calculated according to the preset similarity algorithm.
[0078] For example, the similarity between the second vector and each of the first vectors can be calculated using a pre-defined Euclidean distance function algorithm or a cosine function algorithm.
[0079] In S2022, based on the similarity between the second vector and each first vector, the multiple first vectors are sorted in reverse order to obtain the sorted multiple first vectors.
[0080] For example, based on the similarity between the second vector and each first vector, they are sorted in descending order of similarity to obtain a sorted set of first vectors.
[0081] In S2023, multiple third vectors are extracted from the sorted multiple first vectors.
[0082] For example, based on the second vector, multiple first vectors are retrieved from multiple first vectors stored in the vector database. From these sorted first vectors, multiple elements are extracted to form multiple third vectors. An element refers to the value or other data structure of each first vector in a specific dimension.
[0083] In S2024, search results are obtained by filtering from the standard data stored in the text database.
[0084] For example, the search results refer to the standard data corresponding to multiple third vectors.
[0085] Based on the mapping relationship between the standard data in the text database and the first vector, the retrieval results are obtained through multiple third vectors.
[0086] In some embodiments, as shown in FIG3, step S203 includes steps S2031 to S2033.
[0087] In S2031, based on the user input instructions, the roles, target problems, targets, and target constraints of the target object model are obtained, and a static prompting project is established based on the roles, target problems, targets, and target constraints of the target object model.
[0088] For example, based on user input commands and search results, a target object model is generated using a pre-built hybrid prompting process and a pre-built large language model.
[0089] "Role creation" refers to the role the user expects the target object model to be built upon, such as an expert in the field. "Target problem" refers to the expected problem extracted based on user input instructions. "Target" refers to the target object model extracted based on user input instructions. "Target constraints" refers to the constraints imposed on the target object model extracted based on user input instructions.
[0090] Extract the roles, target problems, targets, and target constraints of the target object model based on user input instructions, and then create a static prompting project based on the roles, target problems, targets, and target constraints of the target object model.
[0091] In S2032, a dynamic suggestion project is established based on the search results.
[0092] Figure 4 is a flowchart of another object model generation method according to some embodiments. In some embodiments, as shown in Figure 4, step S2032 includes steps S20321 and S20322.
[0093] In S20321, based on the search results, the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model are obtained.
[0094] For example, based on multiple standard data from the search results, we can obtain the equipment structure data, core parameter data, maintenance content data, object model specification data, and other supplementary content corresponding to the target object model.
[0095] In S20322, a dynamic prompting project is established based on the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model.
[0096] For example, a dynamic prompting project can be established based on the equipment structure data, core parameter data, maintenance content data, object model specification data, and other supplementary content corresponding to the target object model.
[0097] In S2033, the target object model is generated based on the static and dynamic prompting engineering and through the large language model.
[0098] In some embodiments, as shown in FIG4, step S2033 includes steps S20331 and S20332.
[0099] In S20331, the user input command, static prompt project, and dynamic prompt project are concatenated to obtain the target input command.
[0100] For example, the target input command can be obtained by combining the static and dynamic target model data corresponding to the user input command.
[0101] In S20332, the target object model is generated based on the target input instruction and through the large language model.
[0102] For example, the target input instruction can be used as input to a large language model, and the target object model can be generated through the large language model.
[0103] In the methods provided in some embodiments of this disclosure, by constructing both static and dynamic prompting projects, and by combining user input instructions, the knowledge acquired by the large language model can be more accurate, and the large language model can more accurately understand user questions and generate industrial object models that meet user requirements.
[0104] Figure 5 is a flowchart of a knowledge base construction method according to some embodiments. As shown in Figure 5, the knowledge base construction method provided by some embodiments of this disclosure includes steps S401 and S402.
[0105] In S401, standard data is acquired and text embedding is performed on the standard data to obtain multiple first vectors.
[0106] For example, standard data includes industry object model library data, industry standard data, technical manual data, object model construction standard data, and other data.
[0107] Obtain standard data and perform text embedding processing on the standard data using the BGE model to obtain multiple first vectors.
[0108] In S402, standard data is stored in a text database, and multiple first vectors are stored in a vector database to obtain a knowledge base.
[0109] For example, standard data can be stored in a text database, and multiple first vectors can be stored in a vector database. The knowledge base then includes both a text database and a vector database, as shown in Figure 1. The content of the knowledge base can be continuously updated, either through manually inputting professional knowledge or by providing new professional knowledge published online.
[0110] In a knowledge base construction method provided in some embodiments of this disclosure, a professional industrial knowledge base is used to construct a vector database. Based on precise information retrieval, relevant information can be extracted to help generate industrial product models. The knowledge base is continuously updated, ensuring the timeliness and availability of the knowledge base data.
[0111] Figure 6 is a flowchart of another knowledge base construction method according to some embodiments. As shown in Figure 6, step S401 includes steps S501 to S504.
[0112] In S501, standard data is obtained.
[0113] Standard data includes: labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data.
[0114] For example, standard data includes industry object model library data, industry standard data, technical manual data, object model construction standard data, and other data. All standard data has been manually annotated.
[0115] In S502, standard data is constructed based on labeled industry standard data, labeled industry object model library data, and labeled object models. It is then classified according to preset classification standards to obtain multiple first text data.
[0116] For example, standard data can be processed by text embedding to obtain multiple first vectors. The classification criteria may include subject, industry, and application.
[0117] Based on labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data, the data is systematically organized and classified by dividing the standards to obtain multiple first text data, which facilitates large model retrieval enhancement and supports applications.
[0118] In S503, multiple first text data are deduplicated using the minimum hash algorithm to obtain multiple second text data.
[0119] For example, multiple first text data are deduplicated using a minimum hash algorithm to obtain multiple second text data. All second text data are then entered into the text database of the knowledge base in a uniform text format.
[0120] In S504, multiple second text data are processed by text embedding to obtain multiple first vectors.
[0121] Figure 7 is a flowchart of another knowledge base construction method according to some embodiments. In some embodiments, as shown in Figure 7, step S504 includes steps S5041 to S5043.
[0122] In S5041, multiple second text data are processed by text segmentation according to the preset text length to obtain multiple third text data.
[0123] For example, the preset text length refers to the maximum text length limit for BGE model encoding.
[0124] In this way, based on the preset text length, multiple second text data are processed by text segmentation to obtain multiple third text data.
[0125] In S5042, multiple third text data are deduplicated using the minimum hash algorithm to obtain multiple fourth text data.
[0126] For example, by using the minimum hash algorithm to remove duplicates from multiple third-party text data, multiple fourth-party text data can be obtained.
[0127] In S5043, multiple fourth text data are processed by a pre-built BERT-based general embedding model to obtain multiple first vectors.
[0128] For example, the obtained fourth text data is processed by text embedding using a pre-built BERT general embedding model to obtain multiple first vectors. These multiple first vectors are stored in the vector database of the knowledge base.
[0129] In the methods provided in some embodiments of this disclosure, the acquired standard data is processed through various processing methods to obtain professional knowledge of the text database, multiple sets of vectors are obtained through the BGE model, and a vector database is constructed, which makes it easier for large models to perform accurate data retrieval.
[0130] The following example, a "25MN rapid forging equipment" in the industrial field, illustrates how to construct an industrial object model.
[0131] Figure 8 is a flowchart of another object model generation method according to some embodiments; Figure 9 is a block diagram of a knowledge base according to some embodiments. As shown in Figure 9, the text database 211 includes a topic classification database 2111, an industry classification database 2112, an application classification database 2113, and a supplementary classification database 2114. The topic classification database 2111 is a database that stores objects according to topics, including high-end equipment object model databases, fire safety object model databases, and automobile manufacturing object model databases in the industry classification database; the industry classification database 2112 is a database that stores data according to different industries, including industry standards and technical manuals such as the automotive industry standard manual and forging technology manual; the application classification database 2113 is a database that stores objects according to user classification, including the Open Platform Communications Unified Architecture (OPC UA) standard and the Asset Administration Shell (AAS) standard for object model construction standards. The supplementary classification database 2114 refers to supplementary information input by the user or data related to the target object model in the knowledge base.
[0132] In this case, the object model generation method includes steps S601 to S606.
[0133] In S601, the user inputs the command "25MN rapid forging equipment" and determines the role.
[0134] A role refers to the role that the user expects to establish in the target object model, such as a factory worker in a 25MN rapid forging machine.
[0135] In S602, the user input command "25MN rapid forging equipment" is processed by text embedding through the BGE model to obtain the question text vector E. query .
[0136] In S603, the question text vector E query Calculate similarity with multiple first vectors in the vector database.
[0137] Taking the cosine function algorithm as an example, the formula for calculating the similarity between the question text vector and the first vector is: Similarity(E) query E candidate )= <E query E candidate > (1)
[0138] E candidate As the first candidate vector, Similarity(E) query E candidate The similarity is between the question text vector and the first candidate vector. <E query E candidate > refers to the inner product between the question text vector and the first candidate vector.
[0139] Multiple first vectors can be filtered using formula (2) to obtain filtered first vectors. Similarity(E) query E candidate )= <E query E candidate >≥θ. (2)
[0140] θ is a preset similarity threshold, and its value is determined based on the actual situation. The filtered first vectors can form a vector set, i.e., {E}. candidate | <E query E candidate >≥θ}.
[0141] In S604, multiple first vectors (such as the multiple first vectors obtained by filtering and screening as described above) are sorted in order of similarity from high to low to obtain multiple third vectors.
[0142] Sort the multiple first vectors in descending order of similarity (i.e., in reverse order).
[0143] Then, the first k first vectors from the sorted first vectors are taken to obtain multiple third vectors. These multiple third vectors can form a final vector set, i.e. E top1 It can be understood as the first first vector among multiple first vectors after sorting (i.e., the first vector with the highest similarity).
[0144] In S605, multiple text data are obtained based on the mapping relationship between text data (i.e., the aforementioned standard data) and the first vector within the knowledge base.
[0145] In S606, a target object model is generated based on multiple text data and user input instructions through a combination of prompting engineering and large modeling.
[0146] Based on the user's input instructions, static prompts are obtained regarding relevant information in the project, such as: the role is set as a factory worker; the target problem is how to build a model of the forging equipment; the target is to build the forging equipment; and the target constraint is 25MN.
[0147] Based on multiple text data, relevant information in the dynamic prompt project is obtained, such as: equipment structure data for rapid forging equipment; technical parameters data for rapid forging equipment; maintenance content data for rapid forging equipment; object model specifications data for rapid forging equipment object model construction standards and industry standard data; and other supplementary content related to rapid forging equipment, either from the knowledge base or user input.
[0148] The target input instruction is obtained by concatenating the user input command, the static prompt project, and the dynamic prompt project. For example, the target input instruction is a data set composed of the user input command, the static prompt project, and the dynamic prompt project.
[0149] The target input command is used as input to the large model, and then the physical model of the 25MN rapid forging equipment is generated through the large model.
[0150] In the methods provided in some embodiments of this disclosure, the generation time of a single object model is reduced and the overall development efficiency of device access is improved through the cooperation between knowledge bases and large models, thereby enabling the rapid generation of more accurate, professional, and user-relevant object models.
[0151] This disclosure embodiment can divide an electronic device or main control device into functional modules according to the above method examples. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing unit. The integrated unit can be implemented in hardware or as a software functional module. It should be noted that the module division in this disclosure embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation.
[0152] Some embodiments of this disclosure also provide an apparatus for generating object models.
[0153] As shown in Figure 1, the object model generation device is communicatively connected to the knowledge base construction device, which pre-stores a knowledge base. The knowledge base includes a text database storing standard data and a vector database storing multiple first vectors, which are obtained by embedding the standard data into text. Figure 10 is a block diagram of an object model generation device according to some embodiments. As shown in Figure 10, the object model generation device includes an instruction acquisition module 810, a vector retrieval module 820, and a model generation module 830.
[0154] The instruction acquisition module 810 is used to acquire user input instructions and perform text embedding processing on the user input instructions to obtain a second vector. The user input instructions refer to unstructured text data, and the second vector refers to a sparse vector in a high-dimensional space.
[0155] The vector retrieval module 820 is used to retrieve multiple third vectors from multiple first vectors stored in the vector database based on the second vector, and to filter the retrieval results from the standard data stored in the text database.
[0156] The model generation module 830 is used to generate a target object model based on user input instructions and search results, using a pre-set hybrid prompting engineering and a pre-set large language model.
[0157] In some embodiments, hybrid prompting engineering includes static prompting engineering and dynamic prompting engineering. In this case, FIG11 is a block diagram of another object model generation apparatus according to some embodiments. As shown in FIG11, the model generation module 830 includes: a static engineering module 8301, a dynamic engineering module 8302, and a target object model module 8303.
[0158] The static engineering module 8301 is used to obtain the role, target problem, target and target constraints of the target object model according to the user input instructions, and to create a static prompt project based on the role, target problem, target and target constraints of the target object model.
[0159] The Dynamic Engineering module 8302 is used to create dynamic suggestion projects based on the search results.
[0160] The target object model module 8303 is used to generate a target object model based on the static prompting project and the dynamic prompting project, using a large language model.
[0161] In some embodiments, as shown in FIG11, the dynamic engineering module 8302 includes a dynamic data module 8304 and a dynamic creation module 8305.
[0162] The dynamic data module 8304 is used to obtain the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model based on the search results.
[0163] The dynamic creation module 8305 is used to create dynamic prompting projects based on the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model.
[0164] In some embodiments, as shown in FIG11, the target object model module 8303 includes a target instruction module 8306 and a model generation module 8307.
[0165] The target instruction module 8306 is used to concatenate user input instructions, static prompts, and dynamic prompts to obtain the target input instruction.
[0166] The model generation module 8307 is used to generate a target object model based on the target input instructions and through a large language model.
[0167] In some embodiments, as shown in FIG11, the vector retrieval module 820 includes: a similarity module 8201, a sorting module 8202, and an extraction module 8203.
[0168] The similarity module 8201 is used to calculate the similarity between the second vector and each of the first vectors according to a preset similarity algorithm.
[0169] The sorting module 8202 is used to sort multiple first vectors in reverse order based on the similarity between the second vector and each first vector, so as to obtain multiple sorted first vectors.
[0170] The extraction module 8203 is used to extract multiple third vectors from multiple sorted first vectors.
[0171] The object model generation apparatus provided in some embodiments of this disclosure can execute the object model generation method in the above embodiments. Its implementation principle and technical effect are similar, and will not be described again here.
[0172] Figure 12 is a block diagram of a knowledge base construction apparatus according to some embodiments. As shown in Figure 12, the knowledge base construction apparatus includes an acquisition module 910 and a storage module 920.
[0173] The acquisition module 910 is used to acquire standard data and perform text embedding processing on the standard data to obtain multiple first vectors.
[0174] Storage module 920 is used to store standard data into a text database and store multiple first vectors into a vector database to obtain a knowledge base.
[0175] In some embodiments, the standard data includes: labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data. In this case, FIG13 is a block diagram of another knowledge base construction apparatus according to some embodiments. As shown in FIG13, the acquisition module 910 includes: a classification module 9101, a deduplication module 9102, and an embedding processing module 9103.
[0176] The classification module 9101 is used to construct standard data based on labeled industry standard data, labeled industry object model library data, and labeled object models, and classify them according to preset classification standards to obtain multiple first text data.
[0177] The deduplication module 9102 is used to deduplicatize multiple first text data using a minimum hash algorithm to obtain multiple second text data.
[0178] The embedding processing module 9103 is used to perform text embedding processing on multiple second text data to obtain multiple first vectors.
[0179] In some embodiments, as shown in FIG13, the embedding processing module 9103 includes: a segmentation processing module 9104, a processing module 9105, and a vector conversion module 9106.
[0180] The segmentation processing module 9104 is used to perform text segmentation processing on multiple second text data according to a preset text length to obtain multiple third text data.
[0181] The processing module 9105 is used to perform deduplication on multiple third text data using a minimum hash algorithm to obtain multiple fourth text data.
[0182] The vector transformation module 9106 is used to perform text embedding processing on multiple fourth text data through a pre-set BERT-based general embedding model to obtain multiple first vectors.
[0183] This disclosure provides a knowledge base construction apparatus in some embodiments, which can execute the knowledge base construction method of the above embodiments. Its implementation principle and technical effect are similar, and will not be described again here.
[0184] In the aforementioned embodiments of the object model generation method and the knowledge base construction method, each module can be implemented as a processor. The processor can execute computer execution instructions stored in the memory, thereby enabling the processor to execute the aforementioned object model generation method and the knowledge base construction method.
[0185] Figure 14 is a block diagram of an electronic device according to some embodiments. As shown in Figure 14, the electronic device 2000 includes at least one processor 1010 and a memory 1020. The electronic device also includes a communication component 1030. The processor 1010, the memory 1020, and the communication component 1030 are connected via a bus 1040.
[0186] At least one processor 1010 executes computer execution instructions stored in memory 1020, causing at least one processor 1010 to execute a material model generation method and a knowledge base construction method in any of the above embodiments.
[0187] The implementation process of processor 1010 can be found in the above method embodiments, and its implementation principle and technical effect are similar, so they will not be repeated here.
[0188] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this disclosure can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.
[0189] The memory may include high-speed random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.
[0190] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.
[0191] The above description addresses the functions implemented by electronic devices and main control devices, and introduces the solutions provided by the embodiments of this disclosure. It is understood that, in order to achieve the above functions, the electronic device or main control device includes at least one of the hardware structures or software modules corresponding to each function. By combining the units and algorithm steps of the various examples described in the embodiments of this disclosure, the embodiments of this disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the technical solutions of the embodiments of this disclosure.
[0192] Some embodiments of this disclosure also provide a computer-readable storage medium (such as a non-transitory computer-readable storage medium) storing computer-executable instructions. When executed by a processor, the computer-executable instructions are used to implement a method for generating an object model and a method for constructing a knowledge base in any of the above embodiments.
[0193] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof.
[0194] Examples of readable storage media include Static Random Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), magnetic storage, flash memory, magnetic disks, or optical disks. Readable storage media can be any available medium accessible to general-purpose or special-purpose computers.
[0195] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an application-specific integrated circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in an electronic device or a host device.
[0196] This disclosure also provides a computer program product in some embodiments. The computer program product includes a computer program. The computer program is stored in a readable storage medium, and at least one processor of an electronic device can read the computer program from the readable storage medium. When the computer program is executed by the processor, it is used to implement a method for generating a physical model and a method for constructing a knowledge base on the storage medium.
[0197] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disk, or optical disk. The technical solutions of this disclosure have been described above with reference to the exemplary embodiments shown in the accompanying drawings. However, those skilled in the art will readily understand that the scope of protection of this disclosure is obviously not limited to these specific embodiments. The above embodiments are only used to illustrate the technical solutions of this disclosure and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this disclosure.
Claims
1. A method for generating an object model, applied to an object model generating device, wherein the object model generating device is communicatively connected to a knowledge base building device, the knowledge base building device pre-stores a knowledge base; the knowledge base includes a text database storing standard data and a vector database storing multiple first vectors, wherein the multiple first vectors are obtained by embedding the standard data into text; wherein, The object model generation method includes: The user input instruction is obtained and the user input instruction is processed by text embedding to obtain a second vector; wherein the user input instruction refers to unstructured text data and the second vector refers to a sparse vector in a high-dimensional space. Based on the second vector, multiple third vectors are retrieved from the plurality of first vectors stored in the vector database, and retrieval results are obtained by filtering from the standard data stored in the text database; Based on the user input instructions and the search results, a target object model is generated using a preset hybrid prompting process and a preset large language model.
2. The object model generation method according to claim 1, wherein, The hybrid suggestion engineering includes static suggestion engineering and dynamic suggestion engineering. The step of generating the target object model based on the user input command and the search results, using a preset hybrid suggestion engineering and a preset large language model, includes: Based on the user input instructions, the role, target problem, target, and target constraints of the target object model are obtained, and the static prompting project is established based on the role, target problem, target, and target constraints of the target object model. Based on the search results, the dynamic suggestion project is established; Based on the static prompting project and the dynamic prompting project, the target object model is generated through the large language model.
3. The object model generation method according to claim 2, wherein, The step of establishing the dynamic suggestion project based on the search results includes: Based on the search results, the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model are obtained; Based on the equipment structure, technical parameters, maintenance content, and object model specifications of the target object model, the dynamic prompting project is established.
4. The object model generation method according to claim 2 or 3, wherein, The step of generating the target object model based on the static prompting project and the dynamic prompting project, through the large language model, includes: The user input command, the static prompt project, and the dynamic prompt project are concatenated to obtain the target input command. Based on the target input instruction, the target object model is generated using the large language model.
5. The object model generation method according to any one of claims 1 to 4, wherein, The step of retrieving the plurality of third vectors from the plurality of first vectors stored in the vector database based on the second vector includes: According to a preset similarity algorithm, the similarity between the second vector and each of the plurality of first vectors is calculated. Based on the similarity between the second vector and each of the first vectors, the plurality of first vectors are sorted in reverse order to obtain a plurality of sorted first vectors; Extract the plurality of third vectors from the sorted plurality of first vectors.
6. A knowledge base construction method, applied to a knowledge base construction apparatus, the knowledge base construction method comprising: Obtain standard data and perform text embedding processing on the standard data to obtain multiple first vectors; The standard data is stored in a text database, and the plurality of first vectors are stored in a vector database to obtain the knowledge base; wherein the knowledge base is used in the object model generation method according to any one of claims 1 to 5.
7. The knowledge base construction method according to claim 6, wherein, The standard data includes: labeled industry standard data, labeled industry object model library data, and labeled object model construction standard data. The standard data is then subjected to text embedding processing to obtain the plurality of first vectors, including: Based on the labeled industry standard data, the labeled industry object model library data, and the labeled object model construction standard data, multiple first text data are obtained by classifying them according to preset classification standards. The multiple first text data are deduplicated using the minimum hash algorithm to obtain multiple second text data. Text embedding processing is performed on the plurality of second text data to obtain the plurality of first vectors.
8. The knowledge base construction method according to claim 7, wherein, The step of performing text embedding processing on the plurality of second text data to obtain the plurality of first vectors includes: Based on a preset text length, the multiple second text data are processed into multiple third text data. The multiple third text data are deduplicated using the minimum hash algorithm to obtain multiple fourth text data. By using a pre-set BERT-based general embedding model, the multiple fourth text data are processed to embed the multiple first vectors.
9. A device for generating object models, wherein, The object model generation device is configured to communicate with a knowledge base construction device, which pre-stores a knowledge base. The knowledge base includes a text database storing standard data and a vector database storing multiple first vectors, which are obtained by embedding the standard data into text. The object model generation device includes: The instruction acquisition module is used to acquire user input instructions and perform text embedding processing on the user input instructions to obtain a second vector; wherein, the user input instructions refer to unstructured text data, and the second vector refers to a sparse vector in a high-dimensional space; A vector retrieval module is configured to retrieve multiple third vectors from the plurality of first vectors stored in the vector database based on the second vector, and to filter the retrieval results from the standard data stored in the text database; and The model generation module is used to generate a target object model based on the user input instructions and the search results, using a preset hybrid prompting engineering and a preset large language model.
10. A knowledge base construction apparatus, comprising: The acquisition module is used to acquire standard data and perform text embedding processing on the standard data to obtain multiple first vectors; as well as A storage module is used to store the standard data in a text database and the plurality of first vectors in a vector database to obtain the knowledge base; wherein the knowledge base is used in the object model generation apparatus according to claim 9.