An HPC knowledge question and answer method and system, device and storage medium based on retrieval enhancement generation and terminology system construction
By constructing an HPC terminology system and a vectorized knowledge base, combined with a large language model, the problem of insufficient professionalism and logical depth in answers in the field of high-performance computing is solved, achieving accurate synthesis and deep reasoning, and generating professional and reliable answers.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHENGZHOU UNIV
- Filing Date
- 2026-04-16
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies cannot effectively perform deep conceptual associations and reasoning in the field of high-performance computing, resulting in insufficient professionalism and logical depth in the answers. Furthermore, the granularity of the retrieval unit does not match the needs of precise question answering in the domain, increasing the risk of generating errors.
We construct an HPC terminology system, identify key terms in queries and their superordinate and equivalent concepts to generate semantically enhanced query representations, retrieve them in a vectorized knowledge base, and generate answers by combining them with a large language model.
It achieves a deep understanding and accurate retrieval of concepts in the field of high-performance computing, generating logically rigorous and comprehensive in-depth analysis answers, thus improving the professionalism and reliability of the answers.
Smart Images

Figure CN122196141A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of retrieval enhancement generation technology, specifically relating to an HPC knowledge question answering method, system, device, and storage medium based on retrieval enhancement generation and terminology system construction. Background Technology
[0002] High-performance computing (HPC) is a critical infrastructure supporting cutting-edge scientific research and major engineering innovations. Its technological system encompasses complex heterogeneous hardware architectures, diverse parallel programming models, and in-depth performance tuning, forming extremely high professional barriers. Within this field, developers and operations personnel have an urgent need for intelligent question-answering systems that can provide immediate, accurate, and reliable professional knowledge support.
[0003] While generative AI, exemplified by Large Language Models (LLMs), has demonstrated powerful capabilities in general natural language processing tasks, its inherent knowledge limitations and "illusion" problems become particularly prominent when directly applied to highly specialized fields like HPC. Specifically, firstly, there is a deficiency in the timeliness of knowledge: the static knowledge encapsulated within the model parameters is limited by the deadline of the pre-training data, making it difficult to cover the rapidly iterating hardware characteristics (such as new accelerators), software tools (such as CUDA and ROCm version updates), and scheduling systems (such as new features in Slurm) in the HPC field. Secondly, there is a deficiency in the reliability of facts: without sufficient domain knowledge guidance, the model is highly prone to generating seemingly reasonable but actually erroneous or fictitious specialized content, such as recommending invalid MPI communication functions, incorrect compiler optimization options, or non-existent system commands. Such errors can lead to serious computational errors or resource waste in the high-reliability HPC production environment.
[0004] To improve the performance of large models in specific domains, existing technologies have mainly attempted two paths: domain knowledge-enhanced pre-training (such as continuing to train the model on specialized corpora) and supervised fine-tuning. However, both methods have significant drawbacks: 1) Knowledge updates are difficult and costly: Once the model is trained, the knowledge is fixed, and incorporating new knowledge requires time-consuming retraining or fine-tuning, making it difficult to adapt to the rapid evolution of HPC technology; 2) There is a risk of "catastrophic forgetting": While injecting domain knowledge, the original general capabilities of the model may be weakened; 3) Poor credibility and traceability of answers: The process of the model generating answers is like a "black box," unable to provide specific evidence to support the conclusions, reducing the trust of professionals.
[0005] Retrieval-Augmented Generation (RAG) technology provides a framework solution to the aforementioned problems by combining the general reasoning capabilities of a large model with the specialization and updatable nature of an external knowledge base. Its basic paradigm is as follows: first, relevant document fragments are retrieved from an external knowledge base; then, these fragments are used as evidentiary context and input into the large model along with the user's question to generate a verifiable answer. This technology stores knowledge in an external vector database; when knowledge is updated, only the database needs to be updated, without adjusting the model itself, thus decoupling knowledge acquisition from model capabilities. Existing research shows that RAG technology can effectively improve the factual accuracy of generated answers in tasks such as open-domain question answering.
[0006] Chinese invention patent application CN120910219A, published on November 7, 2025, discloses a text question-answering method and system based on hybrid retrieval enhancement generation. The overall structure is as follows: Figure 1 The system comprises a query encoder, a hybrid retrieval module (integrating a vector retrieval unit and a keyword retrieval unit), a result fusion module, and an answer generator. When a user inputs a query, the system performs concurrent semantic retrieval based on dense vectors and precise keyword retrieval, fusing and deduplicating the results, and generating a final answer based on the fused context. This patent aims to improve the comprehensiveness and timeliness of information retrieval through a hybrid retrieval strategy, thereby enhancing the accuracy and reliability of the generated answer. However, its core retrieval unit still relies on direct segmentation and indexing of the original document, failing to perform deep structural modeling of the complex and hierarchical knowledge systems of specific vertical fields (such as high-performance computing). This results in the system struggling to understand the deep semantic connections between terms when faced with highly specialized queries with closely related concepts, and failing to achieve the leap from keyword matching to concept understanding. For example, for complex problems in the field of high-performance computing involving "performance tuning of hybrid parallel programming models," the document fragments retrieved by this method may be fragmented and lack a systematic structure, failing to actively associate core concepts such as MPI, OpenMP, and CUDA and their hierarchical relationships, thus limiting the professional depth and accuracy of the generated answer. Furthermore, its document block-level retrieval granularity may introduce irrelevant noise, increasing the risk of factual illusions arising from large language models. Therefore, when applied to fields such as high-performance computing with rigorous and rapidly iterating knowledge systems, this scheme suffers from insufficient question-answering accuracy and weak professional explanatory power because its retrieval units are still limited to flat document slices and cannot achieve accurate retrieval through deep conceptual connections between terms (such as syntagmatic substitution).
[0007] However, although hybrid retrieval enhancement generation schemes, represented by CN120910219A, have optimized the general retrieval level and attempted to improve the breadth of information retrieval by fusing multiple retrieval signals, the fundamental defects in their knowledge organization and understanding patterns remain unresolved when such general RAG frameworks or their improvements are directly applied to the field of high-performance computing (HPC). They still face the following technical shortcomings due to insufficient adaptability:
[0008] 1. The knowledge organization model is flat, making it difficult to support complex semantic understanding and associative reasoning in the domain. Existing solutions (including CN120910219A) often consist of fragmented document chunks or question-and-answer sessions focusing on single technical points. Their core is to index text fragments, lacking systematic modeling and hierarchical organization of the entire technology stack in the HPC domain, from hardware architecture and system software to application optimization. This results in fragmented, isolated knowledge fragments lacking a systematic context when facing complex professional problems requiring in-depth technical analysis or cross-domain knowledge association (e.g., analyzing the performance trade-offs and adaptation scenarios of different parallel programming models on specific heterogeneous hardware). Because the generative model lacks structured conceptual network support, it struggles to perform accurate synthesis, comparison, and deep reasoning, thus affecting the professionalism and logical depth of the answers.
[0009] 2. The mismatch between the granularity of the retrieval unit and the demand for precise question answering in the domain exacerbates the uncertainty and risk of generating false answers. General RAG systems typically use fixed-length original document paragraphs as retrieval units, and the hybrid retrieval method adopted in CN120910219A does not change this basic unit. This coarse-grained retrieval method inevitably introduces a large amount of text irrelevant to the core question as noise into the context. This not only increases the processing burden and interference on large language models but also significantly increases the risk of the model fabricating information based on incomplete or irrelevant information (i.e., generating factual illusions). In domains like HPC, which require absolutely accurate answers and have extremely low tolerance for error, this risk directly makes it difficult to guarantee the accuracy, reliability, and consistency of the generated answers. Summary of the Invention
[0010] The purpose of this invention is to provide an HPC knowledge question answering method, system, device, and storage medium based on retrieval enhancement generation and terminology system construction, in order to solve the problem in the prior art that the lack of structured concept network support makes it difficult to perform accurate synthesis, comparison, and deep reasoning, thus affecting the professionalism and logical depth of the answers.
[0011] To address the aforementioned technical problems, the first aspect of this invention provides an HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction, the method comprising:
[0012] Retrieve the user's original query;
[0013] The HPC terms contained in the original query are identified, and the standard definitions, superordinate concepts, and equivalent concepts of the identified terms are extracted from the constructed HPC terminology system. A semantically enhanced query representation is constructed based on the original query, the standard definitions, the superordinate concepts, and the equivalent concepts. The semantically enhanced query representation is used to search the constructed vectorized knowledge base to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer.
[0014] The context and the original query are combined into a complete prompt, which is then input into the large language model to obtain a specialized answer to the original query.
[0015] In one possible implementation, the HPC terminology system is constructed as follows:
[0016] HPC domain text is segmented to generate a candidate term set;
[0017] The large language model is invoked to perform domain relevance discrimination and semantic filtering on the candidate term set to identify terms with clear HPC domain meaning and remove generic words, thereby obtaining a standardized term library;
[0018] The terms in the normalized terminology database are converted into dense vectors to obtain term vectors, and then the term vectors are clustered using a density-based clustering algorithm.
[0019] For each cluster, invoke the large language model to perform at least two levels of semantic induction tasks to construct an HPC terminology system containing at least two levels.
[0020] In one possible implementation, the HPC terminology system comprises two levels, and the method for constructing the HPC terminology system by calling a large language model to perform a semantic induction task for a certain cluster is as follows: first, the large language model is prompted to perform a macroscopic abstraction of the cluster to generate a high-level classification label that summarizes the core domain of the terminology of the cluster; then, the large language model is prompted to identify the semantic nuances within the cluster to generate specific classification labels that describe the internal semantic differences.
[0021] In one possible implementation, the vectorized knowledge base is constructed as follows:
[0022] First, based on document slicing of HPC domain text, a large language model is used to generate question-answer pairs containing questions and corresponding answers, and metadata such as original text citations, literature sources and domain relevance are added to each question-answer pair;
[0023] Then, the sentence embedding model is used to calculate the semantic similarity between the answers in the generated question-answer pairs and the corresponding original document slices. Based on the preset semantic similarity threshold, question-answer pairs with low similarity are filtered out, thus initially selecting high-fidelity question-answer pairs. Next, the large language model is guided to perform secondary filtering on the initially selected question-answer pairs from three dimensions: technical accuracy, logical consistency, and practical effectiveness, so as to finally select high-quality question-answer pairs and store them in the vectorized knowledge base.
[0024] In one possible implementation, a regression tree algorithm is used to determine the semantic similarity threshold. Specifically, the preliminary semantic scores of the question-answer pair sample set to be screened are used as feature inputs. The question-answer pair sample set is divided into two subsets. The preliminary semantic scores of one subset are less than the split point, and the preliminary semantic scores of the other subset are greater than or equal to the split point. The optimal split point that minimizes the sum of the variances within the two subsets is calculated as the semantic similarity threshold.
[0025] In one possible implementation, the selected high-quality question-and-answer pairs are stored in the following manner:
[0026] The selected high-quality question-and-answer pairs are concatenated into text units according to the required format;
[0027] Based on the HPC terminology system, the key terms and their standard definitions contained in the text unit are retrieved, and the standard definitions of the terms are added to the text unit as prefixes or context to form semantically enhanced composite text;
[0028] By leveraging the self-attention mechanism in a domain-optimized dense retrieval model, semantically enhanced composite text is mapped into a dense vector of fixed dimensions.
[0029] The mapping is stored as a dense vector of fixed dimensions in a vectorized knowledge base;
[0030] Establish an HNSW hierarchical navigation small-world graph index to support similarity retrieval of vectorized knowledge bases.
[0031] In one possible implementation, the way to obtain N question-answer pairs that are relatively relevant to the original query is as follows:
[0032] The semantically enhanced query representation is initially retrieved in the vectorized knowledge base, recalling K candidate question-answer pairs, where K > N;
[0033] By using a rearrangement model based on a cross-encoder architecture, the original query and candidate question-answer pairs are concatenated into a single sequence. A deep semantic interaction weight is calculated through a fully self-attention mechanism to obtain a relevance score. Based on this, the N question-answer pairs with the highest scores are selected, which are the N question-answer pairs that are more relevant to the original query.
[0034] To address the aforementioned technical problems, a second aspect of this invention provides an HPC knowledge question-answering system based on retrieval enhancement generation and terminology system construction, comprising:
[0035] The terminology system construction module is used to build the HPC terminology system.
[0036] A high-quality knowledge base building module for constructing vectorized knowledge bases;
[0037] The terminology enhancement retrieval module receives the user's original query, identifies the HPC terms contained in the original query, extracts the standard definition, superordinate concept, and equivalent concept of the identified terms from the constructed HPC terminology system, constructs a semantically enhanced query representation based on the original query, the standard definition, the superordinate concept, and the equivalent concept, and performs a retrieval in the constructed vectorized knowledge base based on the semantically enhanced query representation to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer;
[0038] The specialized answer generation module is used to combine the context and the original query into a complete prompt input to the large language model to obtain a specialized answer to the original query.
[0039] To address the aforementioned technical problems, a third aspect of the present invention provides an HPC knowledge question answering device based on retrieval enhancement generation and terminology system construction, comprising a processor and a memory, wherein the memory stores a computer program, and the processor executes the program to implement the steps of the method in any possible implementation of the first aspect of the present invention.
[0040] To address the aforementioned technical problems, a fourth aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method in any possible implementation of the first aspect of the present invention.
[0041] The beneficial effects of this invention are as follows: Original user queries often suffer from colloquial expressions, missing terminology, or conceptual ambiguity. This invention constructs an HPC terminology system, identifying key terms in the query and extracting their standard definitions, superordinate concepts, and related concepts. This enables the system to possess a deep understanding of domain concepts, providing a framework for conceptual association and reasoning. The system can obtain structured conceptual network support, transforming unstructured natural language queries into semantically accurate and clearly defined structured query representations. When faced with complex queries requiring cross-domain knowledge association and in-depth analysis, the system can perform semantic expansion and precise retrieval based on the HPC terminology system, achieving accurate synthesis, comparison, and deep reasoning.
[0042] Building upon this foundation, the HPC terminology system and the vectorized knowledge base form a complementary and synergistic relationship. When combined, the semantically enhanced query representation retains the precise conceptual boundaries provided by the HPC terminology system while inheriting the semantic generalization capability of vector retrieval, enabling the retrieval process to accurately target core concepts while broadly covering relevant semantics.
[0043] Finally, the original query and the retrieved question-answer pairs are combined and input into the large language model, achieving the integration of knowledge retrieval and generative reasoning. The HPC terminology system ensures the standardization of knowledge during the retrieval stage, the vectorized knowledge base ensures retrieval efficiency and semantic coverage, and the large language model utilizes its language understanding and generation capabilities to organize the fragmented retrieved knowledge into direct answers to the user's original question. This process ensures that the final answer possesses both the professional accuracy provided by the terminology base and the rich context provided by the vectorized knowledge base. Furthermore, through the induction and expression of the large language model, direct answers are generated, significantly improving the professionalism and logical depth of the generated answers. Attached Figure Description
[0044] Figure 1 This is a flowchart of the prior art of the present invention;
[0045] Figure 2 This is a flowchart of the HPC knowledge question answering system based on retrieval enhancement generation and terminology system construction of the present invention;
[0046] Figure 3 This is a flowchart of the terminology system construction module of the present invention;
[0047] Figure 4 This is a flowchart of the high-quality knowledge base construction module of the present invention;
[0048] Figure 5 This is a flowchart of the terminology enhancement retrieval module and the professional answer generation module of the present invention;
[0049] Figure 6 This is an example diagram of the terminology system of the present invention;
[0050] Figure 7 This is a structural diagram of the HPC knowledge question answering device based on retrieval enhancement generation and terminology system construction of the present invention. Detailed Implementation
[0051] This invention constructs an HPC terminology system, enabling the system to possess deep domain concept understanding capabilities. It utilizes this system to generate a semantically enhanced query representation that semantically augments the original query. Then, based on this semantically enhanced query representation, a retrieval is performed in a constructed vectorized knowledge base to obtain N question-answer pairs highly relevant to the original query, serving as context for generating the answer. Finally, the context and the original query are combined to form a complete prompt, which is then input into a large language model to obtain a specialized answer to the original query. This achieves semantic expansion and precise retrieval based on the terminology system, generating logically rigorous and comprehensive in-depth analytical answers. To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings.
[0052] An implementation method for an HPC knowledge question answering system based on retrieval enhancement generation and terminology system construction:
[0053] This invention provides an HPC knowledge question-answering system based on retrieval enhancement generation and terminology system construction, aiming to solve three major problems: lack of systematic organization of HPC knowledge, coarse granularity of retrieval units, and superficial understanding of domain terminology, thereby achieving more accurate, professional, and reliable intelligent question answering in the field of high-performance computing.
[0054] The system includes a terminology system construction module, a high-quality knowledge base construction module, a terminology enhanced retrieval module, and a professional answer generation module.
[0055] 1. Terminology system construction module.
[0056] like Figure 3 As shown, the terminology system construction module is used to automatically extract domain terms from multi-source heterogeneous HPC domain texts, and construct a hierarchical and structured HPC terminology system by integrating unsupervised clustering and large language model (LLM) semantic induction methods. Specifically, it includes a terminology extraction unit and a hierarchical induction unit.
[0057] 1) Terminology Extraction Unit. The terminology extraction unit performs word segmentation on the preprocessed HPC domain text, generates a candidate term set, and calls the large language model API. Using a pre-defined structured prompt template, it performs domain relevance discrimination and semantic filtering on the candidate terms to identify terms with clear HPC domain meaning and remove generic words, thus obtaining a standardized terminology library. For example, the system inputs the following prompt to the large language model: "You are a high-performance computing (HPC) expert. Given the candidate term set {Candidate_Terms}, please identify terms with clear domain technical meaning, remove generic words, and output a brief summary of their standard definitions," forming the initial standardized terminology library.
[0058] 2) Hierarchical Induction Unit. The hierarchical induction unit is used to: ① convert terms in the normalized terminology database into dense vectors to obtain term vectors, and use clustering algorithms to automatically cluster the term vectors; ② for each cluster, call the large language model to perform a two-level semantic induction task to construct an HPC terminology system containing at least two levels.
[0059] Specifically, step ① employs a sentence embedding model to transform terms in the normalized terminology database into dense vectors, such as BERT, Sentence-BERT, and text2vec, bringing semantically similar terms closer together in the vector space. Essentially, it transforms discrete symbolic terms into continuous, semantically rich numerical representations. This approach eliminates the need for separate query syntax design for each term, achieving end-to-end semantic matching. This not only improves the semantic accuracy of retrieval but also lays a unified foundation for building a large-scale, highly responsive, and scalable hierarchical terminology system. It maximizes the performance advantages of vector retrieval and hierarchical indexing while reducing maintenance complexity.
[0060] Specifically, in step ①, density-based clustering algorithms, such as DBSCAN and HDBSCAN, are used to automatically cluster the term vectors. Compared to existing clustering algorithms such as k-means, this method automatically reveals the semantic distribution structure of terms from the data without manual pre-setting. This structure can be directly used to construct a more reasonable hierarchical vector space, and can also improve retrieval accuracy and optimize indexing efficiency.
[0061] Specifically, step ② involves the following process: First, a high-level classification label summarizing the core domain of the terminology cluster is generated, prompting the large language model to perform a macro-level abstraction of the entire cluster. For example, the system inputs the following instruction into the large language model: "Based on the following list of HPC terms {Term_Cluster}, please summarize a macro-level classification label that covers the common technical field to which they belong (e.g., 'parallel programming model' or 'interconnection network technology')". Then, specific classification labels describing the semantic differences within the cluster are generated, prompting the model to identify subtle semantic differences within the cluster. For example, the system inputs the following instruction into the large language model: "Under the category of 'parallel programming model', for the subset {Sub_Cluster}, please provide a more specific sub-category name (e.g., 'message passing interface MPI' or 'shared memory model')". Thus, through the above two recursive semantic abstractions, a multi-level HPC terminology system is constructed. Details of the generated terminology system can be found in [link to documentation]. Figure 6The above approach, while ensuring the efficiency of automated processing, constructs a "macro to micro" navigation path that conforms to human cognitive logic and can directly serve hierarchical vector retrieval. This reduces the subjectivity and cost of manually constructing a classification system and provides a structural foundation that is both interpretable and flexible for subsequent accurate retrieval, dynamic updates, and knowledge governance.
[0062] 2. High-quality knowledge base construction module.
[0063] like Figure 4 As shown, the high-quality knowledge base construction module is used to generate high-quality question-answer pairs based on a terminology system, and performs dual quality filtering that integrates fidelity assessment and three-dimensional semantic assessment. The filtered question-answer pairs are then used as basic units to construct a vectorized knowledge base. Specifically, it includes a question-answer pair generation unit, a dual filtering unit, and a vectorized storage unit.
[0064] 1) Question-Answer Pair Generation Unit. The question-answer pair generation unit is used to generate "question-answer" pairs in batches on document slices that have been filtered by the previous HPC relevance assessment using a large language model, and to add original text citations, literature sources and domain relevance metadata to each question-answer pair.
[0065] 2) Dual Filtering Unit. The dual filtering unit is used to perform strict quality control on the generated question-answer pairs. It consists of two screening processes: ① First screening: Calculate the semantic similarity between the answers in the generated question-answer pairs and the corresponding original document slices, and filter out question-answer pairs with low similarity based on a preset semantic similarity threshold, thereby initially screening out high-fidelity question-answer pairs; ② Second screening: Guide the large language model to perform secondary filtering on the initially screened question-answer pairs from different dimensions, so as to finally select high-quality question-answer pairs and store them in the vectorized knowledge base.
[0066] Specifically, in step ①, a sentence embedding model is used to calculate the semantic similarity between each sentence in the generated answer and the corresponding original text segment, such as BERT, Sentence-BERT, text2vec, etc.
[0067] Specifically, in step ①, based on the optimal semantic similarity threshold determined by the CART regression tree algorithm, the core steps are: using the preliminary semantic scores of the question-answer pair sample set to be screened as feature input, dividing the sample set into two subsets, one subset with a preliminary semantic score less than the split point, and the other subset with a preliminary semantic score greater than or equal to the split point, and calculating the optimal split point that minimizes the sum of the variances within the two subsets as the optimal semantic similarity threshold. The specific formula is:
[0068]
[0069]
[0070] In the formula, respectively represent the candidate question-answer pair entries and the second subset in the first subset; is its corresponding quality score; , respectively represent the arithmetic means of the two subsets. Let the sample set of question-answer pairs to be screened be D, use the preliminary semantic scores of the samples as feature inputs, and then find the optimal cut-off point s by minimizing the squared error criterion, that is, divide the set D into two subsets D1(s) (score < s) and D2(s) (score ≥ s), and calculate the s value that minimizes the sum of the internal variances of the two subsets as the optimal semantic similarity threshold N(t), so as to retain high-fidelity question-answer pairs. Specifically, the parameter t is used to distinguish different semantic verification tasks. In this embodiment, when t points to "the semantic fidelity metric between the answer and the original text", the system traverses the candidate threshold space s and finds the optimal solution N(t) that minimizes the sum of the variances of the subsets. In other embodiments, by adjusting the definition of t, this regression tree algorithm can also be used for dynamic threshold determination of technical accuracy or logical consistency. As an implementation of the semantic similarity threshold, N(t) = 0.533.
[0071] Specifically, the different dimensions in step ② are the three dimensions of "technical accuracy", "logical consistency", and "practical effectiveness". Using these three dimensions can systematically filter low-quality data, ensure that the question-answer pairs in the library are accurate at the factual level, self-consistent at the question-answer logic level, and can effectively solve user problems at the practical application level, so as to provide a high-quality vectorized knowledge base for the hierarchical vector retrieval system.
[0072] Specifically, in step ②, by designing an evaluation prompt, guide the large language model to score the question-answer pairs from the three dimensions of "technical accuracy", "logical consistency", and "practical effectiveness" and output the reasoning basis, and set a comprehensive score threshold (such as P allA secondary filter (≥0.6) is applied to obtain a high-quality question-answer pair dataset. By designing evaluation prompts and constructing structured evaluation instructions, the large language model is required to act as an "HPC domain expert" during the evaluation process. The prompts guide the model not only to output scores but also to output reasoning to ensure transparency in the scoring. For example, "Based on the provided original document fragment, evaluate whether the generated question-answer pair is technically rigorous, logically coherent, and has practical application value." Subsequently, quantitative evaluation dimensions and scoring standards are set, with scores ranging from 1 to 5 points across the three dimensions. Technical accuracy assesses whether the answer contains factual errors or illusions, and whether it accurately maps the HPC terminology and parameters in the original text. Logical consistency assesses whether there is causal inversion or semantic shift between the question and answer, ensuring the answer process conforms to the logic of technical argumentation. Practical effectiveness assesses whether the question-answer pair solves specific HPC scenario problems (such as performance tuning and job scheduling), avoiding empty platitudes. Finally, a weighted average of the scores for each dimension is calculated. By integrating the three dimensions using a weighted summation method, the quality of the question-and-answer pair can be quantified into a comparable comprehensive score, and the weights of each dimension can be flexibly adjusted according to the scenario to adapt to different business scenarios and priorities.
[0073] 3) Vectorized storage unit. The vectorized storage unit is used for: ① concatenating the selected high-quality question-answer pairs into text units in the required format; ② retrieving the key terms and their standard definitions contained in the text unit based on the HPC terminology system, and supplementing the text unit with the standard definitions of the terms as prefixes or context to form semantically enhanced composite text; ③ storing the semantically enhanced composite text in the vectorized knowledge base after vectorization encoding.
[0074] Specifically, in step ①, the vectorized storage unit is used to concatenate the double-filtered question-answer pairs into text units in the format "Question: {Q} Answer: {A}".
[0075] Specifically, in step ③, a domain-optimized dense retrieval model (such as BAAI / bge-large-zh-v1.5) is used for vectorization encoding. Specifically, semantically enhanced composite text is input into the dense retrieval model, and the self-attention mechanism of the multi-layer Transformer architecture is utilized to enable the model to fully reference the professional background knowledge provided by the terminology system when encoding question-answer pairs. This maps the text into fixed-dimensional dense vectors and stores them in a high-quality knowledge base.
[0076] Specifically, step ③ establishes an HNSW (Hierarchical Navigable Small World) hierarchical navigation small world graph index to support similarity retrieval in the vectorized knowledge base. Regarding the construction of the hierarchical structure, the system divides the vector space into multiple logical layers. The bottom layer (Layer 0) contains all question-answer pair vectors in the knowledge base, while the upper layers serve as sparse subsets of the lower layer vectors. This structure is similar to a skip list, enabling rapid location from coarse-grained to fine-grained. During incremental node insertion and connection establishment, for each question-answer pair vector to be added to the database, a neighbor search is performed (using a multi-level search strategy to find the node closest to the current vector in each level), connections are established (in each level, bidirectional connections are established between the node and its M nearest neighbors (e.g., setting M=16 to form a graph network with small-world properties), and candidate set exploration is performed (during the index construction phase, the search range is controlled by setting an exploration factor such as ef_construction=200 to balance index construction speed and retrieval accuracy (recall)). Regarding vector and metadata mapping and binding, during the database entry process, a globally unique identifier (UID) is assigned to each vector, and a mapping index is established between this UID and the original question-answer text ("Question: {Q} Answer: {A}") and metadata (such as literature sources and cited texts) to ensure the traceability of retrieval results. By constructing a multi-level graph structure, millisecond-level similarity retrieval response for millions of knowledge items is achieved while ensuring retrieval accuracy (recall).
[0077] 3. Enhanced terminology search module.
[0078] like Figure 5 As shown, the terminology enhancement retrieval module receives user queries, uses the constructed terminology system to perform terminology recognition and semantic expansion on the queries, conducts a preliminary retrieval in the vectorized knowledge base, and employs a re-ranking model to refine the ranking of candidate results. Specifically, it includes a query understanding unit and a hybrid retrieval re-ranking unit.
[0079] 1) Query Understanding Unit. The query understanding unit is used to parse the natural language query (i.e., the original query) input by the user, identify the core HPC terms in the query using the constructed terminology system, and automatically expand related superordinate and related synonymous concepts according to the hierarchical relationship in the terminology system to form a semantically enhanced query representation.
[0080] Specifically, the query understanding unit generates semantically enhanced queries in the following process: First, precise terminology identification is performed, which involves scanning the user's natural language query using a pre-defined HPC terminology dictionary and word segmentation tools to identify core HPC terminology (such as "MPI parallel computing" and "job scheduling"). Next, background knowledge retrieval is performed, which involves retrieving the standard definitions of the identified terms from the constructed HPC terminology system and automatically extracting their direct superordinate concepts (domain / discipline) and related concepts (representing related technical routes at the same level, such as "PBS or LSF scheduling systems that are related to Slurm") based on their position in the hierarchical tree. Finally, semantically enhanced representation synthesis is performed, which involves non-destructively concatenating the original query text with the retrieved background knowledge according to a pre-defined enhancement template (e.g., "[original query] + [term 1: definition] + [related superordinate concepts] + [related related concepts]") to construct a semantically enhanced query representation containing deep domain context.
[0081] 2) Hybrid retrieval and reordering unit. The hybrid retrieval and reordering unit is used to retrieve data from the constructed vectorized knowledge base based on semantically enhanced query representations, obtaining N question-answer pairs that are more relevant to the original query as context for generating the answer.
[0082] Specifically, the method for obtaining N question-answer pairs that are more relevant to the original query is a two-step screening process: ① Perform preliminary vector retrieval on the semantically enhanced query representation in the vectorized knowledge base to recall K candidate question-answer pairs; ② Perform in-depth semantic interaction calculation on the candidate question-answer pairs and the original query to obtain a relevance score, and select the N question-answer pairs with higher relevance scores as the N question-answer pairs that are more relevant to the original query.
[0083] Specifically, in step ①, the semantically enhanced query representation is used to perform a preliminary dense vector retrieval in the vectorized knowledge base to recall Top-K (K>N) candidate question-answer pairs.
[0084] Specifically, step ② employs a cross-encoder architecture-based reranking model, BAAI / bge-reranker-large, to perform deep semantic interaction calculations between candidate question-answer pairs and the original query, resulting in a refined relevance score. Unlike the dual-encoder approach that independently computes vectors, this invention concatenates the original query and candidate question-answer pairs into a single-sequence input reranking model. Utilizing the Full Self-Attention mechanism in the Transformer architecture, each word (token) in the query can directly perform weight calculations with each word in the question-answer pair at each layer of the model. This interaction method breaks through the compression bottleneck of vector representation, allowing the model to capture subtle semantic matching relationships, logical dependencies, and contextual associations of HPC domain terms between the query and background knowledge, thereby generating a more accurate relevance score than cosine similarity. Correspondingly, the candidate set is sorted in descending order based on this refined score, and the Top-N (e.g., N=3) most relevant question-answer pairs are selected as the final contextual support for generating the answer.
[0085] 4. Professional answer generation module.
[0086] The professional answer generation module combines the rearranged, refined context with prompt templates defined by professional roles, inputs them into the large language model, and generates the final professional answer. Specifically, it includes a prompting engineering unit and a controllable generation unit.
[0087] 1) Prompt Engineering Unit. The prompt engineering unit is used to design a structured system prompt template. This template explicitly defines the role of the large language model as a "high-performance computing domain expert" and instructs it to generate answers strictly based on the provided reference context. The answers are required to be professional, accurate, and avoid fabrication. When the context information is insufficient, it must be explicitly stated.
[0088] 2) Controllable Generation Unit. The controllable generation unit is used to fill the refined context output by the hybrid retrieval and rearrangement unit and the user's original query into the prompt template, combine them into a complete prompt, input it into the large language model, and finally output a professional and highly reliable answer.
[0089] Specifically, a lower temperature parameter can be set for the large language model to control the randomness of the generation process. The temperature parameter is a hyperparameter that controls the randomness of the large language model's generation; the lower the value, the more likely the model is to select the lexical units with the highest probability, resulting in more stable and repeatable output. Setting the temperature parameter to a lower value effectively suppresses random fluctuations in the generation process, ensuring that the large model maintains high consistency and stability in its output when faced with the same term vector, avoiding label drift or semantic deviation caused by randomness.
[0090] An implementation method for HPC knowledge question answering based on retrieval enhancement generation and terminology system construction:
[0091] This invention discloses an HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction. The specific process of this method is as follows: Figure 2 As shown, the core process of this method is as follows:
[0092] Step 1: Obtain the user's original query.
[0093] Step 2: Identify HPC terms contained in the original query, extract the standard definition, superordinate concept and equivalent concept of the identified terms from the constructed HPC terminology system, and construct a semantically enhanced query representation based on the original query, standard definition, superordinate concept and equivalent concept; search the constructed vectorized knowledge base based on the semantically enhanced query representation to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer.
[0094] Step 3: Combine the context obtained in Step 2 and the original query obtained in Step 1 into a complete prompt input into the large language model to obtain a specialized answer to the original query.
[0095] Specifically, the HPC terminology system in step two is constructed as follows: 1) Segment the HPC domain text to generate a candidate term set; 2) Call a large language model to perform domain relevance discrimination and semantic filtering on the candidate term set to determine terms with clear HPC domain meaning and delete common words, thereby obtaining a standardized terminology library; 3) Convert the terms in the standardized terminology library into dense vectors to obtain term vectors, and cluster the term vectors based on a clustering algorithm; 4) For each cluster, call a large language model to perform at least two levels of semantic induction tasks to construct an HPC terminology system containing at least two levels.
[0096] Furthermore, in step 3) of the previous paragraph, the HPC terminology system contains two levels. The way to call the large language model to perform semantic induction task to construct a two-level HPC terminology system for a certain cluster is as follows: First, prompt the large language model to perform macroscopic abstraction of the cluster to generate high-level classification labels that summarize the core domain of the terminology of the cluster; then prompt the large language model to identify the semantic nuances within the cluster to generate specific classification labels that describe the semantic differences within it.
[0097] Specifically, the construction of the vectorized knowledge base in step two is as follows: 1) Based on document slices of HPC domain text, a large language model is used to generate question-answer pairs containing questions and corresponding answers, and metadata such as original text citations, literature sources, and domain relevance are added to each question-answer pair; 2) The semantic similarity between the answers in the generated question-answer pairs and the corresponding original document slices is calculated, and question-answer pairs with low similarity are filtered out based on a preset semantic similarity threshold, thereby initially selecting high-fidelity question-answer pairs; then, the large language model is guided to perform secondary filtering on the initially selected question-answer pairs from different dimensions to finally select high-quality question-answer pairs, which are then stored in the vectorized knowledge base. The semantic similarity threshold can be determined using a regression tree algorithm; and the different dimensions include technical accuracy, logical consistency, and practical effectiveness.
[0098] Furthermore, in step 2) of the previous paragraph, the high-quality question-answer pairs are stored and filtered in the following way: the filtered high-quality question-answer pairs are concatenated into text units in the required format; the key terms and their standard definitions contained in the text unit are retrieved based on the HPC terminology system, and the standard definitions of the terms are added to the text unit as prefixes or context to form semantically enhanced composite text; the semantically enhanced composite text is vectorized and then stored in the vectorized knowledge base.
[0099] Specifically, in step two, N question-answer pairs that are more relevant to the original query are obtained in the following way: the semantically enhanced query representation is initially retrievald in the vectorized knowledge base to recall K candidate question-answer pairs, where K > N; then, in-depth semantic interaction calculation is performed on the candidate question-answer pairs and the original query to obtain a relevance score, and the N question-answer pairs with higher relevance scores are selected as the N question-answer pairs that are more relevant to the original query, so as to serve as the context for generating the answer.
[0100] This method can be implemented based on the aforementioned "Implementation Method of an HPC Knowledge Question Answering System Based on Retrieval Enhancement Generation and Terminology System Construction". Each module in this system is essentially a computer implementation of each step in this method. The principles, more specific implementation process, and achievable effects of this method have been described in detail in the aforementioned system implementation method, and will not be repeated here.
[0101] An implementation method for an HPC knowledge question answering device based on retrieval enhancement generation and terminology system construction:
[0102] An embodiment of the HPC knowledge question answering device based on retrieval enhancement generation and terminology system construction according to the present invention, such as... Figure 7As shown, the system includes a memory (specifically a non-volatile storage medium), a processor, a system bus, and a computer program stored in the memory. The processor and memory communicate and interact with each other via the system bus. The processor executes the computer program to implement the steps of the HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction of this invention.
[0103] This device can be installed on the management or login nodes of a supercomputing center as an embedded knowledge assistance system for HPC users (such as researchers, system administrators, and application engineers). It provides real-time terminology explanations, parameter suggestions, and troubleshooting Q&A when users submit jobs, debug parallel programs, or configure cluster environments. It can also be deployed on the management plane of an HPC cloud service platform to provide standardized knowledge service interfaces for large-scale elastic computing clusters. In addition, it can be integrated into a dedicated integrated development environment or visual operation and maintenance platform in the HPC field. In offline teaching and computing scenarios in domestically developed and controllable supercomputing environments, it provides localized, low-latency, and highly accurate domain knowledge Q&A capabilities in a lightweight all-in-one form.
[0104] The core process of this method is as follows:
[0105] Step 1: Obtain the user's original query.
[0106] Step 2: Identify HPC terms contained in the original query, extract the standard definition, superordinate concept and equivalent concept of the identified terms from the constructed HPC terminology system, and construct a semantically enhanced query representation based on the original query, standard definition, superordinate concept and equivalent concept; search the constructed vectorized knowledge base based on the semantically enhanced query representation to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer.
[0107] Step 3: Combine the context obtained in Step 2 and the original query obtained in Step 1 into a complete prompt input into the large language model to obtain a specialized answer to the original query.
[0108] This method can be implemented based on the aforementioned "Implementation Method of an HPC Knowledge Question Answering System Based on Retrieval Enhancement Generation and Terminology System Construction". Each module in this system is essentially a computer implementation of each step in this method. The principles, more specific implementation process, and achievable effects of this method have been described in detail in the aforementioned system implementation method, and will not be repeated here.
[0109] One embodiment of a computer-readable storage medium:
[0110] The present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of an HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction.
[0111] The core process of this method is as follows:
[0112] Step 1: Obtain the user's original query.
[0113] Step 2: Identify HPC terms contained in the original query, extract the standard definition, superordinate concept and equivalent concept of the identified terms from the constructed HPC terminology system, and construct a semantically enhanced query representation based on the original query, standard definition, superordinate concept and equivalent concept; search the constructed vectorized knowledge base based on the semantically enhanced query representation to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer.
[0114] Step 3: Combine the context obtained in Step 2 and the original query obtained in Step 1 into a complete prompt input into the large language model to obtain a specialized answer to the original query.
[0115] This method can be implemented based on the aforementioned "Implementation Method of an HPC Knowledge Question Answering System Based on Retrieval Enhancement Generation and Terminology System Construction". Each module in this system is essentially a computer implementation of each step in this method. The principles, more specific implementation process, and achievable effects of this method have been described in detail in the aforementioned system implementation method, and will not be repeated here.
[0116] In summary, the present invention has the following characteristics:
[0117] 1) Significantly improved the professional accuracy and factual reliability of question answering: By constructing an HPC terminology system, the system possesses a deep understanding of domain concepts; by using high-quality question-answer pairs as retrieval units and implementing strict quality filtering, the accuracy and professionalism of knowledge sources are ensured from the source. Experimental results show that its comprehensive evaluation score reaches 0.893, significantly better than several mainstream general-purpose large model baselines. Specifically, compared with the Qwen2-7B-Instruct model's score of 0.808 and the ChatGLM3-6B model's score of 0.784, the present invention's solution improves overall performance by approximately 10.5% and 13.9%, respectively. In terms of subdivided dimensions, the present invention achieves a technical accuracy of 0.915, an improvement of 12.7% compared to the baseline model (maximum 0.812); a logical consistency of 0.884, an improvement of 7.1% compared to the baseline; and a practical utility of 0.880, an improvement of 11.7% compared to the baseline. This proves that "terminology system construction" and "dual filtering mechanism" can effectively suppress the "factual illusion" of large models and ensure that the output content meets the professional rigor requirements of the HPC field.
[0118] 2) Secondly, ablation experiments further confirmed the necessity of each key technical unit in this invention. Experimental data showed that when the "HPC Terminology System" module was removed and only the "Dual Quality Filtering" step was retained, the system's ability to understand technical terms (such as job scheduling strategies, parallel library parameters, etc.) decreased significantly, and the overall score dropped from 0.893 to 0.837. Through comparison, it was found that the absence of this module would cause the system to produce a serious semantic shift: for domain abbreviations (such as MPI, SLURM, RDMA) or polysemous words (such as "node" and "job"), the model could only rely on general corpus for fuzzy matching and could not accurately associate them with the specific context of high-performance computing. This demonstrates that the terminology system plays a crucial semantic anchoring role in the query understanding stage, ensuring the accuracy of the retrieval vector's positioning in the professional high-dimensional space. However, if the "double quality filtering" step is removed and only the "HPC terminology system" module is retained, the noisy data in the knowledge base leads to impaired logical coherence of the generated answers, causing the overall score to drop to 0.820. Data analysis shows that approximately 15%-20% of the original collected or generated question-and-answer pairs contain noisy data (including logical contradictions, context-irrelevant information, and factual errors). If the data is directly entered into the database without threshold filtering using a regression tree algorithm, the large model will be misled by the noisy background when generating answers, producing typical factual illusions, resulting in a significant decrease in the logical coherence score of the answers. The above results clearly demonstrate that the terminology-enhanced retrieval and quality control pipeline designed in this invention plays an irreplaceable role in building a high-reliability domain question-answering system.
[0119] 3) It achieves superior problem-solving capabilities for complex and in-depth technical issues: The hierarchical terminology system provides the framework for conceptual association and reasoning. When faced with complex queries that require cross-domain knowledge association and in-depth analysis, the system can perform semantic expansion and precise retrieval based on the terminology system, generating logically rigorous and comprehensive in-depth analytical answers, rather than simply listing facts.
[0120] 4) Significantly improves the efficiency and maintainability of knowledge base construction: The terminology system construction and question-answer pair quality filtering method proposed in this invention realizes an automated pipeline from unstructured documents to high-quality, structured knowledge bases. Compared with methods that require a large amount of manual annotation to build knowledge graphs or retrain models, this solution is lower in cost and faster in knowledge updates and expansions, and can respond quickly to the rapid iteration of HPC technology.
[0121] 5) Enhanced system practicality and user trust: The system-generated answers are not only professional and accurate, but also possess excellent traceability due to the inclusion of metadata such as original text citations in the question-and-answer pairs. This greatly enhances the trust of professional users such as researchers and engineers in the question-and-answer results, and has broad application prospects and significant practical value in HPC education, research assistance, engineering operation and maintenance, and other scenarios.
Claims
1. An HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction, characterized in that, The method includes: Retrieve the user's original query; The HPC terms contained in the original query are identified, and the standard definitions, superordinate concepts, and equivalent concepts of the identified terms are extracted from the constructed HPC terminology system. A semantically enhanced query representation is constructed based on the original query, the standard definitions, the superordinate concepts, and the equivalent concepts. The semantically enhanced query representation is used to search the constructed vectorized knowledge base to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer. The context and the original query are combined into a complete prompt, which is then input into the large language model to obtain a specialized answer to the original query.
2. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 1, characterized in that, The HPC terminology system is constructed as follows: HPC domain text is segmented to generate a candidate term set; The large language model is invoked to perform domain relevance discrimination and semantic filtering on the candidate term set to identify terms with clear HPC domain meaning and remove generic words, thereby obtaining a standardized term library; The terms in the normalized terminology database are converted into dense vectors to obtain term vectors, and then the term vectors are clustered using a density-based clustering algorithm. For each cluster, invoke the large language model to perform at least two levels of semantic induction tasks to construct an HPC terminology system containing at least two levels.
3. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 1, characterized in that, The HPC terminology system comprises two levels. The method for constructing the HPC terminology system by calling a large language model to perform semantic induction tasks for a certain cluster is as follows: First, the large language model is prompted to perform macroscopic abstraction of the cluster to generate high-level classification labels that summarize the core domain of the terminology of the cluster; then, the large language model is prompted to identify the subtle semantic differences within the cluster to generate specific classification labels that describe the internal semantic differences.
4. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 1, characterized in that, The vectorized knowledge base is constructed as follows: First, based on document slicing of HPC domain text, a large language model is used to generate question-answer pairs containing questions and corresponding answers, and metadata such as original text citations, literature sources and domain relevance are added to each question-answer pair; Then, the sentence embedding model is used to calculate the semantic similarity between the answers in the generated question-answer pairs and the corresponding original document slices. Based on the preset semantic similarity threshold, question-answer pairs with low similarity are filtered out, thus initially selecting high-fidelity question-answer pairs. Next, the large language model is guided to perform secondary filtering on the initially selected question-answer pairs from three dimensions: technical accuracy, logical consistency, and practical effectiveness, so as to finally select high-quality question-answer pairs and store them in the vectorized knowledge base.
5. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 4, characterized in that, The semantic similarity threshold is determined using a regression tree algorithm. Specifically, the preliminary semantic scores of the question-answer pair sample set to be screened are used as feature inputs. The question-answer pair sample set is divided into two subsets. The preliminary semantic scores of one subset are less than the split point, and the preliminary semantic scores of the other subset are greater than or equal to the split point. The optimal split point that minimizes the sum of the variances within the two subsets is calculated as the semantic similarity threshold.
6. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 4, characterized in that, The selected high-quality question-and-answer pairs are stored in the following manner: The selected high-quality question-and-answer pairs are concatenated into text units according to the required format; Based on the HPC terminology system, the key terms and their standard definitions contained in the text unit are retrieved, and the standard definitions of the terms are added to the text unit as prefixes or context to form semantically enhanced composite text; By leveraging the self-attention mechanism in a domain-optimized dense retrieval model, semantically enhanced composite text is mapped into a dense vector of fixed dimensions. The mapping is stored as a dense vector of fixed dimensions in a vectorized knowledge base; Establish an HNSW hierarchical navigation small-world graph index to support similarity retrieval of vectorized knowledge bases.
7. The HPC knowledge question answering method based on retrieval enhancement generation and terminology system construction according to claim 1, characterized in that, The method to obtain N question-answer pairs that are highly relevant to the original query is as follows: The semantically enhanced query representation is initially retrieved in the vectorized knowledge base, recalling K candidate question-answer pairs, where K > N; By using a rearrangement model based on a cross-encoder architecture, the original query and candidate question-answer pairs are concatenated into a single sequence. A deep semantic interaction weight is calculated through a fully self-attention mechanism to obtain a relevance score. Based on this, the N question-answer pairs with the highest scores are selected, which are the N question-answer pairs that are more relevant to the original query.
8. An HPC knowledge question answering system based on retrieval enhancement generation and terminology system construction, characterized in that, include: The terminology system construction module is used to build the HPC terminology system. A high-quality knowledge base building module for constructing vectorized knowledge bases; The terminology enhancement retrieval module is used to receive the user's original query, identify the HPC terms contained in the original query, extract the standard definition, its superordinate concept and equivalent concept of the identified term from the constructed HPC terminology system, and construct a semantically enhanced query representation based on the original query, the standard definition, the superordinate concept and equivalent concept; Based on semantically enhanced query representation, a retrieval is performed in the constructed vectorized knowledge base to obtain N question-answer pairs that are more relevant to the original query as the context for generating the answer; The specialized answer generation module is used to combine the context and the original query into a complete prompt input to the large language model to obtain a specialized answer to the original query.
9. An HPC knowledge question answering device based on retrieval enhancement generation and terminology system construction, comprising a processor and a memory, wherein the memory stores a computer program, characterized in that, The processor executes the program to implement the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.