A Smart Education Question-Answering System and Method Based on Knowledge Graph and AI
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING SANLIUJIE NETWORK INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2026-06-03
- Publication Date
- 2026-06-30
AI Technical Summary
In existing knowledge-enhanced question-answering methods, the path sequence, node sources, and evidence boundaries of structured knowledge are difficult to sustain in answer generation, leading to generated content exceeding the scope of evidence and incorporating weakly related knowledge, thus affecting the reliability and traceability of smart education question-answering results.
By semantically encoding the original question, identifying educational entities, and retrieving connected paths in the educational knowledge graph, an evidence path set is formed. The graph text is mapped to a generative model vocabulary index sequence. During word-by-word decoding, the probability distribution of candidate words is determined based on the path positions of the generated answer fragments. The answer words and their corresponding path positions are output. Finally, the answer text is segmented according to the path positions.
It enhances the traceability of the source and the reliability of the content of smart education Q&A results, and reduces the situation where the generated content deviates from the scope of evidence or is mixed with weakly relevant knowledge.
Smart Images

Figure CN122309526A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to a smart education question-answering system and method based on knowledge graphs and AI. Background Technology
[0002] As knowledge graph technology and generative language models are increasingly applied in smart education, educational question-answering methods are gradually shifting from simple text matching to knowledge-enhanced generation. These methods typically first retrieve relevant knowledge from the course knowledge base based on the user's question, then transform the retrieved content into contextual information readable by the model, guiding it to generate natural language answers. This approach leverages structured knowledge to address the issue of insufficient knowledge coverage in model parameters and improves the correlation between question-answering results and the sources of teaching knowledge.
[0003] However, existing knowledge-enhanced question answering methods typically use retrieved knowledge as input to the model, and the generation stage still relies primarily on the model's internal probability distribution for lexical selection. The path order, node sources, and evidence boundaries contained in structured knowledge are difficult to sustain in the answer generation process, leading to issues such as generated content exceeding the scope of evidence, incorporating weakly related knowledge, or being difficult to trace back to its source. Subsequent verification usually only allows for an overall assessment of the generated text, making it difficult to establish a continuous correspondence between answer fragments and original evidence, thus affecting the reliability and traceability of smart education question answering results. Summary of the Invention
[0004] The purpose of this invention is to provide a smart education question-answering system and method based on knowledge graphs and AI to solve the problems mentioned in the background.
[0005] To achieve the above objectives, the present invention provides the following technical solution:
[0006] Firstly, this invention provides a smart education question-answering method based on knowledge graphs and AI, comprising:
[0007] The original question is semantically encoded to identify the corresponding educational entity. The answer type is determined based on the question's intent. Then, connectivity paths are retrieved in the educational knowledge graph based on the educational entity and answer type to form a set of evidence paths. ;
[0008] Set of evidence paths The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. ;
[0009] During word-by-word decoding in the generative model, the path position corresponding to the generated answer fragment is used in the generative model's vocabulary index sequence. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the corresponding path position of the answer term;
[0010] Based on the path positions corresponding to the answer words, the answer text synthesized in the output order is segmented into evidence segments, forming an answer text and an evidence path set. The results of each segment correspondence are used to output the smart education Q&A results.
[0011] Secondly, this invention provides a smart education question-answering system based on knowledge graphs and AI, implemented using the methods described above, including:
[0012] The evidence tracing module is used to semantically encode the original question, identify the educational entity corresponding to the original question, determine the answer type based on the question's indicative meaning, and retrieve connectivity paths in the educational knowledge graph based on the educational entity and answer type to form a set of evidence paths. ;
[0013] The mapping module is used to set up evidence paths. The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. ;
[0014] The reduction module is used to determine the path position of the generated answer fragment in the generative model's vocabulary index sequence during word-by-word decoding. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the corresponding path position of the answer term;
[0015] The question-and-answer module is used to segment the answer text based on the path positions of the answer words in the output order, forming a set of answer text and evidence paths. The results of each segment correspondence are used to output the smart education Q&A results.
[0016] The technical effects and advantages provided by the present invention in the above technical solution are as follows:
[0017] This invention maps the graph text in the evidence path to a generative model vocabulary index sequence. During word-by-word decoding, it determines the vocabulary index set for the current decoding position based on the path position corresponding to the generated answer fragment. Then, it constrains the probability distribution of candidate words using index masks and probability normalization, ensuring that the generation process of answer words is controlled by the evidence path's continuity. Simultaneously, after the answer words are generated, they are written to the corresponding path positions, and the answer text is segmented according to the path positions. This establishes a segment-by-segment correspondence between answer fragments and evidence paths, reducing the likelihood of generated content deviating from the evidence scope or incorporating weakly related knowledge, thus enhancing the traceability and reliability of the smart education question-and-answer results. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.
[0019] Figure 1 This is a flowchart illustrating a smart education question-answering method based on knowledge graphs and AI, provided as an embodiment of the present invention.
[0020] Figure 2 This is a schematic diagram of a module of a smart education question-answering system based on knowledge graphs and AI, provided as an embodiment of the present invention. Detailed Implementation
[0021] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided to make the description of this application more complete and comprehensive, and to fully convey the concept of the exemplary embodiments to those skilled in the art. The drawings are merely illustrative illustrations of this application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted.
[0022] Furthermore, the described features, structures, or characteristics can be combined in any suitable manner in one or more exemplary embodiments. Numerous specific details are provided in the following description to give a full understanding of the exemplary embodiments disclosed in this application. However, those skilled in the art will recognize that the technical solutions disclosed in this application can be practiced with one or more specific details omitted, or other methods, components, steps, etc., can be employed. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring various aspects of the disclosure of this application.
[0023] Example 1
[0024] like Figure 1 As shown in the figure, this embodiment discloses a smart education question-answering method based on knowledge graphs and AI, the method including:
[0025] The method addresses original questions in natural language form. It involves an educational knowledge graph, an entity alias table, a type lexicon, a text semantic encoder, a generative model lexicon, a generative model word segmenter, and a generative model. The educational knowledge graph stores entity nodes, type nodes, attribute nodes, and edge relationships within the educational domain. The entity alias table stores the standard entity name, entity alias, and entity node identifier. The type lexicon stores the answer type name, type node identifier, and type name. The generative model lexicon stores the lexical indices that the generative model can output. The generative model word segmenter converts text into lexical indices from the generative model lexicon and then reconstructs the text from the lexical indices.
[0026] The original problem is processed to form a set of evidence paths. Path location sequence Generative model vocabulary index sequence The generative model vocabulary index set at the current decoding position Candidate word probability distribution Evidence-constrained probability distribution of candidate lexical units and path location text fragments collection The results of the smart education Q&A include the answer text and the set of evidence paths between the answer text and the answer text. The segment-by-segment correspondence results.
[0027] S101: Semantically encode the original question, identify the educational entity corresponding to the original question, determine the answer type based on the question's intent, and retrieve connectivity paths in the educational knowledge graph based on the educational entity and answer type to form an evidence path set. .
[0028] In practice, the original questions are standardized by unifying the encoding formats of Chinese punctuation, English punctuation, mathematical symbols, formula characters, and spaces, forming a standardized question text. This standardized question text serves as the common input for semantic encoding, educational entity recognition, and answer type determination. Through this process, the same educational entity can be mapped to a unified text foundation under different input formats.
[0029] Input the question specification text into the text semantic encoder. The text semantic encoder converts the question specification text into a sequence of lexical numbers. ,in, Indicates the first Each word is assigned a meta-number. This represents the number of lexical units corresponding to the question specification text. The text semantic encoder uses a sequence of lexical unit numbers. Context encoding is performed to form the latent vectors corresponding to each word. Then, mean pooling and normalization are performed on the latent vectors to form the semantic encoding of the original problem. The calculation method is as follows:
[0030] ;
[0031] ;
[0032] In the formula, A pooling vector representing the specification text of the problem; The L2 norm of the pooling vector; The semantic encoder represents the original question. It is trained using educational question-and-answer text and graph text from an educational knowledge graph. Training samples include positive and negative sample pairs. Positive sample pairs consist of question text and graph text belonging to the same question-and-answer evidence relationship. Negative sample pairs consist of graph text corresponding to different knowledge points or different answer types. The trained semantic encoder enables the original question and candidate evidence paths to compute semantic relevance within the same vector space.
[0033] Educational entity identification is accomplished through the question specification text and an entity alias table. Specifically, the standard names and aliases of entities in the educational knowledge graph are read to form an educational entity lexicon; the question specification text is scanned according to word positions, and continuous text segments in the question specification text are matched with the educational entity lexicon; successfully matched continuous text segments are used as candidate educational entity texts. If multiple candidate educational entity texts have a text range inclusion relationship, the candidate educational entity text with the more covered words is retained, and the start and end positions of the candidate educational entity text in the question specification text are recorded. Subsequently, the educational entity corresponding to the original question is determined based on the matching record of the candidate educational entity text in the entity alias table. For example, when the original question is "What is the vertex form of a quadratic function?", the educational entity is "quadratic function"; when the original question is "Why does photosynthesis need chlorophyll?", the educational entities are "photosynthesis" and "chlorophyll".
[0034] The answer type is determined based on the question's indicative text. Specifically, it involves reading the question words, question phrases, and associated predicate phrases from the question's specification text to form the question-indicative text. This text is then input into an answer type mapping table. The answer type mapping table stores the question-indicative text, answer type names, and candidate type node names. The answer type is obtained based on the correspondence between the question-indicative text and the answer type mapping table. For example, if the question-indicative text is "What is it?" and the object being asked is a formula or expression name, the answer type is the expression form class; if the question-indicative text is "Why?" and the object being asked involves a relationship of action, the answer type is the explanation of cause class; and if the question-indicative text is "How to calculate", the answer type is the calculation process class.
[0035] Specifically, the process involves retrieving connectivity paths in the educational knowledge graph based on educational entities and answer types to form a set of evidence paths. ,include:
[0036] This process uses educational entities, answer types, educational knowledge graphs, entity alias tables, and type vocabularies as data sources. Input data consists of educational entities and answer types. Output data is a set of evidence paths. Evidence path set Each evidence path retains a path source identifier, node identifier, edge relationship identifier, edge direction field, endpoint type field, and graph text field. These fields are used to form the subsequent path location sequence. Provide the structural source.
[0037] Link educational entities to entity nodes in the educational knowledge graph to form query entity node records. .
[0038] The text corresponding to the educational entity is entered into an entity alias table. The entity alias table returns the entity node identifiers that match the text corresponding to the educational entity. The entity node identifiers are read, and the nodes pointed to by these identifiers are used as the entity nodes of the educational entity in the educational knowledge graph. Entity nodes in the educational knowledge graph are used to represent course concepts, knowledge points, formulas, subject terms, and teaching objects.
[0039] If the text corresponding to an educational entity can match multiple entity nodes, the subject terms and adjacent concept terms in the question specification text are read and matched with the subject field and the superordinate and subordinate fields of the entity in the entity alias table to obtain the entity node consistent with the question specification text. The subject terms come from the subject name or course field in the original question; the adjacent concept terms come from the text fragments adjacent to the educational entity. Through the above processing, the educational entity is linked to the entity node in the educational knowledge graph that is consistent with the question context.
[0040] Query entity node records This includes three fields: Entity Node Identifier, Entity Standard Name, and Entity Alias. The Entity Node Identifier field records the node identifier of the educational entity in the educational knowledge graph; the Entity Standard Name field records the standard name of the entity node in the educational knowledge graph; and the Entity Alias field records the educational entity text in the original question and the corresponding alias text in the Entity Alias table. Querying entity node records... Provides the starting point for connected path retrieval.
[0041] Among them, the query entity node record It includes an entity node identifier field, an entity standard name field, and an entity alias field. The type node includes a type node identifier field and a type name field. The entity node identifier field points to an entity node in the educational knowledge graph, and the type node identifier field points to a type node in the educational knowledge graph.
[0042] The entity node identifier field corresponds to the entity node in the educational knowledge graph. The entity standard name field is used to generate the graph text. The entity alias field is used to store the entity representation in the original question. The type node identifier field corresponds to the type node in the educational knowledge graph. The type name field is used to record the standard type name of the answer type in the educational knowledge graph. Through the above field settings, both educational entities and answer types are converted into searchable objects in the educational knowledge graph, enabling connected path retrieval to have both starting node and ending point type criteria.
[0043] Map answer types to type nodes in the educational knowledge graph, based on the query entity node records. Read the connected paths of endpoint type matching type nodes in the educational knowledge graph to form a set of candidate evidence paths. .
[0044] The answer type input is a type terminology. The terminology stores the answer type name, type node identifier, and type name fields. The type node identifier corresponding to the answer type is read, and the node pointed to by that identifier is used as the type node in the educational knowledge graph. Type nodes are used to constrain the endpoint type of connected paths, ensuring that candidate evidence paths correspond to the expected answer category of the original question.
[0045] To query entity node records The entity node pointed to by the entity node identifier serves as the starting point of the path. Connecting paths are read along the edge directions in the educational knowledge graph. Each connecting path consists of a starting node, edge relationships, an ending node, and edge direction. The ending type field of the ending node of the connecting path is read, and the ending type field is mapped to the type node identifier. The connecting paths corresponding to the ending type field and the type node identifier are written into the candidate evidence path set. .
[0046] Candidate evidence path set Each candidate evidence path includes a path source identifier, a starting node identifier, an edge relation identifier, an ending node identifier, an edge direction field, an ending type field, and a graph text field. The path source identifier distinguishes different candidate evidence paths. The starting node identifier, edge relation identifier, and ending node identifier preserve the structural origin of the connected path within the educational knowledge graph. The edge direction field records the reading direction of the connected path. The ending type field records the type node corresponding to the ending node. The graph text field is generated from the node names and edge relation names in the connected path according to the edge direction.
[0047] For example, the original question is "What is the vertex form of a quadratic function?" The educational entity is "quadratic function," and the answer type is the expression form class. Linking the educational entities yields the "quadratic function" entity node, and mapping the answer type yields the type node corresponding to the expression form class. The connected path is read starting from the "quadratic function" entity node. The endpoint type of the connected path "quadratic function → has expression form → vertex form" corresponds to the expression form class; therefore, this connected path is added to the candidate evidence path set. This process makes the set of candidate evidence paths... Simultaneously constrained by educational entities and answer types, this reduces the number of connectivity paths unrelated to the answer category that enter subsequent calculation processes.
[0048] Calculate the semantic encoding of the original question and the set of candidate evidence paths. The cosine values of the angles between the text vectors of the candidate evidence paths are used to form a set of evidence paths in the order of their cosine values. .
[0049] Read the candidate evidence path set The graph text field for each candidate evidence path is generated and input into the text semantic encoder. The text semantic encoder outputs the graph text vector of the candidate evidence path using the same encoding method as the original question. .in, Represents the set of candidate evidence paths The sequence number of the candidate evidence path. Semantic encoding of the original question. and graph text vector They reside in the same vector space.
[0050] Computational semantic encoding With graph text vector cosine value of the angle between The calculation formula is as follows:
[0051] ;
[0052] In the formula, Indicates the original problem and the... The cosine of the angle between the candidate evidence paths; Semantic encoding With graph text vector The inner product; Semantic encoding The second norm; Represents a graph text vector The 2-norm.
[0053] According to the cosine value of the included angle The candidate evidence paths are sorted according to their arrangement order, and the sorted candidate evidence paths are written into the evidence path set. When multiple candidate evidence paths have the same cosine value of their included angle, they are written into the evidence path set according to the fixed sorting rules of the path source identifiers. This makes the evidence path set The permutation results can be repeated. Evidence path set. Inheritance candidate evidence path set The graph contains the path source identifier, starting node identifier, edge relationship identifier, ending node identifier, edge direction field, ending type field, and graph text field.
[0054] This step integrates answer type constraints, graph connectivity, and semantic relevance of the original question into the evidence path set. The formation process. Evidence path set. It possesses a graph structure origin and semantic arrangement relationship, providing an evidentiary basis for subsequent vocabulary index constraints.
[0055] S102: Set up evidence paths The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. .
[0056] Evidence path set This data is converted into a vocabulary index that can be used in the word-by-word decoding stage of the generative model. The generative model uses tokens as output units, and each token in the generative model's vocabulary has a corresponding vocabulary index. Evidence path set. The graph text is first converted into a vocabulary index set, and then organized into a generative model vocabulary index sequence according to the sequential order of path positions. Generative model vocabulary index sequence The input is a set of evidence paths. The system consists of a generative model vocabulary and a generative model word segmenter, and its output is a sequence of vocabulary indexes with path positional relationships.
[0057] Specifically, the generative model vocabulary index sequence is formed according to the order of succession of path positions. ,include:
[0058] This process converts the graph text in the evidence path into a vocabulary index while preserving the sequential order between path positions. Path positions represent the graph text units that appear sequentially along the edge directions in the evidence path. Generative model vocabulary index sequence. Once formed, the subsequent decoding process can read the vocabulary index set of adjacent successor positions based on the path position corresponding to the generated answer fragment.
[0059] Based on the evidence path set The edge directions of the connected path are marked with path positions to form a path position sequence. .
[0060] Read evidence path set Each evidence path includes a path source identifier, a starting node identifier, an edge relationship identifier, an ending node identifier, and an edge direction field. The arrival order of nodes is determined based on the edge direction, and the starting node text, edge relationship text, and ending node text in the connected path are sequentially marked as path positions. If the connected path includes attribute node text connected to the ending node, the attribute node text is marked as a path position after the ending node according to the edge direction between the attribute node and the ending node. For the evidence path "quadratic function → having an expression form → vertex form", "quadratic function", "having an expression form", and "vertex form" are marked as path positions respectively.
[0061] Write the path location sequence for each path location. The sequence items. Path position sequence This is used to record the sequential relationships between the positions of each path in the evidence path. These relationships originate from the set of evidence paths. The edge directions of the connected paths, not the freely generated results of generative models. Path position sequence. This provides a sequential basis at the map path level for subsequently determining the current decoding position.
[0062] Wherein, the path location sequence Each sequence item includes a path location number field, a predecessor path location number field, a successor path location number field, and a path source field. The path location index set includes a path location number field and a word index field, and the path location number field and the word index field correspond one-to-one.
[0063] The Path Location Number field records the path location's number within its respective evidence path. The Precursor Path Location Number field records the preceding position of the current path location. The Successor Path Location Number field records the following position of the current path location. The Path Source field records the evidence path to which the current path location belongs. The Path Location Index set includes a Path Location Number field and a lexicon index field, with a one-to-one correspondence between the Path Location Number field and the lexicon index field, enabling each lexicon index to trace back to its corresponding path location. This field configuration establishes a fixed correspondence between path locations, graph sources, and lexicon indexes, providing a basis for recording the path locations corresponding to subsequent answer terms.
[0064] path location sequence The graph text of each path location is input into the generative model word segmenter to form a set of path location indexes corresponding to each path location.
[0065] Read path location sequence The graph text corresponding to each path position is processed and input into the generative model's word segmenter. The generative model's word segmenter uses the same vocabulary as the subsequent generative model used to generate the answer. The word segmenter divides the graph text into a sequence of tokens and outputs the vocabulary index of each token in the generative model's vocabulary. The vocabulary indices corresponding to the same path position are written into the same path position index set.
[0066] The path location index set also includes a lexical order field. This field records the order of the lexical indexes within the same path location. This field ensures that the graph text corresponding to the path location retains its original lexical order after conversion to a lexical index. For graph text containing mathematical expressions, a lexical index is formed based on the segmentation rules of the generative model's word segmenter for mathematical symbols. For example, graph text... After segmentation by the word segmenter, letters, equal signs, parentheses, minus signs, exponent symbols, and plus signs correspond to the word indexes in the generative model's word list, and are written into the same path position index set. This process allows text-based and formula-based graph content to enter the same word index system.
[0067] According to the path location sequence The path position index set corresponding to each path position is arranged in the order of succession to form the generative model vocabulary index sequence. .
[0068] Read path location sequence Each sequence item in the dictionary contains a path source field, a path position number field, a predecessor path position number field, and a successor path position number field. Under the same path source field, the order of succession of each path position is determined based on the predecessor and successor path position number fields. Subsequently, the path position index sets corresponding to each path position are arranged according to this order, forming a generative model vocabulary index sequence. .
[0069] Generative model vocabulary index sequence Each sequence item includes a path source field, a path position number field, a predecessor path position number field, a successor path position number field, and a path position index set. The path source field is used to distinguish evidence path sets. The different evidence paths are represented. The path position number field is used to establish the correspondence between answer terms and path positions in the subsequent decoding stage. The predecessor path position number field and the successor path position number field are used to store adjacent succession relationships. The path position index set is used to provide the generative model vocabulary index range corresponding to the current path position.
[0070] Through the above processing, the evidence path set Converted into a generative model vocabulary index sequence This process transforms the discrete connected paths in the educational knowledge graph into an ordered index structure in the generative model's vocabulary space, enabling subsequent word-by-word decoding to read the range of available lexical units based on the path position. This provides a data structure foundation for the process of constraining the probability distribution of graph evidence entering candidate lexical units.
[0071] S103: During word-by-word decoding in the generative model, the path position corresponding to the generated answer fragment is used in the generative model's vocabulary index sequence. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the path location corresponding to the answer term.
[0072] Generative model vocabulary index sequence Once formed, the original question and the set of evidence paths The graph text and the generated answer fragments are input into the generative model. The generative model is in the... Probability distribution of output candidate words at each decoding position .in, Indicates the output order of the current answer word. Represents the vocabulary of generative models Chinese Meta-index Candidate probabilities at the current decoding position. Candidate lexical probability distribution. Unnormalized probability values that can be output by a generative model After normalization, the calculation formula is as follows:
[0073] ;
[0074] In the formula, The generative model is represented in the first... Each decoding position pairs the lexical index The output is the unnormalized probability value; Represents the vocabulary of a generative model; This represents the meta-index of any word in the generative model vocabulary.
[0075] Generative model vocabulary index set at the current decoding position Derived from generative model vocabulary index sequence And the path position corresponding to the generated answer fragment is in the generative model vocabulary index sequence. The adjacent succession state is determined. The adjacent succession state is jointly represented by the path source field, path position number field, predecessor path position number field, and successor path position number field. Through this adjacent succession state, the path positions already covered by the current answer fragment are identified, and the vocabulary index range that can be called upon for the next decoding position is determined from the adjacent path positions. This process improves the probability distribution of candidate lexical units. The constraints are based on the structural continuity of the evidence path, rather than on the overall text comparison after the answer is generated. Graph evidence thus enters the word-by-word decoding stage, which helps reduce the risk of the answer text bypassing the evidence path.
[0076] Specifically, the path position corresponding to the generated answer fragment is in the generative model vocabulary index sequence. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,include:
[0077] The input to this process is the generated answer fragment, the answer lexical output record, and the generative model vocabulary index sequence. The output is the generative model vocabulary index set at the current decoding position. The answer term output record is formed by the preceding decoding process, recording the term index, output order, and corresponding path position of each output answer term. (Generative model vocabulary index sequence) From the set of evidence paths It is formed after path location tagging and word segmentation mapping. The two are linked by the path location field and the path source field.
[0078] Read the path positions corresponding to each answer word in the generated answer fragment, and form a sequence of generated path positions according to the output order of the answer words. .
[0079] Read the first The first to the second There are 100 output records of answer words. Each output record includes the output order of the answer word, the index of the answer word, the path position, and the path source. Arrange the above records according to the output order of the answer words, and extract the path position from each record to form a sequence of generated path positions. Among them, subscript This indicates that the sequence corresponds to the first... The answer fragments generated before decoding each word.
[0080] Path location sequence generated This indicates where the current answer text has been generated along the evidence path. For example, if the original question is "What is the vertex form of a quadratic function?", and the generated answer fragment is "the vertex form of a quadratic function", where "quadratic function" corresponds to the entity path position in the evidence path, and "vertex form" corresponds to the answer form path position in the evidence path, then the generated path position sequence is... Record the positions of the two paths mentioned above and their output order. This sequence is then used in conjunction with the generative model's vocabulary index sequence. Perform suffix alignment to determine the path position that the current decoding position should follow.
[0081] Among them, the generated path location sequence Each sequence item includes an answer word output position field, an answer word index field, a path position field, and a path source field. The terminal subsequence consists of sequence items that are output consecutively under the same path source field.
[0082] The "Answer Term Output Order" field records the generation order of answer terms in the answer text. The "Answer Term Index" field records the index of the answer term in the generative model vocabulary. The "Path Position" field records the evidence path position corresponding to the answer term. The "Path Source" field records the evidence path to which the answer term belongs. The "Terminal Subsequence" consists of consecutively output sequence items under the same "Path Source" field, representing the path position records generated consecutively along the same evidence path at the end of the current answer fragment. Through these fields, a path position sequence has been generated. It can preserve the correspondence between the order of answer generation, lexical index, and evidence path source.
[0083] The generated path location sequence Terminal subsequences and generative model vocabulary index sequences The path position subsequences with the same path origin are suffix aligned to form a set of successor path positions. .
[0084] Read the generated path location sequence The terminal subsequences are analyzed, and the path source field is extracted from them. Subsequently, the generative model vocabulary index sequence is used. The system reads path location subsequences that share the same path source field. Each path location subsequence consists of a path location number field, a predecessor path location number field, and a successor path location number field, all belonging to the same path source field.
[0085] The path position fields in the terminal subsequence are suffix-aligned with the terminal portions of the path position subsequences that share the same path source, in the output order. Suffix alignment is determined by the consistency of the path position numbers. If a path position number in the terminal subsequence matches a consecutive terminal path position number in the path position subsequence, the successor path position number field of that consecutive terminal path position is read, and the path position pointed to by the successor path position number field is written into the set of inheriting path positions. .
[0086] When the successor path position number field of a consecutive terminal path position is empty, the generative model vocabulary will be used. The end-of-word index is written into the generative model vocabulary index set at the current decoding position. When the subsequent decoding strategy selects the end term index, word-by-word decoding stops, and the process of answer text synthesis and evidence segmentation begins. If multiple evidence paths have the same generated terminal subsequence, the positions of the successor paths corresponding to each evidence path are read and written into the set of successor path positions. .
[0087] Path location set Indicates the first Each decoding position can be connected to an evidence path position. Its source is the end state of the generated answer fragment within the evidence path and the generative model vocabulary index sequence. The process involves linking generated content with adjacent positions on the graph path to determine the available graph range for the next term, avoiding the direct generation of answers based solely on static evidence text.
[0088] Read the set of receiving path locations The first and last word indexes of each path position are compared with the last word indexes of the generated answer fragments using a word segmentation boundary concatenation check. The generative model word index sets corresponding to the path positions that pass the boundary concatenation check are merged to determine the generative model word index set for the current decoding position. .
[0089] Read the set of receiving path locations For each path location, a path location index set is generated, and the first-order word index is read from each path location index set. The first-order word index is the word index that appears first after the generative model word segmenter segments the path location graph text. Simultaneously, the end-order word indexes of the generated answer fragments are read. The end-order word indexes of the generated answer fragments are derived from the generated path location sequences. The answer term index field in the last sequence item.
[0090] Combine the word element text corresponding to the terminology index at the end of the generated answer fragment with the set of continuation path positions. The word texts corresponding to the first word index of each path position are concatenated to obtain candidate concatenated texts. These candidate concatenated texts are then input into the generative model word segmenter to obtain a sequence of concatenated word indices. The last two word indices in the concatenated word index sequence are read and compared bit-by-bit with a binary sequence consisting of the end word index of the generated answer fragment and its corresponding first word index. If they match bit-by-bit, the corresponding path position passes the word segmenter boundary concatenation verification. Through this comparison, it can be confirmed that the end word of the generated answer fragment and the first word of the path position to be connected maintain continuity at the word segmenter word boundary.
[0091] The generative model vocabulary index sets corresponding to the path positions verified by the word segmenter boundary concatenation are merged to form the generative model vocabulary index set for the current decoding position. If more than two path locations pass the verification, then This is the union of the generative model vocabulary index sets corresponding to each validated path location. To ensure continuous execution of subsequent probability distribution calculations, when the set of validated path locations is empty, the set of successor path locations is read. The path positions where the source field of the middle path matches the terminal subsequence are identified, and the generative model vocabulary index sets corresponding to these path positions are merged into the generative model vocabulary index set of the current decoding position. This supplementary processing still uses the set of receiving path locations. The data source is maintained, and the vocabulary index is derived from the evidence path set. Through the above processing, the available vocabulary index range at the current decoding position is simultaneously constrained by the adjacency relationship of the evidence path and the continuity of the segmenter boundary.
[0092] Specifically, the term "will" Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer terminology and the corresponding path location of the answer terminology, including:
[0093] The input to this process is the generative model vocabulary index set at the current decoding position. Generative model vocabulary and candidate word probability distribution The output is an evidence-constrained probability distribution of candidate lexical units. The process involves identifying the answer terminology and its corresponding path location. This process transforms the probability distribution of candidate terms in the generative model from a full vocabulary scope to a probability distribution constrained by evidence paths.
[0094] In the generative model vocabulary The above is based on the generative model vocabulary index set at the current decoding position. Generate index mask ,make The mask value corresponding to the inner word meta-index is ,make The mask value corresponding to the remaining term index in the middle is .
[0095] In the generative model vocabulary Establish an index mask consistent with the vocabulary dimensions. Index mask Each element corresponds to a generative model vocabulary. A word index is used. The generative model vocabulary is traversed. The lexical index is used to determine whether it belongs to the generative model lexical index set at the current decoding position. .belong The lexical index corresponds to a mask value of Generative model vocabulary The mask value corresponding to the remaining meta-indexes is .
[0096] Index mask Transform the vocabulary index corresponding to the current evidence path position into a constraint structure usable by the probability distribution. Index mask With candidate word probability distribution Having the same vocabulary dimension, it is possible to apply the probability distribution of candidate lexical units item by item. This process enables the spectral evidence corresponding to the current decoding position to impose constraints on candidate lexical selection at the probabilistic level.
[0097] index mask Applied to candidate lexical probability distribution ,according to Forming an evidence-constrained probability distribution of candidate lexical units .
[0098] index mask With candidate word probability distribution The probability values of identical lexical index positions are multiplied one by one to form the constrained unnormalized probability value. Subsequently, the constrained unnormalized probability value is normalized to form the evidence-constrained candidate lexical probability distribution. The calculation formula is as follows:
[0099] ;
[0100] In the formula, Represents the vocabulary of generative models The lexical index in the text; Lexical indexes in the probability distribution of candidate lexical elements constrained by evidence. The probability of; Lexical index in the probability distribution of candidate lexical ... The probability of; Indicates the lexical index in the index mask The corresponding mask value; Represents the vocabulary of generative models Meta-index for any word in the Chinese dictionary.
[0101] Due to the probability distribution of candidate lexical units The generative model vocabulary is obtained from the normalization function. Each lexical index corresponds to a finite probability value; simultaneously, the generative model vocabulary index set at the current decoding position. From the set of receiving path locations The path position index set is formed by merging or by forming the end-word index, therefore the index mask. There exists at least one mask value. The lexical index. Therefore, the denominator... A probability distribution of candidate lexical units with non-zero values, subject to evidence constraints. It can perform normalization.
[0102] After normalization, the probability distribution of candidate lexical units constrained by evidence It still maintains the probability distribution form, and its effective probability is concentrated in the generative model vocabulary index set at the current decoding position. The corresponding lexical units. This process transforms the current position of the evidence path into a constraint on the generative model's decoding probability distribution, ensuring that the selection of answer lexical units occurs within the allowed vocabulary index range of the evidence path.
[0103] Based on the generative model decoding strategy, the probability distribution of candidate lexical units constrained by evidence is used. Select the word index, write the current decoding position into the path position record corresponding to the word index, and output the answer word and the path position corresponding to the answer word.
[0104] The generative model's decoding strategy is a probability value-first selection strategy. It reads the probability distribution of candidate lexical units constrained by evidence. The word index that ranks first in probability value is used as the answer word index for the current decoding position. Then, the word text corresponding to this index is read from the generative model vocabulary to form the answer word.
[0105] When the selected lexical index is the end lexical index, it is used to indicate the completion of the answer lexical output; this end lexical index is not written to the answer text. For non-end lexical indices, the current decoding position is written to the path position record corresponding to that lexical index. The path position record includes the answer lexical output order field, the answer lexical index field, the path position field, and the path source field. The answer lexical output order field records the output order of the current lexical. The answer term index field records the term index of the current output. The path position field records the path position corresponding to the current decoding position. The path source field records the evidence path to which the current decoding position belongs. The above path position records form a sequence of generated path positions as the next decoding position. The data source.
[0106] The above processing outputs answer words and their corresponding path positions. Answer words are used to compose the answer text, and their corresponding path positions are used for subsequent evidence segmentation. This processing ensures that for each answer word output by the generative model, a corresponding evidence path position record is simultaneously created, providing a basis for subsequent answer text and evidence path sets. The data is provided for each segment.
[0107] S104: Based on the path positions corresponding to the answer words, the answer text synthesized in the output order is segmented into evidence segments, forming an answer text and an evidence path set. The results of each segment correspondence are used to output the smart education Q&A results.
[0108] After the generative model outputs the answer tokens, it reads all answer tokens and their corresponding path positions. The input source for the answer tokens is the sequence of answer tokens output from the aforementioned decoding process. The input source for the path positions corresponding to the answer tokens is the path position record generated by the aforementioned decoding process. The answer text is synthesized according to the output order of the answer tokens, and then segmented into text fragments corresponding to the evidence path positions based on the path positions of the answer tokens. The output of this process is the smart education question-and-answer result.
[0109] Specifically, the output of the smart education question-and-answer results includes:
[0110] The input to this process is the answer terminology, the path location corresponding to the answer terminology, and the set of evidence paths. The output is a smart education question-and-answer result containing the answer text and path location text fragments. The path location text fragments are used to represent the evidence path location corresponding to each evidence fragment in the answer text.
[0111] The answer text is synthesized according to the output order of the answer words, and then segmented into a set of path position text fragments based on the path positions corresponding to the answer words. .
[0112] All answer words are arranged according to their output position field, and the generative model word segmenter's reverse restoration rules are used to restore the answer words to the answer text. During the restoration process, the output position field and path location field of each answer word are retained. Subsequently, the answer text is segmented according to the path location corresponding to the answer words. When consecutive answer words have the same path source field and the same path location field, these consecutive answer words are merged into a text segment with the same path location. When adjacent answer words correspond to different path locations, a text segment boundary is formed between the two answer words.
[0113] The resulting path location text fragments are written into the path location text fragment set. A collection of text fragments indicating path locations. Each path location text fragment includes a text fragment field, a path source field, a path location field, a fragment start output order field, and a fragment end output order field. The text fragment field records the fragment content within the answer text. The path source and path location fields record the evidence path location corresponding to the text fragment. The fragment start output order and fragment end output order fields record the position range of the text fragment within the answer word sequence.
[0114] For example, the original question is "What is the vertex form of a quadratic function?", and the answer text is "The vertex form of a quadratic function is..." "The vertex form of the quadratic function" corresponds to the position of the expression path in the evidence path. The corresponding formula path location is used to segment the answer text into text fragments corresponding to the path locations, and these fragments are then written into a path location text fragment set. .
[0115] Collection of path location text fragments Each path location text fragment is written to the corresponding path location output field, and the answer text and the corresponding path location output field are encapsulated into a smart education question and answer result.
[0116] Read the collection of text fragments at the path location Each path location text fragment, and based on the path source field and path location field, in the evidence path set. Locate the corresponding path position. Write the text fragment field to the output field at that path position. After writing is complete, the evidence path set is generated. The path location in the text corresponds to the answer text fragment.
[0117] The answer text and the corresponding path location output fields are encapsulated into a smart education question-and-answer result. The smart education question-and-answer result includes an answer text field, an evidence path field, and a path location text fragment field. The answer text field presents the natural language answer output by the generative model. The evidence path field records the source of the evidence path corresponding to the answer text. The path location text fragment field records the correspondence between each text fragment in the answer text and the evidence path location.
[0118] This processing enables smart education question-and-answer results to simultaneously include natural language answers and path location text fragments, preserving both the answer text and the evidence path set. The segment-by-segment correspondence provides a structured basis for tracing answers, presenting evidence, and verifying results.
[0119] Example 2
[0120] like Figure 2 As shown in the example, the parts not detailed in this embodiment are as shown in Example 1. This embodiment discloses a smart education question-answering system based on knowledge graphs and AI, including:
[0121] The evidence tracing module 201 is used to semantically encode the original question, identify the educational entity corresponding to the original question, determine the answer type based on the question's indicative meaning, and retrieve connectivity paths in the educational knowledge graph based on the educational entity and answer type to form an evidence path set. ;
[0122] The mapping module 202 is used to set the evidence paths. The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. ;
[0123] The resolution module 203 is used to determine the path position of the generated answer fragment in the generative model's vocabulary index sequence during word-by-word decoding in the generative model. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the corresponding path position of the answer term;
[0124] The question-answering module 204 is used to segment the answer text synthesized from the answer words in the output order according to the path positions corresponding to the answer words, forming a set of answer text and evidence paths. The results of each segment correspondence are used to output the smart education Q&A results.
[0125] The foregoing has only described certain exemplary embodiments of the present invention by way of illustration. Undoubtedly, those skilled in the art can modify the described embodiments in various ways without departing from the spirit and scope of the present invention. Therefore, the foregoing drawings and descriptions are illustrative in nature and should not be construed as limiting the scope of protection of the claims of the present invention.
Claims
1. A smart education question and answer method based on a knowledge graph and AI, characterized in that, include: The original question is semantically encoded to identify the corresponding educational entity. The answer type is determined based on the question's intent. Then, connectivity paths are retrieved in the educational knowledge graph based on the educational entity and answer type to form a set of evidence paths. ; Set of evidence paths The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. ; During word-by-word decoding in the generative model, the path position corresponding to the generated answer fragment is used in the generative model's vocabulary index sequence. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the corresponding path position of the answer term; Based on the path positions corresponding to the answer words, the answer text synthesized in the output order is segmented into evidence segments, forming an answer text and an evidence path set. The results of each segment correspondence are used to output the smart education Q&A results.
2. The method according to claim 1, characterized in that, The process involves retrieving connectivity paths in the educational knowledge graph based on educational entities and answer types, forming a set of evidence paths. ,include: Link educational entities to entity nodes in the educational knowledge graph to form query entity node records. ; Map answer types to type nodes in the educational knowledge graph, based on the query entity node records. Read the connected paths of endpoint type matching type nodes in the educational knowledge graph to form a set of candidate evidence paths. ; Calculate the semantic encoding of the original question and the set of candidate evidence paths. The cosine values of the angles between the text vectors of the candidate evidence paths are used to form a set of evidence paths in the order of their cosine values. .
3. The method according to claim 2, characterized in that, The query entity node record It includes an entity node identifier field, an entity standard name field, and an entity alias field. The type node includes a type node identifier field and a type name field. The entity node identifier field points to an entity node in the educational knowledge graph, and the type node identifier field points to a type node in the educational knowledge graph.
4. The method according to claim 1, characterized in that, The generative model vocabulary index sequence is formed according to the order of succession based on path position. ,include: Based on the evidence path set The edge directions of the connected path are marked with path positions to form a path position sequence. ; path location sequence The graph text of each path location is input into the generative model word segmenter to form a set of path location indexes corresponding to each path location; According to the path location sequence The path position index set corresponding to each path position is arranged in the order of succession to form the generative model vocabulary index sequence. .
5. The method according to claim 4, characterized in that, The path location sequence Each sequence item includes a path location number field, a predecessor path location number field, a successor path location number field, and a path source field. The path location index set includes a path location number field and a word index field, and the path location number field and the word index field correspond one-to-one.
6. The method according to claim 1, characterized in that, The basis is the path position corresponding to the generated answer fragment in the generative model vocabulary index sequence. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,include: Read the path positions corresponding to each answer word in the generated answer fragment, and form a sequence of generated path positions according to the output order of the answer words. ; The generated path location sequence Terminal subsequences and generative model vocabulary index sequences The path position subsequences with the same path origin are suffix aligned to form a set of successor path positions. ; Read the set of receiving path locations The first and last word indexes of each path position are compared with the last word indexes of the generated answer fragments using a word segmentation boundary concatenation check. The generative model word index sets corresponding to the path positions that pass the boundary concatenation check are merged to determine the generative model word index set for the current decoding position. .
7. The method according to claim 6, characterized in that, The generated path location sequence Each sequence item includes an answer word output position field, an answer word index field, a path position field, and a path source field. The terminal subsequence consists of sequence items that are output consecutively under the same path source field.
8. The method according to claim 1, characterized in that, The Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer terminology and the corresponding path location of the answer terminology, including: In the generative model vocabulary The above is based on the generative model vocabulary index set at the current decoding position. Generate index mask ,make The mask value corresponding to the inner word meta-index is ,make The mask value corresponding to the remaining term index in the middle is ; index mask Applied to candidate lexical probability distribution ,according to Forming an evidence-constrained probability distribution of candidate lexical units ; Based on the generative model decoding strategy, the probability distribution of candidate lexical units constrained by evidence is used. Select the word index, write the current decoding position into the path position record corresponding to the word index, and output the answer word and the path position corresponding to the answer word.
9. The method according to claim 1, characterized in that, The output of the smart education question-and-answer results includes: The answer text is synthesized according to the output order of the answer words, and then segmented into a set of path position text fragments based on the path positions corresponding to the answer words. ; Collection of path location text fragments Each path location text fragment is written to the corresponding path location output field, and the answer text and the corresponding path location output field are encapsulated into a smart education question and answer result.
10. A smart education question-answering system based on knowledge graphs and AI, implemented according to the method of any one of claims 1-9, characterized in that, include: The evidence tracing module is used to semantically encode the original question, identify the educational entity corresponding to the original question, determine the answer type based on the question's indicative meaning, and retrieve connectivity paths in the educational knowledge graph based on the educational entity and answer type to form a set of evidence paths. ; The mapping module is used to set up evidence paths. The graph text mapping of each path position is converted into a generative model vocabulary index set, which forms a generative model vocabulary index sequence according to the order of the path positions. ; The reduction module is used to determine the path position of the generated answer fragment in the generative model's vocabulary index sequence during word-by-word decoding. The adjacent succession states in the generative model vocabulary index set determine the current decoding position. ,Will Acting on the probability distribution of candidate lexical units Forming an evidence-constrained probability distribution of candidate lexical units It outputs the answer term and the corresponding path position of the answer term; The question-and-answer module is used to segment the answer text based on the path positions of the answer words in the output order, forming a set of answer text and evidence paths. The results of each segment correspondence are used to output the smart education Q&A results.