Intelligent question answering method and device, equipment and storage medium

By using entity location weight parameters for named entity recognition and attribute extraction in an intelligent question-answering system, answers are obtained from the target domain knowledge graph, solving the problems of low efficiency and accuracy in knowledge retrieval in existing technologies, and achieving efficient and accurate knowledge acquisition.

CN116719915BActive Publication Date: 2026-06-16INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2023-05-15
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In existing technologies, knowledge retrieval methods that rely primarily on human memory and search engines suffer from incomplete knowledge and inconvenient access, resulting in low efficiency and accuracy in knowledge retrieval.

Method used

By acquiring target questions related to the target domain, named entity recognition is performed using entity location weight parameters to obtain entities in the target question. Triple sets are obtained from the target domain knowledge graph, attribute values ​​are concatenated to form an input sequence, and attribute extraction is performed to determine the relevant answer.

🎯Benefits of technology

It improves the accuracy and efficiency of the intelligent question-answering system, and enhances the accuracy of entity recognition and user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116719915B_ABST
    Figure CN116719915B_ABST
Patent Text Reader

Abstract

The present disclosure provides a kind of intelligent question and answer method, device, equipment and storage medium, can be applied to knowledge graph technical field, artificial intelligence technical field and financial technology field.The method comprises: obtaining target problem related to target field;Based on entity position weight parameter, the named entity recognition operation of target problem is carried out, and at least one entity in target problem is obtained;According to at least one entity, the triad set corresponding to each entity is obtained from target field knowledge graph;For the attribute value in each triad set, the target problem and attribute value are spliced, and the input sequence corresponding to attribute value is obtained;Attribute extraction operation is carried out to input sequence, and attribute result is obtained;Based on multiple attribute results, the related answer of target problem is determined from target field knowledge graph.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the fields of knowledge graph technology, artificial intelligence technology and financial technology, and in particular to an intelligent question answering method, apparatus, device, medium and program product. Background Technology

[0002] With the explosive growth of information, relying solely on search engines for results is no longer sufficient to meet users' needs for accurate answers. Meanwhile, staff need to efficiently utilize knowledge from relevant fields to support their information searches. However, current knowledge retrieval methods, primarily based on human memory and search engines, suffer from incomplete knowledge and inconvenient access, leading to low efficiency and accuracy in knowledge retrieval.

[0003] Therefore, how to improve the accuracy and efficiency of intelligent question-answering systems that support natural language questioning is a technical problem that needs to be solved in related technologies. Summary of the Invention

[0004] In view of the above problems, this disclosure provides an intelligent question-answering method, apparatus, device, medium and program product.

[0005] According to the first aspect of this disclosure, an intelligent question-answering method is provided, comprising:

[0006] Identify target questions relevant to the target domain;

[0007] Based on the entity location weight parameters, a named entity recognition operation is performed on the above target problem to obtain at least one entity in the above target problem;

[0008] Based on at least one of the above entities, obtain a set of triples corresponding to each entity from the target domain knowledge graph;

[0009] For each of the above triplet sets, the above target problem and the above attribute value are concatenated to obtain the input sequence corresponding to the above attribute value;

[0010] Perform attribute extraction on the above input sequence to obtain the attribute results;

[0011] Based on the above attribute results, the relevant answers to the above target questions are determined from the above target domain knowledge graph.

[0012] According to embodiments of this disclosure, the above-mentioned acquisition of the target problem related to the target domain includes:

[0013] The user question is segmented and part-of-speech tagging is performed to obtain the nouns and verbs in the user question.

[0014] The nouns and verbs in the user questions above are matched with the target entity dictionary to obtain the matching results. The target entity dictionary includes all entities in the target domain knowledge graph.

[0015] If the above matching results indicate that the nouns and verbs in the above user question match the above target entity dictionary, then the above user question is classified as the above target question related to the above target domain.

[0016] According to embodiments of this disclosure, the above-described named entity recognition operation based on entity location weight parameters to obtain at least one entity in the target problem includes:

[0017] The above target problem is embedded to obtain the initial input vector;

[0018] The initial input vector is processed using the aforementioned entity position weight parameters to obtain the final input vector.

[0019] Feature extraction is performed on the final input vector to obtain the first vector sequence;

[0020] The first vector sequence is subjected to sequence feature extraction to obtain the second vector sequence;

[0021] Entity recognition is performed on the second vector sequence to obtain the entity annotation sequence;

[0022] Based on the above entity annotation sequence, at least one of the above entities in the above target problem is identified.

[0023] According to embodiments of this disclosure, the aforementioned entity position weight parameters include entity front segment position weight parameters, entity middle segment position weight parameters, and entity rear segment position weight parameters. The method for calculating the aforementioned entity position weight parameters includes:

[0024] Each question in the dataset related to the above target domain is divided into three parts: the first part, the middle part, and the last part.

[0025] The entity front-end position weight parameter is obtained by summing the number of entities in the first part of each question in the above dataset and summing the number of entities in each question in the above dataset.

[0026] The entity mid-segment position weight parameter is obtained by summing the number of entities in the middle segment of each question in the dataset and summing the number of entities in each question in the dataset.

[0027] The entity post-segment position weight parameter is obtained by summing the number of entities in the latter part of each question in the dataset and summing the number of entities in each question in the dataset.

[0028] According to embodiments of this disclosure, the above-mentioned processing of the initial input vector using the entity position weight parameters to obtain the final input vector includes:

[0029] Based on the first probability range determined by the aforementioned entity front segment position weight parameters, a probability value corresponding to each position in the front segment of the aforementioned target problem is randomly generated within the aforementioned first probability range, wherein the front segment of the aforementioned target problem is determined according to the aforementioned entity front segment position weight parameters.

[0030] Based on the second probability range determined by the above entity mid-segment position weight parameters, a probability value corresponding to each position in the mid-segment of the above target problem is randomly generated within the second probability range, wherein the mid-segment of the above target problem is determined according to the above entity mid-segment position weight parameters;

[0031] Based on the third probability range determined by the aforementioned entity rear segment position weight parameters, a probability value corresponding to each position in the rear segment of the aforementioned target problem is randomly generated within the aforementioned third probability range, wherein the rear segment of the aforementioned target problem is determined according to the aforementioned entity rear segment position weight parameters.

[0032] Based on the probability values ​​corresponding to each position in the front, middle and rear parts of the above target problem, a one-dimensional weight vector corresponding to the above target problem is obtained.

[0033] Based on the aforementioned one-dimensional weight vector and the aforementioned initial input vector, the aforementioned final input vector is obtained.

[0034] According to embodiments of this disclosure, the above-described attribute extraction operation on the input sequence to obtain attribute results includes:

[0035] Feature extraction is performed on the above input sequence to obtain the output vector;

[0036] Select the target vector from the above output vector as the feature vector of the above input sequence;

[0037] Perform a binary classification operation on the above feature vectors to obtain the above attribute results.

[0038] According to embodiments of this disclosure, the determination of relevant answers to the target question from the target domain knowledge graph based on multiple attribute results includes:

[0039] For each attribute result, if the attribute result represents the target problem and the attribute value matches, obtain the triple corresponding to the attribute value from the corresponding triple set;

[0040] Based on multiple of the above triples, the relevant answers to the above target questions are determined from the above target domain knowledge graph.

[0041] A second aspect of this disclosure provides an intelligent question-answering device, comprising: a first acquisition module, an identification module, a second acquisition module, a splicing module, an obtaining module, and a determining module. The first acquisition module is used to acquire a target question related to a target domain. The identification module is used to perform named entity recognition on the target question based on entity position weight parameters to obtain at least one entity in the target question. The second acquisition module is used to obtain a set of triples corresponding to each entity from a target domain knowledge graph based on the at least one entity. The splicing module is used to splice the target question and the attribute values ​​in each set of triples to obtain an input sequence corresponding to the attribute values. The obtaining module is used to perform attribute extraction on the input sequence to obtain attribute results. The determining module is used to determine the relevant answer to the target question from the target domain knowledge graph based on multiple attribute results.

[0042] A third aspect of this disclosure provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors perform the methods described above.

[0043] A fourth aspect of this disclosure also provides a computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the methods described above.

[0044] The fifth aspect of this disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.

[0045] According to the intelligent question-answering method, apparatus, device, medium, and program product provided in this disclosure, based on entity position weight parameters, named entity recognition (NAME) can be performed on the acquired target question to obtain at least one entity in the target question. Based on at least one entity, a set of triples corresponding to each entity can be obtained from the target domain knowledge graph. For the attribute values ​​in each triple set, the target question and attribute values ​​are concatenated to obtain an input sequence corresponding to the attribute values. Attribute extraction can then be performed on the input sequence to obtain attribute results. Finally, based on multiple attribute results, relevant answers to the target question can be determined from the target domain knowledge graph, realizing intelligent question answering for the target domain question. By recognizing entities in the target question, relevant answers to the target question can be returned, thereby improving the efficiency of obtaining relevant answers and the user experience. Furthermore, by introducing entity position weight parameters for NAME recognition of the target question, the accuracy of entity recognition in the target question during the NAME recognition process can be improved. Attached Figure Description

[0046] The foregoing contents, as well as other objects, features, and advantages of this disclosure, will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:

[0047] Figure 1 This diagram illustrates an application scenario of the intelligent question-answering method according to an embodiment of the present disclosure.

[0048] Figure 2 A flowchart illustrating an intelligent question-answering method according to an embodiment of the present disclosure is shown schematically;

[0049] Figure 3 A flowchart illustrating a named entity recognition operation according to an embodiment of the present disclosure is shown schematically.

[0050] Figure 4 A flowchart illustrating the process of obtaining the final input vector according to an embodiment of the present disclosure is shown schematically;

[0051] Figure 5 This illustration schematically shows a structural diagram of a named entity recognition model according to an embodiment of the present disclosure;

[0052] Figure 6 A flowchart illustrating the process of obtaining attribute results according to an embodiment of the present disclosure is shown schematically;

[0053] Figure 7 This illustration schematically shows a structural diagram of an attribute extraction model according to an embodiment of the present disclosure;

[0054] Figure 8 A schematic diagram of an intelligent question-answering system according to an embodiment of the present disclosure is shown.

[0055] Figure 9 Another schematic diagram of an intelligent question-answering system according to an embodiment of the present disclosure is shown;

[0056] Figure 10 A schematic diagram illustrating the structure of an intelligent question-answering device according to embodiments of the present disclosure is shown; and

[0057] Figure 11 A block diagram schematically illustrates an electronic device suitable for implementing an intelligent question-answering method according to an embodiment of the present disclosure. Detailed Implementation

[0058] The embodiments of the present disclosure will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the disclosure. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of the present disclosure for ease of explanation. However, it will be apparent that one or more embodiments may be practiced without these specific details. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts of the present disclosure.

[0059] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.

[0060] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.

[0061] When using expressions such as "at least one of A, B, and C", they should generally be interpreted in accordance with the meaning that is commonly understood by a person skilled in the art (e.g., "a system having at least one of A, B, and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B, and C, etc.).

[0062] In the technical solutions disclosed herein, the collection, storage, use, processing, transmission, provision, disclosure, and application of data (including but not limited to user personal information) comply with the provisions of relevant laws and regulations, necessary confidentiality measures have been taken, and they do not violate public order and good morals.

[0063] In implementing this disclosure, it was discovered that, given the difficulty in meeting users' demands for precise answers, question-answering systems supporting natural language queries have become a key research focus. Question-answering systems possess the characteristics of accurately expressing users' knowledge needs and being user-friendly, thereby further improving the convenience of knowledge acquisition. However, knowledge retrieval methods primarily relying on human memory and search engines suffer from incomplete knowledge and inconvenient access, leading to low efficiency and accuracy in knowledge retrieval. Therefore, improving the accuracy and efficiency of intelligent question-answering systems supporting natural language queries is a technical problem that needs to be solved in related technologies.

[0064] To this end, embodiments of this disclosure provide an intelligent question-answering method, comprising: acquiring a target question related to a target domain; performing named entity recognition on the target question based on entity position weight parameters to obtain at least one entity in the target question; acquiring a set of triples corresponding to each entity from a target domain knowledge graph based on the at least one entity; concatenating the target question and attribute values ​​for the attribute values ​​in each set of triples to obtain an input sequence corresponding to the attribute values; performing attribute extraction on the input sequence to obtain attribute results; and determining relevant answers to the target question from the target domain knowledge graph based on multiple attribute results.

[0065] Figure 1 The diagram illustrates an application scenario of the intelligent question-answering method according to an embodiment of the present disclosure.

[0066] like Figure 1 As shown, application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0067] Users can interact with server 105 via network 104 using at least one of the first terminal device 101, second terminal device 102, and third terminal device 103 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, second terminal device 102, and third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social media platform software, etc. (for example only).

[0068] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers.

[0069] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.

[0070] For example, a target question related to the target domain can be obtained through server 105, and named entity recognition can be performed on the target question based on entity position weight parameters to obtain at least one entity in the target question. Based on at least one entity, a set of triples corresponding to each entity can be obtained from the target domain knowledge graph. For the attribute values ​​in each triple set, the target question and the attribute values ​​are concatenated to obtain the input sequence corresponding to the attribute values. Attribute extraction can then be performed on the input sequence to obtain attribute results. Finally, based on multiple attribute results, the relevant answer to the target question can be determined from the target domain knowledge graph.

[0071] It should be noted that the intelligent question-answering method provided in this embodiment can generally be executed by server 105. Correspondingly, the intelligent question-answering device provided in this embodiment can generally be located in server 105. The intelligent question-answering method provided in this embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Correspondingly, the intelligent question-answering device provided in this embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105.

[0072] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0073] The following will be based on Figure 1 The described scene, through Figures 2-9 The intelligent question-answering method of the disclosed embodiments is described in detail.

[0074] Figure 2A flowchart illustrating an intelligent question-answering method according to an embodiment of the present disclosure is shown.

[0075] like Figure 2 As shown, the method 200 includes operations S210 to S260.

[0076] In operation S210, the target problem related to the target domain is obtained.

[0077] According to embodiments of this disclosure, user questions can be categorized based on the domain they pertain to. When the intelligent question-answering method of this application is applied to a target domain, it is necessary to obtain target questions related to that target domain. For example, when the intelligent question-answering method is applied to the banking sector, it is necessary to obtain user questions related to the banking sector; when the intelligent question-answering method is applied to the medical sector, it is necessary to obtain user questions related to the medical field.

[0078] In operation S220, based on the entity position weight parameter, a named entity recognition operation is performed on the target problem to obtain at least one entity in the target problem.

[0079] According to embodiments of this disclosure, a named entity recognition operation can be performed on a target problem using the attention-based named entity recognition model BERT-BiLSTM (Bi-direction Long Short-Term Memory)-CRF (Conditional Random Fields) to obtain at least one entity in the target problem. To improve the accuracy of the named entity recognition model, the data input part of the BERT (Bidirectional Encoder Representation from Transformers) model is modified; that is, a segmentation weight layer is added to the existing word vector layer, paragraph position encoding layer, and position encoding layer.

[0080] According to embodiments of this disclosure, a segmented weight layer based on entity position weight parameters is added to the BERT model. That is, by setting weight parameters for the probability of an entity appearing in a sentence, the accuracy of the named entity recognition model in entity recognition is improved.

[0081] In operation S230, based on at least one entity, a set of triples corresponding to each entity is obtained from the target domain knowledge graph.

[0082] According to embodiments of this disclosure, the target domain knowledge graph may include triples corresponding to each entity in the target domain. For each entity in the obtained target problem, a search can be performed on each entity in the target domain knowledge graph to obtain all triples containing that entity, i.e., a set of triples corresponding to each entity can be obtained.

[0083] According to embodiments of this disclosure, for example, if the target problem may include entity A and entity B, then entity A can be retrieved in the target domain knowledge graph to obtain all triples containing entity A, i.e., the set of triples containing entity A is obtained; and entity B can be retrieved in the target domain knowledge graph to obtain all triples containing entity B, i.e., the set of triples containing entity B is obtained.

[0084] In operation S240, for each set of triples, the target problem and the attribute value are concatenated to obtain the input sequence corresponding to the attribute value.

[0085] According to embodiments of this disclosure, a triple may include an entity, an attribute, and an attribute value. Each set of triples may include at least one triple, and the entities in each triple within the same set are identical.

[0086] According to embodiments of this disclosure, for each attribute value in a set of triples, the target problem can be concatenated with an attribute value to obtain an input sequence corresponding to that attribute value. The format of the input sequence can be represented as "[CLS]target problem[SEP]attribute value[SEP]", where [CLS] is an abbreviation for classification, which can represent a revelation symbol, and [SEP] is an abbreviation for separator, which can represent an intermediate separator.

[0087] According to embodiments of this disclosure, for example, the target problem may include entity A and entity B. The set of triples for entity A may include two triples, where one triple can be represented as entity A, attribute 1, and attribute 1 value, and the other triple can be represented as entity A, attribute 2, and attribute 2 value. The set of triples for entity B may include two triples, where one triple can be represented as entity B, attribute 3, and attribute 3 value, and the other triple can be represented as entity B, attribute 4, and attribute 4 value. The target problem can be concatenated with attribute 1 value, attribute 2 value, attribute 3 value, and attribute 4 value respectively to obtain the input sequence corresponding to attribute 1 value, the input sequence corresponding to attribute 2 value, the input sequence corresponding to attribute 3 value, and the input sequence corresponding to attribute 4 value.

[0088] In operation S250, attribute extraction is performed on the input sequence to obtain the attribute results.

[0089] According to embodiments of this disclosure, attribute extraction operations can be performed on each input sequence to obtain the attribute result corresponding to the attribute value.

[0090] In operation S260, based on multiple attribute results, relevant answers to the target question are determined from the target domain knowledge graph.

[0091] According to embodiments of this disclosure, based on the obtained multiple attribute results, relevant answers to target questions can be determined from a target domain knowledge graph.

[0092] According to embodiments of this disclosure, named entity recognition (NAME) can be performed on the acquired target question based on entity position weight parameters. This allows for the identification of at least one entity within the target question. Based on this entity, a set of triples corresponding to each entity can be obtained from the target domain knowledge graph. For each triple set, the target question and attribute values ​​are concatenated to obtain an input sequence corresponding to the attribute values. Attribute extraction can then be performed on the input sequence to obtain attribute results. Finally, based on multiple attribute results, relevant answers to the target question can be determined from the target domain knowledge graph, enabling intelligent question answering for the target domain question. By identifying entities within the target question, relevant answers can be returned, thereby improving the efficiency of obtaining relevant answers and enhancing the user experience. Furthermore, the introduction of entity position weight parameters for NAME recognition improves the accuracy of entity identification within the target question during the NAME recognition process.

[0093] According to embodiments of this disclosure, obtaining target questions related to a target domain includes: performing word segmentation and part-of-speech tagging on user questions to obtain nouns and verbs in user questions; matching the nouns and verbs in user questions with a target entity dictionary to obtain matching results, wherein the target entity dictionary includes all entities in the target domain knowledge graph; and classifying user questions into target questions related to the target domain when the matching results indicate that the nouns and verbs in user questions match the target entity dictionary.

[0094] According to embodiments of this disclosure, the jieba (“Jieba Chinese Segmentation”) tool can be used to perform word segmentation and part-of-speech tagging on user questions, filtering and retaining nouns and verbs from the user questions. For example, a user question might be “How to open a bank account”. Word segmentation yields “How / Open / Bank / Account”, and part-of-speech tagging yields “How pron”, “Open v”, “Bank n”, and “Account n”. Filtering and retaining the nouns and verbs, namely “Open”, “Bank”, and “Account”, allows for the construction of a target entity dictionary by deduplicating and saving entities such as people, items, institutions, and system operations in the target domain knowledge graph.

[0095] According to embodiments of this disclosure, matching the nouns and verbs in the user question with a target entity dictionary yields matching results. If the matching results indicate that the nouns and verbs in the user question match the target entity dictionary (i.e., the nouns and verbs in the user question can be found in the target entity dictionary), then the user question can be classified as a target question related to the target domain. If the matching results indicate that the nouns and verbs in the user question do not match the target entity dictionary (i.e., the nouns and verbs in the user question cannot be found in the target entity dictionary), then the user question is a question from another domain.

[0096] According to embodiments of this disclosure, for example, when the intelligent question-answering method is applied to the banking field, the nouns and verbs in the user's question are matched with a banking entity dictionary to obtain target questions related to the banking field, wherein the banking entity dictionary is obtained based on a banking field knowledge graph; when the intelligent question-answering method is applied to the medical field, the nouns and verbs in the user's question are matched with a medical entity dictionary to obtain target questions related to the medical field, wherein the medical entity dictionary is obtained based on a medical field knowledge graph.

[0097] According to embodiments of this disclosure, by performing word segmentation and part-of-speech tagging on user questions, nouns and verbs in the user questions can be obtained. The nouns and verbs in the user questions are then matched with a target entity dictionary to obtain matching results. Based on the matching results, user questions can be divided into target questions in the target domain and questions in other domains, thereby obtaining target questions related to the target domain.

[0098] Figure 3 A flowchart illustrating a named entity recognition operation according to an embodiment of the present disclosure is shown.

[0099] like Figure 3 As shown, the method 300 includes operations S310 to S360.

[0100] In operation S310, an embedding operation is performed on the target problem to obtain the initial input vector.

[0101] According to embodiments of this disclosure, for the modified BERT model, the target question can be sequentially input into the token embedding layer, segment embedding layer, and position embedding layer of the modified BERT model to perform embedding operations on the target question and obtain the initial input vector.

[0102] According to embodiments of this disclosure, the target question may include a question, because performing named entity recognition requires inputting a question into the named entity recognition model each time, that is, the named entity recognition model processes only one question at a time.

[0103] According to embodiments of this disclosure, by adding the [CLS] symbol at the beginning of the sentence of the target question and the [SEP] symbol at the end of the sentence, and then inputting the target question into a word vector layer, a character vector corresponding to each character in the target question can be obtained, i.e., a character embedding vector. The [CLS] and [SEP] symbols can also be converted into corresponding vectors. The word vector layer can be used to incorporate information about the characters in the target question, that is, to convert each character in the target question into a vector form.

[0104] According to embodiments of this disclosure, the paragraph position encoding layer is an existing layer in the BERT model, which can be used to distinguish different sentences. For example, each word in the same sentence can be represented by A, and each word in another sentence can be represented by B. The target question is input into the paragraph encoding layer. If the target question can represent a question, each word in the target question is represented by the same identifier, resulting in a sentence embedding vector.

[0105] According to embodiments of this disclosure, by inputting the target question into a positional encoding layer, the positional representation of each character in the target question, i.e., the positional embedding vector, can be obtained. By labeling each character in the target question with a different vector, each character in the target question can be distinguished. For example, if the target question includes 5 characters, plus the [CLS] symbol at the beginning of the sentence and the [SEP] symbol at the end, the [CLS] symbol can be represented by 1, each character in the target question can be represented by 2 to 6 respectively, and the [SEP] symbol can be represented by 7. The positional encoding layer can be used to distinguish the positional information of characters in a sentence. The same character appearing in different positions in a sentence has different semantic information. For example, "I miss you" and "You think of me" are composed of the same three characters, but the meanings of these two sentences are different. This can be distinguished by labeling each character in each sentence with a different vector.

[0106] According to embodiments of this disclosure, the target question is sequentially input into the word vector layer, the paragraph position encoding layer, and the position encoding layer to obtain an initial input vector, which can represent the sum of the word embedding vector, the sentence embedding vector, and the position embedding vector.

[0107] In operation S320, the initial input vector is processed using entity position weight parameters to obtain the final input vector.

[0108] According to embodiments of this disclosure, the obtained initial input vector is input into the segmented weight layer of the modified BERT model. Different weight parameters can be assigned to the vector corresponding to each word in the initial input vector using entity position weight parameters to obtain the final input vector.

[0109] In operation S330, feature extraction is performed on the final input vector to obtain the first vector sequence.

[0110] According to embodiments of this disclosure, the final input vector can be input into subsequent layers of the modified BERT model, and feature extraction can be performed on the final input vector to obtain a first vector sequence. Since the BERT model has strong semantic representation capabilities, the modified BERT model can be used to obtain the semantic representation of the target problem, resulting in the character vector corresponding to each character in the target problem, i.e., the first vector sequence.

[0111] In operation S340, sequence features are extracted from the first vector sequence to obtain the second vector sequence.

[0112] According to embodiments of this disclosure, a first vector sequence can be input into a BiLSTM model to extract sequence features. Specifically, forward LSTM and backward LSTM are used to process the contextual feature information of each word vector in the first vector sequence, and the output information at the same time is merged to obtain a second vector sequence.

[0113] According to embodiments of this disclosure, the output of the modified BERT model, i.e., the first vector sequence, can be used as the input of the BiLSTM model at each time step. Based on the hidden layer state and calculated values ​​in the BiLSTM model at this time step, it can continuously update itself to finally obtain the second vector sequence. This can solve the problem of gradient vanishing in models such as RNN (Recurrent Neural Network) due to the excessive length of the sequence.

[0114] In operation S350, entity recognition is performed on the second vector sequence to obtain the entity annotation sequence.

[0115] According to embodiments of this disclosure, a second vector sequence can be input into a CRF model, and entity recognition can be performed on the second vector sequence to obtain an entity annotation sequence.

[0116] In operation S360, at least one entity in the target problem is identified based on the entity annotation sequence.

[0117] According to embodiments of this disclosure, in a CRF model, entities can be extracted from an entity annotation sequence based on the entity annotation sequence, thereby identifying at least one entity in the target problem.

[0118] According to embodiments of this disclosure, the modified BERT model can use entity position weight parameters to assign different weight parameters to each character in the target question. Then, by sequentially inputting the target question into the modified BERT model, BiLSTM model, and CRF model, the accuracy of entity recognition in the target question can be measured.

[0119] According to embodiments of this disclosure, the entity position weight parameters include entity front-segment position weight parameters, entity middle-segment position weight parameters, and entity rear-segment position weight parameters. The calculation method of the entity position weight parameters includes: dividing each question in the dataset related to the target domain into three parts: front-segment, middle-segment, and rear-segment; obtaining the entity front-segment position weight parameter based on the sum of the number of entities in the front-segment of each question in the dataset and the sum of the number of entities in each question in the dataset; obtaining the entity middle-segment position weight parameter based on the sum of the number of entities in the middle-segment of each question in the dataset and the sum of the number of entities in each question in the dataset; and obtaining the entity rear-segment position weight parameter based on the sum of the number of entities in the rear-segment of each question in the dataset and the sum of the number of entities in each question in the dataset.

[0120] According to embodiments of this disclosure, each question in a dataset related to the target domain is segmented into three parts: the first part, the middle part, and the last part. The sum of the number of entities in each part of each question in the dataset, as well as the sum of the number of entities in each question in the dataset, can be calculated. Therefore, the calculated entity occurrence probabilities for each of the three parts (first part, middle part, and last part) can be used as entity position weight parameters.

[0121] According to the embodiments of this disclosure, the calculation method of the front segment position weight parameter of the entity can be expressed as the following formula (1).

[0122] W P(before) =Count(E before ) / Count(E all (1)

[0123] Among them, W P(before) E can represent the positional weight parameter of the front part of the entity. before It can represent the number of entities in the first part of each question in the dataset, all Count can represent the number of entities in each question in the dataset, and Count can represent the sum of the number of entities in the corresponding part. before ) can represent the sum of the number of entities in the first part of each question in the dataset, Count(E) all ) can represent the sum of the number of entities for each question in the dataset.

[0124] According to the embodiments of this disclosure, the method for calculating the position weight parameter of the middle segment of the entity can be expressed as the following formula (2).

[0125] W P(among) =Count(E among ) / Count(E all (2)

[0126] Among them, W P(among) E can represent the weight parameter of the middle section of an entity. among It can represent the number of entities in the middle section of each question in the dataset, Count(E among ) can represent the sum of the number of entities in the middle section of each question in the dataset.

[0127] According to the embodiments of this disclosure, the calculation method of the position weight parameter of the latter part of the entity can be expressed as the following formula (3).

[0128] W P(after) =Count(E after ) / Count(E all (3)

[0129] Among them, W P(after) E can represent the positional weight parameter of the latter part of the entity. after It can represent the number of entities in the latter part of each question in the dataset, Count(E after ) can represent the sum of the number of entities in the latter part of each question in the dataset.

[0130] According to an embodiment of this disclosure, for example, a dataset contains 60 questions, of which there are a total of 100 entities. The sum of the number of entities in the first part of each question is 10, the sum of the number of entities in the middle part is 30, and the sum of the number of entities in the last part is 60. Then, the entity position weight parameter for the first part can be expressed as 10 / 100 = 0.1, the entity position weight parameter for the middle part can be expressed as 30 / 100 = 0.3, and the entity position weight parameter for the last part can be expressed as 60 / 100 = 0.6.

[0131] According to the embodiments of this disclosure, the entity front position weight parameter, entity middle position weight parameter and entity rear position weight parameter can be obtained according to the above formulas (1), (2) and (3), which are used in the segment weight layer of the modified BERT model. That is, the accuracy of the named entity recognition model in entity recognition can be improved by setting the weight of the probability of the entity appearing in the sentence.

[0132] Figure 4 A flowchart illustrating the process of obtaining the final input vector according to an embodiment of the present disclosure is shown.

[0133] like Figure 4 As shown, the method 400 includes operations S410 to S450.

[0134] In operation S410, based on the first probability range determined by the front-end position weight parameter of the entity, the probability value corresponding to each position in the front part of the target problem is randomly generated within the first probability range.

[0135] According to an embodiment of this disclosure, an example can be taken with a sentence length of 100 for the target question, a positional weight parameter of 0.1 for the first part of the entity, a positional weight parameter of 0.3 for the middle part of the entity, and a positional weight parameter of 0.6 for the last part of the entity.

[0136] According to embodiments of this disclosure, based on the entity front-segment position weight parameter of 0.1, the first probability range can be determined to be 0-0.1 (the probability range value includes the left side but excludes the right side). Since the front segment of the target question is determined based on the entity front-segment position weight parameter, and given the entity front-segment position weight parameter of 0.1 and the sentence length of the target question of 100, the front segment of the target question can be determined to be the first 10 positions of the target question, i.e., positions with sentence lengths of 1-10.

[0137] According to embodiments of this disclosure, probability values ​​corresponding to each position in the first part of the target problem can be randomly generated within a first probability range of 0-0.1, resulting in 10 probability values ​​within the first probability range of 0-0.1. For example, the probability values ​​corresponding to the first 10 positions can be 0.01, 0.04, 0.02, 0.02, 0.06, 0.03, 0.01, 0.05, 0.04, and 0.08, respectively.

[0138] In operation S420, based on the second probability range determined by the position weight parameter of the middle section of the entity, the probability value corresponding to each position in the middle section of the target problem is randomly generated within the second probability range.

[0139] According to embodiments of this disclosure, based on the entity mid-segment position weight parameter of 0.3, the second probability range can be determined to be 0.1-0.4 (the probability range value includes the left side but excludes the right side). Since the mid-segment portion of the target question is determined based on the entity mid-segment position weight parameter, and given the entity mid-segment position weight parameter of 0.3 and the sentence length of the target question of 100, the mid-segment portion of the target question can be determined to be the 30th position after the first 10 positions of the target question, i.e., positions with sentence lengths of 11-40.

[0140] According to embodiments of this disclosure, probability values ​​corresponding to each position in the middle section of the target problem can be randomly generated within a second probability range of 0.1-0.4, resulting in 30 probability values ​​within the second probability range of 0.1-0.4.

[0141] In operation S430, based on the third probability range determined by the position weight parameters of the latter part of the entity, the probability value corresponding to each position in the latter part of the target problem is randomly generated within the third probability range.

[0142] According to an embodiment of this disclosure, based on the entity's latter part position weight parameter of 0.6, the third probability range can be determined to be 0.4-1 (the probability range value includes the left side but excludes the right side). Since the latter part of the target question is determined based on the entity's latter part position weight parameter, and given the entity's latter part position weight parameter of 0.6 and the sentence length of the target question of 100, the latter part of the target question can be determined to be the last 60 positions of the target question, i.e., positions with a sentence length of 41-100.

[0143] According to embodiments of this disclosure, probability values ​​corresponding to each position in the latter part of the target problem can be randomly generated within a third probability range of 0.4-1, resulting in 60 probability values ​​within the third probability range of 0.4-1.

[0144] In operation S440, based on the probability value corresponding to each position in the front, middle and rear parts of the target problem, a one-dimensional weight vector corresponding to the target problem is obtained.

[0145] According to embodiments of this disclosure, by concatenating the probability values ​​corresponding to each position in the front, middle, and rear parts of the target problem, a one-dimensional weight vector corresponding to the target problem can be obtained.

[0146] According to an embodiment of this disclosure, for example, by concatenating the 10 probability values ​​obtained based on the above operation S410 that are in the first probability range of 0-0.1, the 30 probability values ​​obtained based on the above operation S420 that are in the second probability range of 0.1-0.4, and the 60 probability values ​​obtained based on the above operation S430 that are in the third probability range of 0.4-1, a one-dimensional weight vector corresponding to the target problem can be generated, wherein the number of values ​​in the one-dimensional weight vector is 100.

[0147] According to an embodiment of this disclosure, the calculation method of the one-dimensional weight vector can be expressed as the following formula (4).

[0148]

[0149]

[0150] Among them, R (max_len,1) It can represent a one-dimensional weight vector, max_len can represent the sentence length of the target question, 1 can represent the vector dimension, Random can represent random assignment within the probability range, Cat can represent concatenating randomly generated probability values ​​to generate a one-dimensional weight vector, Random(W P(before) (×max_len) can be used to perform the above operation S410, Random(W P(among) (×max_len) can be used to perform the above operation S420, Random(W P(after) ×max_len) can be used to perform the above operation S430.

[0151] In operation S450, the final input vector is obtained based on the one-dimensional weight vector and the initial input vector.

[0152] According to embodiments of this disclosure, the final input vector can be obtained by multiplying the one-dimensional weight vector with the initial input vector.

[0153] According to an embodiment of this disclosure, the final input vector can be calculated as shown in the following formula (5).

[0154] E word =R (max_len,1) (E token +E seg +E pos (5)

[0155] Among them, E word It can represent the final input vector, E token It can represent a word embedding vector, E seg It can represent a sentence embedding vector, E pos It can represent a position embedding vector, Etoken +E seg +E pos It can represent the initial input vector.

[0156] According to embodiments of this disclosure, the final input vector can be calculated using entity position weight parameters and formulas (4) and (5) above.

[0157] Figure 5 A schematic diagram illustrating the structure of a named entity recognition model according to an embodiment of the present disclosure is shown.

[0158] like Figure 5 As shown, the named entity recognition model 500 of this embodiment may include a modified BERT module 520, a BiLSTM module 530, and a CRF module 540. The modified BERT module 520 may include a word vector layer 521, a paragraph position encoding layer 522, a position encoding layer 523, and a segmentation weight layer 524.

[0159] According to an embodiment of this disclosure, the target problem 510 can be input into the modified BERT module 520, and the above-described operations S310 to S340 can be performed to obtain a first vector sequence. Specifically, the word vector layer 521, paragraph position encoding layer 522, and position encoding layer 523 in the modified BERT module 520 can be used to perform the above-described operation S310 to obtain an initial input vector, which is the sum of the word embedding vector 521, the sentence embedding vector 522, and the position embedding vector 523. The segmentation weight layer 524 in the modified BERT module 520 can be used to perform the above-described operation S320 to obtain a final input vector 525. The final input vector 525 can be input into the subsequent layers of the modified BERT model to obtain the first vector sequence.

[0160] According to an embodiment of this disclosure, the first vector sequence output by the modified BERT model can be processed by the BiLSTM module 530 to perform the above-described operation S340 to obtain a second vector sequence; the second vector sequence can be processed by the CRF module 540 to perform the above-described operations S350 and S360 to obtain at least one entity 550.

[0161] According to an embodiment of this disclosure, taking the target question 510 "How to open a bank account" as an example, add the [CLS] symbol at the beginning of the sentence "How to open a bank account" and the [SEP] symbol at the end of the sentence. Then input the target question into the word vector layer 521 of the modified BERT module 520, and you can obtain the following: Figure 5The character embedding vector 521 shown is used to convert each character into a corresponding vector form; the sentence embedding vector 522 can be obtained from the input paragraph position encoding layer 522, which means that each character in the target question is represented by the same identifier EA; the position embedding vector 523 can be obtained from the input position encoding layer 523, which means that each character in the target question is represented by a different vector. For example, the position of each character in the target question can be represented by 10 different vectors E1 to E10.

[0162] According to an embodiment of this disclosure, taking an entity front section position weight parameter of 0.1, an entity middle section position weight parameter of 0.3, and an entity rear section position weight parameter of 0.6 as an example, if the sentence length of the target question is 10, then according to the above operations S410 to S430, the probability value corresponding to each position in the front section of the target question can be randomly generated between 0 and 0.1. The front section of the target question can represent the first position, i.e., W... P(before) The position is indicated by W; the probability value corresponding to each position in the middle part of the target problem can be randomly generated between 0.1 and 0.4, where the middle part of the target problem can represent the second to fourth positions. P(among) The position is represented by W; the probability value corresponding to each position in the latter part of the target problem can be randomly generated between 0.4 and 1, where the latter part of the target problem can represent positions five to ten, i.e., W. P(after) The indicated location.

[0163] According to an embodiment of this disclosure, the above operation S440 can be performed based on the probability value corresponding to each character in the target problem to obtain a one-dimensional weight vector 524; the above operation S450 can be performed on the one-dimensional weight vector 524 to obtain the final input vector 525, wherein T1 to T10 can be used to represent the vector corresponding to each character in the final input vector.

[0164] According to embodiments of this disclosure, the training process of the named entity recognition model 500 of this disclosure uses a dataset related to the target domain as training samples. To accelerate training and improve the accuracy of the named entity recognition model, the data input part of the BERT model is modified by adding a segmentation weight layer to the existing word vector layer, position encoding layer, and paragraph position encoding layer. For this segmentation weight layer, entity position weight parameters can be added to the training parameters of the BERT model based on the labeled data text and language habits. That is, by setting weights for the probability of an entity appearing in a sentence, the accuracy of the named entity recognition model in entity recognition can be improved.

[0165] According to embodiments of this disclosure, the positional distribution of entities within sentences varies across different domains. When studying a specific domain, entities exhibit high similarity, often appearing within the same segment of a sentence. Therefore, the probability distribution of entity appearance in the beginning, middle, and end segments of a sentence can be statistically determined based on a dataset related to the target domain. This distribution, along with the position matrix PE, can be registered as a fixed value, remaining unchanged during model training. The input word vectors can then be adjusted based on this probability distribution, allowing the model to focus more on high-probability segments, thereby accelerating convergence and improving model accuracy.

[0166] According to embodiments of this disclosure, before training, each question in the dataset related to the target domain can be divided into three parts: a front segment, a middle segment, and a rear segment. The entity front segment position weight parameter is obtained by summing the number of entities in the front segment of each question in the dataset and summing the number of entities in all questions in the dataset. The entity middle segment position weight parameter is obtained by summing the number of entities in the middle segment of each question in the dataset and summing the number of entities in all questions in the dataset. The entity rear segment position weight parameter is obtained by summing the number of entities in the rear segment of each question in the dataset and summing the number of entities in all questions in the dataset. The entity position weight parameters may include entity front segment position weight parameters, entity middle segment position weight parameters, and entity rear segment position weight parameters.

[0167] According to embodiments of this disclosure, during training, for each question in the target domain-related dataset, each question in the target domain-related dataset is padded to ensure that each question in the target domain-related dataset has the same length. Using the entity position weight parameters calculated from the dataset, the named entity recognition model is trained using the target domain-related dataset as training samples. The CRF model can predict the corresponding state sequence based on the input, considering both the current state features of the input and the transition features of each output category. It primarily learns the constraint relationships between labels to improve the accuracy of label prediction and performs global optimal extraction. The BiLSTM-CRF model learns contextual information through BiLSTM layers, uses CRF layers to learn the correlation between labels, obtains annotation information, calculates scores, performs backpropagation, and finally obtains the optimal annotation information.

[0168] Figure 6 A flowchart illustrating the process of obtaining attribute results according to an embodiment of the present disclosure is shown schematically.

[0169] like Figure 6 As shown, the method 600 includes operations S610 to S630.

[0170] In operation S610, feature extraction is performed on the input sequence to obtain the output vector.

[0171] According to embodiments of this disclosure, for each input sequence, the input sequence can be input into the BERT model, and feature extraction can be performed on the input sequence to obtain an output vector. The format of the input sequence can be represented as "[CLS]Target Question[SEP]Attribute Value[SEP]". The data input part of the BERT model can include a word vector layer, a paragraph position encoding layer, and a position encoding layer.

[0172] In operation S620, the target vector is selected from the output vector as the feature vector of the input sequence.

[0173] According to embodiments of this disclosure, the target vector can represent the vector in the output vector corresponding to the [CLS] position in the input sequence. The vector at the [CLS] position in the output vector is used as the feature vector of the input sequence, where the feature vector can be represented as a feature matrix.

[0174] In operation S630, a binary classification operation is performed on the feature vector to obtain the attribute results.

[0175] According to embodiments of this disclosure, the feature vector is sequentially passed through a fully connected layer and a softmax (normalized exponential function) layer to perform a binary classification operation. This yields the probability values ​​that the target problem matches the attribute value and the probability values ​​that the target problem does not match the attribute value. By comparing these two probability values, the attribute result is obtained. The sum of these two probability values ​​is 1. The fully connected layer reduces the dimensionality of the feature vector to a two-dimensional feature vector. When the probability value that the target problem matches the attribute value is greater than the probability value that the target problem does not match the attribute value, the attribute result can be labeled with "1", indicating that the target problem matches the attribute value. When the probability value that the target problem matches the attribute value is less than the probability value that the target problem does not match the attribute value, the attribute result can be labeled with "0", indicating that the target problem does not match the attribute value.

[0176] For example, if the probability that the target problem matches the attribute value is 0.8 and the probability that the target problem does not match the attribute value is 0.2, then the probability that the target problem matches the attribute value is greater than the probability that the target problem does not match the attribute value.

[0177] According to embodiments of this disclosure, by performing feature extraction and binary classification on the input sequence, it is possible to determine whether the target problem matches the attribute values ​​in the input sequence.

[0178] Figure 7 The schematic diagram illustrates the structure of an attribute extraction model according to an embodiment of the present disclosure.

[0179] like Figure 7 As shown, the attribute extraction model 700 can be used to perform the above operations S610 to S630 to obtain the attribute results.

[0180] According to an embodiment of this disclosure, the input sequence 710 can be processed by the BERT model 720 to perform the above-described operations S610 and S620 to obtain a feature vector; the above-described operation S630 can be performed by the fully connected layer 730 and softmax 740 to obtain an attribute result 750.

[0181] According to an embodiment of this disclosure, for the input sequence 710, the format of the input sequence can be represented as "[CLS] target problem [SEP] attribute value [SEP]", then Tok_1, Tok_2...Tok_M can represent each character in the target problem, and Tok_M+1...Tok_n can represent each character or number in the attribute value, where M is an integer greater than or equal to 1, M can represent the length of the target problem, n is an integer greater than or equal to M+1, and n-(M+1) can represent the number of characters or numbers included in the attribute value.

[0182] According to embodiments of this disclosure, determining relevant answers to a target question from a target domain knowledge graph based on multiple attribute results includes: for each attribute result, if the attribute result represents a target question that matches the attribute value, obtaining a triple corresponding to the attribute value from the corresponding triple set; and determining relevant answers to the target question from the target domain knowledge graph based on multiple triples.

[0183] According to embodiments of this disclosure, for simple attribute query problems in the target domain, the triples corresponding to the attribute values ​​obtained from the corresponding triple set when the attribute results represent that the target problem matches the attribute values ​​can be directly used as the relevant answers to the target problem.

[0184] According to embodiments of this disclosure, for non-attribute query questions in the target domain, for each attribute result, if the attribute result represents the target question and the attribute value are consistent, a triple corresponding to the attribute value can be obtained from the corresponding triple set, and the triple can be used to represent the user's query intent. Thus, the triple can be substituted into the target domain knowledge graph for retrieval to obtain the relevant answer to the target question.

[0185] According to embodiments of this disclosure, the relevant answers to the target question are converted into answers in natural language form and returned to the user.

[0186] According to embodiments of this disclosure, for each attribute result that matches the attribute value of the target question, the relevant answer to the target question can be determined from the target domain knowledge graph, and the relevant answer can be returned to the user.

[0187] Figure 8 A schematic diagram of the modules of an intelligent question-answering system according to an embodiment of the present disclosure is shown.

[0188] like Figure 8 As shown, the intelligent question-answering system 800 may include three modules: a question understanding module 820, a question solving module 830, and an answer generation module 840.

[0189] According to embodiments of this disclosure, the role of problem understanding is to parse the user-inputted question and extract the semantic information necessary to solve the question. Knowledge graph-based intelligent question answering operates on top of a knowledge graph, and problem solving ultimately manifests as a query on the knowledge graph. Therefore, the key to problem understanding is mapping the user's question to entities on the knowledge graph and expressing the user's query intent.

[0190] According to an embodiment of this disclosure, a user question 810 is input into a question understanding module 820. The question classification submodule 821 in the question understanding module 820 can perform the above-described operation S210 to classify the user question and obtain a target question related to the target domain. The target question is then input into a named entity recognition model 822 in the question understanding module 820. The named entity recognition model 822 can perform the above-described operation S220 to obtain at least one entity in the target question.

[0191] According to embodiments of this disclosure, the problem-solving module 830 can convert the results of the problem understanding module 820 into a knowledge graph query. At least one entity can be input into the problem-solving module 830, and the knowledge graph query submodule 831 within the problem-solving module 830 can perform the aforementioned operation S230 to obtain a set of triples corresponding to each entity. Then, for the attribute values ​​in each set of triples, the values ​​are input into the attribute extraction module 832, which can perform the aforementioned operations S240 and S250 to obtain attribute results.

[0192] According to embodiments of this disclosure, the answer generation module 840 can convert the query results into answers in natural language and return them to the user. The obtained attribute results can be input into the answer generation module 840, which can then perform the aforementioned operation S260 to obtain the relevant answers to the target question, i.e., the query results.

[0193] Figure 9 Another schematic diagram of an intelligent question-answering system according to an embodiment of the present disclosure is shown.

[0194] like Figure 9As shown, in the intelligent question-answering system 900, user questions 910 can be classified, i.e., by performing the above operation S210, target questions 920 can be obtained; named entity recognition can be performed on target questions 920, i.e., by performing the above operation S220, at least one entity 930 can be obtained; for at least one entity, entity query can be performed according to the target domain knowledge graph 940, i.e., by performing the above operation S230, a set of triples 950 corresponding to each entity can be obtained; for each attribute value in the set of triples, the above operation S240 can be performed to obtain an input sequence 960; attribute extraction can be performed on the input sequence 960, i.e., by performing the above operation S250, attribute results 970 can be obtained; and based on the attribute results 970, the above operation S260 can be performed to obtain the relevant answers 980 of target questions 920.

[0195] Based on the above-described intelligent question-answering method, this disclosure also provides an intelligent question-answering device. The following will combine... Figure 10 The device is described in detail.

[0196] Figure 10 A schematic block diagram of a smart question-answering device according to an embodiment of the present disclosure is shown.

[0197] like Figure 10 As shown, the intelligent question-answering device 1000 of this embodiment includes a first acquisition module 1010, an identification module 1020, a second acquisition module 1030, a splicing module 1040, an acquisition module 1050, and a determination module 1060.

[0198] The first acquisition module 1010 is used to acquire target questions related to the target domain. In one embodiment, the first acquisition module 1010 can be used to perform the operation S210 described above, which will not be repeated here.

[0199] The identification module 1020 is used to perform named entity recognition on the target problem based on entity position weight parameters to obtain at least one entity in the target problem. In one embodiment, the identification module 1020 can be used to perform the operation S220 described above, which will not be repeated here.

[0200] The second acquisition module 1030 is used to acquire a set of triples corresponding to each entity from the target domain knowledge graph based on at least one entity. In one embodiment, the second acquisition module 1030 can be used to perform the operation S230 described above, which will not be repeated here.

[0201] The splicing module 1040 is used to splice the target problem and the attribute values ​​in each set of triples to obtain the input sequence corresponding to the attribute values. In one embodiment, the splicing module 1040 can be used to perform the operation S240 described above, which will not be repeated here.

[0202] The obtaining module 1050 is used to perform attribute extraction operations on the input sequence to obtain attribute results. In one embodiment, the obtaining module 1050 can be used to perform the operation S250 described above, which will not be repeated here.

[0203] The determination module 1060 is used to determine the relevant answers to the target question from the target domain knowledge graph based on multiple attribute results. In one embodiment, the determination module 1060 can be used to perform the operation S260 described above, which will not be repeated here.

[0204] According to embodiments of this disclosure, the first acquisition module 1010 includes a first acquisition unit, a matching unit, and a division unit.

[0205] The acquisition unit is used to perform word segmentation and part-of-speech tagging on the user question to obtain the nouns and verbs in the user question.

[0206] The matching unit is used to match the nouns and verbs in the user's question with the target entity dictionary to obtain the matching results. The target entity dictionary includes all entities in the target domain knowledge graph.

[0207] The segmentation unit is used to segment a user question into target questions that are relevant to the target domain, provided that the nouns and verbs in the matching results characterize the user question and match the target entity dictionary.

[0208] According to embodiments of this disclosure, the identification module 1020 includes a second obtaining unit, a third obtaining unit, a first extraction unit, a second extraction unit, an identification unit, and a first determining unit.

[0209] The second acquisition unit is used to perform embedding operations on the target problem to obtain the initial input vector.

[0210] The third acquisition unit is used to process the initial input vector using entity position weight parameters to obtain the final input vector.

[0211] The first extraction unit is used to extract features from the final input vector to obtain the first vector sequence.

[0212] The second extraction unit is used to extract sequence features from the first vector sequence to obtain the second vector sequence.

[0213] The recognition unit is used to perform entity recognition on the second vector sequence to obtain the entity annotation sequence.

[0214] The first determining unit is used to determine at least one entity in the target problem based on the entity annotation sequence.

[0215] According to embodiments of this disclosure, the third acquisition unit includes a segmentation subunit, a first acquisition subunit, a second acquisition subunit, and a third acquisition subunit.

[0216] The segmentation unit is used to divide each question in the dataset related to the target domain into three parts: the first part, the middle part, and the last part.

[0217] The first obtaining sub-unit is used to obtain the entity front-end position weight parameter based on the sum of the number of entities in the front part of each question in the dataset and the sum of the number of entities in each question in the dataset.

[0218] The second sub-unit is used to obtain the entity mid-segment position weight parameter based on the sum of the number of entities in the middle segment of each question in the dataset and the sum of the number of entities in each question in the dataset.

[0219] The third sub-unit is used to obtain the entity recursion position weight parameter based on the sum of the number of entities in the latter part of each question in the dataset and the sum of the number of entities in each question in the dataset.

[0220] According to embodiments of this disclosure, the third obtaining unit further includes a first generating subunit, a second generating subunit, a third generating subunit, a fourth obtaining subunit, and a fifth obtaining subunit.

[0221] The first generation subunit is used to randomly generate the probability value corresponding to each position in the front part of the target problem within the first probability range determined by the entity front part position weight parameter. The front part of the target problem is determined according to the entity front part position weight parameter.

[0222] The second generation subunit is used to randomly generate the probability value corresponding to each position in the middle part of the target problem within the second probability range determined by the middle section position weight parameter of the entity. The middle part of the target problem is determined according to the middle section position weight parameter of the entity.

[0223] The third generation subunit is used to randomly generate the probability value corresponding to each position in the latter part of the target problem within the third probability range determined by the entity's latter segment position weight parameter. The latter part of the target problem is determined based on the entity's latter segment position weight parameter.

[0224] The fourth sub-unit is used to obtain a one-dimensional weight vector corresponding to the target problem based on the probability value corresponding to each position in the front, middle and rear parts of the target problem.

[0225] The fifth sub-unit is used to obtain the final input vector based on the one-dimensional weight vector and the initial input vector.

[0226] According to embodiments of this disclosure, the obtaining module 1050 includes a third extraction unit, a selection unit, and a fourth obtaining unit.

[0227] The third extraction unit is used to extract features from the input sequence to obtain the output vector.

[0228] The selection unit is used to select the target vector from the output vector as the feature vector of the input sequence.

[0229] The fourth acquisition unit is used to perform binary classification on the feature vector to obtain the attribute results.

[0230] According to embodiments of this disclosure, the determining module 1060 includes a fifth obtaining unit and a second determining unit.

[0231] The fifth obtaining unit is used to obtain the triplet corresponding to the attribute value from the corresponding triplet set for each attribute result, provided that the attribute result represents the target problem and the attribute value are consistent.

[0232] The second determining unit is used to determine the relevant answers to the target question from the target domain knowledge graph based on multiple triples.

[0233] According to embodiments of this disclosure, any multiple modules among the first acquisition module 1010, identification module 1020, second acquisition module 1030, splicing module 1040, obtaining module 1050, and determining module 1060 can be combined into one module, or any one of these modules can be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules can be combined with at least some of the functions of other modules and implemented in one module. According to embodiments of this disclosure, at least one of the first acquisition module 1010, identification module 1020, second acquisition module 1030, splicing module 1040, obtaining module 1050, and determining module 1060 can be at least partially implemented as hardware circuitry, such as a field-programmable gate array (FPGA), a programmable logic array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-package, an application-specific integrated circuit (ASIC), or implemented in hardware or firmware by any other reasonable means of integrating or packaging the circuitry, or implemented in software, hardware, or firmware, or in any suitable combination of any of these three implementation methods. Alternatively, at least one of the first acquisition module 1010, the identification module 1020, the second acquisition module 1030, the splicing module 1040, the acquisition module 1050, and the determination module 1060 can be at least partially implemented as a computer program module, which can perform corresponding functions when the computer program module is run.

[0234] Figure 11A block diagram schematically illustrates an electronic device suitable for implementing an intelligent question-answering method according to an embodiment of the present disclosure.

[0235] like Figure 11 As shown, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1102 or a program loaded from a storage portion 1108 into a random access memory (RAM) 1103. The processor 1101 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 1101 may also include onboard memory for caching purposes. The processor 1101 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of the present disclosure.

[0236] RAM 1103 stores various programs and data required for the operation of electronic device 1100. Processor 1101, ROM 1102, and RAM 1103 are interconnected via bus 1104. Processor 1101 performs various operations of the method flow according to embodiments of the present disclosure by executing programs in ROM 1102 and / or RAM 1103. It should be noted that the programs may also be stored in one or more memories other than ROM 1102 and RAM 1103. Processor 1101 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in said one or more memories.

[0237] According to embodiments of this disclosure, the electronic device 1100 may further include an input / output (I / O) interface 1105, which is also connected to a bus 1104. The electronic device 1100 may also include one or more of the following components connected to the input / output (I / O) interface 1105: an input section 1106 including a keyboard, mouse, etc.; an output section 1107 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1108 including a hard disk, etc.; and a communication section 1109 including a network interface card such as a LAN card, modem, etc. The communication section 1109 performs communication processing via a network such as the Internet. A drive 1110 is also connected to the input / output (I / O) interface 1105 as needed. A removable medium 1111, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 1110 as needed so that computer programs read from it can be installed into the storage section 1108 as needed.

[0238] This disclosure also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs that, when executed, implement the method according to the embodiments of this disclosure.

[0239] According to embodiments of this disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as including, but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this disclosure, the computer-readable storage medium may include ROM 1102 and / or RAM 1103 and / or one or more memories other than ROM 1102 and RAM 1103 described above.

[0240] Embodiments of this disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to cause the computer system to implement the item recommendation method provided in the embodiments of this disclosure.

[0241] When the computer program is executed by the processor 1101, it performs the functions defined in the system / apparatus of this disclosure embodiments. According to embodiments of this disclosure, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0242] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and may be downloaded and installed via the communication section 1109, and / or installed from the removable medium 1111. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.

[0243] In such an embodiment, the computer program can be downloaded and installed from a network via communication section 1109, and / or installed from removable medium 1111. When the computer program is executed by processor 1101, it performs the functions defined in the system of this disclosure embodiment. According to embodiments of this disclosure, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0244] According to embodiments of this disclosure, program code for executing the computer programs provided in embodiments of this disclosure can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages ​​include, but are not limited to, languages ​​such as Java, C++, Python, "C", or similar programming languages. The program code can execute entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0245] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0246] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this disclosure can be combined or combined in various ways, even if such combinations or combinations are not explicitly described in this disclosure. In particular, the features described in the various embodiments and / or claims of this disclosure can be combined or combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations and / or combinations fall within the scope of this disclosure.

[0247] The embodiments of this disclosure have been described above. However, these embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. The scope of this disclosure is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of this disclosure, and all such substitutions and modifications should fall within the scope of this disclosure.

Claims

1. An intelligent question-answering method, comprising: Identify target questions relevant to the target domain; Based on entity position weight parameters, a named entity recognition operation is performed on the target problem to obtain at least one entity in the target problem, including: embedding the target problem to obtain an initial input vector; processing the initial input vector using the entity position weight parameters to obtain a final input vector; extracting features from the final input vector to obtain a first vector sequence; extracting sequence features from the first vector sequence to obtain a second vector sequence; performing entity recognition on the second vector sequence to obtain an entity annotation sequence; and determining the at least one entity in the target problem based on the entity annotation sequence. The entity position weight parameters include entity front-segment position weight parameters, entity middle-segment position weight parameters, and entity rear-segment position weight parameters. The calculation method for these entity position weight parameters includes: dividing each question in the dataset related to the target domain into three parts: front, middle, and rear; obtaining the entity front-segment position weight parameter based on the sum of the number of entities in the front segment of each question in the dataset and the sum of the number of entities in each question in the dataset; obtaining the entity middle-segment position weight parameter based on the sum of the number of entities in the middle segment of each question in the dataset and the sum of the number of entities in each question in the dataset; and obtaining the entity rear-segment position weight parameter based on the sum of the number of entities in the rear segment of each question in the dataset and the sum of the number of entities in each question in the dataset. Based on the at least one entity, obtain a set of triples corresponding to each entity from the target domain knowledge graph; For each attribute value in the set of triples, the target problem and the attribute value are concatenated to obtain the input sequence corresponding to the attribute value; Perform attribute extraction on the input sequence to obtain attribute results; Based on the results of multiple attributes, the relevant answers to the target question are determined from the target domain knowledge graph.

2. The method according to claim 1, wherein, The acquisition of the target problem related to the target domain includes: The user question is segmented and part-of-speech tagging is performed to obtain the nouns and verbs in the user question; The nouns and verbs in the user's question are matched against a target entity dictionary to obtain matching results. The target entity dictionary includes all entities in the target domain knowledge graph. If the matching results indicate that the nouns and verbs in the user question match the target entity dictionary, the user question is classified as the target question related to the target domain.

3. The method according to claim 1, wherein, The step of processing the initial input vector using the entity position weight parameters to obtain the final input vector includes: Based on the first probability range determined by the entity front segment position weight parameter, a probability value corresponding to each position in the front segment of the target problem is randomly generated within the first probability range, wherein the front segment of the target problem is determined according to the entity front segment position weight parameter; Based on the second probability range determined by the entity mid-segment position weight parameter, a probability value corresponding to each position in the mid-segment of the target problem is randomly generated within the second probability range, wherein the mid-segment of the target problem is determined according to the entity mid-segment position weight parameter; Based on the third probability range determined by the entity's rear segment position weight parameter, a probability value corresponding to each position in the rear segment of the target problem is randomly generated within the third probability range, wherein the rear segment of the target problem is determined according to the entity's rear segment position weight parameter; Based on the probability value corresponding to each position in the front, middle and rear parts of the target problem, a one-dimensional weight vector corresponding to the target problem is obtained. The final input vector is obtained based on the one-dimensional weight vector and the initial input vector.

4. The method according to claim 1, wherein, The step of performing attribute extraction on the input sequence to obtain attribute results includes: Feature extraction is performed on the input sequence to obtain an output vector; Select a target vector from the output vector as the feature vector of the input sequence; Perform a binary classification operation on the feature vector to obtain the attribute result.

5. The method according to claim 1, wherein, The step of determining the relevant answer to the target question from the target domain knowledge graph based on multiple attribute results includes: For each attribute result, if the attribute result represents a match between the target problem and the attribute value, obtain the triple corresponding to the attribute value from the corresponding triple set; and Based on multiple triples, the relevant answers to the target question are determined from the target domain knowledge graph.

6. An intelligent question-answering device, comprising: The first acquisition module is used to acquire target questions related to the target domain; The identification module is used to perform named entity recognition operation on the target problem based on entity position weight parameters to obtain at least one entity in the target problem; The identification module includes a second obtaining unit, a third obtaining unit, a first extraction unit, a second extraction unit, an identification unit, and a first determining unit. The second obtaining unit is used to perform an embedding operation on the target problem to obtain an initial input vector; The third obtaining unit is used to process the initial input vector using the entity position weight parameters to obtain the final input vector; The first extraction unit is used to extract features from the final input vector to obtain a first vector sequence; The second extraction unit is used to extract sequence features from the first vector sequence to obtain a second vector sequence. The recognition unit is used to perform entity recognition on the second vector sequence to obtain an entity annotation sequence; The first determining unit is configured to determine at least one entity in the target problem based on the entity annotation sequence; The entity position weight parameters include the entity front segment position weight parameters, the entity middle segment position weight parameters, and the entity rear segment position weight parameters. The third obtaining unit includes a segmentation sub-unit, a first obtaining sub-unit, a second obtaining sub-unit, and a third obtaining sub-unit. The segmentation subunit is used to divide each question in the dataset related to the target domain into three parts: the first part, the middle part, and the last part. The first obtaining subunit is used to obtain the entity front segment position weight parameter based on the sum of the number of entities in the front segment of each question in the dataset and the sum of the number of entities in each question in the dataset. The second obtaining subunit is used to obtain the entity mid-segment position weight parameter based on the sum of the number of entities in the middle segment of each question in the dataset and the sum of the number of entities in each question in the dataset. The third obtaining subunit is used to obtain the entity post-segment position weight parameter based on the sum of the number of entities in the latter part of each question in the dataset and the sum of the number of entities in each question in the dataset. The second acquisition module is used to acquire a set of triples corresponding to each entity from the target domain knowledge graph based on the at least one entity. The concatenation module is used to concatenate the target problem and the attribute value for each set of triples to obtain the input sequence corresponding to the attribute value; The acquisition module is used to perform attribute extraction operations on the input sequence to obtain attribute results; A determination module is used to determine the relevant answers to the target question from the target domain knowledge graph based on multiple attribute results.

7. An electronic device, comprising: One or more processors; Storage device for storing one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors execute the intelligent question-answering method according to any one of claims 1 to 5.

8. A computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the intelligent question-answering method according to any one of claims 1 to 5.

9. A computer program product comprising a computer program that, when executed by a processor, implements the intelligent question-answering method according to any one of claims 1 to 5.