Method for determining reply content, storage medium and electronic device
By combining semantic matching and word segmentation filtering with keyword matching, the intelligent question-answering system improves the accuracy of responses in private domain question-answering, solves the problem of inaccurate responses in existing technologies, and provides high-quality response services.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HAIER YOUJIA INTELLIGENT TECH (BEIJING) CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240752A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of smart home technology, and more specifically, to a method for determining the content of a response, a storage medium, and an electronic device. Background Technology
[0002] Against the backdrop of rapid development in artificial intelligence technology, intelligent question-answering systems have become an important bridge connecting users and information, especially in scenarios such as smart homes, e-commerce, and customer service, where their value is increasingly prominent. However, the performance and user experience of intelligent question-answering systems directly depend on the accuracy of their answers.
[0003] This challenge is particularly pronounced in private domain Q&A, which refers to Q&A scenarios within specific fields (such as smart homes, smart furniture, etc.) or industries. This is because private domain Q&A often involves a large number of technical terms, specific entities, and complex contexts. This requires the Q&A system to not only have strong text understanding capabilities, but also to accurately identify and match user questions with questions stored in the database.
[0004] Currently, intelligent question-answering systems largely rely on rule-based matching and word2vec (Word to Vector) methods. However, these methods have significant limitations when handling private domain question answering. Rule-based matching struggles to cover all possible variations in questions, while word2vec, although capable of capturing word similarities, struggles to handle deep semantic relationships and understand entities within the context. Ultimately, this results in low accuracy of responses to user queries in question-answering scenarios.
[0005] Therefore, there is an urgent need for a new method to determine the content of responses to overcome the shortcomings of conventional techniques and improve the accuracy of responses. Summary of the Invention
[0006] This application provides a method and apparatus, storage medium, and electronic device for determining the content of a response, in order to at least solve the problem of low accuracy of the response content.
[0007] According to one aspect of the embodiments of this application, a method for determining the content of a response is provided, comprising: obtaining a query statement sent by a user terminal; performing semantic matching on a question set based on the query statement to match multiple similar questions corresponding to the query statement; wherein the question set includes multiple standardized questions; if the multiple similar questions meet the word segmentation filtering conditions, performing word segmentation filtering on the multiple similar questions, and determining a target similar question matching the query statement from the multiple similar questions based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering; and determining the response content of the query statement based on the target similar question and the keywords in the query statement.
[0008] In an exemplary embodiment, the step of performing semantic matching on a set of questions based on the query statement to match multiple similar questions corresponding to the query statement includes: performing vectorization representation processing on the query statement to obtain a first vector representation of the query statement; performing vectorization representation processing on the multiple standardized questions to obtain second vector representations corresponding to the multiple standardized questions respectively; calculating the similarity between the first vector representation and each of the second vector representations respectively; and determining the standardized questions corresponding to the similarity that meet the preset similarity conditions as multiple similar questions matching the query statement.
[0009] In an exemplary embodiment, the step of performing word segmentation filtering on the multiple similar questions when the multiple similar questions meet the word segmentation filtering conditions, and determining the target similar question matching the query statement from the multiple similar questions based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering, includes: determining the similarity corresponding to the multiple similar questions; when the similarity of the multiple similar questions is within a preset similarity range, performing word matching between the candidate keywords corresponding to each of the multiple similar questions and the keywords in the query statement, and determining the number of identical words between the query statement and the multiple similar questions; and determining the target similar question matching the query statement from the multiple similar questions based on the similarity corresponding to the multiple similar questions and the number of identical words.
[0010] In an exemplary embodiment, the similarity range includes a first similarity range determined by a first similarity threshold and a second similarity threshold; the first similarity threshold is greater than the second similarity threshold; determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: if any one of the similarity values of the plurality of similar questions is less than the first similarity threshold and greater than or equal to the second similarity threshold, determining the statement length of the similar question with the most identical words; if the statement length meets a preset length condition, determining the similar question with the most identical words as the target similar question matching the query statement; if the statement length does not meet the preset length condition, determining the similar question corresponding to the maximum similarity as the target similar question matching the query statement.
[0011] In an exemplary embodiment, the similarity range includes a second similarity range determined by a second similarity threshold and a third similarity threshold; the second similarity threshold is greater than the third similarity threshold; determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: if any one of the similarity of the plurality of similar questions is less than the second similarity threshold and greater than or equal to the third similarity threshold, determining the similar question with the most identical words as the target similar question matching the query statement.
[0012] In an exemplary embodiment, the method further includes: when the plurality of similar questions do not meet the word segmentation filtering conditions, determining the similarity corresponding to the plurality of similar questions respectively, and determining the similar question with the highest similarity as the target similar question that matches the query statement.
[0013] In an exemplary embodiment, determining the response content of the query statement based on the target similarity question and the keywords in the query statement includes: performing vectorization representation processing on the target similarity question to obtain a third vector representation of the target similarity question, and performing semantic matching in a preset corpus based on the third vector representation to determine a first response content corresponding to the target similarity question; performing keyword matching in the preset corpus based on the keywords in the query statement to determine a second response content corresponding to the keywords in the query statement; and performing fusion processing on the first response content and the second response content to obtain the response content of the query statement.
[0014] In an exemplary embodiment, the step of performing keyword matching in a preset corpus based on keywords in the query statement to determine the second response content corresponding to the keywords in the query statement includes: performing keyword matching in the preset corpus based on keywords in the query statement; if no query result matching the keywords is found in the preset corpus, filtering the keywords in the query statement according to a set keyword category to obtain target keywords; performing keyword matching again in the preset corpus based on the target keywords to determine the query result corresponding to the target keywords; and determining the query result corresponding to the target keywords as the second response content corresponding to the keywords in the query statement.
[0015] According to another aspect of the embodiments of this application, a device for determining response content is also provided, comprising: an acquisition module, configured to acquire a query statement sent by a user terminal; a matching module, configured to perform semantic matching in a question set based on the query statement, and match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions; a filtering module, configured to perform word segmentation filtering on the multiple similar questions if the multiple similar questions meet the word segmentation filtering conditions, and determine a target similar question matching the query statement from the multiple similar questions based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering; and a determination module, configured to determine the response content of the query statement based on the target similar question and keywords in the query statement.
[0016] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program, and the computer program is configured to execute the above-described method for determining the response content when it is run.
[0017] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method for determining the response content through the computer program.
[0018] According to another aspect of the embodiments of this application, a computer program product is also provided, including a computer program, wherein when the computer program is executed by a processor, a method for determining the above-mentioned response content is provided.
[0019] This application acquires query statements sent by the user and performs semantic matching on these statements within a question set. This allows for a deeper understanding of the query's semantics, identifying multiple similar questions. If word segmentation filtering conditions are met, further word segmentation filtering is applied to these similar questions, ensuring that the final identified target similar questions not only match the query semantically but also ensure word-level matching. Finally, the response content is determined based on the target similar questions and keywords in the query statement. This approach, which considers both the overall query statement and keyword dimensions, improves the reliability and accuracy of the response, thus addressing the issue of poor response accuracy. Attached Figure Description
[0020] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0021] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0022] Figure 1 This is a schematic diagram of the hardware environment for an interaction method of a smart device according to an embodiment of this application;
[0023] Figure 2 This is a flowchart of a method for determining the content of a response according to an embodiment of this application;
[0024] Figure 3 This is a structural block diagram of a response content determination device according to an embodiment of this application;
[0025] Figure 4 This is a schematic diagram of the structure of an optional electronic device according to an embodiment of this application. Detailed Implementation
[0026] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0027] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0028] According to one aspect of the embodiments of this application, a method for determining the content of a response is provided. This method for determining the content of a response is widely applicable to whole-house intelligent digital control application scenarios such as smart homes, smart home ecosystems, and intelligence house ecosystems. Optionally, in this embodiment, the above-mentioned interaction method for smart home devices can be applied to, for example... Figure 1 The hardware environment shown consists of terminal device 102 and server 104. For example... Figure 1 As shown, server 104 is connected to terminal device 102 via a network and can be used to provide services (such as application services) to the terminal or clients installed on the terminal. A database can be set up on the server or independently of the server to provide data storage services for server 104. Cloud computing and / or edge computing services can be configured on the server or independently of the server to provide data processing services for server 104.
[0029] The aforementioned network may include, but is not limited to, at least one of the following: wired network, wireless network. The aforementioned wired network may include, but is not limited to, at least one of the following: wide area network, metropolitan area network, local area network. The aforementioned wireless network may include, but is not limited to, at least one of the following: Wi-Fi (Wireless Fidelity), Bluetooth. The terminal device 102 may not be limited to PC, mobile phone, tablet computer, smart air conditioner, smart range hood, smart refrigerator, smart oven, smart stove, smart washing machine, smart water heater, smart washing equipment, smart dishwasher, smart projector, smart TV, smart clothes rack, smart curtains, smart audio-visual equipment, smart socket, smart speaker, smart speaker box, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart robot vacuum cleaner, smart window cleaning robot, smart mopping robot, smart air purifier, smart steam oven, smart microwave oven, smart water heater, smart air purifier, smart water dispenser, smart door lock, etc.
[0030] To address the aforementioned issues, this embodiment provides a method for determining the response content, including but not limited to applications in whole-house smart digital control scenarios. Figure 2 This is a flowchart of a method for determining the content of a response according to an embodiment of this application, the process including the following steps S202-S208:
[0031] Step S202: Obtain the query statement sent by the user;
[0032] The query is sent from the user's client to the intelligent question-answering system. Users can input their query on the user's client-side interface (such as a chatbot, voice assistant, or search box) when interacting with the intelligent question-answering system. The query can be entered directly as text or converted from the user's voice input.
[0033] Understandably, for text input, the intelligent question-answering system can receive it through a communication interface, which includes the user's query. For voice input, the intelligent question-answering system can first convert the voice signal into text using speech recognition technology, and then receive this text data.
[0034] In some embodiments, after receiving a query, the intelligent question-answering system performs preliminary parsing, which may include removing irrelevant punctuation, converting capitalization, and performing simple grammar and spell checks to ensure the accuracy of subsequent processing. Furthermore, the parsing process may also include identifying entities and intents in the query to prepare for subsequent semantic matching and keyword extraction.
[0035] Step S204: Based on the query statement, perform semantic matching in the question set to match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions;
[0036] The question set is a pre-prepared database containing standardized questions widely encountered within a specific domain or private domain. These standardized questions are carefully designed and processed to cover common queries within the domain while minimizing variations in wording to facilitate system understanding and matching. Building the question set typically involves steps such as question collection, cleaning (removing redundant and irrelevant questions), classification, and standardization (formatting and synonym replacement). For example, for Q&A in the home appliance private domain, standardized questions in the question set can focus specifically on domain-specific keywords such as brand, model, and fault codes. By using word vectors or large model embeddings, not only can general semantics be understood and matched, but also private domain keywords can be specifically targeted, improving matching accuracy.
[0037] It's important to note that during semantic matching, the query sent by the user and each question in the question set can be transformed into a comparable vector representation. This can be achieved, for example, using deep learning models such as word2vec, BERT, or other large-scale model embedding techniques. These models capture the meaning of words and their contextual relationships within a sentence, thus transforming the entire sentence or statement into a vector that reflects its semantic features. After converting the query and each question in the question set into vectors, the system can perform matching within a vector space model. A vector space model is a data representation method that represents each text (in this case, a question) as a point in a multi-dimensional space, mapping similar text to nearby positions. By calculating the similarity between the query vector and the question vectors in the question set (e.g., using cosine similarity, Euclidean distance, etc.), multiple semantically most similar questions to the query can be identified.
[0038] In an exemplary embodiment, semantic matching is performed on a set of questions based on the query statement to match multiple similar questions corresponding to the query statement, including: performing vectorization representation processing on the query statement to obtain a first vector representation of the query statement; performing vectorization representation processing on the multiple standardized questions to obtain second vector representations corresponding to the multiple standardized questions respectively; calculating the similarity between the first vector representation and each of the second vector representations respectively; and determining the standardized questions corresponding to the similarity that meet the preset similarity conditions as multiple similar questions matching the query statement.
[0039] Specifically, vectorization is the process of converting text data into numerical vectors, enabling text information to be processed in machine learning models. For the query statement and the multiple standardized questions contained in the question set, vectorization processing is required to generate their respective first and second vector representations.
[0040] Understandably, the query input from the user can be preprocessed, including word segmentation, stop word removal, and stemming. Then, a pre-trained word embedding model (such as word2vec, BERT, or large-scale embedding techniques) can be used to convert the entire sentence into a vector. Finally, a vector representation of the entire query can be generated—the first vector representation—through weighted averaging, concatenation, or a specialized model (such as sentence BERT). The preprocessing steps for standardized questions in the question set are similar to those for the query. Then, the same word embedding or sentence embedding model is used to convert each standardized question into a vector representation—the second vector representation.
[0041] In some embodiments, after obtaining the first vector representation of the query statement and the second vector representation of the standardized problem, the similarity between the first vector representation and all second vector representations can be calculated. For example, the cosine similarity between the first vector representation and all second vector representations can be calculated, that is, the cosine value of the angle between two vectors is calculated to measure their similarity. This value ranges from -1 to 1, where 1 represents perfect similarity, 0 represents no similarity, and -1 represents the opposite.
[0042] Optionally, the linear distance between the first vector representation and all second vector representations can also be calculated. The smaller the distance, the more similar the vectors. The Manhattan distance between the first vector representation and all second vector representations can also be calculated, which is the sum of the absolute differences between the first vector representation and all second vector representations in each dimension. This is similar to the Euclidean distance but applies to non-Euclidean spaces.
[0043] Understandably, based on the calculated similarity, a preset similarity condition, such as a similarity threshold, is set. Only when the similarity between the first vector representation of the query and the second vector representation of the standardized question is higher than or equal to this threshold is the standardized question considered a similar question matching the query. Alternatively, the top three standardized questions with the highest similarity can be identified as similar questions. This strategy ensures that the similar questions returned by the system are semantically close enough to the user's query, improving matching accuracy. All standardized questions that meet the preset similarity condition are collected to form a set of similar questions matching the query. This set contains multiple candidate questions, providing a foundation for subsequent steps such as word segmentation and keyword matching, ensuring that the system can understand and respond to the user's query from multiple perspectives.
[0044] In the above embodiments, through vectorized representation and similarity calculation, the intelligent question-answering system can identify multiple standardized questions that are semantically similar to the query statement. This process is not merely text matching, but also an application of semantic understanding and deep learning. In this way, the system can better understand user intent, thereby improving the accuracy of private domain question answering and the user experience.
[0045] Step S206: If the multiple similar questions meet the word segmentation filtering conditions, perform word segmentation filtering on the multiple similar questions. Based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering, determine the target similar question that matches the query statement from the multiple similar questions.
[0046] Among them, the word segmentation filtering conditions are the conditions set to determine whether word segmentation filtering can be performed. For example, a similarity range can be set, and the similarity of multiple similar questions can be compared with the set similarity range. If the similarity of multiple similar questions falls within the similarity range, it is determined that the word segmentation filtering conditions are met.
[0047] Similarity parameters are used to characterize the degree of similarity between similar questions and query statements. Each similar question has a corresponding similarity parameter. For example, for any given question, the similarity parameter can include the similarity between the similar question and the query statement, or it can include the number of identical words between the two statements. The higher the similarity between the similar question and the query statement, and the greater the number of identical words, the closer the similar question and the query statement are.
[0048] Optionally, if the similarity of multiple similar questions does not fall within the similarity range, for example, if the similarity of any one of the similar questions is greater than the maximum similarity in the similarity range, indicating that there is a similar question that is semantically very close to the query statement, then that similar question can be directly identified as the target similar question; of course, if there are more than one similar question that is greater than the maximum similarity in the similarity range, then the similar question with the highest similarity can be selected as the target similar question.
[0049] Optionally, if the similarity of multiple similar questions does not fall within the similarity range, for example, if the similarity of all similar questions is less than the minimum similarity of the similarity range, indicating that the similar questions obtained by matching have little semantic relevance to the query statement, then it is not necessary to generate target similar questions that match the query statement, and the matching of subsequent response content can be directly carried out based on the query statement.
[0050] In an exemplary embodiment, when the plurality of similar questions meet the word segmentation filtering conditions, word segmentation filtering is performed on the plurality of similar questions, and based on the similarity parameters corresponding to the plurality of similar questions obtained by word segmentation filtering, a target similar question matching the query statement is determined from the plurality of similar questions, including: determining the similarity corresponding to the plurality of similar questions; when the similarity of the plurality of similar questions is within a preset similarity range, the candidate keywords corresponding to each of the plurality of similar questions are matched with the keywords in the query statement to determine the number of identical words between the query statement and the plurality of similar questions; based on the similarity corresponding to the plurality of similar questions and the number of identical words, a target similar question matching the query statement is determined from the plurality of similar questions.
[0051] The preset similarity range serves as the basis for the next step of screening. The similarity range can be an interval, used to filter similar questions whose similarity falls within a specific range. For each similar question, candidate keywords can be extracted. Candidate keywords can include entity names (such as brand, model), technical terms, fault codes, categories, etc. Candidate keyword extraction can utilize natural language processing techniques, such as named entity recognition, part-of-speech tagging, and keyword extraction algorithms, such as Jieba word segmentation.
[0052] Specifically, the intelligent question-answering system matches extracted candidate keywords with keywords in the query statement one by one, determining the number of identical words between the query statement and each similar question. This ensures that even with semantic similarity, the resulting target similar questions contain the key information points from the user's query, improving matching accuracy. Furthermore, a comprehensive evaluation is performed based on the similarity of each similar question and the number of identical words with the query statement. For example, a comprehensive score can be calculated for each similar question, including a weighted average of similarity and keyword matching, or a more complex function that comprehensively considers both factors. The intelligent question-answering system can select the similar question with the highest comprehensive score as the target similar question. These questions are highly matched to the query statement both semantically and keyword-wise, thus providing the most relevant and accurate answer.
[0053] In some embodiments, during the word segmentation process of the query statement and multiple similar questions retrieved, Chinese word segmentation tools (such as Jieba) or corresponding tools for other languages can be used to segment words according to grammatical rules and dictionaries. For example, a custom dictionary can be imported. This dictionary can contain proper nouns in the home appliance field, brand names, product types, common fault descriptions, etc. The use of a custom dictionary can ensure that Jieba can identify specific words within the private domain during word segmentation, thereby segmenting the text more accurately. Based on word segmentation, the keyword extraction function provided by Jieba can be used to extract candidate keywords related to the query statement or standardized questions. Jieba's keyword extraction mechanism can automatically filter out key information points in the text based on information such as word frequency and part of speech. These keywords carry the core information of the question. In addition, the extracted keywords may contain some irrelevant or low-information-value words, which can be filtered out, such as by setting some rules, such as word frequency thresholds and part-of-speech conditions (e.g., only retaining nouns and adjectives), to filter out keywords that are truly relevant to the question. At the same time, keywords can be sorted, giving priority to those words that appear frequently in the query statement and have high information value.
[0054] In the above embodiments, the intelligent question-answering system can not only understand the semantics of the query statement, but also further refine the matching results through keyword matching, thereby providing high-quality question-answering services that are both semantically relevant and keyword-accurate in the home appliance private domain or any specific field.
[0055] In an exemplary embodiment, the similarity range includes a first similarity range determined by a first similarity threshold and a second similarity threshold; the first similarity threshold is greater than the second similarity threshold; determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: if any one of the similarity values of the plurality of similar questions is less than the first similarity threshold and greater than or equal to the second similarity threshold, determining the statement length of the similar question with the most identical words; if the statement length meets a preset length condition, determining the similar question with the most identical words as the target similar question matching the query statement; if the statement length does not meet the preset length condition, determining the similar question corresponding to the maximum similarity as the target similar question matching the query statement.
[0056] Understandably, if any of the similarity scores among multiple similar questions falls below the first similarity threshold but is above or equal to the second similarity threshold, it means these questions are relatively matched, but further filtering is needed to determine the final answer. This further filtering considers the number of identical words. This ensures semantic matching while also including keywords or phrases from the query, improving the semantic and topic relevance of the match. Furthermore, among the similar questions with the highest number of identical words, we can check if the question's length meets a preset length condition. The sentence length consideration is to avoid questions that are too long and contain too much irrelevant information, or too short and omit key details. The preset length condition is an empirical value that can be adjusted based on the characteristics of the specific domain and experimental results. If the sentence length meets the preset length condition, then the similar question with the highest number of identical words will be identified as the target similar question matching the query. This indicates that the question matches the query to the greatest extent semantically and topically, while also being of appropriate length, avoiding excessive redundancy or missing information.
[0057] Optionally, if the length of the most similar question with the most identical words does not meet the preset length condition, the system will adopt a fallback strategy and select the question with the highest similarity as the target similar question. This is because, in some cases, even if the question length does not meet the preset condition, if the similarity is extremely high, it means that the question is semantically very close to the query statement, and therefore is more likely to provide an accurate and relevant answer.
[0058] In the above embodiments, in addition to initial screening based on similarity, the matching results are further refined by comprehensively considering the number of identical words and sentence length. This ensures that the intelligent question-answering system can provide high-quality answers that are semantically relevant, topically accurate, and of appropriate length within the home appliance private domain or other specific fields. This strategy improves the accuracy of question answering and the user experience, especially when dealing with semantically similar but differently worded questions, enabling a more flexible and accurate match to the user's actual needs.
[0059] In an exemplary embodiment, the similarity range includes a second similarity range determined by a second similarity threshold and a third similarity threshold; the second similarity threshold is greater than the third similarity threshold; determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: if any one of the similarity of the plurality of similar questions is less than the second similarity threshold and greater than or equal to the third similarity threshold, determining the similar question with the most identical words as the target similar question matching the query statement.
[0060] The intelligent question-answering system defines a second and a third similarity threshold, with the second threshold being higher than the third. These thresholds are set to define a broader matching range within the vector-represented similarity space, covering questions that may be semantically related to the query but whose similarity is insufficient to reach a higher threshold. The intelligent question-answering system can identify the most relevant questions at the keyword level by comparing the number of identical words between each similar question and the query, even if their similarity in vector representation is not the highest. By comparing the number of identical words, the system can overcome potential limitations in similarity calculation, more accurately identify the core content of the question, and ensure that users receive answers that are most relevant to their query intent and best solve their problem.
[0061] In some embodiments, the setting of the second and third similarity thresholds, as well as the comparison standard for the number of identical words, can be adjusted according to the specific scenario and experimental results of private domain question answering to find the optimal balance. Setting the second similarity threshold too high may cause valid questions to be missed, while setting the third similarity threshold too low may introduce too many irrelevant or low-quality questions. The comparison of the number of identical words can also be combined with the length and complexity of the question to determine a reasonable quantitative standard, avoiding mismatches of long and irrelevant questions due to a large number of identical words.
[0062] In the above embodiments, the intelligent question-answering system can more effectively filter and determine similar questions that best match the query statement within the home appliance private domain or other specific domains, improving the accuracy of question-answering and user experience. In particular, when dealing with private domain questions that are semantically similar but have diverse expressions, it can more flexibly and accurately identify the key information points of the question and provide high-quality responses.
[0063] In an exemplary embodiment, the method further includes: when the plurality of similar questions do not meet the word segmentation filtering conditions, determining the similarity corresponding to the plurality of similar questions respectively, and determining the similar question with the highest similarity as the target similar question that matches the query statement.
[0064] It is understandable that if the similarity of multiple similar questions does not fall within the similarity range, it means that the word segmentation filtering conditions are not met. For example, if the similarity of each similar question is greater than the maximum similarity in the similarity range, it indicates that there are similar questions that are semantically very close to the query statement. In this case, the similar question with the highest similarity can be selected as the target similar question.
[0065] Optionally, if the similarity of multiple similar questions does not fall within the similarity range, for example, if the similarity of all similar questions is less than the minimum similarity of the similarity range, indicating that the similar questions obtained by matching have little semantic relevance to the query statement, then it is not necessary to generate target similar questions that match the query statement, and the matching of subsequent response content can be directly carried out based on the query statement.
[0066] Step S208: Based on the target similarity question and the keywords in the query statement, determine the response content of the query statement.
[0067] In an exemplary embodiment, determining the response content of the query statement based on the target similarity question and the keywords in the query statement includes: performing vectorization representation processing on the target similarity question to obtain a third vector representation of the target similarity question, and performing semantic matching in a preset corpus based on the third vector representation to determine a first response content corresponding to the target similarity question; performing keyword matching in the preset corpus based on the keywords in the query statement to determine a second response content corresponding to the keywords in the query statement; and performing fusion processing on the first response content and the second response content to obtain the response content of the query statement.
[0068] Understandably, the pre-set corpus is a database containing a large amount of text that has been processed and converted into vector representations. By calculating the similarity between the third vector representation and each text vector in the corpus, the system can find the texts that are semantically most relevant to the target similar question. These texts may contain answers to the target similar question or related information. Based on the results of the semantic matching above, the first response content corresponding to the target similar question is determined. The first response content refers to textual information that is highly semantically relevant to the target similar question; it can provide a direct answer to the question or useful information.
[0069] Furthermore, the intelligent question-answering system performs keyword matching within a pre-defined corpus based on keywords in the query. Keywords are words in the query that carry key information and may be directly related to the key points of the user's question. Through keyword matching, text related to the keywords in the query can be found in the corpus, thereby determining the content of the second response corresponding to the keywords in the query. This content may include explanations of the keywords, introductions to related concepts, or other information related to the keywords.
[0070] Finally, the first and second responses are merged to generate a response specifically for the query. This merging process may consider the semantic integrity of the first response and the keyword supplementary information of the second response to ensure that the final response comprehensively and accurately answers the user's question.
[0071] In some embodiments, when integrating response content, information from the first response can be given priority because it is obtained based on semantic matching. Simultaneously, supplementary information related to keywords from the second response can be incorporated into the final response to enhance its explanatory power and comprehensiveness. Furthermore, some optimization processes can be performed, such as removing redundant information and adjusting the expression structure to make the response more natural, fluent, and easy to understand.
[0072] In the above embodiments, the intelligent question-answering system can combine semantic understanding and keyword matching to provide more accurate and detailed answers, thereby improving the efficiency and user satisfaction of private domain question answering. This method shows unique advantages, especially when dealing with private domain questions that are semantically complex, have diverse keywords, or have unique expressions.
[0073] In an exemplary embodiment, the step of performing keyword matching in a preset corpus based on the keywords in the query statement to determine the second response content corresponding to the keywords in the query statement includes: performing keyword matching in the preset corpus based on the keywords in the query statement; if no query result matching the keywords is found in the preset corpus, filtering the keywords in the query statement according to a set keyword category to obtain target keywords; performing keyword matching again in the preset corpus based on the target keywords to determine the query result corresponding to the target keywords; and determining the query result corresponding to the target keywords as the second response content corresponding to the keywords in the query statement.
[0074] It should be noted that if no matching information is found in the preset corpus, it may be because the keywords are too specific, obscure, or the expression is inconsistent with the text in the corpus. In this case, the system will proceed with keyword filtering and re-matching to improve the success rate of the search and the accuracy of the answer.
[0075] Specifically, when no matching results are found, the intelligent question-answering system can perform keyword filtering according to predefined keyword categories. Keyword categories refer to classifying keywords into different categories based on their semantic role and importance in the query statement, such as main nouns, descriptive adjectives, operational verbs, and combinations of English letters and numbers. Keyword categories allow for a more intelligent determination of which keywords are core to the retrieval process and which may be redundant or secondary. Based on keyword filtering, target keywords can be selected from the query statement. Target keywords are those considered crucial to understanding the question; they more accurately represent the user's query intent, thereby improving the accuracy of subsequent matching. The intelligent question-answering system can use the identified target keywords to perform keyword matching in a pre-set corpus. This matching process will be more focused and precise because the target keywords have already removed potential interference or redundant information, retaining only the core vocabulary directly related to the question. Based on the re-matching results, the system determines the query results corresponding to the target keywords as the second response content. This content will be a more precise response to the query statement; it may directly answer the user's question, provide steps to solve the problem, explain the meaning of keywords, or provide relevant background information to help the user better understand the question and its answer.
[0076] In the above embodiments, the intelligent question-answering system can effectively handle complex queries that fail to find a direct match on the first attempt. Through intelligent keyword filtering and re-matching, it improves the retrieval efficiency and answer quality of private domain question-answering. This method is particularly suitable for queries containing a large number of technical terms, brand names, or other specific details, helping the system to more accurately understand and respond to user needs.
[0077] In steps S202-S208 above, the query statement sent by the user is obtained, and semantic matching is performed on the query statement within the question set. This allows for a deeper understanding of the query statement's semantics, identifying multiple similar questions corresponding to the query statement. If the word segmentation filtering conditions are met, further word segmentation filtering is performed on the similar questions, ensuring that the final identified target similar questions not only match the query statement semantically but also ensure word-level matching. Finally, the response content is determined based on the target similar questions and keywords in the query statement. This approach, which involves determining the response content from both the overall dimension of the query statement and the keyword dimension, improves the reliability and accuracy of the response content, thereby solving the problem of poor accuracy in response content.
[0078] Obviously, the embodiments described above are only some embodiments of this application, and not all embodiments. To better understand the above method, the following description, in conjunction with embodiments, illustrates the process, but is not intended to limit the technical solutions of the embodiments of this application:
[0079] In today's era of intelligent manufacturing and the Internet of Things, the demand for intelligent question answering is increasing, and the quality of the answers (including information quality and accuracy) is of paramount importance. High-quality answers can improve problem-solving efficiency and yield valuable information. Therefore, improving the accuracy of question answering within private domains is a significant challenge.
[0080] Table 1 shows the question-and-answer related data in a smart home scenario, formatted as follows.
[0081] Table 1
[0082]
[0083]
[0084] The method for determining the response content proposed in this application is based on the matching results of the target similarity questions and keywords in the query statement in a pre-set corpus to determine the response content for the query statement. Specifically, it uses the posterior formula of Bayes' theorem:
[0085]
[0086] P(s i|k1,k2,…,k d The posterior probability () represents the probability of matching a specific question with more keywords, where k represents the keywords in the query. It can be seen that more keywords increase the likelihood of matching the question. Based on this, a hybrid matching approach combining target similarity questions and keywords in the query arises: on one hand, target similarity questions are embedded for rapid semantic matching; on the other hand, keyword matching is performed to improve accuracy. This approach ensures both semantic retrieval and keyword matching, but the number of keywords should be limited to avoid increasing matching time.
[0087] When identifying similar questions, large model embedding is used. Large model embedding transforms questions into vector representations containing rich semantic information, enabling efficient question matching by calculating the distance or similarity between vectors. Even if the words are different but express the same thing, they can still be identified, which significantly reduces the generalization of questions. This deep learning-based method has a significant advantage in understanding the complexity and diversity of language. In this application, cosine similarity is used to recall similar questions, as shown in the following formula:
[0088]
[0089] Where cosineSimilarity represents the calculated cosine similarity, and q represents the user question, i.e., the query statement. i Let Q represent any standard or extended problem, that is, any standardized problem in the problem set, and calculate Q. i and q i The cosine similarity is calculated by adding 1 to prevent negative values.
[0090] The recalled result set is denoted as R. Similar questions in the result set can be ranked according to cosine similarity. The returned results are processed according to a threshold, and in this application, segmented processing is used:
[0091]
[0092] First, if any one of the similarity scores among multiple similar questions is less than 1.95 but greater than or equal to 1.90, then the words obtained by word segmentation of q (e.g., Jieba) are used, and the top three Q's (ranked by the top three similarity scores) in the recall set R are also segmented and statistically analyzed. The word segmentation of q is compared with the word segmentation of the three Q's, and the Q with the most identical words is denoted as R. max If R maxThe text length should not exceed 4 / 3 of q (if the text contains more keywords, it may result in many identical words with different meanings; the value of 4 / 3 can be adjusted according to the actual situation), then R can be... max Otherwise, select R0, which is the most similar problem among the recalled similar problems, and determine it as the target similar problem.
[0093] Secondly, if any one of the similarity scores among multiple similar questions is less than 1.9 but greater than or equal to 1.83, then the words obtained by word segmentation of q (e.g., jieba) and the top 3 Q's (ranked by the top three similarity scores) in the recall set R are segmented and statistically analyzed separately. The word segmentation of q is compared with the word segmentation of the three Q's, and the Q with the most identical words is denoted as R. max .
[0094] Finally, if the similarity score of multiple similar problems is less than 1.83, return empty and do not select the target similar problem; if the similarity score of multiple similar problems is greater than or equal to 1.95, select the similar problem with the highest similarity score and determine it as the target similar problem.
[0095] In determining the response content based on similar questions and keywords, a flexible matching method is proposed for keyword matching. Flexible matching refers to the ability to appropriately reduce the number of keywords retrieved or modify the matching fields based on the query results to ensure a match is found. For example, in the query "Why isn't my xx brand electric water heater producing hot water?", the keywords are "xx brand" and "electric water heater". If no matching query containing the keyword "xx brand" is found in the database, an empty result will be returned. However, for private domain queries, the data is generally managed, and general queries do not contain specific brands. Therefore, for "Why isn't my xx brand electric water heater producing hot water?", flexible matching can include the necessary "electric water heater" but not necessarily "xx brand" (the matching condition exists but is not required). To further increase the likelihood of finding the correct answer, additional search conditions are added, meaning the answer contains (but is not required to) the keyword "xx brand" as a matching condition. This hybrid retrieval significantly improves the accuracy of the matching. For combinations of English letters and alphanumeric characters, they are highly representative and specific and cannot be omitted. For example, in the phrase "Gas display E12 fault", the keyword "E12" must always be present because the problem "Gas display E1 fault" does not exist, but the problem "Gas display E1 fault" does. If the keyword "E12" is not included as a necessary word, the answer may be "Gas display E1 fault".
[0096] Based on the determination of the response content proposed in this application, experimental verification was conducted. The verification process is as follows: A large model was selected as the word embedding model, with a vector dimension of 1024; based on more than 20,000 data points, proper noun categories were extracted to form a vocabulary list, which serves two purposes: firstly, to provide a vocabulary library for Jieba word segmentation, and secondly, to extract nouns contained in questions during mixed retrieval; the above 20,000+ vectorized questions, along with the original questions and answers, were imported into a database, i.e., a predictive database; elastic hybrid matching algorithm and threshold segmentation algorithm were used for testing. 1000 questions and answers were randomly selected, and 6 results (including matched questions and corresponding answers) were extracted, as shown in Table 2 below.
[0097] Table 2
[0098]
[0099]
[0100]
[0101]
[0102] The experimental results above show that large-scale model vectorization is very powerful in semantic understanding. However, relying solely on word vector methods for matching still has some accuracy issues. When the algorithm presented in this paper is used, the answer performance is improved. For example, for the question "The water heater temperature is not rising quickly," without using the answer content determination method provided in this application, the matching result is "The water heater temperature is not rising," which is obviously less convincing than "The water heater temperature rises slowly" obtained using the answer content determination method provided in this application. The question "The heat pump reports F16 fault" is a non-existent problem, and the result without using the answer content determination method provided in this application is clearly incorrect. Using the answer content determination method provided in this application, the result is "No answer." Other examples can be found in Table 2. 1000 questions and answers were randomly selected from 20,000 question-answer pairs and expanded into 3000 questions (different ways of asking the same question). The results of the tests are shown in Table 3 below.
[0103] Table 3
[0104]
[0105] Based on the experimental results and comparisons above, it was found that the elastic hybrid algorithm combined with the cosine similarity algorithm can capture more key information, thereby recalling more relevant and high-quality answers and improving the overall accuracy of question answering. Furthermore, large-scale model vectorization has advantages over previous word2vec methods, such as a larger vocabulary and richer semantics, enabling a better understanding of the question's intent semantically. This ensures that even if the words used in the question are different, the meaning remains consistent. Figure 1 Answers can also be obtained through manual review; furthermore, in private domain question answering, manually compiling important vocabulary yields more relevant and practical results than algorithms. Overall, this method does improve question answering performance to some extent. However, its shortcomings include the need for manual review of vocabulary, and the need to further incorporate semantic-level ranking algorithms or use large models for ranking decisions during the threshold segmentation stage, rather than relying solely on vocabulary.
[0106] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.
[0107] This embodiment also provides a device for determining the content of a response. This device is used to implement the above embodiments and preferred embodiments, and details already described will not be repeated. As used below, the term "module" can be a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0108] Figure 3 This is a structural block diagram of a response content determination device according to an embodiment of this application. The device includes:
[0109] The acquisition module 32 is used to acquire the query statement sent by the user.
[0110] The matching module 34 is used to perform semantic matching in the question set based on the query statement, and to match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions;
[0111] The filtering module 36 is used to perform word segmentation filtering on the multiple similar questions when the multiple similar questions meet the word segmentation filtering conditions, and to determine the target similar question that matches the query statement from the multiple similar questions based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering.
[0112] The determination module 38 is used to determine the response content of the query statement based on the target similar questions and the keywords in the query statement.
[0113] The aforementioned device acquires the query statement sent by the user and performs semantic matching based on the query statement within a question set. This allows for a deep understanding of the query statement's semantics, identifying multiple similar questions. If the word segmentation filtering conditions are met, the similar questions are further segmented and filtered, ensuring that the final identified target similar questions not only match the query statement semantically but also ensure word-level matching. Finally, the response content is determined based on the target similar questions and keywords in the query statement. This approach, which determines the response content from both the overall query statement and keyword perspectives, improves the reliability and accuracy of the response, thereby addressing the issue of poor response accuracy.
[0114] In an exemplary embodiment, the matching module 34 is further configured to perform vectorization representation processing on the query statement to obtain a first vector representation of the query statement; perform vectorization representation processing on the plurality of standardized questions to obtain second vector representations corresponding to the plurality of standardized questions respectively; calculate the similarity between the first vector representation and each of the second vector representations respectively; and determine the standardized questions corresponding to the similarity that meet the preset similarity conditions as a plurality of similar questions that match the query statement.
[0115] In an exemplary embodiment, the filtering module 36 is further configured to determine the similarity corresponding to each of the plurality of similar questions; when the similarity of the plurality of similar questions is within a preset similarity range, the candidate keywords corresponding to each of the plurality of similar questions are matched with the keywords in the query statement to determine the number of identical words between the query statement and the plurality of similar questions; based on the similarity corresponding to the plurality of similar questions and the number of identical words, the target similar question matching the query statement is determined from the plurality of similar questions.
[0116] In an exemplary embodiment, the similarity range includes a first similarity range determined by a first similarity threshold and a second similarity threshold; the first similarity threshold is greater than the second similarity threshold; the filtering module 36 is further configured to, when any one of the similarities among the multiple similar questions is less than the first similarity threshold and greater than or equal to the second similarity threshold, determine the statement length of the similar question with the most identical words; when the statement length meets a preset length condition, determine the similar question with the most identical words as the target similar question matching the query statement; when the statement length does not meet the preset length condition, determine the similar question corresponding to the maximum similarity as the target similar question matching the query statement.
[0117] In an exemplary embodiment, the similarity range includes a second similarity range determined by a second similarity threshold and a third similarity threshold; the second similarity threshold is greater than the third similarity threshold; the filtering module 36 is further configured to determine the similar question with the most identical words as the target similar question matching the query statement if any one of the similarities of the multiple similar questions is less than the second similarity threshold and greater than or equal to the third similarity threshold.
[0118] In an exemplary embodiment, the above apparatus further includes: a processing module, configured to determine the similarity of the multiple similar questions respectively when the multiple similar questions do not meet the word segmentation filtering conditions, and to determine the similar question with the highest similarity as the target similar question that matches the query statement.
[0119] In an exemplary embodiment, the determining module 38 is further configured to perform vectorization representation processing based on the target similarity question to obtain a third vector representation of the target similarity question, and perform semantic matching in a preset corpus based on the third vector representation to determine a first response content corresponding to the target similarity question; perform keyword matching in a preset corpus based on the keywords in the query statement to determine a second response content corresponding to the keywords in the query statement; and perform fusion processing on the first response content and the second response content to obtain the response content of the query statement.
[0120] In an exemplary embodiment, the determining module 38 is further configured to perform keyword matching in a preset corpus based on the keywords in the query statement; if no query result matching the keywords is found in the preset corpus, the keywords in the query statement are filtered according to a set keyword category to obtain target keywords; keyword matching is then performed again in the preset corpus based on the target keywords to determine the query result corresponding to the target keywords; and the query result corresponding to the target keywords is determined as the second response content corresponding to the keywords in the query statement.
[0121] Embodiments of this application also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when run.
[0122] Optionally, in this embodiment, the storage medium may be configured to store a computer program for performing the following steps:
[0123] S1, retrieve the query statement sent by the user;
[0124] S2, based on the query statement, perform semantic matching in the question set to match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions;
[0125] S3, if the multiple similar questions meet the word segmentation filtering conditions, the multiple similar questions are segmented and filtered. Based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering, the target similar question that matches the query statement is determined from the multiple similar questions.
[0126] S4. Based on the target similarity questions and the keywords in the query statement, determine the response content of the query statement.
[0127] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.
[0128] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.
[0129] Embodiments of this application also provide an electronic device, such as... Figure 4 As shown, the electronic device includes a memory 402 and a processor 404. The memory 402 stores a computer program, and the processor 404 is configured to execute the steps in any of the above method embodiments via the computer program.
[0130] Optionally, in this embodiment, the processor 404 can be configured to perform the following steps via a computer program:
[0131] S1, retrieve the query statement sent by the user;
[0132] S2, based on the query statement, perform semantic matching in the question set to match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions;
[0133] S3, if the multiple similar questions meet the word segmentation filtering conditions, the multiple similar questions are segmented and filtered. Based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering, the target similar question that matches the query statement is determined from the multiple similar questions.
[0134] S4. Based on the target similarity questions and the keywords in the query statement, determine the response content of the query statement.
[0135] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.
[0136] Alternatively, as those skilled in the art will understand, Figure 4 The structure shown is for illustrative purposes only. Figure 4 This does not limit the structure of the aforementioned electronic device. For example, the electronic device may also include components that are more... Figure 4 The more or fewer components shown (such as network interfaces, etc.), or having the same Figure 4 The different configurations shown.
[0137] The memory 402 can be used to store software programs and modules, such as the program instructions / modules corresponding to the method and apparatus for determining the response content in this embodiment. The processor 404 executes various functional applications and data processing by running the software programs and modules stored in the memory 402, thereby implementing the aforementioned method for determining the response content. The memory 402 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 402 may further include memory remotely located relative to the processor 404, and these remote memories can be connected to the terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof. Specifically, the memory 402 may be used, but is not limited to, to store information such as system configuration files. As an example, such as... Figure 4 As shown, the memory 402 may include, but is not limited to, the acquisition module 32, matching module 34, filtering module 36, and determination module 38 in the above-described response content determination device. Furthermore, it may include, but is not limited to, other module units in the above-described response content determination device (such as the first determination module and the second determination module), which will not be elaborated further in this example.
[0138] Optionally, the transmission device 406 described above is used to receive or send data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 406 includes a Network Interface Controller (NIC), which can be connected to other network devices and a router via a network cable to communicate with the Internet or a local area network. In another example, the transmission device 406 is a Radio Frequency (RF) module, used for wireless communication with the Internet.
[0139] In addition, the above-mentioned electronic device also includes: a display 408; and a connection bus 410 for connecting the various module components in the above-mentioned electronic device.
[0140] Embodiments of this application also provide a computer program product, which includes a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.
[0141] Embodiments of this application also provide another computer program product, including a non-volatile computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps in any of the above method embodiments.
[0142] The embodiments described herein also provide a computer program that includes computer instructions stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the steps in any of the above method embodiments.
[0143] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.
[0144] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.
[0145] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for determining the content of a response, characterized in that, include: Get the query statement sent by the user; Based on the query statement, semantic matching is performed on the question set to match multiple similar questions corresponding to the query statement; wherein, the question set includes multiple standardized questions; If the multiple similar questions meet the word segmentation filtering conditions, the multiple similar questions are segmented and filtered. Based on the similarity parameters corresponding to the multiple similar questions obtained by word segmentation filtering, the target similar question that matches the query statement is determined from the multiple similar questions. Based on the target similarity questions and the keywords in the query statement, the response content of the query statement is determined.
2. The method according to claim 1, characterized in that, The step of performing semantic matching on the question set based on the query statement to match multiple similar questions corresponding to the query statement includes: The query statement is vectorized to obtain a first vector representation of the query statement; the multiple standardized questions are vectorized to obtain second vector representations corresponding to the multiple standardized questions respectively. Calculate the similarity between the first vector representation and each of the second vector representations; The standardized questions corresponding to the similarity that meet the preset similarity conditions are identified as multiple similar questions that match the query statement.
3. The method according to claim 1, characterized in that, When the multiple similar questions meet the word segmentation filtering conditions, based on the similarity parameters corresponding to the multiple similar questions obtained from the word segmentation filtering, word segmentation filtering is performed on the multiple similar questions to determine the target similar question that matches the query statement from the multiple similar questions, including: Determine the similarity of each of the multiple similar problems; When the similarity of the multiple similar questions is within a preset similarity range, the candidate keywords corresponding to each of the multiple similar questions are matched with the keywords in the query statement to determine the number of identical words between the query statement and the multiple similar questions; based on the similarity of the multiple similar questions and the number of identical words, the target similar question that matches the query statement is determined from the multiple similar questions.
4. The method according to claim 3, characterized in that, The similarity range includes a first similarity range determined by a first similarity threshold and a second similarity threshold; the first similarity threshold is greater than the second similarity threshold; the step of determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: If any one of the similarity scores of the multiple similar questions is less than the first similarity threshold and greater than or equal to the second similarity threshold, the sentence length of the similar question with the most identical words is determined. If the length of the statement meets the preset length condition, the similar question with the most identical words is identified as the target similar question that matches the query statement; If the length of the query statement does not meet the preset length condition, the similar question corresponding to the maximum similarity will be determined as the target similar question that matches the query statement.
5. The method according to claim 3, characterized in that, The similarity range includes a second similarity range determined by a second similarity threshold and a third similarity threshold; the second similarity threshold is greater than the third similarity threshold; the step of determining the target similar question matching the query statement from the plurality of similar questions based on the similarity corresponding to the plurality of similar questions and the number of identical words includes: If any one of the similarity scores of the multiple similar questions is less than the second similarity threshold but greater than or equal to the third similarity threshold, the similar question with the most identical words is determined as the target similar question that matches the query statement.
6. The method according to claim 1, characterized in that, The method further includes: If the multiple similar questions do not meet the word segmentation filtering conditions, the similarity of each of the multiple similar questions is determined, and the similar question with the highest similarity is determined as the target similar question that matches the query statement.
7. The method according to claim 1, characterized in that, The step of determining the response content for the query statement based on the target similarity questions and the keywords in the query statement includes: Based on the target similarity question, a vector representation processing is performed to obtain the third vector representation of the target similarity question. Then, semantic matching is performed in a preset corpus based on the third vector representation to determine the first response content corresponding to the target similarity question. Based on the keywords in the query statement, keyword matching is performed in a preset corpus to determine the second response content corresponding to the keywords in the query statement; The first response and the second response are merged to obtain the response to the query.
8. The method according to claim 7, characterized in that, The step of performing keyword matching in a preset corpus based on the keywords in the query statement to determine the second response content corresponding to the keywords in the query statement includes: Based on the keywords in the query statement, keyword matching is performed in a preset corpus. If no query result matching the keyword is found in the preset corpus, the keywords in the query statement are filtered according to the set keyword categories to obtain the target keyword. Based on the target keyword, keyword matching is performed again in the preset corpus to determine the query result corresponding to the target keyword. The query results corresponding to the target keyword are determined as the second response content corresponding to the keyword in the query statement.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein the program, when executed, performs the method of any one of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to execute the method of any one of claims 1 to 7 through the computer program.