Method and apparatus for text correction

By using a Natural Language Understanding (NLU) model to identify text intent categories and match positive keywords for error correction, this technology solves the problem of low error correction accuracy in existing technologies and achieves high-precision text error correction and intent recognition in both general and specific domains.

CN114970538BActive Publication Date: 2026-06-16HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2021-02-25
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing text correction technologies have low accuracy in both general and specific domains. General domain correction relies on an overly broad open domain, resulting in low accuracy, while specific domain correction relies on domain knowledge construction, which has strong limitations.

Method used

By using a Natural Language Understanding (NLU) model to identify text intent categories, words with contribution values ​​greater than a threshold are identified as negative keywords. Positive keywords are then matched from the intent obfuscation list based on edit distance for error correction. The error correction scope is narrowed by combining intent recognition results. This approach is applicable to both general and specific domains.

🎯Benefits of technology

It improves the accuracy of text correction and model performance, is applicable to both general and specific domains, and enhances the accuracy of subsequent intent recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114970538B_ABST
    Figure CN114970538B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of text processing in the field of artificial intelligence, and provides a text correction method and device, which comprises the following steps: identifying an intention category of a text T through a natural language understanding model NLU to obtain a predicted intention A; when the predicted intention A does not match a user expected intention B, determining contribution values of each word in the text T to the predicted intention A, and selecting a word with a contribution value greater than a threshold value as a negative keyword; for each negative keyword, a positive keyword corresponding to the negative keyword is matched from an intention confusion list according to an edit distance, and the positive keyword is used as a word after correction of the negative keyword; the intention confusion list records each keyword that confuses the expected intention B into the predicted intention A. The application is based on a task, that is, a correction scheme for a key file of an identifiable intention category, and can be applied to general fields and specific fields.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of text processing in the field of artificial intelligence, and in particular to methods and apparatuses for text error correction, computing devices and computer-readable storage media. Background Technology

[0002] Text error correction primarily aims to detect errors in the raw text input and correct them using natural language processing techniques. The raw text can be scanned and recognized content from books and newspapers, content from social networks such as Sina Weibo and WeChat Moments, or user-input speech recognized by an Automatic Speech Recognition (ASR) module. These texts inevitably contain errors (or non-standard terminology), which can lead to a decrease in the accuracy of subsequent processing (such as text translation, text entity recognition, and intent recognition).

[0003] Existing error correction technologies can be broadly categorized into two types based on their target: general-domain error correction and domain-specific error correction. General-domain error correction targets text without domain limitations, primarily using features from pronunciation, glyphs, grammar, knowledge bases, and language models for error detection and correction. However, due to the vast scope of the open domain, its accuracy is not high. Domain-specific text correction modules mainly construct domain dictionaries and utilize fuzzy matching algorithms to obtain the text to be corrected. However, they only perform error correction within a specific domain and rely heavily on domain knowledge. Summary of the Invention

[0004] In view of the prior art, this application provides a method and apparatus for text correction, a computing device and a computer-readable storage medium. This application is a text correction scheme based on task (i.e., for identifiable intent categories) of key documents, which can be applied to general fields and specific fields.

[0005] To achieve the above objectives, the first aspect of this application provides a method for text correction, comprising:

[0006] The intention category of text T is identified by the Natural Language Understanding (NLU) model to obtain the predicted intention A;

[0007] When the predicted intent A does not match the user's expected intent B, the contribution value of each word in the text T to the predicted intent A is determined, and words with contribution values ​​greater than a threshold are selected as negative keywords.

[0008] For each negative keyword, the keyword corresponding to the negative keyword is matched from the intent obfuscation list based on the edit distance and used as a positive keyword. The positive keyword is used as the word after correcting the negative keyword. The intent obfuscation list records each keyword that obfuscates the expected intent B into the predicted intent A.

[0009] As described above, when an error in text T causes the predicted intent to be inconsistent with the user's expected intent, this method can automatically identify the erroneous keywords (i.e., negative keywords) in text T and adaptively correct these negative keywords. Furthermore, during the derivation of negative keywords, the intent recognition results of NLU are incorporated. The negative keywords corresponding to the predicted intent are derived in reverse from the predicted intent identified by NLU, thus limiting text correction to specific keywords related to the task (i.e., related to the intent category output by NLU), narrowing the scope of text correction, and therefore improving model performance and correction accuracy. Moreover, the text correction method of this application is applicable to both general and specific domains.

[0010] As one possible implementation of the first aspect, the text T is generated by a text correction module that corrects a source text.

[0011] It also includes: forming text pairs with the negative keywords and the corrected words, which are used as corpus for training the text correction module.

[0012] Based on the above, this application can also automatically construct text pairs of negative keywords and corrected words according to the text correction results, so that the text correction module that generates the text T can be trained and updated according to the text pair.

[0013] As one possible implementation of the first aspect, the NLU includes at least one self-attention layer;

[0014] The step of determining the contribution value of each word in the text T to the prediction intention A includes:

[0015] For the last K self-attention layers of NLU, obtain the attention score matrix of CLS for each layer; CLS is the prefix character added to the text T; the attention score matrix of CLS includes the attention score of CLS relative to each word in the text T; K is an integer not less than 1;

[0016] The attention score matrices of the CLS in the K layers are summed, and the result is used as a matrix of the contribution values ​​of each word to the predicted intent A.

[0017] The above describes one implementation of the NLU. This implementation uses an attention score matrix of K-layer CLS to calculate the contribution value of each word to the predicted intent A. Since the last K-layer self-attention layer is used, the attention information of the K layers can be fused, that is, the attention information of the high-level and low-level layers is fused, which makes the calculation of the contribution value of each word to the predicted intent A more accurate.

[0018] As one possible implementation of the first aspect, the NLU further includes a multi-channel attention layer stacked sequentially after the multiple self-attention layers, a linear layer corresponding to each channel of the multi-channel attention layer, and a logistic regression layer; the logistic regression layer includes an output node whose intent category is the predicted intent A;

[0019] The step of determining the contribution value of each word in the text T to the prediction intention A further includes:

[0020] The channel contribution value of each channel in the multi-channel attention layer is determined, and the sum of the attention score matrix of the CLS of the K layer is multiplied by the channel contribution value greater than 0. The result is used as the matrix of the contribution values ​​of each word to the predicted intent A.

[0021] The channel contribution value for each channel is in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight of the k-th node to the output node of the prediction intention A of the logistic regression layer.

[0022] As described above, by further employing a multi-channel attention layer, it is possible to capture the different representations of each character in multiple channels. This allows for the capture of the contribution values ​​of each character to the predicted intent A under different multi-channel conditions. Furthermore, by combining the contribution values ​​of channels with contribution values ​​greater than 0 (i.e., channels strongly correlated with the predicted intent) to calculate the contribution value of each character to the predicted intent A, the calculation of the contribution value of each character to the predicted intent A can be made more accurate.

[0023] As one possible implementation of the first aspect, the method for calculating the attention score of CLS relative to each word in text T includes one of the following:

[0024] Attention scores are calculated based on the query vector of CLS and other key vectors.

[0025] Attention scores are calculated based on the query vector of each character and the key vector of the CLS.

[0026] The first attention score is calculated based on the query vector of CLS and the key vectors of other words; the second attention score is calculated based on the query vector of each word and the key vector of CLS; the first and second attention scores corresponding to the same word are summed.

[0027] Therefore, the method for calculating attention scores can be flexibly selected according to needs, such as the amount of computation required.

[0028] As one possible implementation of the first aspect, the attention score is calculated using a computational model of query vector and key vector; the computational model includes one of the following:

[0029] Dot product model, scaled dot product model, additive model, bilinear model.

[0030] Therefore, the above calculation models can be flexibly selected according to needs.

[0031] As one possible implementation of the first aspect, the edit distance includes one of the following: Pinyin edit distance, input method edit distance, and character shape edit distance.

[0032] As shown above, the edit distance can be flexibly selected according to the needs of the application scenario. For example, the pinyin edit distance can be applied to the application scenario where text T comes from ASR recognition, the input method edit distance can be applied to the scenario where text T comes from user input using an input method, and the character shape edit distance (i.e., character shape similarity distance) can be applied to the application scenario where text T comes from OCR technology recognition.

[0033] As one possible implementation of the first aspect, before matching the keyword corresponding to the negative keyword from the intent obfuscation list, the method further includes: determining that the negative keyword is in the keyword list of the predicted intent A.

[0034] Therefore, it is possible to first determine whether the negative keyword is a keyword through the keyword list, so that when it is not in the keyword list (i.e., when it is not a keyword), the judgment of the negative keyword can be skipped, thereby reducing the amount of data to be judged and improving the running efficiency of the error correction method of this application.

[0035] As one possible implementation of the first aspect, the keyword list for predicting intent A is constructed in the following manner:

[0036] Obtain the corpus containing the predicted intent A, and calculate the word frequency (TF) value of each character in the corpus containing the predicted intent A using the following formula:

[0037]

[0038] Each character is sorted in descending order by its TF value, and the first certain number of characters are used as the keyword list for the prediction intent A.

[0039] As shown above, the keyword list in this application is task-related, that is, related to the intent category of the NLU output. Therefore, the constructed keyword list is adapted to the task, so the error correction capability is more accurate, and it can be applied to both general and specific domains.

[0040] As one possible implementation of the first aspect, the keywords in the intent obfuscation list that obfuscate the expected intent B to the predicted intent A are constructed in the following manner:

[0041] Calculate and merge the keyword lists for the expected intent B and the predicted intent A. Calculate the TF-IDF value for each word in the keyword list of the expected intent B, where TF-IDF = TF * IDF. The IDF is calculated according to the following formula, and the number of intents containing that word in the formula is 2:

[0042]

[0043] Each character is sorted in descending order of its TF-IDF value, and the first certain number of characters are used as keywords of the expected intent B to obfuscate the predicted intent A.

[0044] As shown above, the obfuscation keyword list in this application is task-related, that is, related to the intent category of the NLU output. Therefore, the constructed obfuscation keyword list is adapted to the task, so the error correction capability is more accurate, and it can be applied to both general and specific domains.

[0045] A second aspect of this application provides a text correction apparatus, comprising:

[0046] The natural language understanding module is used to identify the intent category of text T and obtain the predicted intent A;

[0047] The key text detection module is used to determine the contribution value of each word in the text T to the predicted intent A when the predicted intent A does not match the user's expected intent B, and select words with contribution values ​​greater than a threshold as negative keywords.

[0048] The key text mining module is used to match the corresponding keywords from the intent obfuscation list based on the edit distance for each negative keyword as positive keywords, and the positive keywords are the words corrected by the negative keywords; the intent obfuscation list records each keyword that obfuscates the expected intent B into the predicted intent A.

[0049] As one possible implementation of the second aspect, the text T is generated by a text correction module that corrects a source text.

[0050] It also includes: forming text pairs with the negative keywords and the corrected words, which are used as corpus for training the text correction module.

[0051] As one possible implementation of the second aspect, the natural language understanding module includes at least one self-attention layer;

[0052] The step of determining the contribution value of each word in the text T to the prediction intention A includes:

[0053] For the last K self-attention layers of NLU, obtain the attention score matrix of CLS for each layer; CLS is the prefix character added to the text T; the attention score matrix of CLS includes the attention score of CLS relative to each word in the text T; K is an integer not less than 1;

[0054] The attention score matrices of the CLS in the K layers are summed, and the result is used as a matrix of the contribution values ​​of each word to the predicted intent A.

[0055] As a possible implementation of the second aspect, the natural language understanding module further includes a multi-channel attention layer stacked sequentially after the multiple self-attention layers, a linear layer corresponding to each channel of the multi-channel attention layer, and a logistic regression layer; the logistic regression layer includes an output node whose intent category is the predicted intent A;

[0056] The step of determining the contribution value of each word in the text T to the prediction intention A further includes:

[0057] The channel contribution value of each channel in the multi-channel attention layer is determined, and the sum of the attention score matrix of the CLS of the K layer is multiplied by the channel contribution value greater than 0. The result is used as the matrix of the contribution values ​​of each word to the predicted intent A.

[0058] The channel contribution value for each channel is in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight of the k-th node to the output node of the prediction intention A of the logistic regression layer.

[0059] As one possible implementation of the second aspect, the method for calculating the attention score of CLS relative to each word in text T includes one of the following:

[0060] Attention scores are calculated based on the query vector of CLS and other key vectors.

[0061] Attention scores are calculated based on the query vector of each character and the key vector of the CLS.

[0062] The first attention score is calculated based on the query vector of CLS and the key vectors of other words; the second attention score is calculated based on the query vector of each word and the key vector of CLS; the first and second attention scores corresponding to the same word are summed.

[0063] As one possible implementation of the second aspect, the attention score is calculated using a computational model of query vector and key vector; the computational model includes one of the following:

[0064] Dot product model, scaled dot product model, additive model, bilinear model.

[0065] As one possible implementation of the second aspect, the editing distance includes one of the following: Pinyin editing distance, input method editing distance, and character shape editing distance.

[0066] As a possible implementation of the second aspect, the key text mining module is also used to determine the negative keyword in the keyword list of prediction intent A.

[0067] As one possible implementation of the second aspect, the keyword list for predicting intent A is constructed in the following manner:

[0068] Obtain the corpus containing the predicted intent A, and calculate the word frequency (TF) value of each character in the corpus containing the predicted intent A using the following formula:

[0069]

[0070] Each character is sorted in descending order by its TF value, and the first certain number of characters are used as the keyword list for the prediction intent A.

[0071] As one possible implementation of the second aspect, the keywords in the intent obfuscation list that obfuscate the expected intent B to the predicted intent A are constructed in the following manner:

[0072] Calculate and merge the keyword lists for the expected intent B and the predicted intent A. Calculate the TF-IDF value for each word in the keyword list of the expected intent B, where TF-IDF = TF * IDF. The IDF is calculated according to the following formula, and the number of intents containing that word in the formula is 2:

[0073]

[0074] Each character is sorted in descending order of its TF-IDF value, and the first certain number of characters are used as keywords of the expected intent B to obfuscate the predicted intent A.

[0075] A third aspect of this application provides a computing device, comprising:

[0076] Communication interface;

[0077] At least one processor connected to the communication interface; and

[0078] At least one memory, connected to the processor and storing program instructions, which, when executed by the at least one processor, cause the at least one processor to perform any of the methods described in the first aspect above.

[0079] A fourth aspect of this application provides a computer-readable storage medium having program instructions stored thereon, which, when executed by a computer, cause the computer to perform any of the methods described in the first aspect above.

[0080] These and other aspects of this application will become more apparent in the description of the following embodiments(s). Attached Figure Description

[0081] The following description, with reference to the accompanying drawings, further illustrates the various features of this application and the relationships between them. The drawings are exemplary; some features are not shown to scale, and some drawings may omit conventional features in the field of this application that are not essential to it, or additional features that are not essential to this application may be shown. The combination of features shown in the drawings is not intended to limit this application. Furthermore, throughout this specification, the same reference numerals refer to the same things. Specific descriptions of the drawings are as follows:

[0082] Figure 1 This is a schematic diagram of an application scenario of this application;

[0083] Figure 2 A flowchart of one embodiment of the text correction method of this application;

[0084] Figure 3 A flowchart illustrating a specific implementation of the text correction method of this application;

[0085] Figure 4 A flowchart illustrating intent prediction via NLU provided in a specific embodiment of this application;

[0086] Figure 5 This is a flowchart of negative keyword mining provided in a specific embodiment of this application;

[0087] Figure 6A flowchart illustrating the matching of positive keywords based on negative keywords in a specific embodiment of this application;

[0088] Figure 7 This is a schematic diagram of the structure of the NLU module provided in a specific embodiment of this application;

[0089] Figure 8 This is a schematic diagram of the attention score matrix provided in a specific embodiment of this application;

[0090] Figure 9 A schematic diagram of the error correction device provided in the embodiments of this application;

[0091] Figure 10 A schematic diagram of a computing device provided in this application. Detailed Implementation

[0092] The terms "first, second, third, etc." or similar terms such as module A, module B, module C, etc., used in the specification and claims are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understood that a specific order or sequence may be interchanged where permitted so that the embodiments of this application described herein can be implemented in an order other than that illustrated or described herein.

[0093] In the following description, the labels of the steps, such as S110, S120, etc., do not necessarily mean that the steps will be executed in this way. The order of the steps can be interchanged or executed simultaneously if permitted.

[0094] The term "comprising" as used in the specification and claims should not be construed as limiting itself to what follows; it does not exclude other elements or steps. Therefore, it should be interpreted as specifying the presence of the mentioned feature, integral, step, or component, but does not exclude the presence or addition of one or more other features, integrals, steps, or components, or groups thereof. Thus, the statement "device comprising means A and B" should not be limited to a device consisting solely of components A and B.

[0095] The terms "an embodiment" or "an embodiment" as used in this specification mean that a particular feature, structure, or characteristic described in conjunction with that embodiment is included in at least one embodiment of this application. Therefore, the terms "in one embodiment" or "in an embodiment" appearing throughout this specification do not necessarily refer to the same embodiment, but may refer to the same embodiment. Furthermore, in one or more embodiments, the particular features, structures, or characteristics can be combined in any suitable manner, as will be apparent to those skilled in the art from this disclosure.

[0096] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. In case of any inconsistency, the meaning set forth in this specification or derived from the content described herein shall prevail. Furthermore, the terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application. To accurately describe the technical content of this application and to accurately understand this application, the following explanations or definitions of the terms used in this specification are provided before describing specific embodiments:

[0097] 1) Text mining technology: Text mining is a method for discovering and extracting key information from text, based on computational linguistics and statistical mathematical analysis, combined with machine learning and information retrieval techniques. This application's embodiments use text mining technology to match corresponding positive keywords for negative keywords.

[0098] 2) Intent Recognition Technology: Broadly speaking, intent recognition is Natural Language Understanding (NLU). Its main purpose is to understand the user's desired action corresponding to the natural language text input by the user, and to describe the action using intent (e.g., the action or domain corresponding to the action in the target system) and slots (parameters required to complete the action). This is then converted into interface calls or application execution actions on the corresponding system through a task execution model, achieving the effect of initiating an action through natural language. For example, if the user inputs the natural language text "What is the weather like in New York today?", the intent recognition module can understand the user's intent as "check the weather," with slots "location: New York" and "time: today." Subsequently, the system can call interfaces and execute actions based on the above intent and slots, such as a voice announcement "The weather in New York today is sunny, with a high of 23°C and a low of 16°C." Intent recognition capability relies on understanding the semantic information in the natural language sentences input by users. When the text contains typos, extra words, or missing words due to non-standard user language or errors in Automatic Speech Recognition (ASR) technology, the accuracy of intent recognition will be affected, thus failing to correctly process the operation that the user wants to perform.

[0099] 3) Negative keywords, positive keywords: As mentioned above, based on NLU (or intention recognition), predicted intentions can be generated based on natural language text. The natural language text corresponds to the user's expected intention (or true intention), that is, what the user actually wants to express through natural language, the operations the user actually hopes to perform, etc. When the predicted intention generated based on NLU does not match the user's expected intention, this application refers to the keyword / keywords in the natural language text that are strongly related to the incorrect predicted intention as negative keywords. Correspondingly, this application refers to the keyword that can correctly describe the user's expected intention corresponding to the negative keyword as a positive keyword. For example, due to the incorrect generation of "Dial Movie A" by ASR, NLU outputs an incorrect predicted intention of "Make a call" based on the natural language text with misspelled words "Dial Movie A", while the user's expected intention is "Play a video", then in this example, "Dial" is the negative keyword, and "Play" is the positive keyword corresponding to "Dial".

[0100] 4) Term Frequency (TF): It is used to represent the frequency of occurrence of a character, word, or phrase in a text. Characters, words, or phrases with a high term frequency can be used as keywords reflecting the intention corresponding to the text.

[0101] 5) Inverse Document Frequency (IDF): Its main idea is that if a certain character, word, or phrase has a high TF in a certain type of text and rarely appears in other types of text, it is considered that this character, word, or phrase has good class discrimination ability and is suitable for classification. Through IDF, high-frequency characters, words, or phrases such as "de" and "le" with high term frequencies in all classes are not used as classification characters, words, or phrases.

[0102] 6) Classification (CLS) vector: For text classification tasks, a text classification model can insert a CLS symbol at the beginning of the text and use the output vector corresponding to this symbol as the semantic representation of the text for text classification. For example, as Figure 5 shown, the vector corresponding to the CLS output by the multi-channel attention layer is used as the output for downstream classification. The meaning of the output vector corresponding to CLS can be understood as: compared with other existing characters / words in the text, this symbol without obvious semantic information will more "fairly" integrate the semantic information of each character / word in the text.

[0103] 7) Transformer: A classic model in Natural Language Processing (NLP), the Transformer model uses a self-attention mechanism. The encoder of the Transformer model contains several Transformer Blocks (Trm), and each Trm contains a self-attention layer.

[0104] The existing methods will be introduced first, and then the technical solution of this application will be described in detail.

[0105] Existing technology 1: Patent application CN107741928A provides a method for correcting text errors after speech recognition based on domain recognition. In the error correction process, this scheme first segments the sentence to be corrected according to predefined grammatical rules, dividing it into redundant parts and core parts; then, it uses a search engine to perform fuzzy string matching to determine a candidate proprietary word set for the core part of the sentence; it calculates a similarity score based on edit distance, and corrects the redundant parts and core parts respectively, wherein fuzzy matching is performed based on the proprietary word set when correcting the core part; then, the corrected redundant parts and core parts are fused, and the error correction result is output.

[0106] Existing technology one has the following drawbacks: it requires the construction of complex grammatical rules to divide text into redundant and core parts. The definition of the core part depends on the rule setting, failing to fully utilize task-related high-level semantic information, resulting in low segmentation accuracy. Furthermore, this approach requires building a lexicon and filtering error-correcting text through fuzzy matching and edit distance, with error correction granularity at the string level. When multi-character errors occur, mismatches are prone to occur.

[0107] Prior art 2: Patent application CN107045496A discloses a method and device for correcting text after speech recognition. This method preprocesses the text after speech recognition, identifies the search intent, extracts attribute information, calculates the similarity between the attribute information and any candidate word in the candidate word library, and corrects the extracted attribute information according to the similarity value.

[0108] In this technical solution, rule templates in the search intent recognition template library are called sequentially. If the preprocessed text matches a template of a certain category (e.g., category C), the search intent is considered to be the current category C. Otherwise, effective features of the preprocessed text are extracted through word segmentation and then fed into a preset classifier for classification. The resulting category is taken as the search intent. Then, based on the attribute information to be extracted, the attribute fragments to be extracted are identified from the preprocessed text. The extraction template and context keywords corresponding to the attribute information to be extracted are obtained. Based on the weight of the extraction template and the weight of the context keywords, the score of each attribute fragment to be extracted is calculated.

[0109] The existing technology 2 has the following drawbacks: This solution is essentially an intent recognition method that uses dictionaries and rules. When there are misspellings, extra words, or missing words in the text that are not specified, which leads to errors in intent recognition and the dictionary cannot match them, the error correction capability of this solution for intent will not be effective.

[0110] This application provides a text correction scheme that differs from the approaches described above. The text correction scheme of this application can be applied to the field of artificial intelligence, involving text processing technologies within this field. The application targets can be terminals, applications, or network services capable of recognizing intent and slots in user-input natural language text, such as smartphones, smart speakers, search engines, and translation machines.

[0111] One application scenario for this application could be voice assistants, such as those on smart speakers and smartphones. Figure 1 The diagram illustrates this application scenario. A voice assistant typically includes an ASR module 12 for recognizing speech as text, a text correction module 13 for correcting errors in the recognized text, and an NLU module 111 for intent and slot recognition of the corrected text. As mentioned earlier, when the speech recognition result output by the ASR module 12 contains errors, it adversely affects the accuracy of intent and slot recognition by the NLU module 111. For example, the intent recognized by the NLU module 111 may not match the user's actual intent, thus failing to correctly process the user's intended operation. The text correction method provided in this application uses the predicted intent and the user's expected intent information output by the NLU module 111, combined with key text detection and mining techniques, to automatically detect keyword errors in the text corrected by the text correction module, and adaptively corrects the keywords. The text correction method provided in this application focuses on key text information in the text, correcting keywords, effectively improving the correction effect, and thus improving the accuracy of subsequent NLU module recognition.

[0112] It should also be noted that the text correction scheme of this application can be implemented on the terminal side, on the network side (such as the server), or by a combination of the terminal side and the network side.

[0113] Based on the above introduction of relevant terms and application scenarios of this application, an embodiment of the text correction method provided in this application will be described in detail below with reference to the accompanying drawings.

[0114] like Figure 2 An embodiment of a text correction method provided in this application is shown, comprising the following steps:

[0115] S210: Receive a text T, and use NLU to identify the intent category of the text T to obtain a predicted intent A, for example, the predicted intent A is "make a phone call".

[0116] Wherein, the text T is the text to be corrected by the method of this application, for example... Figure 1 The output text after error correction by the text correction module 111.

[0117] This NLU is a classification model that takes text as input and outputs intent categories. Each intent category corresponds to the intent to be predicted, such as "make a phone call," "listen to music," or "play a video." This NLU employs an attention mechanism, which can be interpreted as the contribution of each word in the input text T to the predicted intent output by the NLU module. Several NLU structures using attention mechanisms are exemplified below, all of which can be used as the NLU in this application:

[0118] 1) The structure of NLU may include stacked multi-layer self-attention layers, with a logistic regression layer stacked downstream as a classification layer. Optionally, the stacked multi-layer self-attention layers may include stacked TRMs.

[0119] 2) The structure of an NLU can be as follows: Figure 7 The structure shown includes a stacked TRM layer, a multi-channel attention layer, a linear layer corresponding to each channel, and a logistic regression layer as a classification layer. In the following detailed embodiments, this NLU structure will be used as an example to further illustrate this application.

[0120] 3) The structure of an NLU can be a recurrent neural network (RNN) employing an attention mechanism, with a logistic regression layer stacked downstream as a classification layer. The RNN can be a traditional RNN, a Long Short-Term Memory (LSTM) network, a gated recurrent unit (GRU), etc. Since a traditional attention-based RNN essentially uses an attention layer, it can also be categorized under the first type of NLU structure mentioned above.

[0121] It should also be noted that linear layers can be stacked between the above layers, such as between TRM layers, between TRM and the logistic regression layer, and between TRM and the multichannel attention layer, in order to improve the accuracy of the network by increasing the number of network layers.

[0122] S220: When the predicted intent A (e.g., making a phone call) does not match the user's expected intent B (e.g., playing a video), determine the contribution value of each word in the text T to the predicted intent A, and select words with contribution values ​​greater than a threshold as negative keywords.

[0123] User's expected intent B can be obtained through direct user feedback or prediction based on contextual awareness information. An example is provided below:

[0124] When users provide direct feedback, the various intent categories of NLU can be listed through the human-computer interaction interface for users to select. Furthermore, to facilitate quick viewing and selection during the user's selection process, the categories can be sorted and displayed according to the confidence level of each intent category output by the logistic regression layer of NLU, i.e., the softmax value of each intent.

[0125] When using context perception, it can be based on the category of the app currently running on the terminal, such as video app or phone app, or it can be based on historical statistics to determine the probability of running each type of app in the current time period. Then, based on the perceived context, the user's current expected intention B can be determined.

[0126] The method for determining the contribution value of each word in the text T to the predicted intent A can vary depending on the structure of the NLU. Several methods are illustrated below:

[0127] 1) For NLU structures that include stacked multi-layer self-attention layers:

[0128] For the last K self-attention layers of the NLU, the attention score matrix of CLS for each layer is obtained; then, the attention score matrices of the K layers of CLS are summed, and the result is used as a matrix of the contribution values ​​of each character to the predicted intent A. Here, CLS refers to the prefix character added to the text T, the attention score matrix of CLS includes the attention score of CLS relative to each character in the text T, and K is an integer not less than 1.

[0129] Because the last K-layer self-attention layer is used, the attention information of the K-layer can be fused, that is, the attention information of the high-level and low-level layers can be fused, which makes the calculation of the contribution value of each word to the predicted intention A more accurate.

[0130] 2) For Figure 7The structure of an NLU with a multi-channel attention layer is also shown:

[0131] The process can begin by calculating and summing the attention score matrices of the last K layers' CLS. Then, it includes determining the channel contribution value of each channel in the multi-channel attention layer, and multiplying the summed attention score matrices of the K layers by the channel contribution values ​​greater than 0. The result is used as a matrix representing the contribution of each word to the predicted intent A. The channel contribution value for each channel is calculated as follows: in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight from the k-th node to the output node of the prediction intent A in the logistic regression layer. This calculation process will be further detailed in the specific implementation described later.

[0132] By further employing a multi-channel attention layer, the different representations of each character in multiple channels can be captured. This allows for the capture of the contribution values ​​of each character to the predicted intent A under different multi-channel conditions. Furthermore, by combining the contribution values ​​of channels with a contribution value greater than 0 (i.e., channels strongly correlated with the predicted intent) to calculate the contribution value of each character to the predicted intent A, the calculation of the contribution value of each character to the predicted intent A can be made more accurate.

[0133] 3) For NLUs with an RNN structure, the attention score of the intent category of each word relative to the output can be directly used as the contribution value.

[0134] The method for calculating the attention score of each word in the relative text T by CLS includes one of the following, which can be flexibly selected as needed:

[0135] 1) The attention score is calculated based on the query vector of CLS and other key vectors; this method is used in the specific implementation below.

[0136] 2) Calculate the attention score based on the query vector of each word and the key vector of CLS.

[0137] 3) Calculate the first attention score based on the query vector of CLS and the key vectors of other words; calculate the second attention score based on the query vector of each word and the key vector of CLS; sum the first and second attention scores corresponding to the same word.

[0138] The attention score is calculated using a computational model for the query vector and key vector. Specifically, for the query vector matrix Q, key vector matrix K, and value vector matrix V, one of the following computational models can be used to calculate the attention score matrix:

[0139] 1) Dot product model: softmax(QK) T The specific implementation method described below uses this calculation model.

[0140] 2) Scaling Dot Product Model: Where, d k It is the scaling factor, which is a constant, such as 6, 8, 9, etc.

[0141] 3) Additive model: softmax(tanh(WK+UQ)), where W and U are learnable parameters.

[0142] 4) Bilinear model. softmax(K T WQ), where W is a learnable parameter.

[0143] S230: For each of the negative keywords, match the corresponding positive keywords from the intent obfuscation list based on the edit distance, and use these positive keywords as the corrected words for the negative keywords. The intent obfuscation list records the keywords that obfuscate the expected intent B into the predicted intent A.

[0144] The edit distance includes one of the following:

[0145] 1) Pinyin edit distance, such as the distance of initials, finals, and tones, can be applied to the application scenario of text T coming from ASR recognition; the specific implementation method described below uses this edit distance.

[0146] 2) Input method editing distance, such as Pinyin, Wubi, etc., can be applied to scenarios where the text T comes from the user's input using an input method.

[0147] 3) Character editing distance, also known as character similarity distance, can be applied to scenarios where text T is recognized by Optical Character Recognition (OCR) technology.

[0148] In some embodiments, before matching the positive keyword corresponding to the negative keyword from the intent obfuscation list, the method further includes: determining whether the negative keyword is in the keyword list of the predicted intent A. That is, first determining whether the negative keyword is a keyword for the predicted intent A, so that if it is not in the keyword list (i.e., not a keyword), the determination of the negative keyword is skipped, thereby reducing the amount of data to be determined and improving the operating efficiency of the error correction method of this application.

[0149] The keyword list and the obfuscation list are constructed using Formula 1 and Formula 2, respectively, as detailed in Table 1 below in the specific implementation details. It should be noted that the keyword list and obfuscation keyword list in this application are task-related, i.e., related to the intent category of the NLU output. Therefore, the process of mining and correcting negative keywords is adapted to the task, resulting in more accurate error correction and applicability to both general and specific domains.

[0150] In some embodiments, the text T in step S210 is generated by a text correction module correcting a source text; then after step S230, it may further include: forming text pairs with the negative keywords and the corrected words, using them as corpus for training the text correction module.

[0151] The following describes a specific implementation of the above-described text correction method, and will still use the example of... Figure 1 The example shown is applied to a voice assistant scenario. As mentioned earlier, the ASR module 12 of the voice assistant outputs the text after speech recognition. Since the text after speech recognition is likely to contain errors, it is referred to as noisy text in this specific embodiment. The noisy text is corrected by the text correction module 13 to generate corrected text. Then, the text is used by the NLU module 111 to perform intent recognition and generate a predicted intent. Then, the update decision module 112 judges whether the predicted intent matches the user's expected intent. If they do not match, the key text detection module 113 is triggered to find the negative keywords that generated the predicted intent from the text input to the NLU module 111. Then, the key text mining module 114 corrects these keywords to match the user's expected intent.

[0152] The following is a detailed description of a specific implementation of the text correction method of this application.

[0153] Before performing text correction, an obfuscation database needs to be constructed based on the various intents that NLU can recognize. This database includes a keyword list for each intent and a list of obfuscated keywords for each intent relative to other intents. Since the keyword list and obfuscated keyword list can be presented in a matrix (or two-dimensional table) format, this database can also be called an obfuscation matrix. See Table 1 below for details:

[0154] The diagonal elements of the confusion matrix store the list of intent keywords (such as the AA item and BB item in Table 1 below), and the non-diagonal elements store the list of confusion keywords (such as the AB item and BA item in Table 1 below). In the confusion matrix shown in Table 1 below, the title row and title column are the various intents recognizable by NLU, which are respectively listed as the true intent item and the confused intent item, and the data in the table items corresponding to the title row and title column are the specific data of the confusion matrix.

[0155]

[0156]

[0157] Table 1

[0158] Taking the two types of intents "make a call" and "play a video" shown in Table 1 above as an example, the composition of the keyword list and the confusion keyword list in the above confusion matrix will be specifically introduced as follows:

[0159] 1) For each intent, first obtain the corpus of that intent, and then, taking each character as a unit, calculate the term frequency (TF) value of each character in the corpus of that intent according to the following formula (1), and sort them in descending order according to the TF value. Set the first certain number, such as the first certain percentage (such as the first 10%) or the first set number (such as the first 20) of characters as the keywords of that intent, and construct the keyword list of that intent as described above.

[0160]

[0161] For example, the corpus corresponding to the intent "make a call" includes the corpus "call XXX, dial the number of YY, call Z". Then, according to the above TF calculation formula, the TF of the character "call" = 2 / 16, where 2 is the number of times the character appears in the corpus character set, and 16 is the total number of characters in the corpus.

[0162] 2) For two easily confused intents (such as make a call and play a video), merge the calculated keyword lists of the two intents respectively, and calculate the TF-IDF value of each keyword in the merged list, where TF-IDF = TF * IDF. The calculation of IDF can adopt the following formula (2). Since it is calculating the confusion situation of two intents, the number of intents containing the character in the denominator is 2:

[0163]

[0164] Furthermore, the keywords are sorted in descending order of their TF-IDF values, and the top 10% (e.g., the top 10%) or the top 20 (e.g., the top 20) of these keywords are designated as obfuscated keywords for this intent, thus constructing a list of obfuscated keywords for this intent. The list of obfuscated keywords in Table 1 also shows the TF-IDF values ​​of each keyword.

[0165] Once the obfuscation matrix is ​​constructed as described above, it can be used in the file correction process. For example... Figure 3 The flowchart shown illustrates a specific implementation of the error correction method of this application in a voice assistant scenario, which includes the following steps:

[0166] S310: When the voice assistant receives the user's voice input "Play Busan Train", it first performs speech recognition through the ASR module. Assuming that the recognized text in this specific embodiment is "Pull Busan Train".

[0167] S320: The text correction module receives the text recognized by ASR and performs error correction processing. Assuming that in this specific embodiment, the text correction module fails to find the error in the text and does not make any correction to the text, the text output by the text correction module will still be "Pull Busan Train".

[0168] S330: The NLU module receives the text output by the text correction module and generates a predicted intent, that is, outputs the possible intents and slots corresponding to the text. It should be noted that the subsequent error correction in this specific embodiment mainly takes the predicted intent as an example, so for the sake of simplicity, the filling of slots is not described in this specific embodiment.

[0169] For example, if the NLU module predicts the intent of "call Busan" as <intent: make a phone call, confidence: 0.7> and <intent: play a video, confidence: 0.3>, then the output of the NLU module will be <intent: make a phone call>.

[0170] like Figure 7 This illustrates one implementation of an NLU module, which consists of multiple stacked TRMs (eight TRMs are used in this specific implementation) followed by a multi-channel attention layer, a linear layer, and a logistic regression layer. For example... Figure 7 As shown, each TRM includes a self-attention layer and a feed-forward network layer. Each layer within a TRM, or a multi-channel attention layer, may further employ residual connections.

[0171] For the NLU module of this structure, the process of generating the prediction intent is as follows: Figure 4 As shown, it includes the following sub-steps S331-S334:

[0172] S331: Divide the input text "Dial 'Train to Busan'" character by character to obtain the token vectors of each character. At the same time, add classification and end token vectors at the beginning, marked as "CLS" and "SEP" respectively, for a total of 8 token vectors, namely Ecls, E1, E2... E6, Esep, which serve as the input of the first Trm. Among them, E1 to E6 correspond to the token vectors of "Dial 'Train to Busan'".

[0173] S332: The first Trm receives each token vector, and each other Trm receives the vectors output by the previous Trm. Each Trm performs self-attention encoding and then outputs the vectors after self-attention encoding. The output can still be 8 vectors.

[0174] Among them, during self-attention encoding, the query (query vector), key (key vector), and value (value vector) corresponding to each token vector are calculated through linear mapping respectively, and the attention scores between tokens are represented by the normalization (i.e., softmax) of the inner product of the query and the key. When specifically calculating the attention scores between tokens, it can be directly calculated in matrix form, that is, calculate the query vector matrix Q, key vector matrix K, and value vector matrix V composed of each token vector, and calculate the attention score matrix through the dot product model shown in the following formula (3):

[0175] softmax(QK T ) (3)

[0176] In addition, it should be noted that the output of the self-attention layer, that is, the vectors output after self-attention encoding of the input vectors, is expressed in matrix as the following formula (4):

[0177] Z = Attention(Q, K, V) = softmax(QK T )V (4)

[0178] S333: The output of the last Trm is provided to the multi-channel self-attention layer (Multi Head AttentionLayer, also known as the multi-head self-attention layer), and the multi-channel self-attention layer captures different representations of each token in multiple channels. The output of each channel (also known as each head) of the multi-channel self-attention layer includes the vector corresponding to "CLS".

[0179] Specifically referring to Figure 7 as shown, in this specific embodiment, the multi-channel self-attention layer includes 12 channels. The input of each channel is still 8 vectors. Since the downstream is for classification, the output of each channel only takes the vector corresponding to "CLS" as the output, that is, the outputs of these 12 channels are: Zcls1, Zcls2,..., Zcls12.

[0180] S334: The outputs of each channel of the multi-channel self-attention layer are passed to the linear layers of each channel, and then from the linear layers of each channel to the output of the logistic regression layer. The logistic regression layer is a classification layer, and its N outputs correspond to N intent categories. The outputs of the logistic regression layer are used to predict the probability value of each output (i.e., intent) through Softmax as the confidence level. The intent category corresponding to the highest confidence level is the NLU predicted intent.

[0181] In this specific embodiment, each channel's linear layer consists of 64 nodes. For example... Figure 7 As shown, the output vector Zcls of each channel is connected to the 64 nodes of the linear layer of that channel. The 64 nodes of the linear layer of each channel are then fully connected to the nodes L1 to LN of the logistic regression layer (to make...). Figure 7 Clear, Figure 7 The full connection is not shown; only the partial connection to the logistic regression layer node L1 is shown.

[0182] The N outputs of the N nodes L1 to LN of the logistic regression layer correspond to N intents, including "make a phone call", "play a video", ... "listen to music". In this specific implementation, the confidence score for the intent "make a phone call" is 0.7, and the confidence score for the intent "play a video" is 0.3. The category confidence score for the intent "make a phone call" is the highest among all intent categories, therefore the predicted intent is "make a phone call".

[0183] S340: The update decision module receives the NLU's predicted intent "make a phone call" and the user's expected intent. When the user's expected intent is "play video", the update decision module determines that the NLU's predicted intent is inconsistent with the user's expected intent, and then triggers the key text detection module to perform keyword detection.

[0184] S350: The key text detection module performs keyword detection steps to determine which keywords in the NLU module's predicted intent "make a phone call" are strongly correlated with the input NLU text "call Busan," i.e., to identify negative keywords from the text "call Busan." For example... Figure 5 As shown, the implementation of this step includes the following sub-steps S351-S355:

[0185] S351: Obtain the intent label index based on the NLU prediction result. In this specific embodiment, the intent label index is the index (call label) corresponding to the intent "make a phone call".

[0186] S352: Obtain the weights of the linear layer corresponding to the intent tag index (call tag), refer to... Figure 7 As shown, in this specific embodiment, the weights of the linear layer can be described using the following formula (5):

[0187]

[0188] Where m represents 12 channel identifiers, corresponding to the 12 channel identifiers of the multi-channel self-attention layer. For each channel, the weight of the linear layer corresponding to the intent label index (call label) of that channel is:

[0189] In this application, this weight is referred to as the contribution value of each channel.

[0190] Where k represents the identifier of the 64 nodes in the linear layer of the current channel; It is the weight vector from the k-th node of the linear layer in the current channel to the node labeled "making a phone call" in the logistic regression layer. For example... Figure 7 As shown, in the first channel, correspond Figure 7 In It is the output of the k-th node of the linear layer under the current channel. In this specific embodiment, under the first channel, correspond Figure 7 In The output.

[0191] S353: Divide the weights into channels according to the multi-channel attention layer index, and use the following formula (6) to obtain the contribution value of each channel to the prediction result. In this specific implementation, the contribution value of the channel with a contribution value greater than 0 is selected:

[0192]

[0193] Among them, channels with a contribution value greater than 0 are called positive channels, indicating that the channel has a strong correlation with the prediction intention of "making a phone call".

[0194] S354: Obtain the attention score matrix of the last K layers of the forward channel, and extract the attention score vector corresponding to the classification word CLS. In this application, the attention score vector corresponding to the word CLS is called the attention score matrix of CLS.

[0195] like Figure 8 This shows the attention score matrix of one of the self-attention layers. The attention score vector related to the classification word CLS includes terms related to the query vector q1 and the key vector k1, such as... Figure 8 The first column and first row are used. In this implementation, the first row will be used to calculate the attention of CLS to each word.

[0196] S355: Based on the attention score matrix and channel contribution value corresponding to the classification word CLS, calculate the contribution value of each Chinese character in the text to the predicted intent "make a phone call". Characters with scores greater than a certain threshold are marked as negative intent keywords.

[0197] Still referring to Figure 8 for illustration, Figure 8 is the attention score matrix of one layer. Among them, the dot product of the query vector q1 of CLS and the key vectors k of other words and tokens can be used as the attention of CLS to each word. Taking Figure 8 q1k2 in as an example for illustration, q1k2 in this layer corresponds to the vector E1, and the corresponding word is "dial", that is, the attention of CLS in this layer to the word "dial". When using the attention score matrices of K layers, the addition of q1k2 in the K layers integrates the attention of CLS in multiple layers to the word "dial". Since K layers are used, the attention information of high and low layers can be integrated.

[0198] As above, the attention of each word in the text to CLS can be calculated. Then, multiplying the attention of each word by the channel contribution value, the contribution value of each word to the predicted intention "make a call" can be obtained. The calculation result example in this specific implementation is as follows: ("dial", 0.908), ("one", 0.878), ("time", 1.0), ("cauldron", 0.019), ("mountain", 0.0), ("line", 0.131). In this specific implementation, when marking the scores greater than the set threshold of 0.5 as intention negative keywords, therefore, "dial", "one", and "time" are negative keywords.

[0199] S360: For each negative keyword, the key text mining module searches in the intention confusion matrix shown in Table 1 to check whether the word is in the keyword list of the predicted intention, that is, whether it is in the keyword list of the recognized predicted intention "make a call" (i.e., Table BB in Table 1). When it exists, according to the table entry of the predicted intention (i.e., the confused intention) corresponding to the user's expected intention (i.e., the true intention) (i.e., Table BA in Table 1), the confused keyword list is obtained, and then the positive keyword corresponding to the negative keyword is matched from the confused keyword list according to the edit distance. As Figure 6 shown, this step may specifically include the following sub-steps S361 - S363:

[0200] S361: Obtain the keyword list of the predicted intention "make a call" from the intention confusion matrix shown in Table 1. For each negative keyword, for example, for "dial", determine whether the current negative keyword "dial" exists in the keyword list of "make a call", such as Table BB in Table 1. If it exists, execute the next step; otherwise, skip the word "dial" and judge the next negative keyword.

[0201] S362: Obtain a list of confusing keywords that confuse the user's expected intention (i.e., the true intention) into the predicted intention (i.e., the confusing intention) from the intention confusion matrix shown in Table 1. In this specific embodiment, obtain the list of confusing keywords corresponding to the true intention of "play video" and the confusing intention of "make a call", such as the BA item in Table 1.

[0202] S363: After obtaining the list of confusing keywords, search for the positive keyword corresponding to the negative keyword in the table according to the pinyin edit distance. For example, for the negative keyword "dial", the corresponding positive keyword "play" can be found in the corresponding list of confusing keywords.

[0203] S370: The error correction process is completed as above. Further, when the user's expected intention is inconsistent with the predicted intention of the NLU, each negative keyword and its corresponding positive keyword can also be used to construct a training corpus pair (TKP), such as the corpus pair: "dial - play", and update the text error correction module described in step S320 through this corpus pair.

[0204] In some embodiments, the text including each negative keyword (i.e., the text output by the text error correction module in step S320) and the text after replacing the negative keyword with the corresponding positive keyword can also be used to construct a training corpus pair (TKP), such as the corpus pair: "dial once 'Train to Busan' - play once 'Train to Busan'", and update the text error correction module through this corpus pair.

[0205] This application also provides a corresponding embodiment of an error correction device. For the beneficial effects or technical problems solved by this device, reference can be made to the descriptions in the methods corresponding to each device, or to the description of the method in the summary of the invention. Details will not be repeated here.

[0206] As Figure 1 shown, in the embodiment of the text error correction device provided by this application, the device includes:

[0207] A natural language understanding module 111, that is, an NLU module 111, for identifying the intention category of the text T to obtain a predicted intention A;

[0208] A key text detection module 113, for determining the contribution value of each word in the text T to the predicted intention A when the predicted intention A does not match the user's expected intention B, and selecting the words with a contribution value greater than the threshold as negative keywords;

[0209] The key text mining module 114 is used to match the positive keywords corresponding to the negative keywords from the intent obfuscation list based on the edit distance for each negative keyword, and use them as the corrected words for the negative keywords; the intent obfuscation list records each keyword that obfuscates the expected intent B into the predicted intent A.

[0210] The determination of whether the predicted intent A matches the user's expected intent B can be achieved by the update decision module 112.

[0211] In some embodiments, the text T is generated by a text correction module correcting a source text; in this case, the text correction device of this application may further include: forming text pairs with the negative keywords and the corrected words, as corpus for training the text correction module.

[0212] In some embodiments, the natural language understanding module includes at least one self-attention layer;

[0213] The step of determining the contribution value of each word in the text T to the prediction intention A includes:

[0214] For the last K self-attention layers of NLU, obtain the attention score matrix of CLS for each layer; CLS is the prefix character added to the text T; the attention score matrix of CLS includes the attention score of CLS relative to each word in the text T; K is an integer not less than 1;

[0215] The attention score matrices of the CLS in the K layers are summed, and the result is used as a matrix of the contribution values ​​of each word to the predicted intent A.

[0216] In some embodiments, the natural language understanding module further includes a multi-channel attention layer stacked sequentially after the multiple self-attention layers, a linear layer corresponding to each channel of the multi-channel attention layer, and a logistic regression layer; the logistic regression layer includes an output node whose intent category is the predicted intent A;

[0217] The step of determining the contribution value of each word in the text T to the prediction intention A further includes:

[0218] The channel contribution value of each channel in the multi-channel attention layer is determined, and the sum of the attention score matrix of the CLS of the K layer is multiplied by the channel contribution value greater than 0. The result is used as the matrix of the contribution values ​​of each word to the predicted intent A.

[0219] The channel contribution value for each channel is in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight of the k-th node to the output node of the prediction intention A of the logistic regression layer.

[0220] In some embodiments, the method for calculating the attention score of CLS relative to each word in text T includes one of the following:

[0221] Attention scores are calculated based on the query vector of CLS and other key vectors.

[0222] Attention scores are calculated based on the query vector of each character and the key vector of the CLS.

[0223] The first attention score is calculated based on the query vector of CLS and the key vectors of other words; the second attention score is calculated based on the query vector of each word and the key vector of CLS; the first and second attention scores corresponding to the same word are summed.

[0224] In some embodiments, the attention score is calculated using a computational model of query vector and key vector; the computational model includes one of the following:

[0225] Dot product model, scaled dot product model, additive model, bilinear model.

[0226] In some embodiments, the editing distance includes one of the following: Pinyin editing distance, input method editing distance, and character shape editing distance.

[0227] In some embodiments, the key text mining module is further configured to determine the negative keyword in the keyword list of prediction intent A.

[0228] In some embodiments, the keyword list for predicting intent A is constructed in the following manner:

[0229] Obtain the corpus containing the predicted intent A, and calculate the word frequency (TF) value of each character in the corpus containing the predicted intent A using the following formula:

[0230]

[0231] Each character is sorted in descending order by its TF value, and the first certain number of characters are used as the keyword list for the prediction intent A.

[0232] In some embodiments, the keywords in the intent obfuscation list that obfuscate the expected intent B to the predicted intent A are constructed in the following manner:

[0233] Calculate and merge the keyword lists for the expected intent B and the predicted intent A. Calculate the TF-IDF value for each word in the keyword list of the expected intent B, where TF-IDF = TF * IDF. The IDF is calculated according to the following formula, and the number of intents containing that word in the formula is 2:

[0234]

[0235] Each character is sorted in descending order of its TF-IDF value, and the first certain number of characters are used as keywords of the expected intent B to obfuscate the predicted intent A.

[0236] Figure 10 This is a schematic structural diagram of a computing device 900 provided in an embodiment of this application. The computing device 900 includes: a processor 910, a memory 920, and a communication interface 930.

[0237] It should be understood that Figure 10 The communication interface 930 in the computing device 900 shown can be used to communicate with other devices.

[0238] The processor 910 can be connected to the memory 920. The memory 920 can be used to store the program code and data. Therefore, the memory 920 can be a storage unit inside the processor 910, an external storage unit independent of the processor 910, or a component that includes both the storage unit inside the processor 910 and the external storage unit independent of the processor 910.

[0239] Optionally, the computing device 900 may also include a bus. The memory 920 and communication interface 930 can be connected to the processor 910 via the bus. The bus can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc.

[0240] It should be understood that in the embodiments of this application, the processor 910 may be a central processing unit (CPU). The processor may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor. Alternatively, the processor 910 may employ one or more integrated circuits to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0241] The memory 920 may include read-only memory and random access memory, and provides instructions and data to the processor 910. A portion of the processor 910 may also include non-volatile random access memory. For example, the processor 910 may also store device type information.

[0242] When the computing device 900 is running, the processor 910 executes the computer execution instructions in the memory 920 to perform the operation steps of the above method.

[0243] It should be understood that the computing device 900 according to the embodiments of this application can correspond to the corresponding subject in executing the methods according to the various embodiments of this application, and the above and other operations and / or functions of each module in the computing device 900 are respectively for implementing the corresponding processes of the methods of this embodiment. For the sake of brevity, they will not be described in detail here.

[0244] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0245] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0246] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0247] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0248] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0249] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0250] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, performs a diversified problem generation method, including at least one of the schemes described in the above embodiments.

[0251] The computer storage medium in this application embodiment can be any combination of one or more computer-readable media. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. For example, a computer-readable storage medium can be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0252] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device.

[0253] The program code contained on a computer-readable medium may be transmitted using any suitable medium, including, but not limited to, wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0254] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as "C" or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0255] Note that the above are merely preferred embodiments and the technical principles employed in this application. Those skilled in the art will understand that this application is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of this application. Therefore, although this application has been described in detail through the above embodiments, this application is not limited to the above embodiments. Many other equivalent embodiments may be included without departing from the concept of this application, all of which fall within the scope of protection of this application.

Claims

1. A method for text error correction, characterized in that, include: The intention category of text T is identified by the Natural Language Understanding (NLU) model to obtain the predicted intention A; When the predicted intent A does not match the user's expected intent B, the contribution value of each word in the text T to the predicted intent A is determined, and words with contribution values ​​greater than a threshold are selected as negative keywords; wherein, the contribution value of each word in the text T to the predicted intent A is calculated based on the attention score of each word in the text T relative to the predicted intent A. For each of the negative keywords, the keywords corresponding to the negative keywords are matched from the intent confusion list of expected intent B relative to predicted intent A based on the edit distance, and the positive keywords are used as words after correcting the negative keywords; the intent confusion list records each keyword that can correctly describe expected intent B when predicted intent A and expected intent B do not match; Before matching the keyword corresponding to the negative keyword from the intent obfuscation list, the method further includes: determining that the negative keyword is in the keyword list of predicted intent A, wherein the keyword list of predicted intent A is constructed in the following manner: Obtain the corpus of the predicted intent A, and calculate the word frequency (TF) value of each word in the corpus of the predicted intent A using the following formula, on a word-by-word basis: ; Each character is sorted in descending order by its TF value, and the first certain number of characters are used as the content of the keyword list for the predicted intent A.

2. The method according to claim 1, characterized in that: The text T is generated by a text correction module that corrects errors in a source text. It also includes: forming text pairs with the negative keywords and the corrected words, which are used as corpus for training the text correction module.

3. The method according to claim 1, characterized in that: The NLU includes at least one self-attention layer; The step of determining the contribution value of each word in the text T to the prediction intention A includes: For the last K self-attention layers of NLU, obtain the attention score matrix of CLS for each layer; CLS is the prefix character added to the text T; the attention score matrix of CLS includes the attention score of CLS relative to each word in the text T; K is an integer not less than 1; The attention score matrices of the CLS in the K layers are summed, and the result is used as a matrix of the contribution values ​​of each word to the predicted intent A.

4. The method according to claim 3, characterized in that: The NLU further includes a multi-channel attention layer stacked sequentially after the multiple self-attention layers, a linear layer corresponding to each channel of the multi-channel attention layer, and a logistic regression layer; the logistic regression layer includes an output node whose intent category is the predicted intent A; The step of determining the contribution value of each word in the text T to the prediction intention A further includes: The channel contribution value of each channel in the multi-channel attention layer is determined, and the sum of the attention score matrix of the CLS of the K layer is multiplied by the channel contribution value greater than 0. The result is used as the matrix of the contribution values ​​of each word to the predicted intent A. The channel contribution value for each channel is ,in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight of the k-th node to the output node of the prediction intention A of the logistic regression layer.

5. The method according to claim 3 or 4, characterized in that: The method for calculating the attention score of each word in the relative text T by CLS includes one of the following: Attention scores are calculated based on the query vector of CLS and other key vectors. Attention scores are calculated based on the query vector of each character and the key vector of the CLS. The first attention score is calculated based on the query vector of CLS and the key vectors of other words; the second attention score is calculated based on the query vector of each word and the key vector of CLS; the first and second attention scores corresponding to the same word are summed.

6. The method according to claim 5, characterized in that, The attention score is calculated using a computational model of query vector and key vector; the computational model includes one of the following: Dot product model, scaled dot product model, additive model, bilinear model.

7. The method according to claim 1, characterized in that: The editing distance includes one of the following: Pinyin editing distance, input method editing distance, and character shape editing distance.

8. The method according to claim 1, characterized in that: When the predicted intent A in the intent obfuscation list does not match the expected intent B, the keywords that correctly describe the expected intent B are constructed in the following way: Calculate and merge the keyword lists for the expected intent B and the predicted intent A. Calculate the TF-IDF value for each word in the keyword list of the expected intent B, where TF-IDF = TF * IDF. The IDF is calculated according to the following formula, and the number of intents containing that word in the formula is 2: Each character is sorted in descending order according to its TF-IDF value, and the first certain number of characters are used as keywords that can correctly describe the expected intent B when the predicted intent A does not match the expected intent B.

9. A text correction device, characterized in that, include: The natural language understanding module is used to identify the intent category of text T and obtain the predicted intent A; The key text detection module is used to determine the contribution value of each word in the text T to the predicted intent A when the predicted intent A does not match the user's expected intent B, and to select words with contribution values ​​greater than a threshold as negative keywords; wherein, the contribution value of each word in the text T to the predicted intent A is calculated based on the attention score of each word in the text T relative to the predicted intent A. The key text mining module is used to match the keywords corresponding to the negative keywords from the intent confusion list of expected intent B relative to predicted intent A based on the edit distance for each negative keyword, and the positive keywords are used as words after correcting the negative keywords; the intent confusion list records each keyword that can correctly describe expected intent B when the predicted intent A and expected intent B do not match. The key text mining module is further configured to determine the negative keyword in the keyword list of prediction intent A, wherein the keyword list of prediction intent A is constructed in the following manner: Obtain the corpus of the predicted intent A, and calculate the word frequency (TF) value of each word in the corpus of the predicted intent A using the following formula, on a word-by-word basis: ; Each character is sorted in descending order by its TF value, and the first certain number of characters are used as the content of the keyword list for the predicted intent A.

10. The apparatus according to claim 9, characterized in that: The text T is generated by a text correction module that corrects errors in a source text. It also includes: forming text pairs with the negative keywords and the corrected words, which are used as corpus for training the text correction module.

11. The apparatus according to claim 9, characterized in that: The natural language understanding module includes at least one self-attention layer; The step of determining the contribution value of each word in the text T to the prediction intention A includes: For the last K self-attention layers of NLU, obtain the attention score matrix of CLS for each layer; CLS is the prefix character added to the text T; the attention score matrix of CLS includes the attention score of CLS relative to each word in the text T; K is an integer not less than 1; The attention score matrices of the CLS in the K layers are summed, and the result is used as a matrix of the contribution values ​​of each word to the predicted intent A.

12. The apparatus according to claim 11, characterized in that: The natural language understanding module further includes a multi-channel attention layer stacked sequentially after the multiple self-attention layers, a linear layer corresponding to each channel of the multi-channel attention layer, and a logistic regression layer; the logistic regression layer includes an output node whose intent category is the predicted intent A; The step of determining the contribution value of each word in the text T to the prediction intention A further includes: The channel contribution value of each channel in the multi-channel attention layer is determined, and the sum of the attention score matrix of the CLS of the K layer is multiplied by the channel contribution value greater than 0. The result is used as the matrix of the contribution values ​​of each word to the predicted intent A. The channel contribution value for each channel is ,in It is the output value of the k-th node of the linear layer corresponding to this channel. It is the weight of the k-th node to the output node of the prediction intention A of the logistic regression layer.

13. The apparatus according to claim 11 or 12, characterized in that: The method for calculating the attention score of each word in the relative text T by CLS includes one of the following: Attention scores are calculated based on the query vector of CLS and other key vectors. Attention scores are calculated based on the query vector of each character and the key vector of the CLS. The first attention score is calculated based on the query vector of CLS and the key vectors of other words; the second attention score is calculated based on the query vector of each word and the key vector of CLS; the first and second attention scores corresponding to the same word are summed.

14. The apparatus according to claim 13, characterized in that, The attention score is calculated using a computational model of query vector and key vector; the computational model includes one of the following: Dot product model, scaled dot product model, additive model, bilinear model.

15. The apparatus according to claim 9, characterized in that: The editing distance includes one of the following: Pinyin editing distance, input method editing distance, and character shape editing distance.

16. The apparatus according to claim 9, characterized in that: When the predicted intent A in the intent obfuscation list does not match the expected intent B, the keywords that correctly describe the expected intent B are constructed in the following way: Calculate and merge the keyword lists for the expected intent B and the predicted intent A. Calculate the TF-IDF value for each word in the keyword list of the expected intent B, where TF-IDF = TF * IDF. The IDF is calculated according to the following formula, and the number of intents containing that word in the formula is 2: Each character is sorted in descending order according to its TF-IDF value, and the first certain number of characters are used as keywords that can correctly describe the expected intent B when the predicted intent A does not match the expected intent B.

17. A computing device, characterized in that, include: Communication interface; At least one processor connected to the communication interface; as well as At least one memory connected to the processor and storing program instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1-8.

18. A computer-readable storage medium having program instructions stored thereon, characterized in that, When the program instructions are executed by a computer, the computer performs the method described in any one of claims 1-8.