Intention recognition model training method, intention recognition method and device
By deriving multiple first intent recognition models and combining them with a pre-trained second intent recognition model, and using language models, classifiers, and random forest models for training, the problem of low intent recognition accuracy in small sample scenarios is solved, achieving higher intent recognition accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA SOUTHERN POWER GRID BIG DATA SERVICE CO LTD
- Filing Date
- 2022-08-12
- Publication Date
- 2026-06-26
AI Technical Summary
In the early stages of developing human-computer dialogue systems, due to the limited number of training samples, the accuracy of intent recognition in existing technologies is low in small sample scenarios.
By deriving multiple first intent recognition models and combining them with a trained second intent recognition model, and using language models, classifiers, and random forest models for training, multiple probability distribution feature vectors are generated to improve the intent recognition accuracy of the model.
It improves the accuracy of the intent recognition model in small sample scenarios and enhances the robustness and adaptability of the model.
Smart Images

Figure CN117668216B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to an intent recognition model training method, intent recognition method and apparatus. Background Technology
[0002] Intent recognition, a key task in human-computer dialogue systems, aims to determine a user's true intent based on the statements they exchange with the system. With the development of artificial intelligence technology, intent recognition has been widely applied in various aspects of life, such as intelligent voice assistants and intelligent customer service.
[0003] Currently, in the early stages of developing human-computer dialogue systems, the number of samples for training the first intent recognition model is limited. To avoid insufficient training samples, related technologies typically employ metric-based amplification, optimization-based methods to learn a general model initialization parameter to ensure the parameter reaches a relatively good level after a few updates, or training the model using the distance distribution between samples to better adapt to samples of unknown categories, or data augmentation to enhance the target samples in high-dimensional or instance spaces.
[0004] However, since users' statements are often varied in style, the above methods result in low intent recognition accuracy of models trained in scenarios with small sample sizes. Summary of the Invention
[0005] This application provides an intent recognition model training method, intent recognition method, and apparatus, which can perform intent recognition by deriving multiple first intent recognition models and combining the multiple first intent recognition models with a trained second intent recognition model, thereby improving the intent recognition accuracy of the model trained in scenarios with small sample sizes.
[0006] In a first aspect, embodiments of this application provide a method for training an intent recognition model, including:
[0007] Obtain k training sample sets, each of the k training sample sets including multiple corpus text samples and label information for each of the multiple corpus text samples; k is a positive integer;
[0008] For each of the k training sample sets, multiple corpus text samples in each training sample set are input into a pre-built language model to generate semantic features of the corpus text samples;
[0009] The semantic features corresponding to the text samples in the corpus are input into p classifiers respectively to generate p first intent recognition results for the text samples in the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer;
[0010] For each of the p first intent recognition results, a first intent recognition model is trained based on the first intent recognition result and the label information of the corpus text sample to obtain N first intent recognition models; wherein, each of the p first intent recognition models includes the language model and one of the p classifiers, N = p*k;
[0011] Each text sample in the corpus is input into N pre-trained first intent recognition models to generate N probability distribution feature vectors for each text sample in the corpus; wherein, the probability distribution feature vectors include the probability values of the text sample in the corpus under various intents;
[0012] The random forest model is trained based on the N probability distribution feature vectors of the text samples and the label information of the text samples until the random forest model converges, thus obtaining the trained second intent recognition model.
[0013] In one possible implementation, the plurality of corpus text samples includes a plurality of first corpus text samples and a plurality of second corpus text samples obtained by data augmentation of the plurality of corpus text samples;
[0014] The process of obtaining k training sample sets includes:
[0015] Obtain the multiple text samples from the first corpus;
[0016] At least one of the following is performed on the multiple first corpus text samples: random word deletion, homophone replacement, confusing word replacement, cloze test data augmentation based on BERT model, and back-translation data augmentation, to obtain the multiple second corpus text samples;
[0017] The plurality of first corpus text samples and the plurality of corpus text samples are divided into the k training sample sets.
[0018] In one possible implementation, the language model includes multiple hidden layers;
[0019] The step of inputting multiple corpus text samples from each training sample set into a pre-built language model to generate semantic features of the corpus text samples includes:
[0020] The corpus text samples are input into the pre-constructed language model, and the first semantic features output by the multiple intermediate hidden layers and the second semantic features output by the last hidden layer are extracted.
[0021] The first semantic feature and the second semantic feature output by the multi-layer intermediate hidden layer are fused to generate the semantic features of the corpus text sample.
[0022] In one possible implementation, the language model is the RoBERT model.
[0023] In one possible implementation, obtaining the k training sample sets includes:
[0024] The multiple text samples in the corpus are preprocessed, and the preprocessing includes at least one of the following: adjusting the encoding format, deleting illegal characters, converting punctuation format, dividing the corpus into paragraphs, and converting numbers into their forms.
[0025] Secondly, embodiments of this application provide an intent recognition method, including:
[0026] Obtain the corpus text;
[0027] The corpus text is input into N first intent recognition models as described in the first aspect or any possible implementation of the first aspect, and N probability distribution feature vectors of the corpus text are generated, wherein the probability distribution feature vectors are the probability values of the corpus text under various intents.
[0028] The N probability distribution feature vectors are input into the trained second intent recognition model described in the first aspect or any possible implementation of the first aspect to obtain the intent result corresponding to the corpus text.
[0029] Thirdly, embodiments of this application provide an intent recognition model training apparatus, including:
[0030] The acquisition module is used to acquire k training sample sets, each of the k training sample sets including multiple corpus text samples and label information of each corpus text sample in the multiple corpus text samples; k is a positive integer;
[0031] The first generation module is used to input multiple corpus text samples from each of the k training sample sets into a pre-built language model to generate semantic features of the corpus text samples.
[0032] The second generation module is used to input the semantic features corresponding to the text samples of the corpus into p classifiers respectively, and generate p first intent recognition results of the text samples of the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer;
[0033] The first training module is used to train a first intent recognition model for each of the p first intent recognition results based on the first intent recognition result and the label information of the corpus text samples, so as to obtain N first intent recognition models; wherein, each of the p first intent recognition models includes the language model and one of the p classifiers, N = p*k;
[0034] The third generation module is used to input each of the corpus text samples into N pre-trained first intent recognition models to generate N probability distribution feature vectors for each corpus text sample; wherein, the probability distribution feature vectors include the probability values of the corpus text sample under various intents;
[0035] The second training module is used to train the random forest model based on the N probability distribution feature vectors of the text samples and the label information of the text samples, until the random forest model converges, thus obtaining the trained second intent recognition model.
[0036] In one possible implementation, the plurality of corpus text samples includes a plurality of first corpus text samples and a plurality of second corpus text samples obtained by data augmentation of the plurality of corpus text samples; the acquisition module is used to:
[0037] Obtain the multiple text samples from the first corpus;
[0038] At least one of the following is performed on the multiple first corpus text samples: random word deletion, homophone replacement, confusing word replacement, cloze test data augmentation based on BERT model, and back-translation data augmentation, to obtain the multiple second corpus text samples;
[0039] The plurality of first corpus text samples and the plurality of corpus text samples are divided into the k training sample sets.
[0040] In one possible implementation, the language model includes multiple hidden layers; the first generation module is used for:
[0041] The corpus text samples are input into the pre-constructed language model, and the first semantic features output by the multiple intermediate hidden layers and the second semantic features output by the last hidden layer are extracted.
[0042] The first semantic feature and the second semantic feature output by the multi-layer intermediate hidden layer are fused to generate the semantic features of the corpus text sample.
[0043] In one possible implementation, the language model is the RoBERT model.
[0044] In one possible implementation, the acquisition module is used for:
[0045] The multiple text samples in the corpus are preprocessed, and the preprocessing includes at least one of the following: adjusting the encoding format, deleting illegal characters, converting punctuation format, dividing the corpus into paragraphs, and converting numbers into their forms.
[0046] Fourthly, embodiments of this application provide an intent recognition device, including:
[0047] The acquisition module is used to acquire the text from the corpus.
[0048] The generation module is used to input the corpus text into the N first intent recognition models described in the second aspect or any possible implementation of the second aspect, and generate N probability distribution feature vectors of the corpus text, wherein the probability distribution feature vectors are the probability values of the corpus text under various intents.
[0049] The determination module is used to input the N probability distribution feature vectors into the trained second intent recognition model described in the second aspect or any possible implementation of the second aspect, and determine the intent result corresponding to the corpus text.
[0050] Fifthly, embodiments of this application provide a computer device, including a processor, a memory, and a computer program stored in the memory and executable on the processor. When the computer program is executed by the processor, it implements the method provided in the first aspect or any possible implementation of the first aspect, or implements the method provided in the second aspect or any possible implementation of the second aspect.
[0051] Sixthly, embodiments of this application provide a computer storage medium storing instructions that, when executed on a computer, cause the computer to perform the method provided in the first aspect or any possible implementation thereof, or to implement the method provided in the second aspect or any possible implementation thereof.
[0052] The intent recognition model training method, intent recognition method, and apparatus provided in this application connect a language model to p classifiers, thereby enabling the extraction of semantic features from a given text corpus sample using the language model. This allows for the training of p*k = N first intent recognition models using k training sample sets, each including a language model and one of the p classifiers. Text corpus samples from the k training sample sets are then input into the pre-trained N first intent recognition models, generating N probability distribution feature vectors for each text corpus sample. These probability distribution feature vectors include the probability values of the text corpus sample under various intents. Based on these N probability distribution feature vectors, a random forest model is trained to obtain a second intent recognition model. Thus, by setting multiple training sample sets and multiple classifiers, multiple first intent recognition models can be derived, and the probability distribution of text corpus samples under various intents can be determined based on these first intent recognition models, resulting in multiple probability distribution feature vectors. Finally, a random forest model is trained based on these multiple probability distribution feature vectors to obtain the first intent recognition model. By combining multiple first intent recognition models with a pre-trained second intent recognition model, the accuracy of intent recognition by the model trained in scenarios with small sample sizes is improved. Attached Figure Description
[0053] Figure 1 A flowchart illustrating an intent recognition model training method provided in an embodiment of this application is shown.
[0054] Figure 2 A flowchart illustrating an intent recognition method provided in an embodiment of this application is shown.
[0055] Figure 3 This illustration shows a schematic diagram of the structure of an intent recognition model training device provided in an embodiment of this application;
[0056] Figure 4 This paper shows a schematic diagram of the structure of an intent recognition device provided in an embodiment of this application;
[0057] Figure 5 A schematic diagram of the structure of a computer device provided in an embodiment of this application is shown. Detailed Implementation
[0058] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions in the embodiments of this application will be described below with reference to the accompanying drawings.
[0059] In the description of the embodiments of this application, the words "exemplary," "for example," or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplary," "for example," or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the words "exemplary," "for example," or "for instance" is intended to present the relevant concepts in a specific manner.
[0060] In the description of the embodiments in this application, the term "and / or" is merely a description of the association relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, B existing alone, and A and B existing simultaneously. Furthermore, unless otherwise stated, the term "multiple" means two or more. For example, multiple systems refer to two or more systems, and multiple screen terminals refer to two or more screen terminals.
[0061] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and their variations all mean "including but not limited to," unless otherwise specifically emphasized.
[0062] Intent recognition, a key task in human-computer dialogue systems, aims to determine a user's true intent based on the statements they exchange with the system. With the development of artificial intelligence technology, intent recognition has been widely applied in various aspects of life, such as intelligent voice assistants and intelligent customer service.
[0063] Currently, in the early stages of developing human-computer dialogue systems, labeled samples are limited. To avoid insufficient training samples, related technologies typically employ metric-based amplification, optimization-based methods to learn a general model initialization parameter to ensure that the parameter can reach a relatively good level after a few updates, or training the model using the distance distribution between samples to better adapt to samples of unknown categories, or data augmentation to enhance the target samples in high-dimensional or instance spaces.
[0064] However, since users' statements are often varied in style, the above methods result in low intent recognition accuracy of models trained in scenarios with small sample sizes.
[0065] Based on this, embodiments of this application provide an intent recognition model training method, intent recognition method, and apparatus. By deriving multiple first intent recognition models and combining these multiple first intent recognition models with a trained second intent recognition model, intent recognition is performed, thereby improving the intent recognition accuracy of the model trained in scenarios with small sample sizes.
[0066] Figure 1 This is a flowchart illustrating an intent recognition model training device method provided in an embodiment of this application. Figure 1 As shown, the intent recognition model training device method provided in this application embodiment may include S101-S106.
[0067] S101: Obtain k training sample sets. Each training sample set includes multiple corpus text samples and label information for each corpus text sample; k is a positive integer.
[0068] The training sample set can be a pre-stored corpus of dialogues in a database. For example, it could be dialogues between customer service representatives and users in sectors such as electricity, telecommunications, and banking, or one-sided voice recordings from users. During the dialogue between customer service representatives and users, the voice recordings can be saved in the database, and the user's intent in each dialogue can be tagged. When constructing the training sample set, the voice recordings can be retrieved from the database and converted into text samples.
[0069] In some embodiments, to ensure the accuracy of the converted text samples, the text samples may be preprocessed. For example, this may include adjusting the encoding format, deleting illegal characters, converting punctuation formats, dividing the text into paragraphs, and converting numbers to their original form.
[0070] For example, adjusting the encoding format may include uniformly converting Chinese characters in the corpus text samples to UTF-8 format. Deleting illegal characters may include using regular expressions to remove illegal characters other than Chinese characters, English words, numbers, and common punctuation marks; uniformly converting all Arabic numerals in the text to standard simplified Chinese characters; punctuation format conversion may include uniformly converting half-width characters in the text to their corresponding full-width characters; corpus segmentation may involve dividing the corpus into paragraphs and adding news numbers before each paragraph; deleting noisy text in the corpus text samples where the number of characters after punctuation is less than 6, and using deduplication to remove duplicate data from the corpus text samples.
[0071] In some embodiments, to ensure the robustness of the trained intent recognition model, after converting speech into corpus text, data augmentation can also be performed on the corpus text to obtain corpus text samples. For the convenience of description, the corpus text obtained by converting speech is referred to as the first corpus text sample. After obtaining multiple first corpus text samples, data augmentation can be performed on the first corpus text samples to obtain second corpus text samples. For example, randomly deleting words, replacing with homophonic words, replacing with confusable words, data augmentation based on cloze test of the BERT model, back translation data augmentation, and so on.
[0072] As an example, for the text "prescribe a pair of eyes", replacing it with a homophonic word can be replaced with the text "prescribe a pair of glasses". Another example, for the text "stewed potatoes with browsing", replacing it with a confusable word can be replaced with the text "stewed potatoes with beef brisket".
[0073] Exemplarily, back translation data augmentation can rewrite sentences without changing the original meaning of the sentences by translating Chinese to foreign languages and then back to Chinese. For example, a machine translation model can be used to construct a back translation dataset by translating the corpus text in the translation modes of Chinese-English-Chinese, Chinese-French-Chinese, Chinese-English-French-Chinese, and Chinese-French-English-Chinese.
[0074] In this way, the number of corpus text samples can be increased, thereby improving the robustness of the intent recognition model.
[0075] Take the first corpus text sample and the second corpus text sample as the corpus text samples for training the model, and divide the multiple corpus text samples into k training sample sets. For example, if there are 50 corpus text samples, the 50 corpus text samples can be divided into 5 training sample sets. Among them, each training sample set includes 10 corpus text samples.
[0076] S102: For each of the k training sample sets, input the multiple corpus text samples in each training sample set into a pre-constructed language model respectively to generate semantic features of the corpus text samples.
[0077] Each of the k training sample sets is used to train a model. For example, if there are 5 training sample sets, namely sample set A, sample set B, sample set C, sample set D, and sample set E, then sample set A can train the pre-constructed model to obtain model A, sample set B can train the pre-constructed model to obtain model B, sample set C can train the pre-constructed model to obtain model C, sample set D can train the pre-constructed model to obtain model D, and sample set E can train the pre-constructed model to obtain model E.
[0078] Here, the pre-built model can be a language model, such as the RoBERTa model.
[0079] To further ensure the robustness of the trained intent recognition model, each text sample from each training sample set is input into the language model to generate semantic features corresponding to each text sample. For example, sample set A includes text samples a1, a2, and a3. Inputting text sample a1 into the language model generates semantic features corresponding to text sample a1; inputting text sample a2 into the language model generates semantic features corresponding to text sample a2.
[0080] In some embodiments, due to the diversity of intent representations, to avoid the extracted semantic features failing to comprehensively represent the emotional information corresponding to different words in the corpus text samples, feature fusion can be performed. Since each hidden layer contains its own self-attention structure, the semantic features output by different hidden layers can represent semantic features of different depths. The language model includes multiple hidden layers. Corpus text samples can be input into a pre-constructed language model, extracting the first semantic features output by multiple intermediate hidden layers and the second semantic features output by the last hidden layer; the first and second semantic features output by multiple intermediate hidden layers are then fused to generate the semantic features of the corpus text samples. This allows the semantic features to comprehensively represent the emotional information corresponding to different words in the corpus text samples, ensuring the robustness of the trained intent recognition model.
[0081] It should be noted that in the multi-layer hidden layers, the first hidden layer is the input hidden layer, the last hidden layer is the output hidden layer, and the remaining hidden layers are intermediate hidden layers. In this embodiment, the multi-layer intermediate hidden layers can be all the intermediate hidden layers, or it can be some of the hidden layers among all the hidden layers.
[0082] As an example, the language model consists of five hidden layers: hidden layer 1, hidden layer 2, hidden layer 3, hidden layer 4, and hidden layer 5. These five hidden layers are interconnected. Text samples from a corpus can be input into the language model, and the semantic features output from hidden layer 2, hidden layer 3, and hidden layer 4, respectively, are fused with the semantic features output from hidden layer 5 to obtain the final semantic features.
[0083] As another example, the language model includes 12 hidden layers, namely hidden layer 1 to hidden layer 12. Some semantic features output by the middle hidden layers can be selected from the 12 hidden layers and fused with the semantic features output by the last hidden layer. For example, the semantic features output by hidden layer 5, hidden layer 7 and hidden layer 10 can be selected and fused with the semantic features output by hidden layer 12.
[0084] S103: Input the semantic features corresponding to the text samples in the corpus into p classifiers respectively to generate p first intent recognition results of the text samples in the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer.
[0085] To fully utilize the features expressed by semantic features and ensure the robustness of the trained intent recognition model, the semantic features can be input into multiple classifiers, resulting in multiple first intent recognition results. Each first intent recognition result represents the probability distribution score of the text sample in the corpus for different intents.
[0086] In some embodiments, multiple classifiers may include recurrent neural networks (RNNs), Transformer models, hierarchical attention networks (HANs), etc.
[0087] S104: For each of the p first intent recognition results, train a first intent recognition model based on the first intent recognition result and the label information of the text samples in the corpus to obtain N first intent recognition models; where each of the p first intent recognition models includes a language model and one of the p classifiers, N = p * k.
[0088] Each of the multiple classifiers can produce a first intent recognition result. p classifiers can output p first intent recognition results. First intent recognition models can be trained based on the first intent recognition results and the label information of the text samples in the corpus. Thus, a training set of samples can train p first intent recognition models. Furthermore, with p classifiers and k training samples, p*k = N first intent recognition models can be obtained. p, k, and N are all positive integers.
[0089] As an example, there are three classifiers: Classifier 1, Classifier 2, and Classifier 3. There are five training sample sets: Training Sample Set 1 through Training Sample Set 5. The following training process is performed on one of these five training sample sets. Let's illustrate the training process using Training Sample Set 1. Training Sample Set 1 includes text samples 1 through 5. The semantic features 1 of text sample 1 are input into the three classifiers, resulting in three first intent recognition results: Result 1, Result 2, and Result 3. First intent recognition model 1 is trained based on Result 1 and the corresponding label information of text sample 1. First intent recognition model 2 is trained based on Result 2 and the corresponding label information of text sample 1. First intent recognition model 3 is trained based on Result 3 and the corresponding label information of text sample 1. Thus, three trained first intent recognition models are obtained. Next, based on text samples 2 through 5, first intent recognition models 1 through 2 are trained respectively, resulting in trained first intent recognition models 1 through 2. Among them, the first intent recognition model 1 includes a language model and classifier 1, the first intent recognition model 2 includes a language model and classifier 2, and the first intent recognition model 3 includes a language model and classifier 3. Thus, a total of 3*5=15 first intent recognition models are obtained.
[0090] S105: Input each text sample from the corpus into N pre-trained first intent recognition models to generate N probability distribution feature vectors for each text sample from the corpus; wherein, the probability distribution feature vectors include the probability values of the text sample from the corpus under various intents.
[0091] In this embodiment, a text sample from the corpus is input into N first intent recognition models, resulting in N probability distribution feature vectors corresponding to the text sample. Each probability distribution feature vector represents the probability value of the text sample under various intents.
[0092] It should be noted that the text samples input into the first intent recognition model can be text samples from any training sample set.
[0093] In this way, even with a small sample size, it is possible to extract features from a corpus text sample more comprehensively through N first intent recognition models.
[0094] S106: Train the random forest model based on the N probability distribution feature vectors of the text samples and the label information of the text samples in the corpus until the random forest model converges, and obtain the trained second intent recognition model.
[0095] In this embodiment, a random forest model can be trained using N probability distribution feature vectors corresponding to each text sample in the corpus. This allows the trained random forest model to integrate features extracted by N different first intent recognition models, thereby obtaining more comprehensive features from the text samples in the corpus. For ease of description, the trained random forest model is referred to as the second intent recognition model.
[0096] Thus, the second intent recognition model can ultimately obtain more comprehensive features from the corpus text samples, and has better robustness.
[0097] The intent recognition model training device and method provided in this application embodiment sets up multiple training sample sets and multiple classifiers to derive multiple first intent recognition models, and determines the probability distribution of text samples in the corpus under various intents based on the first intent recognition models, obtaining multiple probability distribution feature vectors. A random forest is trained based on the multiple probability distribution feature vectors to obtain the first intent recognition models. The multiple first intent recognition models are combined with the trained second intent recognition models for intent recognition, improving the intent recognition accuracy of the model trained in scenarios with small sample sizes.
[0098] Based on the first intent recognition model and the second intent recognition model in the above embodiments, this application also provides an intent recognition method. A detailed description follows.
[0099] Figure 2 This is a flowchart illustrating an intent recognition method provided in an embodiment of this application, as shown below. Figure 2 As shown, the intent recognition method provided in this application embodiment may include S201 to S203.
[0100] S201: Obtain the text corpus.
[0101] In this embodiment, the speech generated during the dialogue can be acquired in real time and converted into text corpus.
[0102] S202: Input the corpus text into N first intent recognition models to generate N probability distribution feature vectors of the corpus text. The probability distribution feature vectors are the probability values of the corpus text under various intents.
[0103] S203: Input the N probability distribution feature vectors into the second intent recognition model to obtain the intent result corresponding to the corpus text.
[0104] The intent recognition method provided in this application embodiment, by inputting the intended text into a processed... Figure 1In the corresponding implementation, intent recognition is performed on the trained model. Multiple training sample sets and multiple classifiers are used to derive multiple first intent recognition models. Based on these first intent recognition models, the probability distribution of text samples in the corpus under various intents is determined, resulting in multiple probability distribution feature vectors. A random forest is trained based on these multiple probability distribution feature vectors to obtain the first intent recognition models. Combining these multiple first intent recognition models with a pre-trained second intent recognition model improves the accuracy of intent recognition in scenarios with small sample sizes.
[0105] Based on the intent recognition model training method in the above embodiments, this application also provides an intent recognition model training device. Figure 3 This is a schematic diagram of the structure of an intent recognition model training device 300 provided in an embodiment of this application, as shown below. Figure 3 As shown, the intent recognition model training device 300 may include an acquisition module 301, a first generation module 302, a second generation module 303, a first training module 304, a third generation module 305, and a second training module 306.
[0106] The acquisition module 301 is used to acquire k training sample sets. Each of the k training sample sets includes multiple corpus text samples and label information for each corpus text sample in the multiple corpus text samples; k is a positive integer.
[0107] The first generation module 302 is used to input multiple corpus text samples from each of the k training sample sets into a pre-built language model to generate semantic features of the corpus text samples.
[0108] The second generation module 303 is used to input the semantic features corresponding to the text samples of the corpus into p classifiers respectively, and generate p first intent recognition results of the text samples of the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer;
[0109] The first training module 304 is used to train a first intent recognition model for each of the p first intent recognition results based on the first intent recognition result and the label information of the text samples in the corpus, so as to obtain N first intent recognition models; wherein, each of the p first intent recognition models includes a language model and one of the p classifiers, N = p * k;
[0110] The third generation module 305 is used to input each text sample of the corpus into N pre-trained first intent recognition models to generate N probability distribution feature vectors for each text sample of the corpus; wherein, the probability distribution feature vectors include the probability values of the text sample of the corpus under various intents.
[0111] The second training module 306 is used to train the random forest model based on the N probability distribution feature vectors of the text samples and the label information of the text samples in the corpus until the random forest model converges, thus obtaining the trained second intent recognition model.
[0112] In one possible implementation, the multiple corpus text samples include multiple first corpus text samples and multiple second corpus text samples obtained by data augmentation of the multiple corpus text samples; the acquisition module 301 is used for:
[0113] Obtain multiple text samples from the first corpus;
[0114] Multiple second-corpus text samples are obtained by performing at least one of the following on multiple first-corpus text samples: random word deletion, homophone replacement, confusing word replacement, cloze test data augmentation based on BERT model, and back-translation data augmentation.
[0115] Multiple first corpus text samples and multiple corpus text samples are divided into k training sample sets.
[0116] In one possible implementation, the language model includes multiple hidden layers; the first generation module 302 is used for:
[0117] Input the text samples into a pre-built language model and extract the first semantic features output by multiple intermediate hidden layers and the second semantic features output by the last hidden layer.
[0118] The first and second semantic features output by the multiple intermediate hidden layers are fused to generate the semantic features of the text samples in the corpus.
[0119] In one possible implementation, the language model is the RoBERT model.
[0120] In one possible implementation, the acquisition module 301 is used for:
[0121] Multiple text samples from the corpus are preprocessed. The preprocessing includes at least one of the following: adjusting the encoding format, deleting illegal characters, converting punctuation format, dividing the corpus into paragraphs, and converting numbers to their original form.
[0122] The intent recognition model training device provided in this application embodiment can perform... Figure 1The steps of the method in the corresponding embodiments are the same and can achieve the same technical effect. To avoid repetition, they will not be described in detail here.
[0123] The intent recognition model training device provided in this application can derive multiple first intent recognition models by setting multiple training sample sets and multiple classifiers, and determine the probability distribution of text samples in the corpus under various intents based on the first intent recognition models, thereby obtaining multiple probability distribution feature vectors. A random forest is trained based on these multiple probability distribution feature vectors to obtain the first intent recognition models. Combining the multiple first intent recognition models with the trained second intent recognition models improves the accuracy of intent recognition in scenarios with small sample sizes.
[0124] Based on the intent recognition method in the above embodiments, this application also provides an intent recognition device. Figure 4 This is a schematic diagram of the structure of the intent recognition device 400 provided in the embodiments of this application, as shown below. Figure 4 As shown, the intent recognition device 400 provided in this application embodiment may include an acquisition module 401, a generation module 402, and a determination module 403.
[0125] Module 401 is used to acquire corpus text;
[0126] The generation module 402 is used to input the corpus text into N first intent recognition models in the second aspect or any possible implementation of the second aspect, and generate N probability distribution feature vectors of the corpus text, wherein the probability distribution feature vectors are the probability values of the corpus text under various intents.
[0127] The determination module 403 is used to input N probability distribution feature vectors into the trained second intent recognition model in the second aspect or any possible implementation of the second aspect, and determine the intent result corresponding to the corpus text.
[0128] The intent recognition device provided in this application embodiment is capable of performing... Figure 2 The steps of the method in the corresponding embodiments are the same and can achieve the same technical effect. To avoid repetition, they will not be described in detail here.
[0129] The intent recognition device provided in this application embodiment recognizes intent text input into a system that... Figure 1In the corresponding implementation, intent recognition is performed on the trained model. Multiple training sample sets and multiple classifiers are used to derive multiple first intent recognition models. Based on these first intent recognition models, the probability distribution of text samples in the corpus under various intents is determined, resulting in multiple probability distribution feature vectors. A random forest is trained based on these multiple probability distribution feature vectors to obtain the first intent recognition models. Combining these multiple first intent recognition models with a pre-trained second intent recognition model improves the accuracy of intent recognition in scenarios with small sample sizes.
[0130] The following describes a computer device provided by an embodiment of this application.
[0131] Figure 5 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Figure 5 As shown, the computer device provided in the embodiments of this application can be used to implement the intent recognition model training device method or intent recognition method described in the above method embodiments.
[0132] The computer device may include a processor 501 and a memory 502 storing computer program instructions.
[0133] Specifically, the processor 501 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.
[0134] Memory 502 may include mass storage for data or instructions. For example, and not limitingly, memory 502 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 502 may include removable or non-removable (or fixed) media. Where appropriate, memory 502 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 502 is non-volatile solid-state memory.
[0135] Memory may include read-only memory (ROM), random access memory (RAM), disk storage media devices, optical storage media devices, flash memory devices, and electrical, optical, or other physical / tangible memory storage devices. Therefore, typically, memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the methods according to this application.
[0136] The processor 501 reads and executes computer program instructions stored in the memory 502 to implement any of the intent recognition model training device methods or intent recognition methods in the above embodiments.
[0137] In one example, the electronic device may also include a communication interface 505 and a bus 510. Wherein, as... Figure 5 As shown, the processor 501, memory 502, and communication interface 505 are connected through bus 510 and complete communication with each other.
[0138] The communication interface 505 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application.
[0139] Bus 510 includes hardware, software, or both, that couples components of an electronic device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 510 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.
[0140] Furthermore, in conjunction with the above embodiments, this application embodiment can be implemented using a computer storage medium. This computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the intent recognition model training apparatus methods or intent recognition methods described in the above embodiments.
[0141] The functional blocks shown in the above-described structural diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.
[0142] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.
[0143] The aspects of this application have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by dedicated hardware performing the specified functions or actions, or can be implemented by a combination of dedicated hardware and computer instructions.
[0144] The above description is merely a specific implementation of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.
Claims
1. A method for training an intent recognition model, characterized in that, include: Obtain k training sample sets, each of the k training sample sets including multiple corpus text samples and label information for each of the multiple corpus text samples; k is a positive integer; For each of the k training sample sets, multiple corpus text samples from each training sample set are input into a pre-built language model to generate semantic features of the corpus text samples. The semantic features corresponding to the text samples in the corpus are input into p classifiers respectively to generate p first intent recognition results for the text samples in the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer; For each of the p first intent recognition results, a first intent recognition model is trained based on the first intent recognition result and the label information of the corpus text sample to obtain N first intent recognition models; wherein, each of the N first intent recognition models includes the language model and one of the p classifiers, N=p*k; Each text sample in the corpus is input into N pre-trained first intent recognition models to generate N probability distribution feature vectors for each text sample in the corpus; wherein, the probability distribution feature vectors include the probability values of the text sample in the corpus under various intents; The random forest model is trained based on the text samples in the corpus, the N probability distribution feature vectors, and the label information of the text samples in the corpus until the random forest model converges, thus obtaining the trained second intent recognition model.
2. The method according to claim 1, characterized in that, The plurality of text samples include a plurality of first text samples and a plurality of second text samples obtained by data augmentation of the plurality of first text samples; The process of obtaining k training sample sets includes: Obtain the multiple text samples from the first corpus; At least one of the following is performed on the multiple first corpus text samples: random word deletion, homophone replacement, confusing word replacement, cloze test data augmentation based on BERT model, and back-translation data augmentation, to obtain the multiple second corpus text samples; The plurality of first corpus text samples and the plurality of second corpus text samples are divided into the k training sample sets.
3. The method according to claim 1, characterized in that, The language model includes multiple hidden layers; The step of inputting multiple corpus text samples from each training sample set into a pre-built language model to generate semantic features of the corpus text samples includes: The corpus text samples are input into the pre-constructed language model, and the first semantic features output by the multiple intermediate hidden layers and the second semantic features output by the last hidden layer are extracted. The first semantic feature and the second semantic feature output by the multi-layer intermediate hidden layer are fused to generate the semantic features of the corpus text sample.
4. The method according to any one of claims 1-3, characterized in that, The language model is the RoBERT model.
5. The method according to any one of claims 1-3, characterized in that, The process of obtaining k training sample sets includes: The multiple text samples in the corpus are preprocessed, and the preprocessing includes at least one of the following: adjusting the encoding format, deleting illegal characters, converting punctuation format, dividing the corpus into paragraphs, and converting numbers into their forms.
6. An intent recognition method, characterized in that, include: Obtain the corpus text; The corpus text is input into N first intent recognition models as described in any one of claims 1-4 to generate N probability distribution feature vectors of the corpus text, wherein the probability distribution feature vectors are the probability values of the corpus text under various intents; The N probability distribution feature vectors are input into the trained second intent recognition model as described in any one of claims 1-4 to obtain the intent result corresponding to the corpus text.
7. An intent recognition model training device, characterized in that, include: The acquisition module is used to acquire k training sample sets, each of the k training sample sets including multiple corpus text samples and label information of each corpus text sample in the multiple corpus text samples; k is a positive integer; The first generation module is used to input multiple corpus text samples from each of the k training sample sets into a pre-built language model to generate semantic features of the corpus text samples. The second generation module is used to input the semantic features corresponding to the text samples of the corpus into p classifiers respectively, and generate p first intent recognition results of the text samples of the corpus; the p first intent recognition results correspond one-to-one with the p classifiers; p is a positive integer; The first training module is used to train a first intent recognition model for each of the p first intent recognition results based on the first intent recognition result and the label information of the corpus text samples, so as to obtain N first intent recognition models; wherein, each of the N first intent recognition models includes the language model and one of the p classifiers; The third generation module is used to input the multiple corpus text samples into N pre-trained first intent recognition models respectively to generate N probability distribution feature vectors; wherein, the probability distribution feature vectors include the probability values of the corpus text samples under various intents, N=p*k; The second training module is used to train the random forest model based on the text samples in the corpus, the N probability distribution feature vectors, and the label information of the text samples in the corpus, until the random forest model converges, thus obtaining the trained second intent recognition model.
8. The apparatus according to claim 7, characterized in that, The plurality of text samples include a plurality of first text samples and a plurality of second text samples obtained by data augmentation of the plurality of first text samples; The acquisition module is used to acquire the plurality of first corpus text samples; At least one of the following is performed on the multiple first corpus text samples: random word deletion, homophone replacement, confusing word replacement, cloze test data augmentation based on BERT model, and back-translation data augmentation, to obtain the multiple second corpus text samples; The plurality of first corpus text samples and the plurality of second corpus text samples are divided into the k training sample sets.
9. The apparatus according to claim 7, characterized in that, The language model includes multiple hidden layers; The first generation module is used to input the corpus text sample into the pre-constructed language model and extract the first semantic features output by the multiple intermediate hidden layers and the second semantic features output by the last hidden layer. The first semantic feature and the second semantic feature output by the multi-layer intermediate hidden layer are fused to generate the semantic features of the corpus text sample.
10. An intent recognition device, characterized in that, include: The acquisition module is used to acquire the text from the corpus. A generation module is used to input the corpus text into N first intent recognition models as described in any one of claims 1-4, and generate N probability distribution feature vectors of the corpus text, wherein the probability distribution feature vectors are the probability values of the corpus text under various intents; The determination module is used to input the N probability distribution feature vectors into the trained second intent recognition model as described in any one of claims 1-4, and determine the intent result corresponding to the corpus text.