A method and apparatus for determining a new intent category
By combining a two-layer entity recognition model and a pre-trained language model, the distance between the entity vector and the intent category vector of the speech data is calculated, which solves the subjectivity problem in the process of determining new intent categories and improves the accuracy of the results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 太保科技有限公司
- Filing Date
- 2022-09-30
- Publication Date
- 2026-06-19
AI Technical Summary
The existing technology for determining new intent categories is highly subjective, resulting in insufficient objectivity and low accuracy.
A two-layer entity recognition model is used to recognize the speech data to be recognized, obtain the combination of target entities, and input it into a pre-trained language model. The distance between the target entity vector and multiple existing intent category vectors is calculated, and whether it is a new intent category is determined based on the distance threshold.
By using objective distance calculation methods to avoid manual analysis, the accuracy of identifying new intent categories is improved, and the influence of subjectivity is reduced.
Smart Images

Figure CN115510191B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of natural language processing technology, and in particular to a method and apparatus for determining a new intention category. Background Technology
[0002] With the rapid development of artificial intelligence, intelligent voice services are widely used in various scenarios such as smart homes, healthcare, voice payments, and insurance services. When customers have needs, they dial the relevant hotline and express their requests using voice. The intelligent voice service system will recognize the customer's intent based on the voice data and provide corresponding intelligent responses or guidance to meet the customer's needs.
[0003] In existing technologies, a text classification-based method is typically used for new intent recognition. This method involves: first, labeling a large amount of speech data, then extracting features to train a classification model to obtain an intent recognition model; then, using the intent recognition model to identify the speech data to be recognized; if the output result indicates that it cannot be classified into an existing intent category, it is manually analyzed to determine whether it needs to be identified as a new intent category; finally, if a new intent category is determined from the speech data to be recognized, the new intent category needs to be labeled and the intent recognition model retrained.
[0004] However, the above-mentioned method of manually analyzing speech data that does not belong to the existing intent category has the problem of strong subjectivity in the process of determining the new intent category, and the determination result of the new intent category is not objective enough, resulting in low accuracy of the determination result of the new intent category. Summary of the Invention
[0005] In view of this, embodiments of this application provide a method and apparatus for determining new intent categories, aiming to improve the accuracy of the determination results.
[0006] In a first aspect, embodiments of this application provide a method for determining a new intent category, the method comprising:
[0007] The speech data to be recognized is identified through a two-layer entity recognition model to obtain the target entity combination;
[0008] If the target entity combination does not belong to the entity library, the target entity combination is input into the pre-trained language model to obtain the target entity vector corresponding to the target entity combination.
[0009] The distance between the target entity vector and multiple existing intent category vectors is calculated to obtain multiple vector distances. The multiple existing intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library.
[0010] If the distances between the multiple vectors are all greater than a preset threshold, a new intent category is determined based on the speech data to be recognized.
[0011] Optionally, the step of identifying the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination includes:
[0012] The speech data to be recognized is identified by the first layer model based on multiple first preset entity categories in the two-layer entity recognition model to obtain the first target entity category and the entity corresponding to the first target entity category of the speech data to be recognized.
[0013] The speech data to be recognized is identified by the second layer model based on multiple second preset entity categories in the dual-layer entity recognition model, to obtain the second target entity category corresponding to the speech data to be recognized, the entity corresponding to the second target entity category, and the unrecognized entity. The multiple second preset entity categories are subcategories of the multiple first preset entity categories.
[0014] A target entity combination is obtained based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity.
[0015] Optionally, obtaining the target entity combination based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity includes:
[0016] If the unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity will be identified as a new second preset entity category;
[0017] Based on the new second preset entity category, the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity are updated to obtain the updated second target entity category and the entity corresponding to the updated second target entity category;
[0018] The updated second target entity category is superimposed with the entity corresponding to the updated second target entity category to obtain the target entity combination.
[0019] Optionally, the training steps of the two-layer entity recognition model include:
[0020] Acquire multiple speech sample data and first and second annotation data for each speech sample data, wherein the first annotation data is used to annotate the multiple first preset entity categories and the second annotation data is used to annotate the multiple second preset entity categories;
[0021] The speech sample data is input into a two-layer recognition network for recognition to obtain first recognition data and second recognition data of the speech sample data. The first recognition data includes recognition data based on the plurality of first preset entity categories, and the second recognition data includes recognition data based on the plurality of second preset entity categories.
[0022] The model parameters of the two-layer recognition network are trained based on the first recognition data, the second recognition data, the first labeled data, the second labeled data, and the loss function of the two-layer recognition network.
[0023] The trained two-layer recognition network is determined as the two-layer entity recognition model.
[0024] Optionally, the steps for constructing the entity library include:
[0025] Based on the second identification data, the combination of the multiple existing entities is obtained;
[0026] The entity library is constructed based on the combination of the multiple existing entities.
[0027] Optionally, constructing the entity library based on the combination of the plurality of existing entities includes:
[0028] The multiple existing entities are combined and then mined using an association rule mining algorithm to obtain the target association rule;
[0029] The multiple existing entity combinations are processed according to the target association rule to obtain multiple processed existing entity combinations.
[0030] The entity library is constructed based on the combination of multiple existing entities after processing.
[0031] Optionally, the plurality of first preset entity categories include: action words, proper nouns, interrogative words, and question words to be processed.
[0032] Optionally, after stating that the target entity combination does not belong to the entity library, the method further includes:
[0033] The target entity combination is processed according to preset rules to obtain the processed target entity combination.
[0034] The step of inputting the target entity combination into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination includes:
[0035] The processed target entity combination is input into the pre-trained language model to obtain the target entity vector.
[0036] Optionally, the clustering step of the multiple existing intent category vectors includes:
[0037] Each existing entity vector is labeled with the similarity relationship between itself and the remaining entity vectors among the plurality of existing entity vectors, thereby obtaining a plurality of labeled entity vectors;
[0038] The multiple labeled entity vectors are sorted according to the number of labels on the multiple labeled entity vectors to obtain an entity vector sequence;
[0039] Based on the order of the entity vector sequence, existing entity vectors with similar labels are clustered sequentially to obtain multiple existing intent category vectors.
[0040] Secondly, embodiments of this application provide a device for determining a new intent category, the device comprising:
[0041] The recognition module is used to identify the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination;
[0042] The module is configured to, if the target entity combination does not belong to the entity library, input the target entity combination into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination.
[0043] The calculation module is used to calculate the distance between the target entity vector and multiple existing intent category vectors respectively to obtain multiple vector distances. The multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library.
[0044] The determination module is used to determine a new intent category based on the speech data to be recognized if the distances of the multiple vectors are all greater than a preset threshold.
[0045] Thirdly, embodiments of this application provide a device for determining a new intent category, the device comprising:
[0046] Memory, used to store computer programs;
[0047] A processor for executing the computer program to cause the device to perform the method for determining the new intent category described in the first aspect above.
[0048] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program, wherein when the computer program is run, a device running the computer program implements the method for determining a new intent category as described in the first aspect.
[0049] Compared with the prior art, the embodiments of this application have the following beneficial effects:
[0050] This application provides a method for determining a new intent category. The method involves identifying the speech data to be recognized using a two-layer entity recognition model to obtain target entity combinations. If the target entity combination does not belong to an entity library, it is input into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination. Distance calculations are performed between the target entity vector and multiple existing intent category vectors to obtain multiple vector distances. These multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library. If all multiple vector distances are greater than a preset threshold, a new intent category is determined based on the speech data to be recognized. As can be seen, this method, when determining that the entity combination of the speech data to be recognized does not belong to the entity library, calculates the distances between the entity vectors of the speech data to be recognized and the existing intent category vectors to obtain multiple vector distances. Based on the comparison results of these multiple vector distances with a preset threshold, it determines whether a new intent category should be determined based on the speech data to be recognized. This avoids manual analysis, thus preventing subjectivity in the determination process and making the determination result more objective, thereby improving the accuracy of the new intent category determination. Attached Figure Description
[0051] To more clearly illustrate the technical solutions in this embodiment or the prior art, the drawings used in the description of the embodiment or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0052] Figure 1 This application scenario illustrates a novel intent category determination method provided in this embodiment.
[0053] Figure 2 A flowchart illustrating a method for determining a new intent category as provided in an embodiment of this application;
[0054] Figure 3 This is a schematic diagram of a device for determining a new intent category, provided in an embodiment of this application. Detailed Implementation
[0055] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present application.
[0056] Currently, the existing methods for determining new intent categories involve several steps. First, a large amount of speech data is labeled, and then features are extracted to train a classification model, resulting in an intent recognition model. Next, the speech data to be recognized is processed by the intent recognition model. If the output cannot be classified into an existing intent category, it is manually analyzed to determine if it should be classified as a new intent category. Finally, if a new intent category is determined from the speech data to be recognized, the new intent category needs to be labeled, and the intent recognition model needs to be retrained. However, this method of manually analyzing speech data that does not belong to an existing intent category introduces a high degree of subjectivity in determining the new intent category, leading to insufficient objectivity and low accuracy in the determination results.
[0057] Based on this, to address the aforementioned issues and improve the accuracy of new intent category determination, this application provides a method and apparatus for determining new intent categories. In this method, the speech data to be recognized is identified using a two-layer entity recognition model to obtain target entity combinations. If the target entity combination does not belong to the entity library, it is input into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination. Distance calculations are performed between the target entity vector and multiple existing intent category vectors to obtain multiple vector distances. These multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library. If all multiple vector distances are greater than a preset threshold, a new intent category is determined based on the speech data to be recognized. It can be seen that when determining that the entity combination of the speech data to be recognized does not belong to the entity library, this method calculates the distances between the entity vectors of the speech data to be recognized and the existing intent category vectors to obtain multiple vector distances. Based on the comparison results of these multiple vector distances with a preset threshold, it determines whether a new intent category should be determined based on the speech data to be recognized. This avoids manual analysis and subjective issues in the determination process, making the determination result more objective and thus improving the accuracy of the new intent category determination.
[0058] For example, one scenario in the embodiments of this application can be applied to, such as Figure 1 The scenario shown includes an entity library 101 and a server 102. The entity library 101 includes multiple existing intent category vectors, and the server 102 uses the implementation method provided in this application embodiment to obtain multiple existing intent category vectors from the entity library 101.
[0059] First, in the above application scenarios, although the action descriptions of the implementation methods provided in this application are executed by the server 102, the implementation methods of this application are not limited in terms of the execution subject, as long as the actions disclosed in the implementation methods provided in this application are executed.
[0060] Secondly, the above scenario is only one example provided by the embodiments of this application, and the embodiments of this application are not limited to this scenario.
[0061] The following detailed description, in conjunction with the accompanying drawings, of the specific implementation methods and apparatus for determining new intent categories in the embodiments of this application.
[0062] See Figure 2 The figure is a flowchart of a method for determining a new intent category provided in an embodiment of this application, combined with... Figure 2 As shown, it can specifically include:
[0063] S201: The speech data to be recognized is identified through a two-layer entity recognition model to obtain the target entity combination.
[0064] The speech data to be recognized is input into a two-layer entity recognition model. Multiple entity words within the speech data are recognized to obtain target entities corresponding to preset categories of the two-layer entity recognition model, resulting in target entity combinations. Preprocessing the speech data based on the two-layer entity recognition model can extract target entities corresponding to preset categories, or it can extract new entities that do not correspond to preset categories, thus obtaining target entity combinations. Using these target entity combinations to replace the speech data to be recognized in subsequent steps for determining new intent categories can reduce the length of the speech data to some extent, amplify its semantic features, and remove some non-essential text. This allows for the input of higher-quality text for subsequent new intent recognition steps, further improving the accuracy of the new intent category determination results.
[0065] The two-layer entity recognition model includes an input layer, an encoding layer, and a decoding layer. The input layer converts the speech data to be recognized into a distributed sequence that can be input to the encoding layer; for example, a Word2vec model can be used. The encoding layer encodes the distributed sequence converted from the speech data to obtain encoded features; for example, a pre-trained BERT model or a Transformer encoder can be used. The decoding layer predicts the entity boundaries and entity types. It generates target entity combinations through the abstract semantic representation of the entity context obtained by the encoding layer; an Efficient-GlobalPointer decoder or a Conditional Random Field decoder can be used. Of course, other methods can also be used, without affecting the implementation of the embodiments of this application.
[0066] S202: If the target entity combination does not belong to the entity library, input the target entity combination into the pre-trained language model to obtain the target entity vector corresponding to the target entity combination.
[0067] The target entity combination obtained through the two-layer entity recognition model is compared with multiple existing entity combinations in the entity database. If there is an entity combination among the multiple existing entity combinations that matches the target entity combination, then the intent category of the speech data to be recognized belongs to multiple existing intent categories, and it is not necessary to determine a new intent category. If there is no entity combination among the multiple existing entity combinations that matches the target entity combination, then the target entity combination needs to be input into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination. The pre-trained language model refers to a model that has been pre-trained through some tasks to obtain a set of model parameters, and then initialized using these parameters. Its model parameters are no longer randomly initialized, and the pre-trained language model includes a lot of semantic and syntactic knowledge, which will significantly improve the performance of subsequent training tasks. For example, the pre-trained language model can be a BERT model or a GPT model; of course, other pre-trained language models can also be used without affecting the implementation of this embodiment.
[0068] S203: Calculate the distance between the target entity vector and multiple existing intent category vectors respectively to obtain multiple vector distances. The multiple existing intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library.
[0069] The target entity vector obtained through a pre-trained language model is used to calculate distances between itself and multiple existing intent category vectors to determine whether the intent category of the speech data to be recognized belongs to multiple existing intent categories. Multiple existing entity vectors corresponding to combinations of multiple existing entities in the entity database are clustered to obtain multiple existing intent categories. Distance calculation refers to calculating the similarity between the target entity vector and multiple existing intent category vectors. For example, Euclidean distance, cosine similarity, or Mahalanobis distance can be used for distance calculation. Of course, other vector similarity calculation methods can also be used without affecting the implementation of this embodiment.
[0070] S204: If the distances of multiple vectors are all greater than the preset threshold, determine the new intent category based on the speech data to be recognized.
[0071] The target entity vector is compared with multiple existing intent category vectors, and the vector distances calculated using vector similarity are compared with a preset threshold. If any of the vector distances is smaller than the preset threshold, it indicates that the speech data to be recognized belongs to the intent category corresponding to that vector distance. If all vector distances are greater than the preset threshold, it indicates that the speech data to be recognized does not belong to any of the existing intent categories, and a new intent category needs to be determined based on the speech data to be recognized. The preset threshold refers to a pre-set vector distance value. For example, it could be a threshold adjusted by researchers based on multiple experimental results and pre-set in the method for determining the new intent category. Of course, other methods can also be used to preset the threshold, which does not affect the implementation of the embodiments of this application.
[0072] Based on the above-mentioned S201-S204, in this embodiment, the speech data to be recognized is identified using a two-layer entity recognition model to obtain target entity combinations. If the target entity combination does not belong to the entity library, it is input into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination. The target entity vector is then used to calculate the distance between itself and multiple existing intent category vectors to obtain multiple vector distances. These multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library. If all multiple vector distances are greater than a preset threshold, a new intent category is determined based on the speech data to be recognized. It can be seen that when determining that the entity combination of the speech data to be recognized does not belong to the entity library, this method calculates the distance between the entity vector of the speech data to be recognized and the existing intent category vectors to obtain multiple vector distances. Based on the comparison results of these multiple vector distances with a preset threshold, it determines whether a new intent category should be determined based on the speech data to be recognized. This avoids manual analysis and subjective issues in the process of determining the new intent category, making the determination result more objective and improving the accuracy of the new intent category determination result.
[0073] In this embodiment of the application, S201 may specifically include the following S2021-S2023:
[0074] S2021: The speech data to be recognized is recognized by the first layer model based on multiple first preset entity categories in the two-layer entity recognition model to obtain the first target entity category and the entity corresponding to the speech data to be recognized.
[0075] The speech data to be recognized is recognized through the first layer of the two-layer entity recognition model. The first layer model is constructed based on multiple first preset entity categories to obtain the first target entity category in the speech data to be recognized that corresponds to the multiple first preset entity categories, and the entity in the speech data to be recognized that corresponds to the first target entity category.
[0076] Here, the first preset entity category refers to a pre-set entity category. For example, four entity categories can be pre-set in the first layer of a two-layer entity recognition model, which may include: action words, proper nouns, interrogative words, and question words to be processed. Of course, other first preset entity categories can also be set without affecting the implementation of this application embodiment.
[0077] S2022: The speech data to be recognized is recognized by the second-layer model based on multiple second preset entity categories in the two-layer entity recognition model to obtain the second target entity category corresponding to the speech data to be recognized, the entity corresponding to the second target entity category, and the unrecognized entity. The multiple second preset entity categories are subcategories of multiple first preset entity categories.
[0078] The speech data to be recognized is recognized through the second layer of the two-layer entity recognition model. The second layer model is constructed based on multiple second preset entity categories to obtain the second target entity category in the speech data to be recognized that corresponds to the multiple second preset entity categories, the entity in the speech data to be recognized that corresponds to the second target entity category, and the unrecognized entity in the speech data that does not belong to the second preset entity category. The multiple second preset entity categories are subcategories of multiple first preset entity categories.
[0079] For example, if the method for determining the new intent category is applied to the insurance field, the first preset entity category specifically includes the four entity categories mentioned above. The second preset entity category can specifically include the following: subcategories of action words can include urging, renewal, complaint, and inquiry; subcategories of proper nouns can include property insurance names, liability insurance names, fees, and value-added services; subcategories of interrogative words can include how to do it, why, when, and what; and subcategories of pending question words can include operation failure, payment failure, and inability to contact. Of course, multiple other second preset entity categories can be set without affecting the implementation of this application's embodiments.
[0080] The execution order of steps S2021 and S2022 can be interchanged, or the two steps can be executed simultaneously. This application embodiment does not limit the execution order of steps S2021 and S2022.
[0081] S2023: Obtain the target entity combination based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity.
[0082] Based on the second target entity category obtained through the second layer model of the two-layer entity recognition model, the entity corresponding to the second target entity category, and the unrecognized entity, the target entity combination is obtained.
[0083] In this application, the process of obtaining the target entity combination is not specifically limited. For ease of understanding, a possible implementation method is described below.
[0084] In one possible implementation, it is first determined whether the unidentified entity belongs to one of the multiple entities corresponding to the first target entity category. If the unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity is identified as a new entity category and added to the second preset entity category to obtain a new second preset entity category. Finally, based on the updated second preset entity category, the updated second target entity category and the entities corresponding to the updated second target entity category are obtained and superimposed to obtain the target entity combination.
[0085] Therefore, S2023 may specifically include: if the unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity is determined as a new second preset entity category; according to the new second preset entity category, the second target entity category, the entities corresponding to the second target entity category, and the unidentified entity are updated to obtain the updated second target entity category and the entities corresponding to the updated second target entity category; the updated second target entity category and the entities corresponding to the updated second target entity category are superimposed to obtain a target entity combination.
[0086] In the embodiments of this application, the training process of the two-layer entity recognition model is not specifically limited. For ease of understanding, the following description is based on a possible implementation method.
[0087] In one possible implementation, firstly, multiple speech sample data are acquired, each speech sample data having been labeled with first labeled data corresponding to multiple first preset entity categories, and also labeled with second labeled data corresponding to multiple second preset entity categories; then, the multiple speech samples labeled with the first and second labeled data are input into a two-layer recognition network to obtain first recognition data corresponding to the multiple first preset entity categories in each speech sample data, and second recognition data corresponding to the multiple second preset entity categories in each speech sample data; then, based on the comparison results between the labeled first and second labeled data and the obtained first and second recognition data, respectively, and the loss function value of the two-layer recognition network, the model parameters of the two-layer recognition network are trained; finally, the trained two-layer recognition network is determined as a two-layer entity recognition model.
[0088] Therefore, the two-layer entity recognition model can be trained through the following steps: First, acquire multiple speech sample data and first and second labeled data for each speech sample data. The first labeled data is used to label multiple first preset entity categories, and the second labeled data is used to label multiple second preset entity categories. Second, input the speech sample data into the two-layer recognition network for recognition, obtaining first and second recognition data for the speech sample data. The first recognition data includes recognition data based on multiple first preset entity categories, and the second recognition data includes recognition data based on multiple second preset entity categories. Third, train the model parameters of the two-layer recognition network based on the first recognition data, the second recognition data, the first labeled data, the second labeled data, and the loss function of the two-layer recognition network. Fourth, determine the trained two-layer recognition network as the two-layer entity recognition model.
[0089] In this application embodiment, the construction process of the entity library is not specifically limited. For ease of understanding, the following description is based on a possible implementation method.
[0090] In one possible implementation, multiple existing entity combinations in the multiple speech sample data are first obtained based on the second recognition data corresponding to multiple second preset entity categories in each speech sample data, and then the multiple existing entity combinations are constructed into an entity library. Therefore, the entity library can be constructed through the following steps: obtaining multiple existing entity combinations based on the second recognition data; constructing an entity library based on the multiple existing entity combinations.
[0091] In this application embodiment, another construction process for the entity library can also be provided.
[0092] In one possible implementation, the multiple existing entity combinations are first mined using an association rule mining algorithm to obtain target association rules for these combinations. Then, the existing entity combinations are processed according to the target association rules to obtain processed combinations. Finally, an entity library is constructed based on the processed combinations. Therefore, the entity library can also be constructed using the following steps: mining multiple existing entity combinations using an association rule mining algorithm to obtain target association rules; processing the existing entity combinations according to the target association rules to obtain processed combinations; and constructing an entity library based on the processed combinations. For example, the association rule mining algorithm can be the Apriori algorithm, the Eclat algorithm, or the FP-Tree algorithm. Of course, other association rule mining algorithms can also be used without affecting the implementation of this embodiment.
[0093] Furthermore, before inputting the target entity combination into the pre-trained language model, the target entity combination can be processed according to preset rules to obtain a higher quality target entity combination, further improving the accuracy of the new intent category determination result. Therefore, in an optional embodiment of this application, the method may further include S1: processing the target entity combination according to preset rules to obtain a processed target entity combination. Accordingly, S202 may specifically include: inputting the processed target entity combination into the pre-trained language model to obtain a target entity vector.
[0094] For example, the preset rule could be to limit the number of entities corresponding to the second target entity category in the target entity combination, and to cut target entity combinations with too many entities; the preset rule could also be to delete entities that are meaningless in the target entity combination; the preset rule could also be to select one of the mutually exclusive entity words in the target entity combination for deletion; of course, other preset rules are also possible, which do not affect the implementation of the embodiments of this application.
[0095] In the embodiments of this application, the clustering process of multiple existing entity vectors is not specifically limited. For ease of understanding, the following description is based on a possible implementation method.
[0096] In one possible implementation, the relationships between multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library can be labeled first. When two existing entity vectors have a similar relationship, these two existing entity vectors are labeled, until the similar relationships between each existing vector and the remaining entity vectors in the multiple existing entity vectors are all labeled. Then, the multiple existing entity vectors are sorted according to the number of labels already labeled for each existing vector to obtain an entity vector sequence. Finally, starting from the first existing entity vector in the entity vector sequence, scanning is performed towards the last existing entity vector. When an existing entity vector is scanned, the existing entity vectors with similar labels are classified into one category, until all the multiple existing entity vectors in the entity vector sequence have been classified, resulting in multiple existing intent category vectors.
[0097] Therefore, multiple existing entity vectors can be clustered through the following steps: label the similarity relationships between each existing entity vector and the remaining entity vectors in the multiple existing entity vectors to obtain multiple labeled entity vectors; sort the multiple labeled entity vectors according to the number of labels to obtain an entity vector sequence; and cluster the existing entity vectors with similar labels according to the order of the entity vector sequence to obtain multiple existing intent category vectors.
[0098] The above are some specific implementations of the method for determining new intent categories provided in the embodiments of this application. Based on this, this application also provides a corresponding apparatus. The apparatus provided in the embodiments of this application will be described below from the perspective of functional modularity.
[0099] See Figure 3 The figure is a schematic diagram of a novel intent category determination device 300 provided in an embodiment of this application. The device 300 includes:
[0100] The recognition module 301 is used to recognize the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination;
[0101] The module 302 is used to input the target entity combination into the pre-trained language model if the target entity combination does not belong to the entity library, and obtain the target entity vector corresponding to the target entity combination.
[0102] The calculation module 303 is used to calculate the distance between the target entity vector and multiple existing intent category vectors respectively to obtain multiple vector distances. The multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library.
[0103] The determination module 304 is used to determine a new intent category based on the speech data to be recognized if the distances of multiple vectors are all greater than a preset threshold.
[0104] In this embodiment, the cooperation of four modules—identification module 301, acquisition module 302, calculation module 303, and determination module 304—ensures that when the entity combination of the speech data to be identified does not belong to the entity library, the distance between the entity vector of the speech data to be identified and the existing intent category vector is calculated to obtain multiple vector distances. Based on the comparison results of these multiple vector distances with a preset threshold, it is determined whether a new intent category should be determined based on the speech data to be identified. This method avoids manual analysis, thus preventing subjectivity in the determination of the new intent category and making the determination result more objective, thereby improving the accuracy of the new intent category determination result.
[0105] As one implementation method, the identification module 301 may specifically include:
[0106] The first recognition unit is used to recognize the speech data to be recognized through the first layer model based on multiple first preset entity categories in the two-layer entity recognition model, and obtain the first target entity category and the entity corresponding to the speech data to be recognized.
[0107] The second recognition unit is used to recognize the speech data to be recognized through the second layer model based on multiple second preset entity categories in the two-layer entity recognition model, and to obtain the second target entity category corresponding to the speech data to be recognized, the entity corresponding to the second target entity category, and the unrecognized entity. The multiple second preset entity categories are subcategories of multiple first preset entity categories.
[0108] The first obtaining unit is used to obtain a combination of target entities based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity.
[0109] As one implementation method, the first obtaining unit can specifically be used for:
[0110] If an unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity will be identified as a new second preset entity category;
[0111] Based on the new second preset entity category, the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity are updated to obtain the updated second target entity category and the entity corresponding to the updated second target entity category;
[0112] The updated second target entity category is overlaid with the entity corresponding to the updated second target entity category to obtain the target entity combination.
[0113] As one implementation method, this two-layer entity recognition model can be trained using the following units:
[0114] The acquisition unit is used to acquire multiple speech sample data and first and second annotation data for each speech sample data. The first annotation data is used to annotate multiple first preset entity categories, and the second annotation data is used to annotate multiple second preset entity categories.
[0115] The third recognition unit is used to input the speech sample data into the two-layer recognition network for recognition, and obtain the first recognition data and the second recognition data of the speech sample data. The first recognition data includes recognition data based on multiple first preset entity categories, and the second recognition data includes recognition data based on multiple second preset entity categories.
[0116] The training unit is used to train the model parameters of the two-layer recognition network based on the first recognition data, the second recognition data, the first labeled data, the second labeled data, and the loss function of the two-layer recognition network.
[0117] The determination unit is used to determine the trained two-layer recognition network as a two-layer entity recognition model.
[0118] As one implementation method, this entity library can be trained using the following units:
[0119] The second obtaining unit is used to obtain a combination of multiple existing entities based on the second identification data;
[0120] Building units are used to construct entity libraries based on combinations of multiple existing entities.
[0121] As one implementation method, the building unit can specifically be used for:
[0122] Multiple existing entities are combined and then mined using an association rule mining algorithm to obtain the target association rule;
[0123] Based on the target association rules, multiple existing entity combinations are processed to obtain multiple processed existing entity combinations.
[0124] An entity library is constructed based on the combination of multiple existing entities after processing.
[0125] In one implementation, the multiple first preset entity categories of the first identification unit may specifically include: action words, proper nouns, interrogative words, and question words to be processed.
[0126] As one implementation, the device 300 for determining the new intent category may further include:
[0127] The processing module is used to process the target entity combination according to preset rules to obtain the processed target entity combination;
[0128] Accordingly, module 302 can be used specifically for:
[0129] The processed target entities are combined and input into a pre-trained language model to obtain target entity vectors.
[0130] As one implementation method, multiple existing intent categories can be clustered using the following units:
[0131] The annotation unit is used to annotate the similarity relationship between each existing entity vector and the remaining entity vectors among the multiple existing entity vectors, so as to obtain multiple annotated entity vectors;
[0132] The sorting unit is used to sort multiple labeled entity vectors according to the number of labels on the multiple labeled entity vectors to obtain an entity vector sequence;
[0133] Clustering units are used to cluster existing entity vectors with similar labels according to the order of the entity vector sequence, so as to obtain multiple existing intent category vectors.
[0134] This application also provides corresponding devices and computer storage media for implementing the solutions provided in this application.
[0135] The device includes a memory and a processor. The memory stores instructions or code, and the processor executes the instructions or code to cause the device to perform the method for determining a new intent category as described in any embodiment of this application.
[0136] The computer storage medium stores code, and when the code is executed, the device running the code implements the method for determining a new intent category as described in any embodiment of this application.
[0137] In the embodiments of this application, the terms "first" and "second" (if they exist) are used only as name identifiers and do not represent the order of first and second.
[0138] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the methods of the above embodiments can be implemented by means of software plus a general-purpose hardware platform. Based on this understanding, the technical solution of this application can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as a read-only memory (ROM) / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the methods described in various embodiments or some parts of the embodiments of this application.
[0139] It should be noted that the various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for the device embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiments. The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separate. The components indicated as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment solution according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0140] The above description is merely one specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for determining a new intent category, characterized in that, The method includes: The speech data to be recognized is identified through a two-layer entity recognition model to obtain the target entity combination; If the target entity combination does not belong to the entity library, the target entity combination is input into the pre-trained language model to obtain the target entity vector corresponding to the target entity combination. The distance between the target entity vector and multiple existing intent category vectors is calculated to obtain multiple vector distances. The multiple existing intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library. If the distances of all the vectors are greater than a preset threshold, a new intent category is determined based on the speech data to be recognized. The step of identifying the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination includes: The speech data to be recognized is identified by the first layer model based on multiple first preset entity categories in the two-layer entity recognition model to obtain the first target entity category and the entity corresponding to the first target entity category of the speech data to be recognized. The speech data to be recognized is identified by the second layer model based on multiple second preset entity categories in the dual-layer entity recognition model, to obtain the second target entity category corresponding to the speech data to be recognized, the entity corresponding to the second target entity category, and the unrecognized entity. The multiple second preset entity categories are subcategories of the multiple first preset entity categories. Based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity, a target entity combination is obtained; The step of obtaining a target entity combination based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity includes: If the unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity will be identified as a new second preset entity category; Based on the new second preset entity category, the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity are updated to obtain the updated second target entity category and the entity corresponding to the updated second target entity category; The updated second target entity category is superimposed with the entity corresponding to the updated second target entity category to obtain the target entity combination.
2. The method of claim 1, wherein, The training steps of the two-layer entity recognition model include: Acquire multiple speech sample data and first and second annotation data for each speech sample data, wherein the first annotation data is used to annotate the multiple first preset entity categories and the second annotation data is used to annotate the multiple second preset entity categories; The speech sample data is input into a two-layer recognition network for recognition to obtain first recognition data and second recognition data of the speech sample data. The first recognition data includes recognition data based on the plurality of first preset entity categories, and the second recognition data includes recognition data based on the plurality of second preset entity categories. The model parameters of the two-layer recognition network are trained based on the first recognition data, the second recognition data, the first labeled data, the second labeled data, and the loss function of the two-layer recognition network. The trained two-layer recognition network is determined as the two-layer entity recognition model.
3. The method of claim 2, wherein, The steps for constructing the entity library include: Based on the second identification data, the combination of the multiple existing entities is obtained; The entity library is constructed based on the combination of the multiple existing entities.
4. The method of claim 3, wherein, The step of constructing the entity library based on the combination of the multiple existing entities includes: The multiple existing entities are combined and then mined using an association rule mining algorithm to obtain the target association rule; The multiple existing entity combinations are processed according to the target association rule to obtain multiple processed existing entity combinations. The entity library is constructed based on the combination of multiple existing entities after processing.
5. The method according to claim 1, characterized in that, The multiple first preset entity categories include: action words, proper nouns, interrogative words, and question words to be processed.
6. The method of claim 1, wherein, If the target entity combination does not belong to the entity library, the following steps are also included: The target entity combination is processed according to preset rules to obtain the processed target entity combination. The step of inputting the target entity combination into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination includes: The processed target entity combination is input into the pre-trained language model to obtain the target entity vector.
7. The method according to any one of claims 1 to 6, characterized in that, The clustering steps for the multiple existing intent category vectors include: Each existing entity vector is labeled with the similarity relationship between itself and the remaining entity vectors among the plurality of existing entity vectors, thereby obtaining a plurality of labeled entity vectors; The multiple labeled entity vectors are sorted according to the number of labels on the multiple labeled entity vectors to obtain an entity vector sequence; Based on the order of the entity vector sequence, existing entity vectors with similar labels are clustered sequentially to obtain multiple existing intent category vectors.
8. A device for determining a new intention category, characterized in that, The device includes: The recognition module is used to identify the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination; The module is configured to, if the target entity combination does not belong to the entity library, input the target entity combination into a pre-trained language model to obtain the target entity vector corresponding to the target entity combination. The calculation module is used to calculate the distance between the target entity vector and multiple existing intent category vectors respectively to obtain multiple vector distances. The multiple intent category vectors are obtained by clustering multiple existing entity vectors corresponding to multiple existing entity combinations in the entity library. The determination module is used to determine a new intent category based on the speech data to be recognized if the distances of the multiple vectors are all greater than a preset threshold. The step of identifying the speech data to be recognized through a two-layer entity recognition model to obtain the target entity combination includes: The speech data to be recognized is identified by the first layer model based on multiple first preset entity categories in the two-layer entity recognition model to obtain the first target entity category and the entity corresponding to the first target entity category of the speech data to be recognized. The speech data to be recognized is identified by the second layer model based on multiple second preset entity categories in the dual-layer entity recognition model, to obtain the second target entity category corresponding to the speech data to be recognized, the entity corresponding to the second target entity category, and the unrecognized entity. The multiple second preset entity categories are subcategories of the multiple first preset entity categories. Based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity, a target entity combination is obtained; The step of obtaining a target entity combination based on the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity includes: If the unidentified entity is among the entities corresponding to the first target entity category, the unidentified entity will be identified as a new second preset entity category; Based on the new second preset entity category, the second target entity category, the entity corresponding to the second target entity category, and the unidentified entity are updated to obtain the updated second target entity category and the entity corresponding to the updated second target entity category; The updated second target entity category is superimposed with the entity corresponding to the updated second target entity category to obtain the target entity combination.