Method and device for processing dialogue text, electronic equipment and storage medium
By setting dialogue role features and decoding prediction class labels in the dialogue text, the problem of inaccurate recognition of named entity recognition technology in dialogue text is solved, and the accuracy of extracting important information and semantic recognition ability in dialogue text is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SINOVOICE TECH CO LTD
- Filing Date
- 2022-09-21
- Publication Date
- 2026-06-30
AI Technical Summary
Existing named entity recognition technologies are not accurate enough in dialogue text, making it impossible to extract the correct information in complex scenarios.
By setting the dialogue role characteristics of the dialogue characters, the distinguishability between dialogue text content is enhanced, and the decoding of nested type tags is achieved by decoding the predicted class tags.
It improves the accuracy of extracting important information from dialogue text, avoids cross-role information extraction, and enhances the ability to recognize semantic information.
Smart Images

Figure CN115563255B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of information processing technology, and in particular to a method for processing dialogue text, a device for processing dialogue text, an electronic device, and a computer-readable storage medium. Background Technology
[0002] In real life, based on different text processing needs, we often encounter situations where we need to extract certain specific information from text, such as extracting names of people, places, and organizations from a sentence or paragraph. The entities identified by names, such as names of people, places, and organizations, are called named entities. For information extraction from text, named entity recognition technology is usually used, which uses deep neural network models to identify and extract named entities from text. In relatively complex named entity recognition scenarios, nested named entity recognition methods can also be used to extract named entities from text.
[0003] However, existing named entity recognition (NER) technologies are generally only suitable for extracting text based on sentences or paragraphs, and the extracted information is usually content with relatively obvious surface features, such as names of people and places. In real-world applications, the information to be extracted is much more complex. In such cases, NER technologies may fail to extract the correct information from the text due to inaccurate recognition. Summary of the Invention
[0004] The present invention provides a method, apparatus, electronic device, and computer-readable storage medium for processing dialogue text, in order to solve or partially solve the problem in existing named entity recognition technology that the correct information in the text cannot be extracted due to insufficient accuracy in text content recognition.
[0005] This invention discloses a method for processing dialogue text, including:
[0006] In response to a text recognition operation on the dialogue text, obtain the dialogue text content to be recognized;
[0007] Obtain the dialogue text model corresponding to the dialogue text content, and obtain the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text.
[0008] The target dialogue character features are used as text indexes to determine the storage location of the dialogue text content in the dialogue text model.
[0009] Obtain at least one predicted class label corresponding to the storage location, wherein the predicted class label is used to represent the hierarchical label corresponding to the dialogue text content;
[0010] The predicted class labels are decoded to obtain the text annotation objects corresponding to the predicted class labels, and the content corresponding to the text annotation objects is extracted.
[0011] Optionally, before obtaining the content of the dialogue text to be recognized in response to the text recognition operation on the dialogue text, the method further includes:
[0012] Acquire multi-turn dialogue content, which includes multiple dialogue characters and the dialogue text content corresponding to the multiple dialogue characters;
[0013] Each of the dialogue roles is identified to obtain the dialogue role characteristics corresponding to each of the dialogue roles.
[0014] Associate the features of each dialogue character with the corresponding dialogue text content of each dialogue character;
[0015] The concatenated dialogue texts are then input into the dialogue text model.
[0016] Optionally, the step of performing role identification processing on each of the dialogue roles to obtain the dialogue role characteristics corresponding to each of the dialogue roles includes:
[0017] Identify the dialogue text content corresponding to each of the dialogue roles, and determine the role attributes corresponding to each of the dialogue roles.
[0018] In response to the identification input operation for each of the said role attributes, a special symbol identifier corresponding to each of the said role attributes is generated, and each of the special symbol identifiers is used as the dialogue role feature corresponding to each of the said dialogue roles.
[0019] Optionally, before obtaining the content of the dialogue text to be recognized in response to the text recognition operation on the dialogue text, the method further includes:
[0020] The features of each dialogue character are obtained from the dialogue text model.
[0021] Based on the characteristics of each dialogue role, determine the dialogue text content corresponding to each of the dialogue role characteristics;
[0022] The content of each dialogue text is identified and processed to determine the text annotation object corresponding to each dialogue text content.
[0023] In response to a label input operation for the text annotation object, a predicted class label corresponding to the text annotation object is generated.
[0024] Optionally, the step of decoding the predicted class label to obtain the text annotation object corresponding to the predicted class label, and extracting the content corresponding to the text annotation object, includes:
[0025] Determine the string length of the dialogue text content;
[0026] The dialogue text content is visualized based on the string length and at least one of the predicted class labels to generate a three-dimensional label matrix corresponding to the dialogue text content;
[0027] Analyze the three-dimensional label matrix to determine the annotation position of at least one of the predicted class labels in the three-dimensional label matrix;
[0028] Obtain the text annotation object at the specified annotation location, and extract the content corresponding to the text annotation object.
[0029] Optionally, parsing the three-dimensional label matrix to determine the annotation position of at least one predicted class label in the three-dimensional label matrix includes:
[0030] Analyze the three-dimensional label matrix to obtain the probability distribution corresponding to at least one of the predicted class labels;
[0031] The probability distribution is compared with a preset probability threshold.
[0032] When the probability value in the probability distribution is greater than or equal to the preset probability threshold, the predicted class label corresponding to the probability distribution is determined as the target predicted class label;
[0033] Based on the target prediction class label, determine the labeling position of the target prediction class label in the three-dimensional label matrix.
[0034] Optionally, it also includes:
[0035] When the probability value corresponding to the probability distribution is less than the preset probability threshold, the predicted class label corresponding to the probability distribution is skipped.
[0036] This invention also discloses a dialog text processing apparatus, the apparatus comprising:
[0037] The dialogue text content acquisition module is used to acquire the dialogue text content to be recognized in response to the text recognition operation on the dialogue text.
[0038] The target dialogue role feature acquisition module is used to acquire the dialogue text model corresponding to the dialogue text content, and to acquire the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text.
[0039] The storage location determination module is used to determine the storage location of the dialogue text content in the dialogue text model by using the target dialogue role features as a text index.
[0040] The prediction tag acquisition module is used to acquire at least one prediction tag corresponding to the storage location, wherein the prediction tag is used to represent the hierarchical tag corresponding to the dialogue text content;
[0041] The predicted label decoding module is used to decode the predicted label, obtain the text annotation object corresponding to the predicted label, and extract the content corresponding to the text annotation object.
[0042] Optionally, the device further includes:
[0043] A multi-turn dialogue content acquisition module is used to acquire multi-turn dialogue content, which includes multiple dialogue roles and dialogue text content corresponding to the multiple dialogue roles;
[0044] The dialogue role feature generation module is used to perform role identification processing on each of the dialogue roles to obtain the dialogue role features corresponding to each of the dialogue roles.
[0045] The dialogue role feature association module is used to associate each of the dialogue role features with the dialogue text content corresponding to each of the dialogue roles;
[0046] The dialogue text content splicing module is used to splice together the associated dialogue text content and input it into the dialogue text model.
[0047] Optionally, the dialogue character feature generation module includes:
[0048] The role attribute determination module is used to identify the dialogue text content corresponding to each of the dialogue roles and determine the role attributes corresponding to each of the dialogue roles.
[0049] The special symbol identifier generation module for dialogue roles is used to generate special symbol identifiers corresponding to each of the said role attributes in response to the identifier input operation for each of the said role attributes, and to use each of the special symbol identifiers as the dialogue role features corresponding to each of the said dialogue roles.
[0050] Optionally, the device further includes:
[0051] The dialogue role feature acquisition module is used to acquire the features of each dialogue role from the dialogue text model;
[0052] The dialogue text content determination module is used to determine the dialogue text content corresponding to each of the dialogue role characteristics based on each of the dialogue role characteristics.
[0053] The text annotation object determination module is used to identify and process each of the dialogue text contents and determine the text annotation object corresponding to each of the dialogue text contents.
[0054] The predictive label generation module is used to generate predictive labels corresponding to the text label object in response to a label input operation for the text label object.
[0055] Optionally, the predicted class label decoding module includes:
[0056] A string length determination module is used to determine the string length of the dialogue text content;
[0057] A three-dimensional label matrix generation module is used to visualize the dialogue text content based on the string length and at least one of the predicted class labels, and generate a three-dimensional label matrix corresponding to the dialogue text content.
[0058] The annotation location determination module is used to parse the three-dimensional label matrix and determine the annotation location of at least one of the predicted class labels in the three-dimensional label matrix.
[0059] The text annotation object content extraction module is used to obtain the text annotation object at the annotation position and extract the content corresponding to the text annotation object.
[0060] Optionally, the marker location determination module includes:
[0061] The probability distribution acquisition module is used to parse the three-dimensional label matrix and obtain the probability distribution corresponding to at least one of the predicted class labels;
[0062] The probability distribution comparison module is used to compare the probability distribution with a preset probability threshold.
[0063] The target prediction class label determination module is used to determine the prediction class label corresponding to the probability distribution as the target prediction class label when the probability value in the probability distribution is greater than or equal to the preset probability threshold.
[0064] The labeling position determination module is used to determine the labeling position of the target predicted class label in the three-dimensional label matrix based on the target predicted class label.
[0065] Optionally, the target prediction class label determination module is further specifically used for:
[0066] When the probability value corresponding to the probability distribution is less than the preset probability threshold, the predicted class label corresponding to the probability distribution is skipped.
[0067] This invention also discloses an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
[0068] The memory is used to store computer programs;
[0069] When the processor executes a program stored in the memory, it implements the method described in the embodiments of the present invention.
[0070] This invention also discloses a computer-readable storage medium storing instructions that, when executed by one or more processors, cause the processors to perform the methods described in this invention.
[0071] The embodiments of the present invention have the following advantages:
[0072] In this embodiment of the invention, for the extraction of important information in dialogue text, the distinction between dialogue text content of different dialogue roles is enhanced by setting dialogue role features. This increases semantic information by allowing for separate or direct input, while also avoiding cross-role information extraction. Furthermore, by decoding at least one predicted class label, the decoding of nested type labels is achieved, thereby improving the recognition of text information and further enhancing the accuracy of extracting important information in dialogue text. Attached Figure Description
[0073] Figure 1 This is a flowchart of the steps of a dialogue text processing method provided in an embodiment of the present invention;
[0074] Figure 2 This is a schematic diagram of tag decoding for dialogue text provided in an embodiment of the present invention;
[0075] Figure 3 This is a schematic diagram of tag decoding for dialogue text provided in an embodiment of the present invention;
[0076] Figure 4 This is a structural block diagram of a dialogue text processing device provided in an embodiment of the present invention;
[0077] Figure 5 This is a block diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0078] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0079] As an example, in the current field of general text processing, there are often situations where it is necessary to extract a specific element from text, such as extracting named entities such as names of people, places, and organizations. This is the named entity recognition technology. Today, named entity recognition has become a relatively mature technology. Various deep neural network models based on LSTM-CRF (Long Short-Term Memory Network-Constant Ratefactor), BERT-CRF (Bidirectional Encoder Representation from Transformer-Constant Ratefactor, a language model trained with a large amount of unlabeled text based on intelligent code rate allocation), and SPAN-BERT (Span-Bidirectional Encoder Representation from Transformer-Constant Ratefactor, a language model trained with a large amount of unlabeled text within a text line) are also widely used in named entity recognition scenarios. Furthermore, in relatively complex named entity recognition scenarios, nested named entity recognition methods can also be used for extraction.
[0080] However, existing named entity recognition (NER) technologies are generally only suitable for extracting text based on sentences or paragraphs, and the extracted information is usually content with relatively obvious surface features, such as names of people and places. In real-world applications, especially in conversations between two or more people, such as customer service, outbound calls, court hearings, or interrogations, the information to be extracted is much more complex. Taking customer service as an example, in extracting place names, it is not only necessary to extract the place name, but also to determine the specific role, whether the location corresponds to the destination or the origin, etc. In this case, using NER technology will result in problems such as inaccurate recognition, leading to the inability to extract the correct information from the text.
[0081] In this regard, one of the core inventive points of this invention is that, for the extraction of important information in dialogue text, by setting dialogue role features of dialogue roles, the distinguishability between dialogue text content of different dialogue roles can be enhanced. Relatively separate input or direct input can increase semantic information and avoid cross-role information extraction. At the same time, by decoding at least one predicted class label, the decoding of nested type labels can be realized, thereby improving the recognition of text information and further improving the accuracy of extracting important information in dialogue text.
[0082] Reference Figure 1 The diagram illustrates a flowchart of a method for processing dialogue text according to an embodiment of the present invention, which may specifically include the following steps:
[0083] Step 101: In response to the text recognition operation on the dialogue text, obtain the dialogue text content to be recognized;
[0084] Before extracting certain important information from the dialogue text, the dialogue text needs to be identified first to obtain the dialogue text content to be further identified. Specifically, the dialogue text content to be identified can be obtained in response to the text recognition operation on the dialogue text.
[0085] In one optional embodiment, the multi-turn dialogue content can be processed first and input into a dialogue text model for subsequent identification and extraction of information from the dialogue text. Specifically, this process can be as follows: First, acquire the multi-turn dialogue content, which includes multiple dialogue roles and their corresponding dialogue text content. As an example, suppose there are three dialogue roles: Role A, Role B, and Role C. Role A's dialogue text content is "I am A," Role B's is "I am B," and Role C's is "I am C." Alternatively, Role A could also correspond to the dialogue text content "How can I help you?", Role B could correspond to the dialogue text content "I want to file a complaint against Company X," and Role C could correspond to the dialogue text content "I want to book a flight to Location Y," and so on.
[0086] To better distinguish the various dialogue roles, each dialogue role needs to be identified separately to obtain the dialogue role features corresponding to each dialogue role. These dialogue role features are used to point to the characteristics of the dialogue role in the dialogue text. Thus, each dialogue role can be identified and its corresponding dialogue role features can be obtained.
[0087] Optionally, role identification processing is performed on each dialogue role to obtain the dialogue role characteristics corresponding to each dialogue role. Specifically, this can be done by: identifying the dialogue text content corresponding to each dialogue role, determining the role attributes corresponding to each dialogue role, responding to the input operation of the identifier for each role attribute, generating special symbol identifiers corresponding to each role attribute, and using each special symbol identifier as the dialogue role characteristic corresponding to each dialogue role. As an example, the dialogue text content of three dialogue roles is obtained at this time: role A's "I am A" "How can I help you?", role B's "I am B" "I want to complain about company X", and role C's "I am C" "I want to book a ticket to place Y". Based on the dialogue content, the role attributes of each role can be determined and the corresponding identifiers can be input. For example, role A's role attribute can be determined as customer service, and special symbol identifier 0 can be input as role A's dialogue role characteristic; role B's role attribute can be determined as customer 1, and special symbol identifier 1 can be input as role B's dialogue role characteristic; role C's role attribute can be determined as customer 2, and special symbol identifier 2 can be input as role C's dialogue role characteristic, and so on.
[0088] After obtaining the dialogue character features corresponding to each dialogue character, the features can be associated with the dialogue text content corresponding to each dialogue character. Then, the associated dialogue text content is concatenated and input into the dialogue text model. As an example, continuing with the previous example, if the dialogue role feature corresponding to role A is 0, and the dialogue text content is "I am A" and "How can I help you?", then the dialogue role feature 0 can be associated with the dialogue text content. This allows the dialogue text content to be input into the dialogue text model simultaneously as the character form of the dialogue role feature. Specifically, if the input dialogue text content is "I am A", then the character form of the dialogue role feature associated with "I am A" can be "00 00 0" (one Chinese character corresponds to two characters). Similarly, the character form of the dialogue role feature associated with "How can I help you?" can be "00 00 00 00 00 00 00 00 00 00 00". Concatenating "I am A" and "How can I help you?" results in the concatenated text "I am A, how can I help you?", and the corresponding character form of the dialogue role feature can be "00 00 0 00 00 00 0000 00 00 00 00 00". Similarly, the dialogue role characteristics of characters B and C can be associated with their corresponding dialogue text content, and then concatenated and input into the dialogue text model. The concatenated text content of character B can be "I am B and I want to complain about company X", and the corresponding dialogue role characteristic character form input can be "11 11 1 11 11 11 11 111 111 11". The concatenated text content of character C can be "I am C and I want to book a plane ticket to place Y", and the corresponding dialogue role characteristic character form input can be "22 22 2 22 22 22 22 22 22 22 22 22 22 22 22", and so on.
[0089] In the above embodiments, when inputting multi-turn dialogue content into the dialogue text model, the role identification of the dialogue characters in the dialogue content is processed to distinguish each dialogue character, so as to better distinguish each dialogue character and the corresponding dialogue text content in the future, thereby achieving correct identification and extraction of important information in the dialogue text content.
[0090] It is worth noting that the examples listed above are merely simple illustrations. Those skilled in the art should understand that in reality, the acquired multi-turn dialogue content will change according to different contexts, and thus the dialogue roles and their corresponding dialogue text content will also change accordingly. Furthermore, when processing the role identification of dialogue roles, the input symbols are not limited to 0 / 1 / 2 as shown in the examples; other symbols, such as letters or other special symbols, can also be used. Therefore, those skilled in the art can set special symbols according to actual needs to distinguish different dialogue roles. Moreover, when inputting dialogue text content into the dialogue text model, in addition to the forms shown in the examples, the concatenation and input methods can also be adopted, such as concatenating the dialogue text content corresponding to different dialogue roles together and then inputting it, or concatenating it directly according to the order in which the dialogue roles speak and then inputting it. This invention does not impose any limitations on these methods.
[0091] As can be seen from the foregoing, if only named entity recognition is used to identify the content of dialogue text, it is very likely to lead to inaccurate content recognition. Therefore, in addition to characterizing the dialogue roles in the dialogue content, hierarchical tags can also be used to simultaneously annotate the dialogue text content corresponding to the dialogue roles to further enhance the extraction and recognition accuracy of the dialogue text.
[0092] Hierarchical labeling refers to classifying entities based on their attributes. In other words, it involves categorizing entities according to the attributes of the labels. The higher the level, the more comprehensive the classification. For example, the first level might be "animals" or "plants." The second level, "animals," could be further divided into "vertebrates" and "invertebrates." The third level, "vertebrates," could be further divided into "mammals," "reptiles," and "fish." Specifically, taking "human" as an example, if we label "human" hierarchically, "human" could simultaneously belong to the first-level label "vertebrates," the second-level label "mammals," the third-level label "human," and so on.
[0093] Therefore, in this invention, different hierarchical labels can be set according to different scenarios. For example, in a complaint scenario, if the dialogue content is "I want to complain about Company X", then the first-level label for "Company X" can be set as "Organization Name" and the second-level label as "Company Being Complained Against". Similarly, for "Car Value 100 Yuan", it is known that "100 Yuan" is an amount label, so the first-level label for "100 Yuan" can be set as "Amount" and the second-level label as "Car Value", and so on.
[0094] In the specific implementation, the hierarchical labeling process for dialogue text content can be as follows: First, obtain the features of each dialogue role from the dialogue text model. Then, based on the features of each dialogue role, determine the dialogue text content corresponding to each dialogue role feature, and perform recognition processing on each dialogue text content to determine the text labeling object corresponding to each dialogue text content. One dialogue text content can correspond to at least one text labeling object. Simultaneously, the dialogue text content can be without any labels. Therefore, when it is not necessary to label the dialogue text content, no text labeling object can be set. For example, for the dialogue text content "I love Y City," if only the person needs to be identified, no label can be set. If it is necessary to identify both the person and the location, then "Y City" can be set as the location label. For ease of explanation, this invention uses dialogue text content with corresponding text labeling objects as examples. It is understood that those skilled in the art can set the text labeling objects corresponding to the dialogue text content according to the actual situation. This invention does not impose any restrictions on this. Finally, in response to the label input operation for the text labeling object, a predicted class label corresponding to the text labeling object is generated. The predicted class label is used to represent the hierarchical label corresponding to the dialogue text content.
[0095] As an example, continuing with the examples listed above, assuming the obtained dialogue role feature is 1, we can determine the corresponding role B from the dialogue text model based on dialogue role feature 1. The corresponding dialogue text content is "I am B and I want to complain about Company X". We can identify the corresponding text annotation object for this dialogue text content, which can be either "I want to complain about Company X" or "Company X". For "I want to complain about Company X", we can set a corresponding predictive label, such as setting the first-level label to "demand" and the second-level label to "complaint". For "Company X", we can also set a corresponding predictive label, such as setting the first-level label to "organization name" and the second-level label to "complained unit". Similarly, we can use the same method to determine the text annotation object of the dialogue text content of role C, such as "Y place", and set predictive labels for it, such as setting the first-level label to "location" and the second-level label to "destination". The labeling process for role C is similar to that for role B, so it will not be elaborated here.
[0096] Furthermore, although in this example, the dialogue text of character A does not contain any information that actually needs to be extracted, its recognition and extraction of the dialogue text of characters B and C has contextual reference value. Therefore, during the dialogue content input stage, the content of character A will also be input into the dialogue text model.
[0097] In the above embodiments, by using hierarchical tagging of the dialogue text content of each dialogue character, the recognizability of important information in each dialogue text content is further enhanced, thereby improving the accuracy of subsequent recognition and extraction.
[0098] It should be noted that the examples listed above are merely examples. Those skilled in the art can set the text annotation objects of the dialogue text content according to the actual situation, and can also set different hierarchical tags to adapt to the needs. This invention does not limit this.
[0099] Step 102: Obtain the dialogue text model corresponding to the dialogue text content, and obtain the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text.
[0100] Regarding the recognition process, once the dialogue text content to be recognized is determined, a dialogue text model corresponding to the dialogue text content can be obtained, and the target dialogue role features corresponding to the dialogue text content can be extracted from the dialogue text model. The target dialogue role features are the features used to refer to the dialogue role in the dialogue text. As an example, assuming the dialogue text content to be recognized is "I want to complain about Company X", the target dialogue role feature corresponding to this dialogue text content can be obtained as 1 from the dialogue text model.
[0101] Step 103: Use the target dialogue character features as a text index to determine the storage location of the dialogue text content in the dialogue text model;
[0102] Specifically, the target dialogue character features are used as text indexes to determine the storage location of the dialogue text content in the dialogue text model. For example, the target dialogue character feature 1 is used as a text index to determine the storage location of the dialogue text content "I want to complain about Company X" of the corresponding character B in the dialogue text model.
[0103] Step 104: Obtain at least one prediction class label corresponding to the storage location, wherein the prediction class label is used to represent the hierarchical label corresponding to the dialogue text content;
[0104] In the specific implementation, based on the determined storage location, at least one predicted class label corresponding to that storage location is obtained. The predicted class label is used to represent the hierarchical label corresponding to the dialogue text content. For example, for the dialogue text content "I want to complain about Company X", there can be two predicted class labels: one predicted class label with the first-level label "demand" and the second-level label "complaint", and another predicted class label with the first-level label "organization name" and the second-level label "complained unit".
[0105] Step 105: Decode the predicted class label to obtain the text annotation object corresponding to the predicted class label, and extract the content corresponding to the text annotation object.
[0106] The acquired predicted tags are decoded to obtain the corresponding text annotation objects, and the content corresponding to the text annotation objects is extracted. Specifically, for a dialogue text, if only one predicted tag is obtained, only that predicted tag is decoded; if two or more predicted tags are obtained, each predicted tag is decoded, and the corresponding text annotation object is obtained for each predicted tag. The content corresponding to the text annotation object is then extracted based on the decoding results.
[0107] In one optional embodiment, the obtained predicted class labels are decoded, specifically as follows: First, the string length of the dialogue text content is determined, and the dialogue text content is visualized based on the string length and at least one predicted class label to generate a three-dimensional label matrix corresponding to the dialogue text content. Then, the three-dimensional label matrix is parsed to determine the annotation position of at least one predicted class label in the three-dimensional label matrix. As an example, assuming the string length of the dialogue text content is determined to be m, and the number of obtained predicted class labels is n (m≥2, n≥1), a three-dimensional label matrix m*m*n corresponding to the dialogue text content of string length m can be generated. The three-dimensional label matrix m*m*n is parsed, and the annotation position of the predicted class label in the three-dimensional label matrix is determined based on the parsing result. Then, the text annotation object at the annotation position is obtained, and the content corresponding to the text annotation object is extracted as the target recognition content.
[0108] Further, determining the labeling position of the predicted class label in the 3D label matrix can be specifically done as follows: parse the 3D label matrix to obtain the probability distribution corresponding to at least one predicted class label, compare the probability distribution with a preset probability threshold, and when the probability value in the probability distribution is greater than or equal to the preset probability threshold, determine the predicted class label corresponding to the probability distribution as the target predicted class label; when the probability value corresponding to the probability distribution is less than the preset probability threshold, skip the predicted class label corresponding to the probability distribution, and determine the labeling position of the target predicted class label in the 3D label matrix based on the target predicted class label.
[0109] For example, a probability threshold s can be pre-defined (s can be a value between 0 and 1, set according to the actual situation). The three-dimensional label matrix m*m*n is analyzed. Assume the dialogue text content to be identified corresponds to three predicted labels. The first predicted label is a third-level label with a probability distribution of (a, b, c), where a represents the probability of hitting a first-level label in the current dialogue text content, b represents the probability of hitting a second-level label, and c represents the probability of hitting a third-level label. The second predicted label is a second-level label with a probability distribution of (d, e), and the third predicted label is a third-level label with a probability distribution of (f, g, h). The probability values for each probability distribution are all values between 0 and 1.
[0110] Next, the probability values of each probability distribution are compared with the preset probability threshold s. The result corresponding to the probability value greater than or equal to s can be marked as 1, and the result corresponding to the probability value less than s can be marked as 0. For example, for the first predicted label, the probability values a~c are compared with the preset probability threshold s respectively. Assuming that the comparison result is (1,0,1), it can be said that the first-level label and the third-level label are hit. Assuming that the comparison result is (1,1,1), it can be said that the three-level labels are hit at the same time. At this time, the first predicted label can be determined as the target predicted label, thereby determining the labeling position of the first predicted label in the three-dimensional label matrix. If the comparison result is (0,0,0), it means that no level label of the first predicted label is hit in the current dialogue text content. At this time, the first predicted label can be skipped.
[0111] In other words, during the decoding of predicted labels, the probability values of each level label in the predicted labels are compared with preset probability thresholds to determine which level label(s) are actually matched among multiple level labels. For example, if only one level label is matched in a predicted label through threshold comparison, that level label is used as the target label, and the content corresponding to the text annotation object pointed to by that target label is extracted. If at least one level label is matched, the highest level label is used as the target label, and the content corresponding to the text annotation object pointed to by that target label is extracted. If both the first-level label and the third-level label are matched, the third-level label can be used as the target label of the text annotation object. Although the text annotation object ultimately corresponds to the third-level label, the accuracy of the text annotation object is enhanced by the simultaneous annotation method, thereby further improving the accuracy of recognition and extraction.
[0112] Similarly, the comparison results of the second and third predicted class labels can be obtained in the same way. Based on the comparison results, the labeling positions of each label in the three-dimensional label matrix can be determined, and the text labeling objects at the labeling positions can be obtained. The content corresponding to the text labeling objects can be extracted. Since the method is similar, it will not be described in detail here.
[0113] As a specific example, if the identified dialogue text is "I want to complain about Company X", the string length of "I want to complain about Company X" can be determined to be 13, and the corresponding number of predicted class labels is 2. Therefore, a 13*13*2 three-dimensional label matrix can be generated corresponding to the dialogue text content with a string length of 13. Figure 2 The diagram illustrates the tag decoding process for this example. For ease of viewing, each Chinese character is treated as a point with corresponding coordinates, and the three-dimensional tag matrix is transformed into a two-dimensional table, as shown in the figure (7*7 two-dimensional matrix). Here, the first predicted tag ID is set to 0, the second predicted tag ID is set to 1, and the position corresponding to the extracted element (i.e., the label position) is assigned the ID of the predicted tag. If there is no content, it is empty. It can be seen that the position of 0 in the two-dimensional table is [0, 6], and the position of 1 is [4, 6]. Assuming the dialogue text content is T, then T[0:6+1] can be tag 0, corresponding to the text label object "I want to complain about Company X" corresponding to the first predicted tag, and T[4:6+1] is tag 1, corresponding to the text label object "Company X" corresponding to the second predicted tag. As can be seen from the figure, this tag matrix is an upper triangular matrix, thus realizing the extraction of nested entities.
[0114] A probability threshold of 0.5 can be pre-set, and the three-dimensional tag matrix corresponding to "I want to complain about Company X" can be analyzed. Previously, it was determined that there are two predicted tags for "I want to complain about Company X". The first predicted tag is a second-level tag, corresponding to the tags "demand" and "complaint". Assuming the algorithm calculates the corresponding probability distribution as (0.4, 0.8), 0.4 represents the probability of the first-level tag "demand" hitting in the current dialogue text. Similarly, 0.8 represents the probability of the second-level tag "complaint" hitting. The second predicted tag is also a second-level tag... The hierarchical labels are "Organization Name" and "Complained Unit". Assuming the algorithm calculates a probability distribution of (0.7, 0.9), each probability value is compared to a preset probability threshold of 0.5. For the first predicted label, (0.4, 0.8) is compared to 0.5, resulting in (0, 1), indicating a successful match of the second-level label "Complaint". Therefore, the first predicted label can be identified as the target predicted label, thus determining its position in the three-dimensional label matrix [0, ...]. 6], and extract the content of the text annotation object corresponding to the annotation position [0, 6], “I want to complain about Company X”, and take the second-level tag “complaint” as the target tag corresponding to “I want to complain about Company X”. For the second predicted tag, compare (0.7,0.9) with 0.5 respectively, and the comparison result is (1,1), which means that the first-level tag “organization name” and the second-level tag “complained unit” are hit at the same time. At this time, the second predicted tag can be determined as the target predicted tag, thereby determining the annotation position of the second predicted tag in the three-dimensional tag matrix [4, 6], and extract the content of the text annotation object corresponding to the annotation position [4, 6], “Company X”, and take the second-level tag “complained unit” as the target tag corresponding to “Company X”.
[0115] like Figure 3 As shown, this diagram illustrates a further enhancement to the annotation effect in this example. As previously discussed, since the annotation positions [4, 6] correspond to both the "Organization Name" and "Complained Unit" level labels, it is possible to... Figure 2 Based on this, a new tag ID is added at the annotation position [4, 6]. For example, if it is set to 2, it means that the text annotation object content corresponding to this annotation position corresponds to two levels of tags at the same time, while keeping the tag ID of the annotation position [0, 6] unchanged.
[0116] In the above embodiments, for the tag decoding process, multiple nested hierarchical tags are decoded by setting a three-dimensional tag matrix and a threshold decoding method. Based on this, hierarchical tag classification is used to optimize the effect. Furthermore, the tag system is enhanced by using common named entities, so that the algorithm model learns the relationship between general named entities and special tags to enhance the effect.
[0117] It should be noted that the embodiments of the present invention include, but are not limited to, the examples described above. It is understood that those skilled in the art can make further settings according to actual needs under the guidance of the ideas in the embodiments of the present invention, and the present invention does not limit such settings.
[0118] In this embodiment of the invention, for the extraction of important information in dialogue text, the distinction between dialogue text content of different dialogue roles is enhanced by setting dialogue role features. This increases semantic information by allowing for separate or direct input, while also avoiding cross-role information extraction. Furthermore, by decoding at least one predicted class label, the decoding of nested type labels is achieved, thereby improving the recognition of text information and further enhancing the accuracy of extracting important information in dialogue text.
[0119] It should be noted that, for the sake of simplicity, the method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments of the present invention are not limited to the described order of actions, because according to the embodiments of the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily essential to the embodiments of the present invention.
[0120] Reference Figure 4 The diagram illustrates a structural block diagram of a dialogue text processing device provided in an embodiment of the present invention, which may specifically include the following modules:
[0121] The dialogue text content acquisition module 401 is used to acquire the dialogue text content to be recognized in response to the text recognition operation on the dialogue text.
[0122] The target dialogue role feature acquisition module 402 is used to acquire the dialogue text model corresponding to the dialogue text content, and to acquire the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text.
[0123] The storage location determination module 403 is used to determine the storage location of the dialogue text content in the dialogue text model by using the target dialogue role features as a text index.
[0124] The prediction class label acquisition module 404 is used to acquire at least one prediction class label corresponding to the storage location, wherein the prediction class label is used to represent the hierarchical label corresponding to the dialogue text content;
[0125] The prediction label decoding module 405 is used to decode the prediction label, obtain the text annotation object corresponding to the prediction label, and extract the content corresponding to the text annotation object.
[0126] In one alternative embodiment, the device further includes:
[0127] A multi-turn dialogue content acquisition module is used to acquire multi-turn dialogue content, which includes multiple dialogue roles and dialogue text content corresponding to the multiple dialogue roles;
[0128] The dialogue role feature generation module is used to perform role identification processing on each of the dialogue roles to obtain the dialogue role features corresponding to each of the dialogue roles.
[0129] The dialogue role feature association module is used to associate each of the dialogue role features with the dialogue text content corresponding to each of the dialogue roles;
[0130] The dialogue text content splicing module is used to splice together the associated dialogue text content and input it into the dialogue text model.
[0131] In one optional embodiment, the dialogue role feature generation module includes:
[0132] The role attribute determination module is used to identify the dialogue text content corresponding to each of the dialogue roles and determine the role attributes corresponding to each of the dialogue roles.
[0133] The special symbol identifier generation module for dialogue roles is used to generate special symbol identifiers corresponding to each of the said role attributes in response to the identifier input operation for each of the said role attributes, and to use each of the special symbol identifiers as the dialogue role features corresponding to each of the said dialogue roles.
[0134] In one alternative embodiment, the device further includes:
[0135] The dialogue role feature acquisition module is used to acquire the features of each dialogue role from the dialogue text model;
[0136] The dialogue text content determination module is used to determine the dialogue text content corresponding to each of the dialogue role characteristics based on each of the dialogue role characteristics.
[0137] The text annotation object determination module is used to identify and process each of the dialogue text contents and determine the text annotation object corresponding to each of the dialogue text contents.
[0138] The predictive label generation module is used to generate predictive labels corresponding to the text label object in response to a label input operation for the text label object.
[0139] In one optional embodiment, the predicted class label decoding module 405 includes:
[0140] A string length determination module is used to determine the string length of the dialogue text content;
[0141] A three-dimensional label matrix generation module is used to visualize the dialogue text content based on the string length and at least one of the predicted class labels, and generate a three-dimensional label matrix corresponding to the dialogue text content.
[0142] The annotation location determination module is used to parse the three-dimensional label matrix and determine the annotation location of at least one of the predicted class labels in the three-dimensional label matrix.
[0143] The text annotation object content extraction module is used to obtain the text annotation object at the annotation position and extract the content corresponding to the text annotation object.
[0144] In one optional embodiment, the annotation location determination module includes:
[0145] The probability distribution acquisition module is used to parse the three-dimensional label matrix and obtain the probability distribution corresponding to at least one of the predicted class labels;
[0146] The probability distribution comparison module is used to compare the probability distribution with a preset probability threshold.
[0147] The target prediction class label determination module is used to determine the prediction class label corresponding to the probability distribution as the target prediction class label when the probability value in the probability distribution is greater than or equal to the preset probability threshold.
[0148] The labeling position determination module is used to determine the labeling position of the target predicted class label in the three-dimensional label matrix based on the target predicted class label.
[0149] In an optional embodiment, the target prediction class label determination module is further specifically used for:
[0150] When the probability value corresponding to the probability distribution is less than the preset probability threshold, the predicted class label corresponding to the probability distribution is skipped.
[0151] As the device embodiment is basically similar to the method embodiment, the description is relatively simple, and relevant parts can be found in the description of the method embodiment.
[0152] In addition, this invention also provides an electronic device, including: a processor, a memory, and a computer program stored in the memory and executable on the processor. When the computer program is executed by the processor, it implements the various processes of the above-described dialog text processing method embodiments and achieves the same technical effect. To avoid repetition, it will not be described again here.
[0153] This invention also provides a computer-readable storage medium storing a computer program. When executed by a processor, the computer program implements the various processes of the above-described dialog text processing method embodiments and achieves the same technical effects. To avoid repetition, it will not be described again here. The computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0154] Figure 5 A schematic diagram of the hardware structure of an electronic device for implementing various embodiments of the present invention.
[0155] The electronic device 500 includes, but is not limited to, components such as: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and a power supply 511. Those skilled in the art will understand that the electronic device structure involved in the embodiments of the present invention does not constitute a limitation on the electronic device. An electronic device may include more or fewer components than illustrated, or combine certain components, or have different component arrangements. In the embodiments of the present invention, the electronic device includes, but is not limited to, mobile phones, tablet computers, laptops, PDAs, in-vehicle terminals, wearable devices, and pedometers.
[0156] It should be understood that, in this embodiment of the invention, the radio frequency unit 501 can be used for receiving and transmitting signals during information transmission or calls. Specifically, it receives downlink data from the base station and processes it with the processor 510; additionally, it transmits uplink data to the base station. Typically, the radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier, a duplexer, etc. Furthermore, the radio frequency unit 501 can also communicate with networks and other devices through a wireless communication system.
[0157] The electronic device provides users with wireless broadband internet access through the network module 502, such as helping users send and receive emails, browse web pages, and access streaming media.
[0158] The audio output unit 503 can convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into audio signals and output them as sound. Furthermore, the audio output unit 503 can also provide audio output related to specific functions performed by the electronic device 500 (e.g., call signal reception sound, message reception sound, etc.). The audio output unit 503 includes a speaker, a buzzer, and a receiver, etc.
[0159] Input unit 504 is used to receive audio or video signals. Input unit 504 may include a graphics processing unit (GPU) 5041 and a microphone 5042. The GPU 5041 processes image data of still images or videos acquired by an image capture device (such as a camera) in video capture mode or image capture mode. The processed image frames can be displayed on display unit 506. The image frames processed by GPU 5041 can be stored in memory 509 (or other storage media) or transmitted via radio frequency unit 501 or network module 502. Microphone 5042 can receive sound and process such sound into audio data. The processed audio data can be converted into a format that can be transmitted to a mobile communication base station via radio frequency unit 501 in telephone call mode.
[0160] The electronic device 500 also includes at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 5061 according to the ambient light level, and the proximity sensor can turn off the display panel 5061 and / or backlight when the electronic device 500 is moved to the ear. As a type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes). When stationary, it can detect the magnitude and direction of gravity and can be used to identify the posture of the electronic device (such as landscape / portrait switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, tapping), etc. The sensor 505 may also include a fingerprint sensor, pressure sensor, iris sensor, molecular sensor, gyroscope, barometer, hygrometer, thermometer, infrared sensor, etc., which will not be described in detail here.
[0161] The display unit 506 is used to display information input by the user or information provided to the user. The display unit 506 may include a display panel 5061, which may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
[0162] User input unit 507 can be used to receive input numerical or character information, and to generate key signal inputs related to user settings and function control of electronic devices. Specifically, user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also known as a touch screen, can collect touch operations performed by the user on or near it (such as operations performed by the user using a finger, stylus, or any suitable object or accessory on or near touch panel 5071). Touch panel 5071 may include two parts: a touch detection device and a touch controller. The touch detection device detects the user's touch position and the signal generated by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends it to the processor 510, which receives and executes commands from the processor 510. In addition, touch panel 5071 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave. Besides touch panel 5071, user input unit 507 may also include other input devices 5072. Specifically, other input devices 5072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, power buttons, etc.), trackballs, mice, joysticks, etc., which will not be described in detail here.
[0163] Furthermore, the touch panel 5071 can cover the display panel 5061. When the touch panel 5071 detects a touch operation on or near it, it transmits the information to the processor 510 to determine the type of touch event. Subsequently, the processor 510 provides corresponding visual output on the display panel 5061 according to the type of touch event. It is understood that in one embodiment, the touch panel 5071 and the display panel 5061 are implemented as two independent components to realize the input and output functions of the electronic device. However, in some embodiments, the touch panel 5071 and the display panel 5061 can be integrated to realize the input and output functions of the electronic device. The specific implementation is not limited here.
[0164] Interface unit 508 serves as an interface for connecting external devices to electronic device 500. For example, external devices may include a wired or wireless headphone port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, an audio input / output (I / O) port, a video I / O port, a headphone port, and so on. Interface unit 508 can be used to receive input from external devices (e.g., data, power, etc.) and transmit the received input to one or more components within electronic device 500, or it can be used to transmit data between electronic device 500 and external devices.
[0165] The memory 509 can be used to store software programs and various data. The memory 509 may primarily include a program storage area and a data storage area. The program storage area may store the operating system, applications required for at least one function (such as sound playback, image playback, etc.), etc.; the data storage area may store data created based on the use of the mobile phone (such as audio data, phonebook, etc.). Furthermore, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device.
[0166] The processor 510 is the control center of the electronic device. It connects various parts of the electronic device via various interfaces and lines. By running or executing software programs and / or modules stored in the memory 509, and by calling data stored in the memory 509, it performs various functions and processes data, thereby providing overall monitoring of the electronic device. The processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor and a modem processor. The application processor mainly handles the operating system, user interface, and applications, while the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 510.
[0167] The electronic device 500 may also include a power supply 511 (such as a battery) that supplies power to various components. Preferably, the power supply 511 can be logically connected to the processor 510 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system.
[0168] In addition, the electronic device 500 includes some functional modules not shown, which will not be described in detail here.
[0169] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0170] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0171] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of the present invention without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of the present invention.
[0172] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed in this invention can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0173] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0174] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0175] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0176] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0177] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.
[0178] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for processing dialogue text, characterized in that, include: In response to a text recognition operation on the dialogue text, obtain the dialogue text content to be recognized; Obtain the dialogue text model corresponding to the dialogue text content, and obtain the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text. The target dialogue character features are used as text indexes to determine the storage location of the dialogue text content in the dialogue text model. Obtain at least one predicted class label corresponding to the storage location, wherein the predicted class label is used to represent the hierarchical label corresponding to the dialogue text content; The predicted class labels are decoded to obtain the text annotation objects corresponding to the predicted class labels, and the content corresponding to the text annotation objects is extracted. Before obtaining the content of the dialogue text to be recognized in response to the text recognition operation on the dialogue text, the method further includes: Acquire multi-turn dialogue content, which includes multiple dialogue characters and the dialogue text content corresponding to the multiple dialogue characters; Each of the dialogue roles is identified to obtain the dialogue role characteristics corresponding to each of the dialogue roles. Associate the features of each dialogue character with the corresponding dialogue text content of each dialogue character; The concatenated dialogue texts are then input into the dialogue text model.
2. The method according to claim 1, characterized in that, The step of performing role identification processing on each of the dialogue roles to obtain the dialogue role characteristics corresponding to each of the dialogue roles includes: Identify the dialogue text content corresponding to each of the dialogue roles, and determine the role attributes corresponding to each of the dialogue roles. In response to the identification input operation for each of the character attributes, a special symbol identifier corresponding to each of the character attributes is generated, and each of the special symbol identifiers is used as the dialogue character feature corresponding to each of the dialogue characters.
3. The method according to claim 1, characterized in that, Before obtaining the content of the dialogue text to be recognized in response to the text recognition operation on the dialogue text, the method further includes: The features of each dialogue character are obtained from the dialogue text model. Based on the characteristics of each dialogue role, determine the dialogue text content corresponding to each of the dialogue role characteristics; The content of each dialogue text is identified and processed to determine the text annotation object corresponding to each dialogue text content. In response to a label input operation for the text annotation object, a predicted class label corresponding to the text annotation object is generated.
4. The method according to claim 1, characterized in that, The step of decoding the predicted class label to obtain the text annotation object corresponding to the predicted class label, and extracting the content corresponding to the text annotation object, includes: Determine the string length of the dialogue text content; The dialogue text content is visualized based on the string length and at least one of the predicted class labels to generate a three-dimensional label matrix corresponding to the dialogue text content; Analyze the three-dimensional label matrix to determine the annotation position of at least one of the predicted class labels in the three-dimensional label matrix; Obtain the text annotation object at the specified annotation location, and extract the content corresponding to the text annotation object.
5. The method according to claim 4, characterized in that, The step of parsing the three-dimensional label matrix to determine the annotation position of at least one predicted class label in the three-dimensional label matrix includes: Analyze the three-dimensional label matrix to obtain the probability distribution corresponding to at least one of the predicted class labels; The probability distribution is compared with a preset probability threshold. When the probability value in the probability distribution is greater than or equal to the preset probability threshold, the predicted class label corresponding to the probability distribution is determined as the target predicted class label; Based on the target prediction class label, determine the labeling position of the target prediction class label in the three-dimensional label matrix.
6. The method according to claim 5, characterized in that, Also includes: When the probability value corresponding to the probability distribution is less than the preset probability threshold, the predicted class label corresponding to the probability distribution is skipped.
7. A device for processing dialogue text, characterized in that, The device includes: The dialogue text content acquisition module is used to acquire the dialogue text content to be recognized in response to the text recognition operation on the dialogue text. The target dialogue role feature acquisition module is used to acquire the dialogue text model corresponding to the dialogue text content, and to acquire the target dialogue role feature corresponding to the dialogue text content from the dialogue text model. The target dialogue role feature is a feature used to point to the dialogue role in the dialogue text. The storage location determination module is used to determine the storage location of the dialogue text content in the dialogue text model by using the target dialogue character features as a text index. The prediction tag acquisition module is used to acquire at least one prediction tag corresponding to the storage location, wherein the prediction tag is used to represent the hierarchical tag corresponding to the dialogue text content; The prediction tag decoding module is used to decode the prediction tags, obtain the text annotation objects corresponding to the prediction tags, and extract the content corresponding to the text annotation objects. The device further includes: A multi-turn dialogue content acquisition module is used to acquire multi-turn dialogue content, which includes multiple dialogue roles and the dialogue text content corresponding to the multiple dialogue roles; The dialogue role feature generation module is used to perform role identification processing on each of the dialogue roles to obtain the dialogue role features corresponding to each of the dialogue roles. The dialogue role feature association module is used to associate each of the dialogue role features with the dialogue text content corresponding to each of the dialogue roles; The dialogue text content splicing module is used to splice together the associated dialogue text content and input it into the dialogue text model.
8. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; The memory is used to store computer programs; When the processor executes a program stored in the memory, it implements the method as described in any one of claims 1-6.
9. A computer-readable storage medium having instructions stored thereon that, when executed by one or more processors, cause the processors to perform the method as described in any one of claims 1-6.