A summary generation method, device and equipment applied to a dialogue and a storage medium
By combining Levi graph encoding and the BART model, the problem that traditional conference summary generation methods cannot distinguish between different role perspectives is solved, resulting in personalized conference summaries and improved summary accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PING AN TECH (SHENZHEN) CO LTD
- Filing Date
- 2024-09-11
- Publication Date
- 2026-06-23
Smart Images

Figure CN119202234B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of natural language processing technology, and in particular to a method, apparatus, device, and storage medium for summarizing dialogues. Background Technology
[0002] Currently, with the widespread use of electronic devices to record audio or video to document events, the methods of recording meetings have become increasingly diverse. In addition to text recording, audio and video are being used more and more. For example, large amounts of content can be transcribed into shorthand for easy live text broadcasts or press releases; or weekly meetings can be recorded with audio and video for easy retrieval later, and so on.
[0003] There is an existing method for generating meeting summaries, which uses artificial intelligence models to generate dialogue summaries, such as meeting summaries, by summarizing the key parts of the overall meeting content.
[0004] However, the applicant found that traditional meeting summary generation methods do not differentiate between speakers in a meeting. This is because this type of dialogue data involves different speakers, each with varying focuses on the content of the dialogue. Especially in multi-person dialogues, speaker A tends to focus more on the context of their own speech and less on the interactions between others.
[0005] This demonstrates that traditional meeting summary generation methods cannot generate meeting summaries that correspond to different roles. Summary of the Invention
[0006] The purpose of this application is to provide a method, apparatus, device, and storage medium for generating summaries for dialogues, addressing the problem that traditional meeting summary generation methods cannot generate meeting summaries corresponding to different roles.
[0007] To address the aforementioned technical problems, this application provides a method for generating summaries for dialogues, employing the following technical solution:
[0008] Obtain the raw dialogue data to be processed;
[0009] The original dialogue data is annotated to obtain annotated dialogue data;
[0010] A Levi graph is constructed based on the labeled dialogue data, and graph encoding is performed on the labeled dialogue data based on the Levi graph to obtain graph-encoded data;
[0011] The graph-encoded data is subjected to a first feature weighting operation to obtain a first weighted feature vector;
[0012] The labeled dialogue data is text-encoded according to the encoder of the BART model to obtain text-encoded data;
[0013] The text-encoded data is subjected to a second feature weighting operation to obtain a second weighted feature vector;
[0014] The first weighted feature vector and the second weighted feature vector are concatenated according to the OTK fusion model to obtain the concatenated feature vector;
[0015] The concatenated feature vectors are decoded using the BART model's decoder to generate the conference summary text.
[0016] Furthermore, the step of performing a first feature weighting operation on the graph-encoded data to obtain a first weighted feature vector specifically includes the following steps:
[0017] The graph-encoded data is input into the RGCN layer for max pooling to obtain max pooled data;
[0018] The max-pooling data is input into the first density layer to perform a first linear transformation operation to obtain the first linearly transformed data.
[0019] The first feature weighting operation is performed on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
[0020] Furthermore, the step of performing a second feature weighting operation on the text encoded data to obtain a second weighted feature vector specifically includes the following steps:
[0021] The text-encoded data is input into the second density layer to perform a second linear transformation operation, resulting in second linearly transformed data.
[0022] The second linearly changing data is subjected to a second feature weighting operation based on the self-attention mechanism to obtain a second weighted feature vector.
[0023] Furthermore, after the step of concatenating the first weighted feature vector and the second weighted feature vector according to the OTK fusion model to obtain the concatenated feature vector, the following steps are also included:
[0024] The concatenated feature vectors are input into two multilayer perceptron layers respectively to obtain two multilayer perceptron features;
[0025] The two multi-layer perceptual features are mapped onto general features of the feature vector dimension respectively to obtain enhanced concatenated feature vectors;
[0026] The step of decoding the concatenated feature vectors using the BART model decoder to obtain the conference summary text specifically includes the following steps:
[0027] The enhanced splicing feature vector is decoded using the BART model's decoder to obtain the conference summary text.
[0028] Furthermore, the step of decoding the concatenated feature vector using the BART model decoder to obtain the conference summary text specifically includes the following steps:
[0029] The spliced feature vector is input into the third density layer to perform a third linear transformation operation, resulting in third linear transformation data.
[0030] The third linearly changing data is subjected to a third feature weighting operation based on the self-attention mechanism to obtain a third weighted feature vector.
[0031] The third weighted feature vector is input into the decoder of the BART model for decoding to obtain the conference summary generation result.
[0032] To address the aforementioned technical problems, this application also provides a summary generation device for dialogue, employing the following technical solution:
[0033] The raw data acquisition module is used to acquire the raw dialogue data to be processed.
[0034] The annotation module is used to perform annotation operations on the original dialogue data to obtain annotated dialogue data;
[0035] The graph encoding module is used to construct a Levi graph based on the labeled dialogue data, and to perform graph encoding operations on the labeled dialogue data based on the Levi graph to obtain graph encoded data.
[0036] The first feature weighting module is used to perform a first feature weighting operation on the graph-coded data to obtain a first weighted feature vector.
[0037] The text encoding module is used to perform text encoding operations on the labeled dialogue data according to the encoder of the BART model to obtain text encoded data;
[0038] The second feature weighting module is used to perform a second feature weighting operation on the text encoded data to obtain a second weighted feature vector.
[0039] The feature concatenation module is used to concatenate the first weighted feature vector and the second weighted feature vector according to the OTK fusion model to obtain the concatenated feature vector;
[0040] The decoding module is used to decode the concatenated feature vectors according to the decoder of the BART model to obtain the conference summary generated text.
[0041] Furthermore, the first feature weighting module includes:
[0042] The max pooling submodule is used to input the graph-encoded data into the RGCN layer for max pooling operation to obtain max pooled data.
[0043] The first linear transformation submodule is used to input the max pooling data into the first density layer to perform a first linear transformation operation to obtain the first linearly transformed data.
[0044] The first feature weighting submodule is used to perform a first feature weighting operation on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
[0045] Furthermore, the second feature weighting module includes:
[0046] The second linear transformation submodule is used to input the text encoded data into the second density layer to perform a second linear transformation operation to obtain second linear transformation data.
[0047] The second feature weighting submodule is used to perform a second feature weighting operation on the second linearly changing data according to the self-attention mechanism to obtain a second weighted feature vector.
[0048] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:
[0049] It includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the summary generation method for dialogue as described above.
[0050] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:
[0051] The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the summary generation method for dialogue as described above.
[0052] This application provides a method for summarizing dialogues, comprising: acquiring raw dialogue data to be processed; annotating the raw dialogue data to obtain annotated dialogue data; constructing a Levi graph based on the annotated dialogue data, and performing graph encoding on the annotated dialogue data based on the Levi graph to obtain graph-encoded data; performing a first feature weighting operation on the graph-encoded data to obtain a first weighted feature vector; performing text encoding on the annotated dialogue data according to the encoder of a BART model to obtain text-encoded data; performing a second feature weighting operation on the text-encoded data to obtain a second weighted feature vector; concatenating the first weighted feature vector and the second weighted feature vector according to an OTK fusion model to obtain a concatenated feature vector; and decoding the concatenated feature vector according to the decoder of a BART model to obtain a meeting summary generated text. Compared with the prior art, this application performs Levi graph encoding on speakers from different perspectives, using the perspective-encoded features as input to control the decoder to generate summaries of the same meeting content viewed by different roles. Attached Figure Description
[0053] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0054] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;
[0055] Figure 2 This is a flowchart illustrating the implementation of the summary generation method for dialogue provided in Embodiment 1 of this application;
[0056] Figure 3 This is a schematic diagram of the structure of the summary generation device for dialogue provided in Embodiment 2 of this application;
[0057] Figure 4 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation
[0058] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0059] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0060] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0061] like Figure 1 As shown, system architecture 100 may include terminal device 101, network 102, and server 103. Terminal device 101 may be a laptop 1011, tablet 1012, or mobile phone 1013. Network 102 is used as a medium to provide a communication link between terminal device 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc.
[0062] Users can use terminal device 101 to interact with server 103 via network 102 to receive or send messages, etc. Various communication client applications can be installed on terminal device 101, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0063] Terminal device 101 can be various electronic devices with a display screen and support web browsing. In addition to laptops 1011, tablets 1012, or mobile phones 1013, terminal device 101 can also be an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III), an MP4 player (Moving Picture Experts Group Audio Layer IV), a laptop computer, and a desktop computer, etc.
[0064] Server 103 can be a server that provides various services, such as a backend server that provides support for the pages displayed on terminal device 101.
[0065] It should be noted that the summary generation method for dialogue provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the summary generation device for dialogue is generally located in the server / terminal device.
[0066] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0067] Example 1
[0068] Continue to refer to Figure 2 The diagram illustrates a flowchart of an embodiment of a dialogue summary generation method according to this application. The dialogue summary generation method includes steps S201, S202, S203, S204, S205, S206, S207, and S208.
[0069] In step S201, the raw dialogue data to be processed is obtained.
[0070] In step S202, the original dialogue data is annotated to obtain annotated dialogue data.
[0071] In this embodiment, the meeting data needs to be manually annotated. The dialogue data in the meeting is manually annotated according to speaker roles. For example, speakers S and T can each annotate the same meeting content from their own perspectives. Simultaneously, non-participants can also annotate the meeting content with their own text summaries. Thus, a single meeting dialogue has two types of participants but three summaries: S, T, and the overall summary (Gen). S and T are the meeting text summaries annotated from the perspectives of different participants, while Gen is the summary from non-participants. The Gen summary annotations will be used as the objective overall summary annotation data. The format of the annotated training set is as follows:
[0072] S0: XXXXX, XX.
[0073] T0: XXXX.
[0074] T1: XXXXX.
[0075] S1: XXX.
[0076] S2: XXX.
[0077] T2: XXX, XXX.
[0078] ...
[0079] Tn: XXXXXXX.
[0080] Sn: XXXXXXX.
[0081] SA:XXXXXXXX. TA:XXXXXX. GenA:XXXXXXXX.
[0082] In this context, content S and T represent the speeches given by participants S and T, respectively. S0 is the first sentence spoken by participant S, T0 is the first sentence spoken by participant T, and n represents the nth sentence spoken by each participant. SA is a summary of the meeting from the perspective of participant S, TA is a summary of the meeting from the perspective of participant T, and GenA is a summary of the meeting after non-participants have read the content from S and T.
[0083] In this embodiment, the dialogue graph G = (V, E), where V represents the content of the current speaker, and E represents the transition of the current content (E represents the relationship between two sentences and is strongly correlated with the role of the current speaker; E from the i-th node to the j-th node is defined as e{ij}, where node i is the i-th sentence in the meeting dialogue). For each e{ij}, a label category l is manually assigned. ij This category includes three types: Intention (In), Action (Ac), and Context (context).
[0084]
[0085] Ac, In, and context are the three tag types in LiJ. Ac represents action, In represents intent, and context represents context (if there is a contextual relationship between the two sentences). utt.i represents the current content node i, T and S represent the speaker roles T or S respectively. P represents the current role's perspective.
[0086] Suppose the current speaker is T, utt.i∈T, and the role's perspective is also on T or Gen, P∈'T', 'Gen'. Then, when the current speech content node i connects to other speech content j, the current label lij is marked as Ac.
[0087] Suppose the current speaker is S, utt.i∈S, and the role's perspective is also on S or Gen, P∈'S', 'Gen'. Then, when the current speech content node i connects to other speech content j, the current label lij is marked as In.
[0088] Suppose the current speaker is S, utt.i∈S, the role's perspective is on T, and P∈'S'. Then, when the current speech content node i connects to other speech content j, the current label lij is marked as context.
[0089] Suppose the current speaker is T, utt.i∈T, and the role's perspective is also on S, P∈'S'. Then, when the current speech content node i connects to other speech content j, the current label lij is marked as context.
[0090] In step S203, a Levi graph is constructed based on the labeled dialogue data, and graph encoding is performed on the labeled dialogue data based on the Levi graph to obtain graph-encoded data.
[0091] In this embodiment, after the data is labeled, Levi graph modeling is performed on the meeting dialogue data. Since the labeling of the same sentence will be different depending on the perspective of the actor, different perspectives will have different learning objectives when encoding the Levi graph.
[0092] In step S204, the graph-coded data is subjected to a first feature weighting operation to obtain a first weighted feature vector.
[0093] In step S205, the labeled dialogue data is text encoded according to the encoder of the BART model to obtain text encoded data.
[0094] In step S206, a second feature weighting operation is performed on the text encoded data to obtain a second weighted feature vector.
[0095] In step S207, the first weighted feature vector and the second weighted feature vector are concatenated according to the OTK fusion model to obtain the concatenated feature vector.
[0096] Although the two parts of the input are transformed by density layers with different parameters, they are symmetrical, which ensures the consistency of the transformation for the same input and also ensures that the input vector dimension is the same.
[0097] In this embodiment of the application, two vectors H with the same dimension are... D and H I To splice:
[0098] Z G =OTKE([H ′ D :H′ I ])
[0099] In step S208, the concatenated feature vectors are decoded using the BART model's decoder to obtain the conference summary text.
[0100] In this embodiment of the application, a method for generating summaries of dialogues is provided, comprising: acquiring raw dialogue data to be processed; annotating the raw dialogue data to obtain annotated dialogue data; constructing a Levi graph based on the annotated dialogue data, and performing graph encoding on the annotated dialogue data based on the Levi graph to obtain graph-encoded data; performing a first feature weighting operation on the graph-encoded data to obtain a first weighted feature vector; performing text encoding on the annotated dialogue data according to the encoder of a BART model to obtain text-encoded data; performing a second feature weighting operation on the text-encoded data to obtain a second weighted feature vector; concatenating the first weighted feature vector and the second weighted feature vector according to an OTK fusion model to obtain a concatenated feature vector; and decoding the concatenated feature vector according to the decoder of a BART model to obtain the meeting summary generated text. Compared with the prior art, this application performs Levi graph encoding on speakers from different perspectives, and uses the perspective-encoded features as input to control the decoder to generate summaries of the same meeting content viewed by different roles.
[0101] In some optional implementations of the embodiments of this application, step S204 specifically includes the following steps:
[0102] The graph-encoded data is input into the RGCN layer for max pooling to obtain max pooled data.
[0103] The max-pooled data is input into the first density layer to perform the first linear transformation operation, resulting in the first linearly transformed data;
[0104] The first feature weighting operation is performed on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
[0105] In this embodiment, the same dialogue data is divided into two parts. One part is input into a Levi graph for encoding, then input into an RGCN layer and subjected to pooling (max pooling). The pooled feature vector is input into a density layer, which consists of a fully connected layer, and then self-attention is performed. The resulting self-attention weights are then multiplied by the feature vector itself to obtain the vector H. D As one of the inputs to the OTK fusion model, the self-attention calculation process is as follows:
[0106]
[0107] In some optional implementations of the embodiments of this application, step S206 specifically includes the following steps:
[0108] The text-encoded data is input into the second density layer to perform a second linear transformation operation, resulting in second linearly transformed data.
[0109] The second feature weighting operation is performed on the second linearly changing data according to the self-attention mechanism to obtain the second weighted feature vector.
[0110] In this embodiment, another portion of the dialogue input is fed into a BART model for text encoding. The resulting encoded vector is then input into another fully connected (dense) layer, where self-attention is also performed to obtain H. I This vector is also one of the inputs to the OTK fusion model.
[0111] In some optional implementations of the embodiments of this application, after step S207, the following steps are further included:
[0112] The concatenated feature vectors are input into two multilayer perceptron layers respectively to obtain two multilayer perceptron features;
[0113] The two multi-layer perceptual features are mapped onto general features of the feature vector dimension respectively to obtain the enhanced concatenated feature vector;
[0114] The above step S208 specifically includes the following steps:
[0115] The enhanced splicing feature vector is decoded using the BART model's decoder to generate the conference summary text.
[0116] In this embodiment, the concatenated vector ZG is input to two MLP layers. Each MLP consists of a fully connected layer and an activation function. Finally, the features are mapped to half of the concatenated dimension, which is consistent with the HD or HI dimension.
[0117] In some optional implementations of the embodiments of this application, step S208 specifically includes the following steps:
[0118] The concatenated feature vector is input into the third density layer to perform the third linear transformation operation, resulting in the third linear transformation data.
[0119] The third feature weighting operation is performed on the third linearly changing data according to the self-attention mechanism to obtain the third weighted feature vector;
[0120] The third weighted feature vector is input into the decoder of the BART model for decoding to obtain the conference summary generation result.
[0121] In this embodiment, the output of the OTK fusion model is then input into a density layer consisting of a fully connected layer to calculate self-attention. Finally, the output of the self-attention is input into the BART decoder to generate conference summaries from different perspectives. The different perspective summaries depend on which perspective encoding Levi graph was used at the time of input, which can be T, S, or Gen.
[0122] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0123] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0124] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware with computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0125] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0126] Example 2
[0127] Further reference Figure 3 As a response to the above Figure 2 The implementation of the method shown in this application provides an embodiment of a conversation summary generation apparatus, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0128] like Figure 3 As shown, the summary generation apparatus 200 for dialogue in this embodiment of the application includes:
[0129] Raw data acquisition module 210 is used to acquire raw dialogue data to be processed;
[0130] The annotation module 220 is used to perform annotation operations on the original dialogue data to obtain annotated dialogue data;
[0131] The graph encoding module 230 is used to construct a Levi graph based on the labeled dialogue data, and to perform graph encoding operations on the labeled dialogue data based on the Levi graph to obtain graph encoded data.
[0132] The first feature weighting module 240 is used to perform a first feature weighting operation on the graph-coded data to obtain a first weighted feature vector.
[0133] The text encoding module 250 is used to perform text encoding operations on the labeled dialogue data according to the encoder of the BART model to obtain text encoded data.
[0134] The second feature weighting module 260 is used to perform a second feature weighting operation on the text encoded data to obtain a second weighted feature vector.
[0135] The feature concatenation module 270 is used to concatenate the first weighted feature vector and the second weighted feature vector according to the OTK fusion model to obtain the concatenated feature vector;
[0136] The decoding module 280 is used to decode the concatenated feature vectors according to the decoder of the BART model to obtain the conference summary text.
[0137] In this embodiment, a conversation summarization device 200 is provided, comprising: a raw data acquisition module 210 for acquiring raw conversation data to be processed; an annotation module 220 for annotating the raw conversation data to obtain annotated conversation data; a graph encoding module 230 for constructing a Levi graph based on the annotated conversation data and performing graph encoding on the annotated conversation data based on the Levi graph to obtain graph encoded data; a first feature weighting module 240 for performing a first feature weighting operation on the graph encoded data to obtain a first weighted feature vector; a text encoding module 250 for performing text encoding on the annotated conversation data according to the encoder of a BART model to obtain text encoded data; a second feature weighting module 260 for performing a second feature weighting operation on the text encoded data to obtain a second weighted feature vector; a feature concatenation module 270 for concatenating the first weighted feature vector and the second weighted feature vector according to an OTK fusion model to obtain a concatenated feature vector; and a decoding module 280 for decoding the concatenated feature vector according to the decoder of a BART model to obtain the conversation summary generated text. Compared with existing technologies, this application uses Levi graph encoding on speakers from different perspectives and uses the perspective encoding features as input to control the decoder to generate summaries of the same meeting content viewed by different roles.
[0138] In some optional implementations of the embodiments of this application, the first feature weighting module includes:
[0139] The max pooling submodule is used to input graph-encoded data into the RGCN layer for max pooling operation to obtain max pooled data.
[0140] The first linear transformation submodule is used to input the max pooling data into the first density layer to perform the first linear transformation operation and obtain the first linearly transformed data.
[0141] The first feature weighting submodule is used to perform a first feature weighting operation on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
[0142] In some optional implementations of the embodiments of this application, the second feature weighting module includes:
[0143] The second linear transformation submodule is used to input text-encoded data into the second density layer to perform a second linear transformation operation and obtain second linear transformation data.
[0144] The second feature weighting submodule is used to perform a second feature weighting operation on the second linearly changing data according to the self-attention mechanism to obtain the second weighted feature vector.
[0145] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 4 , Figure 4 This is a basic structural block diagram of a computer device according to an embodiment of this application.
[0146] The computer device 300 includes a memory 310, a processor 320, and a network interface 330 that are interconnected via a system bus. It should be noted that only the computer device 300 with components 310-330 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0147] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0148] The memory 310 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 310 may be an internal storage unit of the computer device 300, such as the hard disk or memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc. Of course, the memory 310 may also include both internal storage units and external storage devices of the computer device 300. In this embodiment, the memory 310 is typically used to store the operating system and various application software installed on the computer device 300, such as computer-readable instructions for a dialogue summary generation method. Furthermore, the memory 310 can also be used to temporarily store various types of data that have been output or will be output.
[0149] In some embodiments, the processor 320 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 320 is typically used to control the overall operation of the computer device 300. In this embodiment, the processor 320 is used to execute computer-readable instructions stored in the memory 310 or to process data, for example, to execute computer-readable instructions applied to the dialogue summary generation method.
[0150] The network interface 330 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 300 and other electronic devices.
[0151] The computer device provided in this application performs Levi graph encoding on speakers from different perspectives, and uses the perspective encoding features as input to control the decoder to generate summaries of the same meeting content viewed by different roles.
[0152] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the summary generation method applied to a dialogue as described above.
[0153] The computer-readable storage medium provided in this application uses Levi graph encoding on speakers from different perspectives and uses the perspective encoding features as input to control the decoder to generate summaries of the same meeting content viewed by different roles.
[0154] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0155] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A method for summarizing dialogues, characterized in that, Includes the following steps: Obtain the raw dialogue data to be processed; The original dialogue data is annotated according to speaker roles to obtain annotated dialogue data; A Levi graph is constructed based on the labeled dialogue data, and graph encoding is performed on the labeled dialogue data based on the Levi graph to obtain graph-encoded data; The graph-encoded data is subjected to a first feature weighting operation to obtain a first weighted feature vector; The labeled dialogue data is text-encoded according to the encoder of the BART model to obtain text-encoded data; The text-encoded data is subjected to a second feature weighting operation to obtain a second weighted feature vector; The first weighted feature vector and the second weighted feature vector are concatenated according to the OTK fusion model to obtain the concatenated feature vector; The concatenated feature vectors are decoded using the BART model's decoder to obtain the conference summary text. The step of decoding the concatenated feature vectors using the BART model decoder to obtain the conference summary text specifically includes the following steps: The spliced feature vector is input into the third density layer to perform a third linear transformation operation, resulting in third linear transformation data. The third linearly changing data is subjected to a third feature weighting operation based on the self-attention mechanism to obtain a third weighted feature vector. The third weighted feature vector is input into the decoder of the BART model for decoding to obtain the conference summary text.
2. The method for summarizing dialogues according to claim 1, characterized in that, The step of performing a first feature weighting operation on the graph-encoded data to obtain a first weighted feature vector specifically includes the following steps: The graph-encoded data is input into the RGCN layer for max pooling to obtain max pooled data; The max-pooling data is input into the first density layer to perform a first linear transformation operation to obtain the first linearly transformed data. The first feature weighting operation is performed on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
3. The method for summarizing dialogues according to claim 1, characterized in that, The step of performing a second feature weighting operation on the text encoded data to obtain a second weighted feature vector specifically includes the following steps: The text-encoded data is input into the second density layer to perform a second linear transformation operation, resulting in second linearly transformed data. The second linearly changing data is subjected to a second feature weighting operation based on the self-attention mechanism to obtain a second weighted feature vector.
4. The method for summarizing dialogues according to claim 1, characterized in that, After the step of concatenating the first weighted feature vector and the second weighted feature vector according to the OTK fusion model to obtain the concatenated feature vector, the following steps are also included: The concatenated feature vectors are input into two multilayer perceptron layers respectively to obtain two multilayer perceptron features; The two multi-layer perceptual features are mapped onto general features of the feature vector dimension respectively to obtain enhanced concatenated feature vectors; The step of decoding the concatenated feature vectors using the BART model decoder to obtain the conference summary text specifically includes the following steps: The enhanced splicing feature vector is decoded using the BART model's decoder to obtain the conference summary text.
5. A summary generation device for dialogue, characterized in that, include: The raw data acquisition module is used to acquire the raw dialogue data to be processed. The annotation module is used to annotate the original dialogue data according to the speaker role to obtain annotated dialogue data; The graph encoding module is used to construct a Levi graph based on the labeled dialogue data, and to perform graph encoding operations on the labeled dialogue data based on the Levi graph to obtain graph encoded data; The first feature weighting module is used to perform a first feature weighting operation on the graph-coded data to obtain a first weighted feature vector. The text encoding module is used to perform text encoding operations on the labeled dialogue data according to the encoder of the BART model to obtain text encoded data; The second feature weighting module is used to perform a second feature weighting operation on the text encoded data to obtain a second weighted feature vector. The feature concatenation module is used to concatenate the first weighted feature vector and the second weighted feature vector according to the OTK fusion model to obtain the concatenated feature vector; The decoding module is used to decode the concatenated feature vector according to the decoder of the BART model to obtain the conference summary generated text; The decoding module is also used to input the spliced feature vector into the third density layer to perform a third linear transformation operation to obtain the third linear transformation data; The decoding module is further configured to perform a third feature weighting operation on the third linearly changing data according to a self-attention mechanism to obtain a third weighted feature vector; The decoding module is also used to input the third weighted feature vector into the decoder of the BART model for decoding to obtain the conference summary generated text.
6. The summary generation apparatus for dialogue according to claim 5, characterized in that, The first feature weighting module includes: The max pooling submodule is used to input the graph-encoded data into the RGCN layer for max pooling operation to obtain max pooled data. The first linear transformation submodule is used to input the max pooling data into the first density layer to perform a first linear transformation operation to obtain the first linearly transformed data. The first feature weighting submodule is used to perform a first feature weighting operation on the first linearly changing data according to the self-attention mechanism to obtain the first weighted feature vector.
7. The summary generation apparatus for dialogue according to claim 5, characterized in that, The second feature weighting module includes: The second linear transformation submodule is used to input the text encoded data into the second density layer to perform a second linear transformation operation to obtain second linear transformation data. The second feature weighting submodule is used to perform a second feature weighting operation on the second linearly changing data according to the self-attention mechanism to obtain a second weighted feature vector.
8. A computer device, comprising a memory and a processor, characterized in that, The memory stores computer-readable instructions, and when the processor executes the computer-readable instructions, it implements the steps of the summary generation method for dialogue as described in any one of claims 1 to 4.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions that, when executed by a processor, implement the steps of the summary generation method for dialogue as described in any one of claims 1 to 4.