Abstract extraction method and apparatus, electronic device, and storage medium
By training a key information extraction model and a summary generation model for a large language model, the problems of noise and errors in speech dialogue conversion documents were solved, and accurate summary generation of dialogue text content was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SF TECH CO LTD
- Filing Date
- 2024-12-30
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, the conversion of voice dialogue into documents is noisy, and key information is easily segmented or transcribed incorrectly, making it difficult to accurately extract key information from the document and affecting the accuracy of summary generation.
A key information extraction model is trained based on a large language model to extract key information from the dialogue text, and then combined with a summary generation model to generate an accurate summary.
It improves the accuracy of extracting summaries from dialogue text, generating more accurate summaries and ensuring the completeness and accuracy of key information.
Smart Images

Figure CN122309728A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, specifically to a method, apparatus, electronic device, and storage medium for extracting abstracts. Background Technology
[0002] In existing technologies, key information is usually extracted from documents by using features such as word frequency, phrase frequency, or short phrase frequency. Then, the extracted key information is combined to obtain a summary. Alternatively, a sequence-to-sequence (seq2seq) model in deep learning can be used to directly generate a corresponding summary based on the document content.
[0003] However, documents may contain significant noise, especially those converted from voice conversations. Due to customer expression habits, much key information (such as emails, phone numbers, and addresses) may be segmented, or automatic speech recognition (ASR) may transcribe key information (such as homophones) incorrectly, making it difficult to extract key information from the document. Based on the combination of extracted keywords or directly on the document, it is difficult to obtain an accurate summary. Summary of the Invention
[0004] Based on the defects and shortcomings of the prior art, this application proposes a summary extraction method, apparatus, electronic device and storage medium, which can call a key information extraction model trained on a large language model for a key information extraction task based on the dialogue text content, accurately extract the key information in the dialogue text content, and thus obtain an accurate summary of the dialogue text content based on the dialogue text content and the key information.
[0005] According to a first aspect of the embodiments of this application, a summary extraction method is provided, comprising:
[0006] Obtain the dialogue text content to be extracted;
[0007] Based on the dialogue text content, a key information extraction model is invoked to extract key information from the dialogue text content. The key information extraction model is obtained by training a large language model for the key information extraction task.
[0008] A summary of the dialogue text content is obtained based on the dialogue text content and the key information;
[0009] Output a summary of the dialogue text.
[0010] According to a second aspect of the embodiments of this application, a summary extraction apparatus is provided, comprising:
[0011] The acquisition module is used to acquire the dialogue text content for which the summary is to be extracted;
[0012] The extraction module is used to call the key information extraction model based on the dialogue text content to extract key information from the dialogue text content. The key information extraction model is obtained by training a large language model for the key information extraction task.
[0013] A generation module is used to obtain a summary of the dialogue text content based on the dialogue text content and the key information;
[0014] The output module is used to output a summary of the dialogue text content.
[0015] According to a third aspect of the embodiments of this application, an electronic device is provided, including a memory and a processor;
[0016] The memory is connected to the processor and is used to store programs;
[0017] The processor is used to implement the summary extraction method as described in the first aspect by running a program in the memory.
[0018] According to a fourth aspect of the embodiments of this application, a storage medium is provided, on which a computer program is stored, and when the computer program is run by a processor, it implements the summary extraction method as described in the first aspect.
[0019] In the aforementioned summary extraction method, apparatus, electronic device, and storage medium, after receiving dialogue text content, a key information extraction model is invoked based on the dialogue text content to extract key information from the dialogue text content. This key information extraction model is trained on a large language model specifically for the key information extraction task. Then, based on the acquired dialogue text content and its key information, a summary of the dialogue text content is extracted and output. Thus, since the key information extraction model is trained on a large language model specifically for the key information extraction task, the reasoning ability of the large language model can be utilized to extract key information, resulting in accurate key information. The summary extraction of the dialogue text content is performed based on the dialogue text content itself and the key information. Due to the guidance of accurate key information, a more accurate summary of the dialogue text content can be extracted. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0021] Figure 1 This is a flowchart illustrating a summary extraction method according to an embodiment of this application;
[0022] Figure 2 This is a schematic diagram illustrating a summary extraction process according to an embodiment of this application;
[0023] Figure 3 This is a schematic diagram of a summary extraction device provided in an embodiment of this application;
[0024] Figure 4 This is a schematic diagram of the structure of an electronic device proposed in an embodiment of this application. Detailed Implementation
[0025] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0026] Overview
[0027] As described in the background section, in the prior art, key information is usually extracted from a document by using features such as word frequency, phrase frequency, or short phrase frequency. Then, the extracted key information is combined to obtain a summary, or a corresponding summary is directly generated based on the document content using seq2seq in deep learning.
[0028] Among them, the seq2seq neural network architecture can transform one sequence (such as a sentence, an audio clip, etc.) into another sequence (such as translation of another language, text summarization, text output of speech recognition, etc.). This neural network architecture has wide applications in natural language processing and other sequence data processing tasks.
[0029] However, documents may contain significant noise, especially those converted from voice dialogue. Due to customer expression habits, much crucial information (such as emails, phone numbers, and addresses) may be segmented, or ASR (Automatic Speech Recognition) may mistranscribe key information (such as homophones), making it difficult to extract crucial information from the document. Accurate summarization based on extracted keywords or directly from the document is unlikely. ASR, on the other hand, refers to the conversion of human speech signals into machine-understandable text or commands using computer programs and algorithms. ASR is widely used in mobile phones, smart homes, in-car navigation, automated customer service, and many other fields, greatly improving the convenience and efficiency of human-computer interaction.
[0030] In this embodiment, during the process of extracting a summary from the dialogue text, the key information extracted by the key information extraction model trained on a large language model, along with the dialogue text itself, can yield an accurate summary of the dialogue text, thus improving the accuracy of the summary extraction. Specifically, the key information extraction model trained on a large language model for the key information extraction task extracts key information from the obtained dialogue text to be summarized. Then, based on the key information, it obtains a summary of the dialogue text and outputs it. Thus, since the key information extraction model is trained on a large language model for the key information extraction task, it can utilize the powerful reasoning ability of the large language model to extract accurate key information. Because this key information can, to a certain extent, represent the general content of the dialogue text, combining the key information with the dialogue text content allows for the accurate extraction of a summary of the dialogue text.
[0031] Based on the above concept, this specification provides a summary extraction method, which will be described exemplarily below with reference to the accompanying drawings.
[0032] Please see Figure 1 In one exemplary embodiment, a summary extraction method is provided, applicable to any electronic device. For example... Figure 1 As shown, the abstract extraction method includes steps S101-S104:
[0033] S101: Obtain the dialogue text content for which the summary is to be extracted.
[0034] The dialogue in the text can refer to customer service conversations, i.e., conversations between customer service representatives and customers. The customer service conversations are processed to obtain the corresponding text content. Of course, depending on the specific needs, the dialogue in the text can also refer to other conversations besides customer service conversations.
[0035] Customer service conversations can be conducted via voice or text.
[0036] Specifically, the process involves first obtaining the audio dialogue from which the summary is to be extracted, and then performing ASR transcription on the audio dialogue to obtain the corresponding dialogue text content.
[0037] For example, after obtaining the customer service voice dialogue to be extracted, the customer service voice dialogue is converted into text, and the text is processed to obtain the corresponding text dialogue content.
[0038] Alternatively, specifically, after directly obtaining the customer service text dialogue to be extracted, the text of the customer service text dialogue is processed to obtain the corresponding text dialogue content.
[0039] More specifically, text processing is called text cleaning, which involves removing special symbols, duplicate text, and other content to obtain relatively clean text dialogue content.
[0040] Of course, after transcribing the voice dialogue, the transcribed text can also be processed, that is, cleaned, to remove special symbols, duplicate text, and other content to obtain clean dialogue text content.
[0041] S102: Based on the dialogue text content, a key information extraction model is called to extract key information from the dialogue text content.
[0042] The key information extracted may vary depending on the actual working conditions.
[0043] For example, key information includes, for instance, address, telephone number, email address, customer requirements, etc.
[0044] For example, for the dialogue text "Mr. Ji's train will arrive at City A train station at 8:00 AM tomorrow. Please remember to pick him up then", the key information extraction model is invoked to extract the key information from the dialogue text, including the time "8:00 AM tomorrow", the location "City A train station", and the event "picking up Mr. Ji".
[0045] Furthermore, the key information extraction model is obtained by training a large language model specifically for the key information extraction task. The key information extraction model is used to extract key information from the content of dialogue text.
[0046] Among them, the large language model is either a general large language model or a special large language model for the logistics customer service field.
[0047] Based on the received dialogue text content, the key information in the dialogue text content is extracted by calling the key information extraction model from local storage or by calling the key information extraction model generated in real time.
[0048] S103: Obtain a summary of the dialogue text content based on the dialogue text content and key information.
[0049] The extracted key information is combined to determine a summary of the dialogue text.
[0050] For example, the dialogue text content is "The product a you purchased on November 18, 2024 will be delivered to its destination between 9:00 AM and 10:00 AM on November 20, 2024". In step S102, the key information is extracted through the key information extraction model, including product information "product a purchased on November 18, 2024" and delivery time "9:00 AM to 10:00 AM on November 20, 2024". By directly combining the key information, the summary of the dialogue text content can be obtained as "Product a purchased on November 18, 2024 will be delivered between 9:00 AM and 10:00 AM on November 20, 2024".
[0051] Alternatively, based on the extracted key information and the dialogue text content, a summary of the dialogue text content can be generated.
[0052] For example, taking the dialogue text content and its key information given in the previous example, the extracted key information, namely product information and delivery time, along with the dialogue text content, are input into the summary generation model. The summary generation model extracts a summary of the dialogue text content based on the key information, resulting in a summary of the dialogue text content, namely, "Product a purchased on November 18, 2024 will be delivered between 9:00 and 10:00 AM on November 20, 2024".
[0053] S104: Output a summary of the dialogue text.
[0054] In this embodiment, after receiving the dialogue text content, a key information extraction model is invoked based on the dialogue text content to extract key information from the dialogue text content. This key information extraction model is trained on a large language model specifically for key information extraction tasks. Then, a summary of the dialogue text content is obtained based on the key information, and this summary is output. Thus, since the key information extraction model is trained on a large language model specifically for key information extraction tasks, the reasoning ability of the large language model can be utilized to extract key information, resulting in more accurate key information. Based on this key information, a more accurate summary of the dialogue text can be obtained.
[0055] In some embodiments, when a key information extraction model is invoked based on the dialogue text content to extract key information from the text dialogue content, the key information is extracted by the key information extraction model based on the dialogue text content and the first instruction information to obtain the key information.
[0056] The first instruction information is used to instruct the key information extraction model to perform the key information extraction task, that is, to instruct the key information extraction model to extract key information from the dialogue text content.
[0057] Specifically, the dialogue text content is first combined with the first instruction information to generate a first prompt message, which prompts the key information extraction model to perform a key information extraction task on the dialogue text content, that is, to extract the key information in the dialogue text content. Then, the first prompt message and the dialogue text content are input into the key information extraction model to obtain the key information in the dialogue text content.
[0058] The first instruction information is the instruction in the example given in the previous embodiment. The dialogue text content is filled into the corresponding placeholder of the instruction to obtain the first prompt information.
[0059] In this embodiment, the dialogue text content and the first instruction information that instructs the key information extraction model to perform the key information extraction task are combined to generate a first prompt information. Then, the first prompt information and the dialogue text content are input into the key information extraction model to obtain the key information in the dialogue text content. Thus, without manual intervention, the key information in the dialogue text content can be automatically extracted through the key information extraction model.
[0060] In some embodiments, when obtaining a summary of the dialogue text content based on key information, a summary generation model can also be invoked based on the dialogue text content and key information to generate a summary of the dialogue text content, thereby achieving automatic summary generation.
[0061] The summary generation model is obtained by training a large language model for the summary generation task.
[0062] Similarly, this large language model can be a general-purpose large language model or a specialized large language model for the logistics customer service field.
[0063] In this embodiment, a summary generation model is invoked based on the dialogue text content and extracted key information to generate a summary of the dialogue text content. Since the summary generation model is trained on a large language model specifically for the summary generation task, it leverages the reasoning capabilities of the large language model to generate a more accurate summary of the dialogue text content.
[0064] In some embodiments, when a summary generation model is invoked based on the dialogue text content and key information to generate a summary of the text dialogue content, a summary is generated by using the summary generation model based on the dialogue text content, key information, and second instruction information to obtain a summary of the dialogue text content.
[0065] The second instruction information is used to instruct the summary generation model to perform the summary generation task, that is, to instruct the summary generation model to extract a summary of the dialogue text content.
[0066] Specifically, the dialogue text content is first combined with the second instruction information to generate a second prompt message, which prompts the summary generation model to perform a summary generation task on the dialogue text content and the extracted key information, that is, to generate a summary of the dialogue text content. Then, the second prompt message and the dialogue text content are input into the summary generation model to obtain a summary of the dialogue text content.
[0067] The second instruction information is the instruction in the example given in the previous embodiment. The dialogue text content and the extracted key information are filled into the corresponding placeholders of the instruction to obtain the second prompt information.
[0068] In this embodiment, the dialogue text content, the extracted key information, and the second instruction information used to instruct the summary generation model to perform the summary generation task are combined to generate a second prompt information. Then, the second prompt information and the dialogue text content are input into the summary generation model to obtain a summary of the dialogue text content. Thus, without manual intervention, the summary generation model can automatically generate a summary of the dialogue text content.
[0069] Since some information may not be extracted during the extraction of key information, resulting in inaccurate summaries of the final generated dialogue text, in order to ensure the accuracy and completeness of the generated summaries, in some embodiments, after generating the summary of the dialogue text content, it is determined whether there is any missing information in the summary, and based on whether there is any missing information in the summary, corresponding operations are performed to regenerate the summary or directly output the summary.
[0070] A judgment model is used to determine whether there is missing information in the summary. This judgment model can be trained on any large language model.
[0071] If yes, meaning there is missing information in the summary, then regenerate the summary or generate a new summary until there is no missing information in the summary; if no, meaning there is no missing information in the summary, then perform the operation of outputting a summary of the dialogue text content.
[0072] For example, taking the generated summary of the dialogue text content as 'a', if there is missing information in 'a', the summary is regenerated to obtain the regenerated summary 'b'. If there is still missing information in 'b', the summary is regenerated again. If there is no longer any missing information in 'b', the operation of outputting 'b' is executed. If there is no missing information in 'a', the operation of outputting 'a' is executed.
[0073] When regenerating the summary, the key information extraction model and the summary generation model are invoked based on the already generated summary and the dialogue text content to regenerate the summary until there is no missing information in the generated summary.
[0074] For example, taking a summary of the generated dialogue text as 'a', if there is missing information in 'a', the key information extraction model and the summary generation model are called based on 'a' and the dialogue text content to regenerate the summary, and the regenerated summary is 'b'.
[0075] Specifically, if there is missing information in the summary, the key information extraction model is invoked based on the summary and dialogue text content to extract the missing information. Then, the summary generation model is invoked based on the missing information, dialogue text content, and summary to generate a new summary, until there is no missing information in the new summary.
[0076] Missing information refers to information that is missing from the abstract.
[0077] For example, the dialogue text is "The goods you purchased on November 18, 2024 will be delivered to their destination between 9:00 AM and 10:00 AM on November 20, 2024". The extracted summary is "The goods you purchased on November 18, 2024 will be delivered on the 20th". This summary is missing the specific delivery time. The missing information is the specific delivery time. Based on this summary, the dialogue text content, and the description of the missing information, namely the "specific delivery time", the summary generation model is called to generate a new summary, namely "The goods you purchased on November 18, 2024 will be delivered between 9:00 AM and 10:00 AM on November 20, 2024".
[0078] Specifically, the summary, dialogue text, and first instruction information are combined to generate a third prompt message. This prompts the key information extraction model to perform the key information extraction task again on the dialogue text, that is, to extract new key information or missing information from the dialogue text in addition to the key information already extracted. Then, the third prompt message and the dialogue text are input into the key information extraction model to extract the missing information from the dialogue text.
[0079] Next, the missing information, dialogue text content, and summary are combined with the second instruction information to generate a fourth prompt. This prompts the summary generation model to perform the summary generation task again on the dialogue text content, that is, to generate a new summary based on the extracted missing information combined with the dialogue text content and the summary. Then, the fourth prompt and the dialogue text content are input into the summary generation model to generate a new, more complete, and accurate summary of the dialogue text content.
[0080] For a description of the first and second instruction information, please refer to the above content, which will not be repeated here.
[0081] Specifically, after generating a new summary, it is determined whether there is any missing information in the new summary. If so, the step of generating a new summary is repeated; otherwise, the new summary is output.
[0082] In this embodiment, after generating a summary of the dialogue text content, it is determined whether there is any missing information in the summary. If so, the key information extraction model and the summary generation model are invoked based on the summary and the dialogue text content to regenerate the summary of the dialogue text content. If not, the operation of outputting the dialogue text content summary is performed. In this way, the completeness of the information represented by the generated summary of the dialogue text content can be guaranteed as much as possible. When the information represented by the summary of the dialogue text content is incomplete, the summary is regenerated to ensure the completeness and accuracy of the obtained summary.
[0083] Furthermore, based on the summary and dialogue text content, a key information extraction model is invoked to extract any missing information from the summary. This missing information, along with the dialogue text content and the summary, is then used to invoke a summary generation model to generate a new summary. The new summary is then checked for missing information; if so, the process of generating a new summary is repeated until no missing information remains. This process of critical information evaluation determines whether any key information is missing. If so, the missing key information is extracted again based on the current summary and dialogue text content. Once the missing key information is obtained, the summary generation model is invoked again based on the dialogue content, the current summary, and the missing key information to generate a new summary. This iterative process of self-reflection continues until a complete summary is generated. Error correction is then performed on the generated summary to ensure its completeness and accuracy.
[0084] Further extraction of key information from the dialogue text may result in missing key information. To ensure that the missing key information in the summary is extracted as quickly as possible, in some embodiments, the missing information can be extracted by targeting specific information items within the missing information.
[0085] First, identify the missing information items in the summary. Then, based on the missing information items, the (current) summary, and the dialogue text content, call the key information extraction model to extract the content corresponding to the missing information items, obtain the missing information in the summary, and thus quickly extract the missing key information in the summary, thereby quickly regenerating the summary of the dialogue text content.
[0086] To ensure the accuracy of the final output dialogue text content, in some embodiments, after generating the summary of the dialogue text content, or before outputting the summary of the dialogue text content, a security audit is performed on the summary to determine whether there is any illegal information in the summary. If so, the summary is modified again.
[0087] Optionally, the system can determine whether there is hallucination in the summary, that is, whether there is information in the summary that is inconsistent with the content of the dialogue text or inconsistent with the facts, and perform corresponding operations based on whether there is hallucination in the summary.
[0088] Among them, the presence of hallucinations in the summary can be determined by a judgment model, which can be any large language model or a model determined based on other judgment logic.
[0089] If yes, meaning there is an illusion in the summary, then the summary is regenerated; if no, meaning there is no illusion in the summary, then the operation of outputting a summary of the dialogue text content is performed.
[0090] If hallucinations are present in the summary, when regenerating the summary, the summary generation model is invoked based on the dialogue text content and the current summary to regenerate the summary until the hallucinations are no longer present in the summary.
[0091] Specifically, the dialogue text content, the current summary, and the second instruction information are combined to obtain the fifth prompt information. This prompt information is used to instruct the summary generation model to regenerate the summary. The fifth prompt information and the dialogue text content are then input into the summary generation model to generate a new summary.
[0092] After each new summary, it is determined whether there is a hallucination in the new summary. If not, that is, there is no hallucination, the new summary is determined as the final required summary; if so, that is, there is a hallucination, the summary is re-determined based on the new summary and the content of the dialogue text.
[0093] In this embodiment, after generating a summary of the dialogue text content, it is determined whether the summary contains hallucinations. If so, the summary generation model is invoked based on the summary and the dialogue text content to regenerate the summary until no hallucinations are found in the summary. If not, the operation of outputting a summary of the dialogue text content is performed. This ensures that the final generated summary is complete and accurate.
[0094] In some embodiments, the summary of the dialogue text content is post-processed before being output. This post-processing includes removing invalid expressions, such as repetitions.
[0095] In some embodiments, the two parts of determining whether there is missing information and / or whether there is hallucination in the summary can be constructed by any judgment model, such as a single large language model, a natural language understanding (NLU) deep learning model such as a bidirectional encoder representations from transformers (BERT) model, a rule system, or an intelligent agent.
[0096] Since the key information of the dialogue text content is obtained by calling the key information extraction model, in some embodiments, before calling the key information extraction model based on the dialogue text content to extract the key information in the dialogue text content, it is necessary to obtain the key information extraction model first.
[0097] For the key information extraction task, the large language model is trained in combination with the customer service dialogue to obtain the required key information extraction model.
[0098] The customer service dialogue is annotated to obtain the first training sample. Then, based on the training data set composed of the first training sample, the large language model is fine-tuned to obtain the key information extraction model.
[0099] Among them, the first training sample includes the first prompt word, the customer service dialogue text, and the dialogue key information. The first prompt word is used to prompt the large language model to extract keywords, and the dialogue key information is the key information in the customer service dialogue. In addition, the customer service dialogue text is the text obtained by converting the customer service dialogue into text.
[0100] Exemplarily, training the key information extraction model includes Step 1 and Step 2:
[0101] Step 1, obtain training data:
[0102] Annotate the customer service dialogue from the hotline to obtain training data. Taking the training data with key information including phone number 178… and address Shenzhen… as an example, the format of this training data is for example:
[0103] {"instruction": "You are an expert in extracting key information in the hotline field and are very good at extracting the dialogue information from the dialogue content. Although many information is expressed in multiple sentences, you can still piece them together appropriately.",
[0104] "input": "Customer: ***.\nCS: ****",
[0105] "output": {"phone": 178...., "address": "Shenzhen...."}}
[0106] Among them, instruction represents the prompt, that is, the above first prompt word, which is used to enable the model to understand the instruction given to it; input is the dialogue content after ASR transcription; output is the key information extracted according to the dialogue content.
[0107] Step 2, train the model:
[0108] The large language model is fine-tuned based on supervised key information data, namely training data, which is the training dataset consisting of the first training samples.
[0109] By concatenating the instruction and input as input and output as the output label, and fine-tuning the large language model, a key information extraction model with key information extraction function is obtained.
[0110] In this embodiment, the acquired customer service dialogue is labeled to obtain the first training sample, which includes the first prompt word for prompting the large language model to extract keywords, the customer service dialogue text, and the key information in the customer service dialogue. Then, based on the training dataset composed of the first training sample, the large language model is fine-tuned to achieve the effect of training the large language model for the key information extraction task, and the required key information extraction model for extracting key information in the customer service dialogue text is obtained, which facilitates the automatic extraction of key information in the subsequent process.
[0111] Since the summary of the dialogue text content is obtained by calling the summary generation model, in some embodiments, the summary generation model needs to be obtained before generating the summary of the dialogue text content based on the dialogue text content and the summary of the dialogue text content.
[0112] For the task of summarizing, a large language model is trained by combining customer service dialogues to obtain the required summarizing model.
[0113] The customer service dialogues are labeled to obtain the second training samples. Then, based on the training dataset composed of the second training samples, the large language model is fine-tuned to obtain the summary generation model.
[0114] The second training sample includes a second prompt word, customer service dialogue text, key dialogue information, and a customer service dialogue summary. The second prompt word is used to prompt the large language model to generate the summary, and the key dialogue information is the key information in the customer service dialogue. Additionally, the customer service dialogue text is the text obtained by text-to-text conversion of the customer service dialogue.
[0115] For example, training the summary generation model includes steps 1 and 2:
[0116] Step 1, Obtain training data:
[0117] Training data was obtained by annotating customer service conversations from the hotline. Using a generated summary as an example, if a customer claims they haven't received a package, and customer service informs them that the package was placed at store number ** in **village** and the customer accepted it, the training data format would be as follows:
[0118] {"instruction":"#Your identity#: You are a text summarization system for conversations in the express delivery industry. Customers consult customer service about related issues arising from the express delivery industry..."}
[0119] "input": "Customer: ***. Customer Service: ****"
[0120] "output": "The customer stated that they did not receive this item, and customer service informed them that the item was placed at store number ** in ** village ** bar ** and the customer accepted it."
[0121] Here, instruction represents prompt, which is the second prompt word mentioned above, used to enable the model to understand the instructions given to it; input is the dialogue content after ASR transcription; and output is a summary generated based on the dialogue content and key information.
[0122] Step 2, train the model:
[0123] The large language model is fine-tuned based on supervised key information data, namely training data, which is the training dataset consisting of the second training samples.
[0124] By concatenating the instruction and input as input and output as the output label, and fine-tuning the large language model, a summary generation model with summary generation function is obtained.
[0125] In this embodiment, the acquired customer service dialogue is labeled to obtain a second training sample, which includes a second prompt word for prompting the large language model to generate a summary, the customer service dialogue text, key information in the customer service dialogue, and a summary of the customer service dialogue. Then, based on the training dataset composed of the second training sample, the large language model is fine-tuned to achieve the effect of training the large language model for the summary generation task, and the required summary generation model for generating the summary of the dialogue text content is obtained, which facilitates the automatic generation of summaries based on key information and dialogue text content in the future.
[0126] For example, the summary extraction process can be as follows: Figure 2 As shown:
[0127] Step 1: Input the hotline dialogue text and perform data preprocessing on the input data, including cleaning the input data, such as removing special symbols, repetitions, etc., to obtain a relatively clean dialogue text (i.e., the dialogue text content mentioned above).
[0128] Step 2: Extract key information from the dialogue text, such as address, phone number, email, etc.
[0129] Step 3: Combine the dialogue text obtained in Step 1 and the extracted key information into a prompt, that is, complete the dialogue content and key information into a prompt (i.e., the first prompt information mentioned above);
[0130] Step 4: Call the large language model (LLM) summarization service (i.e., the summarization generation model mentioned above) to generate the corresponding summary;
[0131] Step 5: Determine whether the current summary, i.e. the summary generated in step 4, is missing key information. The judgment model here can be any large language model.
[0132] Step 6: If information is determined to be missing in Step 5, then combine the dialogue text (from Step 1) and the summary (from Step 4) into a new prompt (i.e., the third prompt mentioned above), that is, improve the prompt used to extract the missing information, and call the LLM key information extraction service (i.e., the key information extraction model) again to extract the missing information.
[0133] Step 7: Combine the dialogue text from Step 1, the summary from Step 4, and the missing information extracted from Step 6 into a complete prompt (i.e., the fourth prompt information mentioned above).
[0134] Step 8: Call the LLM digest generation service again to generate the corresponding digest;
[0135] Step 9: Determine if the summary generated in Step 8 is missing information. If it is missing, continue with Steps 6-9; otherwise, proceed to Step 10.
[0136] Step 10: Determine whether the current summary contains hallucinations. The judgment model here can be any large model or other judgment logic.
[0137] If hallucinations are present, the dialogue text from step 1 and the current summary are combined to form a new prompt (i.e., the fifth prompt message mentioned above), and the LLM summary generation service is called to regenerate the summary until a complete and accurate summary is obtained.
[0138] Step 11: Perform post-processing on the summary obtained in Step 10 to remove invalid expressions, such as duplicates;
[0139] Step 12: Output a complete and accurate summary.
[0140] Steps 2-4 are for generating the summary, steps 5-9 are for filling in the gaps, and step 10 is for eliminating hallucinations.
[0141] Through the above process, the large language model can be used to integrate key information into complete information through reasoning ability, and the generated summary can be corrected through multiple self-reflection. It can generate diverse expressions based on the dialogue text content, and ensure that the generated summary is complete and accurate.
[0142] Exemplary device
[0143] like Figure 3 As shown in the figure, this application embodiment also provides a summary extraction device, including an acquisition module 301, an extraction module 302, a generation module 303, and an output module 304.
[0144] in,
[0145] The acquisition module 301 is used to acquire the dialogue text content to be extracted from the summary;
[0146] Extraction module 302 is used to call the key information extraction model based on the dialogue text content to extract key information from the dialogue text content. The key information extraction model is obtained by training a large language model for the key information extraction task.
[0147] The generation module 303 is used to obtain a summary of the dialogue text content based on the dialogue text content and the key information;
[0148] Output module 304 is used to output a summary of the dialogue text content.
[0149] The abstract extraction device provided in this embodiment belongs to the same concept as the abstract extraction method provided in the above embodiments of this application. It can execute the method provided in any of the above embodiments of this application and has the corresponding functional modules and beneficial effects. Technical details not described in detail in this embodiment can be found in the specific processing content of the abstract extraction method provided in the above embodiments of this application, and will not be repeated here.
[0150] The functions implemented by the above-mentioned acquisition module 301, extraction module 302, generation module 303 and output module 304 can be implemented by the same or different processors calling software, and this application embodiment does not limit them.
[0151] Exemplary electronic devices
[0152] Another embodiment of this application also provides an electronic device, see [link to relevant documentation] Figure 4 As shown, the electronic device includes a memory 400 and a processor 410.
[0153] The memory 400 is connected to the processor 410 and is used to store programs;
[0154] The processor 410 is configured to implement the summary extraction method disclosed in any of the above embodiments by running the program stored in the memory 400.
[0155] Specifically, the electronic device may also include: a bus, a communication interface 420, an input device 430, and an output device 440.
[0156] The processor 410, memory 400, communication interface 420, input device 430, and output device 440 are interconnected via a bus. Among them:
[0157] A bus can include a pathway for transmitting information between various components of a computer system.
[0158] The processor 410 can be a general-purpose processor, such as a general-purpose central processing unit (CPU), a microprocessor, etc., or an application-specific integrated circuit (ASIC), or one or more integrated circuits used to control the execution of the program of the present application. It can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0159] Processor 410 may include a main processor, as well as a baseband chip, modem, etc.
[0160] The memory 400 stores a program for executing the technical solution of this application, and may also store an operating system and other critical business functions. Specifically, the program may include program code, which includes computer operation instructions. More specifically, the memory 400 may include read-only memory (ROM), other types of static storage devices capable of storing static information and instructions, random access memory (RAM), other types of dynamic storage devices capable of storing information and instructions, disk storage, flash memory, etc.
[0161] Input device 430 may include a device for receiving user input data and information, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor.
[0162] Output device 440 may include devices that allow information to be output to a user, such as a display screen, printer, speaker, etc.
[0163] The communication interface 420 may include a device that uses any transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), etc.
[0164] The processor 410 executes the program stored in the memory 400 and calls other devices, which can be used to implement the various steps of any of the summary extraction methods provided in the above embodiments of this application.
[0165] Those skilled in the art will understand that Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the electronic device to which the present application is applied. The specific electronic device may include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements.
[0166] This application also proposes a chip including a processor and a data interface. The processor reads and runs a program stored in a memory through the data interface to execute the summary extraction method described in any of the above embodiments. For details of the processing and its beneficial effects, please refer to the embodiments of the summary extraction method described above.
[0167] In addition to the methods and apparatus described above, embodiments of this application provide a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps of the summary extraction methods according to various embodiments of this application as described in the "Exemplary Methods" section of this specification.
[0168] The computer program product can be written in any combination of one or more programming languages to perform the operations of the embodiments of this application. The programming languages include object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.
[0169] Furthermore, embodiments of this application also propose a storage medium storing a computer program that is executed by a processor in the abstract extraction method according to various embodiments of this application described in the "Exemplary Methods" section above.
[0170] The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in the present invention are merely examples and not limitations, and should not be considered as essential features of each embodiment of the present invention. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the present invention to the necessity of employing the aforementioned specific details.
[0171] The block diagrams of devices, apparatuses, devices, and systems involved in this invention are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.
[0172] It should also be noted that in the apparatus, device, and method of the present invention, the components or steps can be disassembled and / or recombined. These disassemblies and / or recombinations should be considered as equivalent solutions of the present invention.
[0173] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the invention. Therefore, the invention is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.
[0174] It should be understood that the qualifying terms "first", "second", "third", "fourth", "fifth" and "sixth" used in the description of the embodiments of the present invention are only used to more clearly illustrate the technical solutions and are not intended to limit the scope of protection of the present invention.
[0175] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein.
Claims
1. An abstract extraction method characterized by, The method includes: Obtain the dialogue text content to be extracted; Based on the dialogue text content, a key information extraction model is invoked to extract key information from the dialogue text content. The key information extraction model is obtained by training a large language model for the key information extraction task. A summary of the dialogue text content is obtained based on the dialogue text content and the key information; Output a summary of the dialogue text.
2. The abstracting method according to claim 1, characterized by, The step of calling the key information extraction model based on the dialogue text content to extract key information from the dialogue text content includes: The dialogue text content is combined with the first instruction information to generate a first prompt information, wherein the first instruction information is used to instruct the key information extraction model to extract key information from the dialogue text content; The first prompt information and the dialogue text content are input into the key information extraction model to obtain the key information.
3. The abstracting method of claim 1, wherein, The step of obtaining a summary of the dialogue text content based on the dialogue text content and the key information includes: Based on the dialogue text content and the key information, a summary generation model is invoked to generate a summary of the dialogue text content; The summary generation model is obtained by training a large language model for the summary generation task.
4. The abstracting method according to claim 3, characterized by, Based on the dialogue text content and the key information, a summary generation model is invoked to generate a summary of the dialogue text content, including: The dialogue text content, the key information, and the second instruction information are combined to generate a second prompt information, which is used to instruct the summary generation model to extract a summary of the dialogue text content. The second prompt information and the dialogue text content are input into the summary generation model to obtain a summary of the dialogue text content.
5. The abstracting method of claim 1, wherein, After generating the summary of the dialogue text content, the method further includes: Determine whether there is any missing information in the summary; If so, the key information extraction model is invoked based on the summary and the dialogue text content to extract the missing information, and the summary generation model is invoked based on the missing information, the dialogue text content and the summary to generate a new summary, until there is no missing information in the new summary, where the missing information is the information that is missing in the summary; If not, then perform the operation of summarizing the output dialogue text content.
6. The abstracting method according to claim 5, wherein, The step of calling the key information extraction model based on the summary and the dialogue text content to extract missing information includes: Identify the missing information items in the summary; Based on the information item, the summary, and the dialogue text, the key information extraction model is invoked to extract the content corresponding to the information item in a targeted manner, thereby obtaining the missing information.
7. The abstracting method according to claim 3, characterized by, After generating the summary of the dialogue text content, the method further includes: Determine whether the summary contains information that is inconsistent with the content of the dialogue text or with the facts; If so, the summary generation model is invoked based on the summary and the dialogue text content to regenerate the summary until there is no information in the summary that is inconsistent with the dialogue text content or inconsistent with the facts; If not, perform the operation of outputting a summary of the dialogue text content.
8. The abstracting method according to any one of claims 1 to 7, characterized by, Before invoking the key information extraction model based on the dialogue text content to extract key information from the dialogue text content, the method further includes: The customer service dialogue is labeled to obtain the first training sample. The first training sample includes the first prompt word, the customer service dialogue text and the key information of the dialogue. The first prompt word is used to prompt the large language model to extract keywords. The key information of the dialogue is the key information in the customer service dialogue. Based on the training dataset composed of the first training samples, the large language model is fine-tuned to obtain the key information extraction model.
9. The abstracting method according to any one of claims 3 to 7, characterized by, The method further includes: The customer service dialogue is labeled to obtain the second training sample. The second training sample includes the second prompt word, the customer service dialogue text, the key information of the dialogue, and the customer service dialogue summary. The second prompt word is used to prompt the big language model to extract the summary. Based on the training dataset consisting of the second training samples, the large language model is fine-tuned to obtain the summary generation model.
10. An abstract extraction apparatus characterized by comprising: The device includes: The acquisition module is used to acquire the dialogue text content for which the summary is to be extracted; The extraction module is used to call the key information extraction model based on the dialogue text content to extract key information from the dialogue text content. The key information extraction model is obtained by training a large language model for the key information extraction task. A generation module is used to obtain a summary of the dialogue text content based on the dialogue text content and the key information; The output module is used to output a summary of the dialogue text content.
11. An electronic device, comprising: Including memory and processor; The memory is connected to the processor and is used to store programs; The processor is configured to implement the digest extraction method as described in any one of claims 1 to 9 by running a program in the memory.
12. A storage medium, characterized by The storage medium stores a computer program, which, when executed by a processor, implements the summary extraction method as described in any one of claims 1 to 9.