Intelligent customer service dialogue man-machine switching method and system based on multi-granularity semantic fidelity
By locking dialogue records, performing deep semantic parsing, and generating multi-granular summaries, combined with semantic fidelity verification, the human-machine transfer problem in intelligent customer service systems has been solved, achieving efficient and accurate information transmission and improved service quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INSPUR GENERSOFT CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing intelligent customer service systems suffer from information gaps, semantic distortion, and high understanding costs during human-machine transitions, failing to effectively guarantee the integrity and accuracy of the original semantics, resulting in low service efficiency and poor user experience.
By locking the complete original dialogue record from the start of the conversation to the trigger moment, deep semantic parsing is performed to generate multi-granular summaries, and semantic fidelity closed-loop verification is executed to ensure the accuracy and consistency of the transfer summary, including the extraction of the structured core fact base, multi-granular summary generation and verification process.
It achieves an efficient and seamless transition from robots to human customer service, ensuring the complete transmission of information and the accuracy of semantics, thereby improving service efficiency and user satisfaction.
Smart Images

Figure CN122242519A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of artificial intelligence and natural language processing technology, specifically relating to an intelligent customer service dialogue human-computer transfer method and system based on multi-granularity semantic fidelity. Background Technology
[0002] With the development of artificial intelligence technology, intelligent customer service robots have become the first line of defense in communication between enterprises and users, handling a large number of standardized and high-frequency user inquiries and problem-solving tasks. However, when faced with complex issues, escalating user emotions, or business rule restrictions, intelligent customer service robots often cannot solve user problems independently. In these situations, it is necessary to seamlessly transfer the conversation from the robot to a human customer service representative. This human-machine transition is crucial to ensuring the quality of subsequent services and improving user satisfaction.
[0003] Existing intelligent customer service systems primarily employ the following methods when handling human-machine transfers: First, direct transfer, which only transmits the last sentence entered by the user to the human agent. This method completely loses historical dialogue information, requiring the human agent to retrieve background information from the user from the beginning, resulting in low service efficiency and a poor user experience. Second, full-text transfer, which directly pushes the entire original dialogue record between the user and the chatbot to the human agent. While comprehensive, the original dialogue is lengthy, containing a large amount of repetitive and irrelevant information. The human agent needs to spend considerable time sorting and extracting key points, severely impacting response speed and problem-solving efficiency. Finally, simple summary transfer, which generates a brief dialogue summary based on preset rules or keyword extraction technology. This method only involves superficial text splicing, lacking a deep understanding of the dialogue's semantics. It is highly susceptible to omissions, alterations, or logical inconsistencies in key business information (such as order numbers, amounts, and time points), leading to semantic distortion. Human agents providing services based on distorted summaries may cause greater user dissatisfaction or even complaints.
[0004] In summary, existing technologies generally suffer from problems such as information gaps, semantic distortion, and high understanding costs. There is an urgent need for innovative methods specifically designed for intelligent customer service robot transfer scenarios, which can strictly guarantee the integrity and accuracy of the original semantics while condensing information, so as to achieve efficient and seamless connection from robot to human customer service. Summary of the Invention
[0005] In a first aspect, embodiments of this application provide an intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity, comprising the following steps: S1. In response to the human-machine interaction signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; S2. Perform deep semantic analysis on the complete original dialogue record to extract a structured core fact base containing core demands, key entities, and user emotions; S3. Using the structured core fact base as input, generate a transfer summary containing multiple information granularities; S4. Perform semantic fidelity closed-loop verification on the transfer summary and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. S5. Push the verified transfer summary to the human customer service agent's terminal.
[0006] Furthermore, the specific steps of step S1 are as follows: S11. Monitor the dialogue flow between the user and the intelligent robot in real time, and generate a human-machine transition trigger signal when the preset transition trigger condition is met. S12. In response to the human-machine transition signal, lock all original dialogue records from the start of the current session to the trigger time from the dialogue buffer, and prohibit any modification operations on all the original dialogue records; S13. Read and save the complete, locked original dialogue record from the dialogue buffer as the sole source of facts for subsequent processing.
[0007] Furthermore, the specific steps of step S2 are as follows: S21. Call the pre-trained language model to encode the complete original dialogue record into a semantic vector to preserve the deep semantics of the original dialogue; S22. Based on the semantic vector, the user's initial request, the evolved core request, and the currently unmet needs are extracted through the intent recognition model and slot filling method to generate core request information; S23. Based on the semantic vector, extract key business entities, including but not limited to user identifier, order number, amount, time point and device number, through a high-precision named entity recognition model, and generate key entity information; S24. Perform emotion and tone analysis on the complete original dialogue record to determine the evolution trajectory of the user's emotional state and the key turning points of the dialogue, and generate user emotional information; S25. Integrate core demand information, key entity information, and user sentiment information to build and output a structured core fact base.
[0008] Furthermore, the specific steps of step S3 are as follows: S31. Using a template-driven approach, the business entities and status information in the structured core fact base are filled into a preset summary template to generate a first-granularity summary, so as to ensure the accurate transmission of core facts; S32. Call the conditional generation model, using the structured core fact base as conditional input, to generate a coherent narrative paragraph describing the problem background, evolution process, and current contradictions, and obtain a second-granularity summary; S33. Call the large language model to perform grammatical error correction and sentence structure optimization on the second-granularity summary, and visually enhance key content according to information importance to generate the third-granularity summary; S34. Based on the structured core fact base, and in conjunction with the enterprise's internal knowledge base, recommend solutions for similar cases and generate a fourth-level summary.
[0009] Furthermore, in step S21, a pre-trained language model based on the Transformer architecture is used to encode the complete original dialogue record; In step S23, the high-precision named entity recognition model uses the BiLSTM-CRF model to extract semantic vectors; In step S32, the conditional generation model adopts a Transformer-based sequence-to-sequence architecture, specifically using the BART or T5 model. In step S33, the large language model adopts a Transformer-based generative pre-trained model.
[0010] Furthermore, the specific steps of step S4 are as follows: S41. Call the independent information restoration engine to perform reverse information extraction on the transfer summary to obtain the information set to be verified; S42. Perform a precise difference comparison between the information set to be verified and the structured core fact base, and calculate the fact consistency score; S43. Determine whether the factual consistency score has reached the preset threshold; If so, it is considered a complete match, and the verification passes; If not, then semantic distortion is determined to exist, and the verification fails.
[0011] Furthermore, the specific steps of step S42 are as follows: S421. Define the set of information to be verified as a set. Define the structured core fact base as a set ; Each element and All are structured fact units containing semantic information, including but not limited to entities, relations, and attribute value pairs; S422. For sets Each element in traverse the set Each element in Calculate using a pre-trained semantic similarity model and semantic similarity score Its range is [0,1]; S423. Record With all Maximum similarity score and the corresponding best matching element ; ; S424. Determine the maximum similarity score. Is it greater than or equal to a preset semantic equivalence threshold? ; If so, determine and Semantic equivalence, making this pair of elements Include the intersection count, and from the set Temporarily removed Simultaneously, the number of intersection elements is accumulated. ; If not, determine and Semantic inequivalence, and from sets Temporarily removed ; S425. Determine if a set is correct. Have all elements in the process been processed? If not, repeat steps S422-S424; If so, obtain the final number of intersection elements. ; S426. Calculate the number of elements in a union. ; S427. Calculate the factual consistency score The output is then used for the verification decision in step S43.
[0012] Furthermore, if the verification fails in step S43, perform any of the following remedial measures: The first measure is to return the transfer summary to step S3 for correction; The second measure is to downgrade the push notification, only pushing the structured core fact base in structured data format to the human customer service agents; The third measure is to push the original dialogue record and highlight the risks and specific differences in the summary generation on the interface.
[0013] Furthermore, the specific steps of step S5 are as follows: S51. Render the verified call summary in rich text format and push it to the target human agent's workstation in real time. S52. Send a transfer notification message to the user to inform them that the problem has been fully transmitted; S53. Provide an interactive entry point on the human customer service agent's workbench to view the original conversation log and view dataset details.
[0014] Secondly, embodiments of this application also provide an intelligent customer service dialogue human-machine transfer system based on multi-granularity semantic fidelity, including: The human-machine interaction trigger module is used to respond to the human-machine interaction trigger signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; The semantic parsing module is used to perform deep semantic parsing on the complete original dialogue record and extract a structured core fact base containing core demands, key entities and user emotions. The summary generation module is used to generate a transfer summary containing multiple information granularities, taking the structured core fact base as input. The verification module is used to perform semantic fidelity closed-loop verification on the transfer digest and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. The push module is used to send the verified transfer summary to the human customer service agent's terminal.
[0015] As can be seen from the above technical solutions, this application has the following advantages: The intelligent customer service dialogue human-machine transfer method and system based on multi-granularity semantic fidelity provided in this application achieves integrity protection of the transfer data source and establishment of a unique source of facts by locking and acquiring the complete original dialogue record from the start of the dialogue to the triggering time and prohibiting modification, thus avoiding context loss due to data tampering or interception. By using a pre-trained model to perform deep semantic parsing of the dialogue and extracting a structured core fact base containing core demands, key entities, and emotional evolution, it achieves filtering of unstructured dialogue noise and accurate extraction of deep business information, providing a data foundation for subsequent processing. By employing template filling, conditional narrative generation, and large model optimization, it achieves further improvements. The system generates multi-granularity transfer summaries through four methods: data processing, knowledge base association, and more. This enables a layered information supply, from absolutely accurate factual data to coherent case descriptions and solutions to support decision-making, meeting the diverse reading needs of customers. By calling an independent engine to extract information from the summaries and comparing them with the original fact base using semantic similarity metric, the system automatically detects and intercepts any illusions, omissions, or distortions that may occur during the summary generation process, ensuring the semantic fidelity of the output content. Furthermore, by implementing tiered remedial measures such as summaries correction, downgrading and pushing structured data, or highlighting risks and pushing the original text when verification fails, the system ensures that the human-machine transfer service process is never interrupted. Attached Figure Description
[0016] To more clearly illustrate the technical solution of this application, the accompanying drawings used in the description will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 This is a flowchart illustrating the intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity of the present invention.
[0018] Figure 2 This is a schematic diagram of the intelligent customer service dialogue human-machine transfer system based on multi-granularity semantic fidelity of the present invention. Detailed Implementation
[0019] The various embodiments of this disclosure will be described more fully in the following detailed description of the specific steps of the intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity. This disclosure may have various embodiments, and adjustments and changes may be made therein. However, it should be understood that there is no intention to limit the various embodiments of this disclosure to the specific embodiments disclosed herein, but rather this disclosure should be understood to cover all adjustments, equivalents, and / or alternatives falling within the spirit and scope of the various embodiments of this disclosure.
[0020] This embodiment provides an intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity. Through deep parsing and multi-granularity generation, combined with closed-loop verification, it achieves lossless information transmission.
[0021] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0022] Please see Figure 1 The diagram shows a flowchart of a human-machine transfer method for intelligent customer service dialogue based on multi-granularity semantic fidelity in a specific embodiment. The method includes the following steps: S1. In response to the human-machine interaction signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; It should be noted that this step establishes the baseline authenticity of the data, prevents the loss of historical information due to the continuation of the dialogue or cache overwriting, and ensures that all subsequent analyses are based on a fixed and complete data snapshot, thus ensuring semantic fidelity. S2. Perform deep semantic analysis on the complete original dialogue record to extract a structured core fact base containing core demands, key entities, and user emotions; It should be noted that this step achieves information purification and structuring, transforming lengthy and messy natural language dialogues into structured data that computers can process precisely, removing noise, retaining core value, and ensuring the generation of high-quality summaries. S3. Using the structured core fact base as input, generate a transfer summary containing multiple information granularities; It should be noted that this step achieves layered presentation of information and optimization of readability, which not only ensures the absolute accuracy of key data, but also provides a logical narrative that conforms to human reading habits, and is supplemented with decision support to meet the information needs of human customer service in different scenarios. S4. Perform semantic fidelity closed-loop verification on the transfer summary and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. It should be noted that this step, through a closed loop from generation to restoration to comparison, builds a security defense, proactively detects and intercepts potential semantic distortions, ensures that the information pushed to the human end is credible, and avoids misleading. S5. Push the verified transfer summary to the human customer service agent's terminal; It should be noted that this step achieves efficient delivery. Only verified, high-quality summaries are pushed out, ensuring that every piece of information received by human customer service is concise and accurate, directly translating into improved service efficiency and increased user satisfaction.
[0023] This embodiment achieves zero-distortion transmission of core needs and emotions during human-computer interaction by locking in the complete dialogue, building a structured fact base, generating multi-granular summaries, and performing semantically accurate closed-loop verification, thereby improving collaboration efficiency.
[0024] Furthermore, as a refinement and extension of the specific implementation methods of the above embodiments, in order to fully illustrate the specific implementation process of this embodiment, another intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity is provided, taking the following specific customer service dialogue scenario as an example: Scenario setting: User A (User ID: ZS001) consulted a bank's intelligent customer service system about credit card-related issues. After multiple rounds of dialogue with the intelligent customer service "Xiao An", the problem was not resolved, triggering a human-machine transfer. The method includes the following steps: S1. In response to the human-machine interaction signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; the specific steps of step S1 are as follows: S11. Monitor the dialogue flow between the user and the intelligent robot in real time, and generate a human-machine transition trigger signal when the preset transition trigger condition is met. For example, in this scenario, the system monitors the conversation between the user and the intelligent customer service "Xiao An" in real time; when the intelligent customer service fails to provide a valid answer for two consecutive rounds (such as replying "cannot provide a definite answer" or "please wait"), the system determines that the preset transfer triggering condition has been met and automatically generates a human-machine transfer triggering signal; S12. In response to the human-machine transition signal, lock all original dialogue records from the start of the current session to the trigger time from the dialogue buffer, and prohibit any modification operations on all the original dialogue records; For example, after receiving the forwarding signal, the system immediately locks all the original dialogue records of the current session from the memory cache and sets write protection to ensure that no subsequent processing will modify the original data; S13. Read and save the complete and locked original dialogue record from the dialogue buffer as the sole source of facts for subsequent processing; For example, the system persistently stores the locked original dialogue record and marks it as the unique source of fact for this transfer. The complete locked original dialogue record in this scenario is shown in Table 1 below: Table 1
[0025] S2. Perform deep semantic analysis on the complete original dialogue record to extract a structured core fact base containing core demands, key entities, and user emotions; the specific steps of step S2 are as follows: S21. Call the pre-trained language model to encode the complete original dialogue record into a semantic vector to preserve the deep semantics of the original dialogue; In step S21, a pre-trained language model based on the Transformer architecture (such as BERT, RoBERTa, or its domain variants) is used to encode the complete original dialogue transcript; for a dialogue text sequence of length L... The model maps these vectors into a sequence of context-sensitive semantic vectors using a multi-layer bidirectional Transformer encoder. ; among them, each Let d be the semantic vector of the i-th token, where d is the dimension of the hidden layer of the model (e.g., d=768 in BERT-base); the pre-trained language model adopts the pre-training method of masked language modeling (MLM) and next sentence prediction (NSP) on a large-scale general corpus (e.g., encyclopedia, book corpus) to obtain basic language understanding ability, and performs domain-adaptive fine-tuning on customer service dialogue corpus. For example, each round of dialogue is concatenated into an input sequence, and special markers [CLS] and [SEP] are added. For the third round user input "I want to know the conditions for applying for a credit card from Bank B", after BERT encoding, a 768-dimensional vector representation of each token is obtained, where the vector corresponding to the [CLS] marker is... It captured the overall semantics of this round of dialogue; S22. Based on the semantic vector, the user's initial request, the evolved core request, and the currently unmet needs are extracted through the intent recognition model and slot filling method to generate core request information; Specifically, the implementation process of the intent recognition model: The semantic vector sequence The vector representing the [CLS] tag for the entire dialogue. Input intent classification layer. The intent classification layer is a single-layer feedforward neural network followed by a Softmax function:
[0026] in, For the intention classification weight matrix, The bias term is K, which represents the total number of predefined intent categories (such as "query order", "apply for refund", "complaint", etc.); the model outputs a probability distribution. The intent category with the highest probability is taken as the user's initial request; In multi-turn dialogue scenarios, to capture the evolution of user requests, an attention-based contextual intent tracking method is adopted: for each round of user input... Calculate its semantic vector With historical intention state The attention weights are used to update the current intent state. ,in, Embedding the intent predicted in the previous round, the final output is the evolution trajectory of the core demands during the dialogue process; Slot filling process: Key information slots in the dialogue are identified using sequence labeling; the semantic vector of each token is... Joint decoding is performed on the input conditional random field layer:
[0027] in, It is a linear transformation matrix. For the transition score matrix, The tag for the i-th token (e.g., "B-order_number", "I-order_number", "O", etc.) is used to decode the optimal tag sequence using the Viterbi algorithm, from which slot values such as order number, amount, and product name are extracted. Unmet requirements identification: Compare the user input in the last round of the current dialogue with the robot's response. If the robot's response contains preset trigger words such as "cannot be processed", "please wait", or "transfer to human", or if the robot is determined to have failed to solve the user's problem by calculating the question-answer matching degree (using the confidence of the intent classification output in step S22 which is lower than the threshold), then the intent and slot identified in the user input in that round are marked as "current unmet needs". For example, intent recognition is performed by inputting the [CLS] vector of each round of dialogue into the intent classification layer; the classification layer is a single-layer feedforward neural network followed by a softmax function, which outputs a probability distribution:
[0028] in, The semantic vector corresponding to [CLS] is labeled, and d is the hidden layer dimension of the pre-trained language model (e.g., d=768 in BERT-base). This is the intent classification weight matrix, where K is the total number of predefined intent categories; The softmax function is the bias term; it maps a K-dimensional real vector to a probability distribution. ,satisfy ; This represents the conditional probability that the intent category is y given the input dialogue text X; The recognition results in this scenario are shown in Table 2 below: Table 2
[0029] Slot filling is performed, and key information slots in the dialogue are identified using sequence labeling. The semantic vector of each token is decoded using a BiLSTM-CRF model to obtain a tag sequence. For example, in the 7th sentence "I have a credit card from Bank C", "Bank C" is identified as BANK_NAME and "credit card" as CARD_TYPE; Unmet needs identification: The user input was compared with the robot's response; in the 9th round, the user asked about the credit limit, and the robot replied "a lot, a lot", without giving a specific value; in the 11th round, the user asked about repayment issues, and the robot replied "cannot provide a definite answer"; combined with the trigger word "cannot provide" appearing in the robot's response, it was determined that the user's requests in these two rounds were unmet needs.
[0030] The core requirements obtained from the integration are shown in Table 3 below: Table 3
[0031] S23. Based on the semantic vector, extract key business entities, including but not limited to user identifier, order number, amount, time point and device number, through a high-precision named entity recognition model, and generate key entity information; In step S23, the high-precision named entity recognition model uses the BiLSTM-CRF model to extract semantic vectors; Specifically, the model architecture: Employing a sequence labeling framework based on a pre-trained language model, the semantic vector output in step S21... Based on this, a bidirectional long short-term memory network is superimposed to capture sequence dependencies:
[0032] Then, the entity type label of each token is obtained by decoding through a linear layer and CRF. Training methods: Fine-tuning was performed on a customer service dialogue dataset labeled with business entity tags; the training objective was to maximize the log-likelihood of the correct label sequence.
[0033] Where D is the training dataset. Calculated using the BiLSTM-CRF model in step S23; training uses the Adam optimizer, with a learning rate of 2e-5, a batch size of 32, and 5-10 training epochs; Entity normalization: For identified entity mentions (such as "my orders"), they are mapped to standardized entity identifiers using entity linking technology; an entity thesaurus and a vector similarity-based entity matching model are constructed, and a Siamese network architecture is used to calculate the similarity between the mention vector and the entity vector in the knowledge base. The entity with the highest similarity is selected as the standardized output. For example, a named entity recognition model based on BERT-BiLSTM-CRF is adopted; the model first uses BERT to generate semantic vectors, then captures sequence dependencies through BiLSTM, and finally performs global decoding through the CRF layer; The key entities identified from the dialogue are shown in Table 4 below: Table 4
[0034] S24. Perform emotion and tone analysis on the complete original dialogue record to determine the evolution trajectory of the user's emotional state and the key turning points of the dialogue, and generate user emotional information; Specifically, the implementation process of the sentiment analysis model: A multimodal sentiment analysis framework is employed, utilizing both dialogue text and tone features; for the text modality, the semantic vectors are... Input sentiment classification layer:
[0035] Output the probability distribution of sentiment categories, including "calm", "questioning", "anxious", "angry", etc. For voice customer service scenarios (if voice data exists), extract voice acoustic features (such as fundamental frequency, energy, and speech rate), and model voice emotion features using a temporal convolutional network. Integrate with the emotional features of the text: ; Emotional trajectory modeling: For each round of user input, calculate their sentiment score vector. (M is the number of sentiment categories), construct the sentiment time series. Gated cyclic units are used to model the evolution of emotions.
[0036] Through analysis The rate of change, identifying conversational turns with abrupt shifts in emotional intensity as key turning points (such as the moment a user transitions from calm to anger); when the change in emotional score... When the threshold is exceeded, the round is marked as an emotional turning point; For example, a multimodal sentiment analysis framework is used to classify the sentiment of each round of user input; the sentiment categories include: calm, curiosity, anticipation, anxiety, and anger. The sentiment analysis results for each round of user input are as follows: Table 5
[0037] A gated recurrent unit was used to model the trajectory of emotional evolution, and the change in emotional score was calculated. The change in emotional score in round 11 compared to round 9 was analyzed. If the threshold of 0.5 is exceeded, the 11th round is marked as an emotional turning point; S25. Integrate core demand information, key entity information, and user sentiment information to build and output a structured core fact base; For example, the structured core fact base can be in JSON format; S3. Using the structured core fact base as input, generate a transfer summary containing multiple information granularities; the specific steps of step S3 are as follows: S31. Using a template-driven approach, the business entities and status information in the structured core fact base are filled into a preset summary template to generate a first-granularity summary, so as to ensure the accurate transmission of core facts; For example, the constructed summary template is as follows: User Information: User ID: {user_id}, Current Emotion: {current_emotion} Resolved issues: {resolved_count} items {resolved_list} Unresolved issues: {unresolved_count} items {unresolved_list} Key Entities: {entity_list} User emotional trajectory: {emotion_trajectory} The template is populated with data from the structured core fact base, and the first-level summary is generated as follows: User Information: User ID: ZS001, Current Emotion: Anger Resolved: 3 - Inquire about credit card application requirements (answered) - Inquiry about the eligibility requirements for applying for a credit card using a pre-existing card (answered) - Check credit limit (ID card information only, answer already provided) Unresolved issues: 2 - Check credit limit (with property ownership) - The bot did not provide a specific limit, only replying "a lot, a lot". -Inquiry about automatic repayment failure processing-The chatbot stated it could not provide a definitive answer. Key Entities: -Bank Name: Bank C - Holding period: Six months -Credit history: No negative records -Asset Information: Three properties in City D - Problem type: Insufficient debit card balance, installment bill User emotional trajectory: Initially calm, turning to curiosity in round 5, turning to anticipation in rounds 7-9, turning to anger in round 11 due to unresolved issues; S32. Call the conditional generation model, using the structured core fact base as conditional input, to generate a coherent narrative paragraph describing the problem background, evolution process, and current contradictions, and obtain a second-granularity summary; In step S32, the conditional generation model adopts a Transformer-based sequence-to-sequence architecture, specifically using the BART or T5 model. Specifically, model architecture selection: The conditional generation model adopts a Transformer-based sequence-to-sequence architecture, specifically selecting the BART or T5 model, which performs well in natural language generation tasks. Taking BART as an example, its architecture includes: Encoder: A bidirectional Transformer encoder used to understand the contextual information of the input sequence. The encoder consists of multiple layers of self-attention layers and a feedforward network, with each layer outputting a context-sensitive vector representation; Decoder: Autoregressive Transformer decoder. Each time a token is generated, the decoder pays attention to the encoder's output through a cross-attention mechanism when generating each token, and at the same time prevents seeing future information through mask self-attention. Input construction method: The structured core fact base is structured data in JSON format, which needs to be converted into a text sequence suitable for model input; this step uses a prompt template to linearize the structured data into a natural language description. Model fine-tuning training: The conditional generation model is not a general-purpose model, but a specialized model that has been fine-tuned on a customer service dialogue summary dataset; the fine-tuning process is as follows: Dataset Construction: Collect a large number of real customer service dialogue records and their corresponding manually written summaries, convert the dialogue records into a structured core fact base (automatically generated using the method in step 2), and form training pairs of (structured core fact base, summary); the dataset size is no less than 100,000 dialogues; Training objective: To minimize the cross-entropy loss between the generated and target summaries using maximum likelihood estimation; the loss function is:
[0038] in, For input prompts, Let t be the t-th token in the target digest, and T be the length of the target digest; Optimization parameters: Use the Adam optimizer, initial learning rate 3e-5, batch size 16, and 5 training epochs; employ gradient clipping and early stopping strategies to prevent overfitting; Decoding strategy: Beam search is used during generation, with a beam width of 4. Length penalty and duplication penalty are also set to avoid generating lengthy or repetitive content.
[0039] For example, a BART-large model fine-tuned on a customer service dialogue summary dataset is used as the conditional generation model; firstly, the structured core fact base is converted into prompt text: User ZS001 raised the following core demands during the conversation: 1. Inquire about credit card application requirements (answered) 2. Inquire about the eligibility requirements for applying for a credit card using a credit card (answered; the user has held a Bank C credit card for six months and has no negative records). 3. Check credit limit (ID information only, already answered: above 3000) 4. Check credit limit (There is a real estate situation, which is unresolved. The user stated that they own three properties in City D). 5. Issue with failed automatic credit card repayment (unresolved; insufficient debit card balance to repay installment bills) User emotional trajectory: Calm → Curiosity → Expectation → Anger (11th turning point) Based on the above information, please generate a coherent summary describing the user's problem background, evolution process, and the core unresolved issues. Input the above prompts into the BART model, and use beam search (beam size=4) to generate a summary; the second-granularity summary output by the model is as follows: User ZS001 initially inquired about the application requirements for a Bank B credit card and confirmed that they met the basic requirements. They then inquired about the requirements for applying for a card using an existing card, stating that they held a Bank C credit card with no negative records in the past six months, thus meeting the requirements. Next, the user inquired about the credit limit. After learning that the limit was over 3000 with only an ID card, they further disclosed that they owned three properties in City D, hoping for a more accurate credit limit assessment. However, the chatbot only replied "a lot, a lot," without providing a specific limit. The user became dissatisfied, and their expectations turned into anger. Finally, the user raised the issue of insufficient debit card balance for automatic credit card repayment and inability to repay installment bills, which the chatbot stated it could not answer. Currently, the user's two core issues (credit limit with property ownership and handling of failed automatic repayment) remain unresolved, and the user's emotions have escalated to anger. S33. Call the large language model to perform grammatical error correction and sentence structure optimization on the second-granularity summary, and visually enhance key content according to information importance to generate the third-granularity summary; In step S33, the large language model employs a Transformer-based generative pre-trained model (such as GPT-3.5, GPT-4, ChatGLM, or Wenxin Yiyan). This model possesses powerful language understanding and generation capabilities, enabling it to perform tasks such as text polishing and summary optimization. In actual deployment, the model service can be deployed via API calls or locally. For example, the GPT-3.5 model is called for summary optimization; the following hints are provided: You are a professional customer service summary optimization assistant. Please optimize the following summary: 1. Grammar correction: Corrects any grammatical errors or unfluent sentences.
[0040] 2. Sentence structure optimization: Avoid long sentences and use appropriate sentence breaks to make the expression clearer.
[0041] 3. Visual reinforcement: Identify the most important information (such as the user's core needs, unresolved issues, and emotional shifts) and mark these key contents in the text with **bold**.
[0042] Original abstract: User ZS001 initially inquired about the application requirements for a Bank B credit card and confirmed that they met the basic requirements. They then inquired about the conditions for applying for a card using an existing card, stating that they held a Bank C credit card with no negative records in the past six months, thus meeting the requirements. Next, the user inquired about the credit limit. After learning that the limit was over 3000 with only an ID card, they further disclosed that they owned three properties in City D, hoping for a more accurate credit limit assessment. However, the chatbot only replied "a lot, a lot," without providing a specific limit. The user became dissatisfied, and their emotions shifted from expectation to anger. Finally, the user raised the issue of insufficient debit card balance for automatic credit card repayment and inability to repay installment bills, which the chatbot stated it could not answer. Currently, the user's two core issues (credit limit with property ownership and handling of failed automatic repayments) remain unresolved, and the user's emotions have escalated to anger.
[0043] The optimized summary returned by the large language model is as follows: User ZS001 initially inquired about the application requirements for a Bank B credit card and confirmed that they met the basic requirements. They then inquired about the requirements for applying for a card using an existing card, stating that they held a Bank C credit card and had no negative records within the past six months, thus meeting the requirements.
[0044] The user then inquired about their **credit limit**. After learning that the limit was over 3000 with only an ID card, they further disclosed that they owned **three properties in City D**, hoping for a more accurate credit limit assessment. However, the chatbot only replied "a lot," **without providing a specific limit**. [Important] This caused the user dissatisfaction, and their emotions shifted from expectation to **anger**.
[0045] Finally, the user raised the issue of **credit card automatic repayment failure** (insufficient debit card balance to repay installment bills), to which the robot stated it could not provide an answer. [Important] Currently, the user's **two core issues** (credit limit when owning property and handling automatic repayment failures) remain unresolved, and the **user's emotions have escalated to anger**.
[0046] Model invocation and post-processing: The constructed prompts are sent to the large language model, and the temperature parameter is set to 0.3 (to ensure the stability and accuracy of the output). The maximum output length is set appropriately according to the length of the original summary (e.g., 512 tokens). The model returns optimized summary text; Post-process the returned text to ensure the markup format is correct, for example, converting ** to bold marks for actual display; S34. Based on the structured core fact base, and in conjunction with the enterprise's internal knowledge base, recommend solutions for similar cases and generate a fourth-level summary; For example, key entities in the structured core fact base (such as "credit limit", "automatic repayment failure", "insufficient debit card balance") can be used as query conditions to search the enterprise's internal knowledge base and find solutions for similar cases; The search results are shown in Table 6 below: Table 6
[0047] The generated fourth-granularity summary is as follows: Knowledge base related recommendations: Based on the user's question, the system recommends the following solutions for customer service reference: 1. Credit limit issue: If a user owns three properties in City D, it is advisable to provide proof of ownership to apply for a higher credit limit.
[0048] Processing path: Go to the credit limit management module -> Select "Asset Certificate Credit Limit Increase" -> Upload property certificate -> Manual review Reference case: #20240215-003 2. Automatic repayment failure issue: Insufficient debit card balance caused automatic repayment to fail.
[0049] Recommendations: a) Guide users to make manual repayments via mobile banking; b) Suggest users set up a backup repayment card. Reference Case: #20240122-017 Note: The above solutions are for reference only. Please adjust them according to the user's actual situation.
[0050] At this point, four summaries at different granularities have been generated: First granularity: Structured data filling to ensure the accuracy of core facts; Second granularity: coherent narrative paragraphs, describing the complete context; Third granularity: Optimized readable version with visual enhancement of key content; Fourth granularity: Knowledge base-related recommendations, providing solution references; S4. Perform semantic fidelity closed-loop verification on the transfer summary and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. The specific steps of step S4 are as follows: S41. Call the independent information restoration engine to perform reverse information extraction on the transfer summary to obtain the information set to be verified; For example, using a NER and intent recognition model with the same architecture as in step S2 but deployed independently, reverse information extraction is performed on the generated third-granularity summary; the extraction results are shown in Table 7 below: Table 7
[0051] The extracted results are used to construct a set of information to be verified. ; S42. Perform a precise difference comparison between the information set to be verified and the structured core fact base, and calculate the fact consistency score; The specific steps of step S42 are as follows: S421. Define the set of information to be verified as a set. Define the structured core fact base as a set ; Each element and All are structured fact units containing semantic information, including but not limited to entities, relations, and attribute value pairs; For example, It contains 18 information units. It contains 20 information units (including intent, entity, emotion, etc.); S422. For sets Each element in traverse the set Each element in Using pre-trained semantic similarity models (such as Sentence-BERT, SimCSE, etc.) to calculate and semantic similarity score Its range is [0,1]; S423. Record With all Maximum similarity score and the corresponding best matching element ; ; S424. Determine the maximum similarity score. Is it greater than or equal to a preset semantic equivalence threshold? (For example =0.85); If so, determine and Semantic equivalence, making this pair of elements Include the intersection count, and from the set Temporarily removed (To avoid duplicate matching), and simultaneously accumulate the number of intersection elements. ; If not, determine and Semantic inequivalence, and from sets Temporarily removed (To avoid duplicate matching); For example, for a set Each element in Using a pre-trained Sentence-BERT model to compute and set Each element The semantic similarity is shown in Table 8 below; some matching results are shown in Table 8 below: Table 8
[0052] S425. Determine if a set is correct. Have all elements in the process been processed? If not, repeat steps S422-S424; If so, obtain the final number of intersection elements. ; S426. Calculate the number of elements in a union. ; S427. Calculate the factual consistency score And output the result to perform the verification decision in step S43; For example, after iterating and matching, we get: =18 (All information to be verified has been matched in the core fact base) =18, =20 =18+20-18=20 Factual consistency score =18 / 20=0.90; S43. Determine whether the factual consistency score has reached the preset threshold; If so, it is considered a complete match, and the verification passes; If not, then semantic distortion is determined to exist, and the verification fails; For example, the preset threshold T = 0.85; the calculated value S = 0.90 > 0.85, therefore the verification is deemed successful; If the verification fails in step S43, perform any of the following remedial measures: The first measure is to return the transfer summary to step S3 for correction; The second measure is to downgrade the push notification, only pushing the structured core fact base in structured data format to the human customer service agents; The third measure is to push the original dialogue record and highlight the risks and specific differences in the summary generation on the interface. S5. Push the verified transfer summary to the human customer service agent's terminal; the specific steps of step S5 are as follows: S51. Render the verified call summary in rich text format and push it to the target human agent's workstation in real time. For example, the system integrates the verified third-granularity summary and fourth-granularity summary, renders them in rich text format, and pushes them to human customer service; the push content is as follows: Human-Machine Transfer Summary: User ID: ZS001 | Transfer Time: 09:31:40 Important Note: User's current emotion: Anger Dialogue summary: User ZS001 initially inquired about the application requirements for a Bank B credit card and confirmed that they met the basic requirements. They then inquired about the requirements for applying for a card using an existing card, stating that they held a Bank C credit card and had no negative records within the past six months, thus meeting the requirements.
[0053] The user then inquired about their **credit limit**. After learning that the limit was over 3000 with only an ID card, they further disclosed that they owned **three properties in City D**, hoping for a more accurate credit limit assessment. However, the chatbot only replied "a lot," **without providing a specific limit**. [Important] This caused the user dissatisfaction, and their emotions shifted from expectation to **anger**.
[0054] Finally, the user raised the issue of **credit card automatic repayment failure** (insufficient debit card balance to repay installment bills), to which the robot stated it could not provide an answer. [Important] Currently, the user's **two core issues** (credit limit when owning property and handling automatic repayment failures) remain unresolved, and the **user's emotions have escalated to anger**.
[0055] Knowledge base recommendations: Based on the user's problem, the following solutions are recommended: 1. Credit limit issue: The user owns three properties in City D. It is recommended that the user provide proof of property ownership to apply for a higher credit limit.
[0056] Processing path: Go to the credit limit management module -> Select "Asset Certificate Credit Limit Increase" -> Upload property certificate -> Manual review Reference case: #20240215-003 2. Automatic repayment failure: Insufficient debit card balance caused automatic repayment to fail.
[0057] Recommendations: a) Guide users to make manual repayments via mobile banking; b) Suggest users set up a backup repayment card. Reference Case: #20240122-017 Operation entrance: [View original conversation][View structured data details][Start service] S52. Send a transfer notification message to the user to inform them that the problem has been fully transmitted; For example, the system sends a notification message to the user: System notification: You are being transferred to a dedicated customer service representative. Your question has been fully communicated; no further explanation is needed. Please wait... S53. Provide interactive entry points on the human customer service agent workbench to view the original conversation records and view dataset details; For example, customer service representatives can click the "View Original Conversation" button to view the complete 12-round original conversation record; and click the "View Structured Data Details" button to view the JSON-formatted structured core fact base generated in step S2.
[0058] At this point, the entire human-machine transfer process is complete. Based on the above high-fidelity summary, the human customer service representative can immediately provide targeted service addressing the user's two unresolved issues and their anger without needing to repeatedly ask for background information.
[0059] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0060] like Figure 2 As shown, the following are embodiments of the intelligent customer service dialogue human-machine transfer system based on multi-granularity semantic fidelity provided in this disclosure. This system and the intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity in the above embodiments belong to the same inventive concept. For details not described in detail in the embodiments of the intelligent customer service dialogue human-machine transfer system based on multi-granularity semantic fidelity, please refer to the embodiments of the intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity described above.
[0061] The system includes: The human-machine interaction trigger module is used to respond to the human-machine interaction trigger signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; The semantic parsing module is used to perform deep semantic parsing on the complete original dialogue record and extract a structured core fact base containing core demands, key entities and user emotions. The summary generation module is used to generate a transfer summary containing multiple information granularities, taking the structured core fact base as input. The verification module is used to perform semantic fidelity closed-loop verification on the transfer digest and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. The push module is used to send the verified transfer summary to the human customer service agent's terminal.
[0062] This embodiment achieves lossless transmission of key information and second-level understanding by human customer service through the interactive collaboration of the forwarding and triggering module, semantic parsing module, summary generation module, verification module, and push module, and through the end-to-end semantic parsing, multi-granularity summary generation, and reverse comparison verification mechanism.
[0063] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for human-to-computer transfer in intelligent customer service dialogue based on multi-granularity semantic fidelity, characterized in that, Includes the following steps: S1. In response to the human-machine interaction signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; S2. Perform deep semantic analysis on the complete original dialogue record to extract a structured core fact base containing core demands, key entities, and user emotions; S3. Using the structured core fact base as input, generate a transfer summary containing multiple information granularities; S4. Perform semantic fidelity closed-loop verification on the transfer summary and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. S5. Push the verified transfer summary to the human customer service agent's terminal. 2.The intelligent customer service dialogue human-computer switching method based on multi-granularity semantic fidelity according to claim 1, wherein The specific steps of step S1 are as follows: S11. Monitor the dialogue flow between the user and the intelligent robot in real time, and generate a human-machine transition trigger signal when the preset transition trigger condition is met. S12. In response to the human-machine transition signal, lock all original dialogue records from the start of the current session to the trigger time from the dialogue buffer, and prohibit any modification operations on all the original dialogue records; S13. Read and save the complete, locked original dialogue record from the dialogue buffer as the sole source of facts for subsequent processing.
3. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 1, characterized in that, The specific steps of step S2 are as follows: S21. Call the pre-trained language model to encode the complete original dialogue record into a semantic vector to preserve the deep semantics of the original dialogue; S22. Based on the semantic vector, the user's initial request, the evolved core request, and the currently unmet needs are extracted through the intent recognition model and slot filling method to generate core request information; S23. Based on the semantic vector, extract key business entities, including but not limited to user identifier, order number, amount, time point and device number, through a high-precision named entity recognition model, and generate key entity information; S24. Perform emotion and tone analysis on the complete original dialogue record to determine the evolution trajectory of the user's emotional state and the key turning points of the dialogue, and generate user emotional information; S25. Integrate core demand information, key entity information, and user sentiment information to build and output a structured core fact base.
4. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 3, characterized in that, The specific steps of step S3 are as follows: S31. Using a template-driven approach, the business entities and status information in the structured core fact base are filled into a preset summary template to generate a first-granularity summary, so as to ensure the accurate transmission of core facts; S32. Call the conditional generation model, using the structured core fact base as conditional input, to generate a coherent narrative paragraph describing the problem background, evolution process, and current contradictions, and obtain a second-granularity summary; S33. Call the large language model to perform grammatical error correction and sentence structure optimization on the second-granularity summary, and visually enhance key content according to information importance to generate the third-granularity summary; S34. Based on the structured core fact base, and in conjunction with the enterprise's internal knowledge base, recommend solutions for similar cases and generate a fourth-level summary.
5. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 4, characterized in that, In step S21, a pre-trained language model based on the Transformer architecture is used to encode the complete original dialogue record; In step S23, the high-precision named entity recognition model uses the BiLSTM-CRF model to extract semantic vectors; In step S32, the conditional generation model adopts a Transformer-based sequence-to-sequence architecture, specifically using the BART or T5 model. In step S33, the large language model adopts a Transformer-based generative pre-trained model.
6. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 1, characterized in that, The specific steps of step S4 are as follows: S41. Call the independent information restoration engine to perform reverse information extraction on the transfer summary to obtain the information set to be verified; S42. Perform a precise difference comparison between the information set to be verified and the structured core fact base, and calculate the fact consistency score; S43. Determine whether the factual consistency score has reached the preset threshold; If so, it is considered a complete match, and the verification passes; If not, then semantic distortion is determined to exist, and the verification fails.
7. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 6, characterized in that, The specific steps of step S42 are as follows: S421. Define the set of information to be verified as a set. Define the structured core fact base as a set ; Each element and All are structured fact units containing semantic information, including but not limited to entities, relations, and attribute value pairs; S422. For sets Each element in traverse the set Each element in Calculate using a pre-trained semantic similarity model and semantic similarity score Its range is [0,1]; S423. Record With all Maximum similarity score and the corresponding best matching element ; ; S424. Determine the maximum similarity score. Is it greater than or equal to a preset semantic equivalence threshold? ; If so, determine and Semantic equivalence, making this pair of elements Include the intersection count, and from the set Temporarily removed Simultaneously, the number of intersection elements is accumulated. ; If not, determine and Semantic inequivalence, and from sets Temporarily removed ; S425. Determine if a set is correct. Have all elements in the process been processed? If not, repeat steps S422-S424; If so, obtain the final number of intersection elements. ; S426. Calculate the number of elements in a union. ; S427. Calculate the factual consistency score The output is then used for the verification decision in step S43.
8. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 6, characterized in that, If the verification fails in step S43, perform any of the following remedial measures: The first measure is to return the transfer summary to step S3 for correction; The second measure is to downgrade the push notification, only pushing the structured core fact base in structured data format to the human customer service agents; The third measure is to push the original dialogue record and highlight the risks and specific differences in the summary generation on the interface.
9. The intelligent customer service dialogue human-machine transfer method based on multi-granularity semantic fidelity according to claim 1, characterized in that, The specific steps of step S5 are as follows: S51. Render the verified call summary in rich text format and push it to the target human agent's workstation in real time. S52. Send a transfer notification message to the user to inform them that the problem has been fully transmitted; S53. Provide an interactive entry point on the human customer service agent's workbench to view the original conversation log and view dataset details.
10. A smart customer service dialogue human-machine transfer system based on multi-granularity semantic fidelity, characterized in that, include: The human-machine interaction trigger module is used to respond to the human-machine interaction trigger signal, lock and acquire the complete original dialogue record from the start of the dialogue to the trigger moment; The semantic parsing module is used to perform deep semantic parsing on the complete original dialogue record and extract a structured core fact base containing core demands, key entities and user emotions. The summary generation module is used to generate a transfer summary containing multiple information granularities, taking the structured core fact base as input. The verification module is used to perform semantic fidelity closed-loop verification on the transfer digest and compare it with the structured core fact base. If the comparison results are consistent, the verification is deemed to have passed; otherwise, the verification is deemed to have failed. The push module is used to send the verified transfer summary to the human customer service agent's terminal.