Seating script recommendation system, method, apparatus, electronic device, and storage medium

By working collaboratively between the agent's front-end and back-end, key elements of the call are dynamically extracted and user emotions are identified to generate and render recommended scripts. This solves the problems of high token cost and low recommendation rate in existing technologies, and improves the agent's response speed and user experience.

CN122309675APending Publication Date: 2026-06-30XIAMEN XINGZONG DIGITAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN XINGZONG DIGITAL TECH CO LTD
Filing Date
2026-04-03
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, when agents generate scripts in call centers or customer service workstations, there are problems such as high token costs, low recommendation rates, and lack of context awareness, which leads to slower response times and poor user experience.

Method used

By maintaining a dynamic sliding window and a long-term solidified physical area at the agent's front end, key elements of real-time calls are extracted and user emotions are identified to generate a script request payload. The server back end is used to perform similarity retrieval and generate recommended scripts using a large language model, which are then rendered and displayed at the agent's front end.

Benefits of technology

It reduced token costs, improved the speed and accuracy of recommended scripts, enhanced user experience, and reduced training costs for new agents.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309675A_ABST
    Figure CN122309675A_ABST
Patent Text Reader

Abstract

This application provides a seat-based dialogue recommendation system, method, apparatus, electronic device, and storage medium, applicable to the field of artificial intelligence technology. The system of this application extracts key elements from real-time call text using a sliding window and stores them in a short-term memory (STM) and a long-term entity storage area (LTS). This allows the generation of request payloads based solely on the STM, LTS, and sentiment tags. Compared to existing technologies that directly transmit call text to a server backend, this system significantly reduces text transmission volume and token costs while ensuring contextual information and related structured information, thus increasing the generation rate of recommended dialogues. Furthermore, by recognizing user emotions and setting preset styles, a large language model can generate conversational recommended dialogues based on preset styles, thereby calming user emotions and improving user experience.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a seat-based dialogue recommendation system, method, apparatus, electronic device, and storage medium. Background Technology

[0002] In call centers, unified communications workstations, or omnichannel customer service workstations, agents typically need to perform multiple tasks simultaneously when talking to users, such as listening to questions, understanding intent, searching the knowledge base, and organizing responses.

[0003] In existing technologies, agents can obtain relevant recommended responses to users by matching keywords or inputting the entire STT (Speech-to-Text) transcript of the call into a large language model. However, this approach involves a large data transfer volume and high token costs, resulting in a low recommendation rate for the suggested responses. Summary of the Invention

[0004] The purpose of this application is to provide a seat-based dialogue recommendation system, method, apparatus, electronic device, and storage medium to reduce the high cost of tokens and improve the recommendation speed of dialogues. The specific technical solution is as follows: In a first aspect of this application, a seat script recommendation system is provided, the system comprising a seat front-end and a server back-end. The front end of the seat is used for, Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users. A dynamic sliding window is maintained in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; key elements are extracted from the complete dialogue text, and structured information of preset types is stored in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; Identify the user's emotions, and when the script recommendation trigger condition is met, generate a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and send it to the server backend; The server backend is used for, Upon receiving the script request payload, a similarity search is performed in a preset database based on the latest question in the complete original dialogue text, resulting in multiple search results. Based on the search results, the emotion tags, the long-term entity solidification area, the preset style, and the short-term memory area, structured prompt words are constructed, and the structured prompt words are input into the large language model so that the large language model generates recommended dialogue according to the preset style; The front end of the seat is also used for, For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

[0005] In one possible implementation, the agent front end is further configured to determine that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0006] In one possible implementation, the agent front end is also used for, For the complete dialogue text that has undergone a preset number of rounds of processing via the dynamic sliding window, Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

[0007] In one possible implementation, the agent front end is specifically used for, For each recommended phrase, upon receiving the corresponding word element, an independent candidate card is created; each candidate card has an independent buffer. For each candidate card, determine whether it continues to receive the word units corresponding to the recommended phrase; If not, then for that candidate card, display the operation buttons and render the recommended message; If so, the buffer corresponding to the candidate card is updated, and the recommended phrase is rendered in the candidate card.

[0008] In one possible implementation, the agent front end is also used for, Determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0009] In one possible implementation, the agent front end is also used for, Check if multiple recommended phrases are being displayed. If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

[0010] In a second aspect of this application, a method for recommending agent dialogue scripts is provided. The method is applied to the agent front-end of an agent dialogue script recommendation system, the system further including a server back-end. The method includes: Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users. A dynamic sliding window is maintained in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; key elements are extracted from the complete dialogue text, and structured information of preset types is stored in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; The system identifies the user's emotions, and when the script recommendation trigger condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend; it also receives the recommended script sent by the server backend. For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

[0011] In one possible implementation, the method further includes: determining that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0012] In one possible implementation, the method further includes: For the complete dialogue text that has undergone a preset number of rounds of processing via the dynamic sliding window, Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

[0013] In one possible implementation, for each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase, including: For each recommended phrase, upon receiving the corresponding word element, an independent candidate card is created; each candidate card has an independent buffer. For each candidate card, determine whether it continues to receive the word units corresponding to the recommended phrase; If not, then for that candidate card, display the operation buttons and render the recommended message; If so, the buffer corresponding to the candidate card is updated, and the recommended phrase is rendered in the candidate card.

[0014] In one possible implementation, the method further includes: Determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0015] In one possible implementation, the method further includes: Check if multiple recommended phrases are being displayed. If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

[0016] In a third aspect of this application, a seat script recommendation device is provided. The device is applied to the seat front-end of the seat script recommendation system, the system further comprising a server back-end, and the device comprising: The real-time call text acquisition module is used to acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings, and obtain real-time call text; the real-time call recordings include real-time calls between agents and users; The element extraction module is used to maintain a dynamic sliding window in memory, retain the complete dialogue text of the most recent preset number of rounds as a short-term memory area; extract key elements from the complete dialogue text, and store the structured information of preset types in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; The script request payload generation module is used to identify the user's emotions. When the script recommendation triggering condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend. The recommended script receiving module is used to receive recommended scripts sent by the server backend; The recommended dialogue rendering module is used to allocate an independent card and an independent buffer for each recommended dialogue and to render the recommended dialogue. The card is used to display the recommended dialogue.

[0017] In one possible implementation, the device further includes: The script recommendation trigger condition judgment module is used to determine that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0018] In one possible implementation, the device further includes: The retention score calculation module is used to process the complete dialogue text for a preset number of rounds after the dynamic sliding window has been processed. Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

[0019] In one possible implementation, the recommended script rendering module includes: The candidate card creation submodule is specifically used to create an independent candidate card for each recommended phrase when the corresponding word unit of the recommended phrase is received; each candidate card has an independent buffer. The word element receiving and judgment submodule is specifically used to determine whether each candidate card continuously receives the word element corresponding to the recommended phrasing; The recommended dialogue rendering submodule is specifically used to display the operation buttons for the candidate card if the condition is not met, thus completing the rendering of the recommended dialogue. The buffer update submodule is specifically used to continue updating the buffer corresponding to the candidate card if the condition is met, and to render the recommended message in the candidate card.

[0020] In one possible implementation, the device further includes: Historical Recommended Script Judgment Module; used to determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0021] In one possible implementation, the device further includes: The multi-path rendering detection module is used to detect whether multiple recommended phrases are being rendered; If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

[0022] In a fourth aspect of this application, an electronic device is provided, comprising: Memory, used to store computer programs; When a processor executes a program stored in memory, it implements the method described in the second aspect of the embodiments of this application.

[0023] In a fifth aspect of the present application, a computer-readable storage medium is provided, wherein a computer program is stored therein, and when the computer program is executed by a processor, it implements the method described in the second aspect of the present application.

[0024] Compared with the prior art, the embodiments of this application have at least the following technical effects: This application provides a seat-based dialogue recommendation system, method, apparatus, electronic device, and storage medium. The system includes a seat-based front-end and a server-based back-end. The seat-based front-end is used to: acquire real-time call recordings; perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time conversations between the agent and the user; maintain a dynamic sliding window in memory to retain the complete dialogue text of the most recent preset number of rounds as a short-term memory area; extract key elements from the complete dialogue text and store the structured information of a preset type in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; identify the user's emotions, and when the dialogue recommendation trigger condition is met, recommend dialogue text based on the short-term memory area... The long-term entity solidification area and the emotion tag used to represent the user's current emotion category generate a script request payload and send it to the server backend. The server backend is used to, upon receiving the script request payload, perform a similarity search in a preset database based on the latest question in the complete dialogue text to obtain multiple search results. For the search results, the emotion tag, the long-term entity solidification area, the preset style, and the short-term memory area, a structured prompt word is constructed and input into a large language model so that the large language model generates recommended scripts according to the preset style. The agent frontend is also used to, for each recommended script, allocate an independent card and an independent buffer, and render the recommended script. The card is used to display the recommended script.

[0025] The system applying this application's embodiments dynamically extracts key elements from real-time call text via a sliding window, storing them separately in a short-term memory (STM) and a long-term entity storage area (LTS). This allows for request payload generation based solely on the STM, LTS, and sentiment tags. Compared to existing technologies that directly transmit call text to a server backend, this system significantly reduces text transmission volume and token costs while ensuring contextual information and related structured data, thus improving the recommendation rate of suggested dialogues. Furthermore, by recognizing user emotions and setting preset styles, the large language model can generate conversational recommended dialogues based on preset styles, thereby calming users and enhancing user experience. Generating recommended dialogues based on identified dialogue recommendation trigger conditions automatically generates candidate responses for the current context when the user pauses or completes a semantic task, reducing training costs for novice agents.

[0026] Of course, implementing any method of the embodiments of this application does not necessarily require achieving all of the advantages described above at the same time. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings.

[0028] Figure 1 A schematic diagram of the structure of the agent dialogue recommendation system provided in the embodiments of this application; Figure 2 Another schematic diagram of the agent script recommendation system provided in the embodiments of this application; Figure 3 A timing diagram for recommending dialogue scripts provided in an embodiment of this application; Figure 4 A flowchart illustrating the rendering of recommended dialogue in an embodiment of this application; Figure 5 A flowchart of a seat talk recommendation method provided in an embodiment of this application; Figure 6 A schematic diagram of the structure of the agent script recommendation device provided in the embodiments of this application; Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0029] In call centers, unified communications workstations, or omnichannel customer service workstations, agents typically need to simultaneously perform multiple tasks while talking to users, including listening to questions, understanding intent, searching the knowledge base, and organizing responses. Existing solutions usually rely on static knowledge base searches, standard scripts, or keyword-matched FAQs (Frequently Asked Questions) recommendations. However, applicants have found the following problems during their use: 1. Low search efficiency and interruption of call rhythm: Agents need to manually search by entering keywords while on the call, which can easily lead to missing key user information, resulting in response delays and distraction.

[0030] 2. Lack of context awareness in recommendation results: Traditional knowledge base recommendations mainly rely on keyword hits, and the output is often a rigid policy text that cannot combine current sentiment, dialogue stage and confirmed entities to generate natural expressions.

[0031] 3. Key information is easily forgotten in long call scenarios: When a call lasts for ten minutes or even tens of minutes, if all the STT (Speech-to-Text) text is directly input into a large language model, it will lead to increased token costs, decreased response speed, and early key information is easily buried in subsequent conversations.

[0032] 4. Existing AI (Artificial Intelligence) assisted interaction blocks front-end operations: Some solutions cause frequent shaking of the workbench sidebar, sudden changes in card height, or loss of scroll focus when generating responses, interfering with parallel actions such as agents entering work orders and viewing user information.

[0033] To address at least one of the aforementioned problems, a first aspect of this application provides a seat-based script recommendation system, which, as follows: Figure 1 The structure shown includes a front-end agent 101 and a back-end server 102. The front end of the seat 101 is used for, Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings, and obtain the real-time call text.

[0034] Real-time call recording includes real-time conversations between agents and users. When acquiring real-time call text, a real-time dual-track STT (Speech-to-Text) text stream can be received from the communication engine, recording engine, or third-party voice service via WebSocket (Web Socket Protocol, a network protocol for full-duplex communication) or an equivalent bidirectional channel. This stream can at least distinguish between the user's and agent's audio channels.

[0035] In this embodiment, an agent refers to a staff member of an enterprise or platform who provides consultation, after-sales, and other services to users in a certain business. A user refers to an individual or group who uses or enjoys the functions provided by the business. However, during a call, regardless of the number of people on both sides, the call audio is divided into two types based solely on the user's audio channel and the agent's audio channel: one is the user's audio collected by the calling device, and the other is the agent's audio collected by the calling device.

[0036] A dynamic sliding window is maintained in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; key elements are extracted from the complete dialogue text, and the structured information of preset types is stored in the long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides.

[0037] In practical applications, after obtaining the real-time call text, it can be stored in the memory of the agent's front end. A dynamic sliding window is maintained in memory, which processes the text of the real-time call in a sliding manner. The real-time call text records the dialogue text between the agent and the user. The sliding window can process a preset number of rounds of dialogue text (i.e., the complete original dialogue text) each time. For example, a round can be defined as the dialogue in which the speaker and listener roles are reversed during the conversation between the agent and the user. For instance, if the agent asks a question first, and the user answers; then the user asks a question, and the agent answers, the roles are reversed, and the former "agent asks a question, user answers" is considered one round; the latter "user asks a question, agent answers" is considered another round. The preset number of rounds can be set according to actual business needs, for example, to 5, 3, 8, etc.

[0038] The short-term memory (STM) stores a preset number of complete dialogue rounds from the real-time call text processed by the sliding window. As the sliding window moves, previously stored complete dialogue rounds are deleted, and new complete dialogue rounds are stored, reflecting the current call context. For example, if the real-time call text contains six complete dialogue rounds, and the sliding window processes three rounds at a time, then the STM stores the first three rounds of complete dialogue rounds while processing them; conversely, when the sliding window moves forward to process the last three rounds, the STM stores the last three rounds of complete dialogue rounds.

[0039] When extracting key elements from a complete dialogue, semantic recognition and entity extraction can be performed to extract structured information of a preset type. This preset type can be information deemed to have business value based on actual business needs. For example, for after-sales service, preset types of information could include user name, order number, product model, user request type (e.g., refund request), confirmed product malfunction, and actions promised by the agent or user (e.g., agent promising repair, or user promising work order cancellation).

[0040] Information of a preset type is processed into structured information. This structured information can be presented as a two-dimensional table. The structured information is stored in a long-term entity solidification area. In one example, it is stored as a meta field and is not deleted as the sliding window moves. Using the complete dialogue text of the previous six rounds as an example, after the sliding window extracts key elements from the first three rounds of dialogue text, it saves the preset type of structured information in the long-term entity solidification area. When the sliding window moves to the last three rounds of dialogue text, the long-term entity solidification area still stores the preset type of structured information extracted from the first three rounds of dialogue text.

[0041] The system identifies the user's emotions and, when the prompting conditions for script recommendation are met, generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend.

[0042] The system can incorporate a deep emotion recognition model in the agent's front end. This model combines the physical characteristics of the user's voice (e.g., intonation, pitch, speed, rhythm, volume), emotional keywords in the sentence (e.g., disappointment, complaint, great, thank you), syntactic structure (e.g., rhetorical questions, double negatives, negations, interrogative sentences), and the complete original dialogue to identify the user's emotions. Emotional labels can represent the user's current emotion category; for example, "1" represents calm; "2" represents happiness; "3" represents anger; and "4" represents sadness.

[0043] In one possible trial approach, the script recommendation trigger condition can be determined to be met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0044] Semantic pause detection identifies pauses in a user's speech by detecting the silence time in the corresponding audio stream. If the silence time exceeds this limit, it is considered that the user has completed a full semantic unit. This silence time can be set according to actual needs, for example, to values ​​such as 0.6s, 0.8s, or 1s. Anti-shake detection prevents the misidentification of pauses in the user's speech due to network jitter, breathing techniques, or other factors.

[0045] When identifying emotional changes and interrogative sentences, the aforementioned deep emotion recognition model can be used. Upon meeting the trigger conditions for script recommendation, a script request payload is generated and sent to the server backend. This payload includes an emotion tag representing the user's current emotional category, the complete original dialogue text in the short-term memory representing the current context, and structured information of preset types with business value in the long-term entity solidification area.

[0046] The server backend is used for, After receiving the script request payload, a similarity search is performed in the preset database based on the latest question in the complete original dialogue, resulting in multiple search results.

[0047] The latest question refers to the question most relevant to the current moment in the complete original dialogue. The preset database is a pre-set knowledge base related to the business, which may include product descriptions, policy documents, SOPs (Standard Operating Procedures), and historically excellent responses. Historically excellent responses are recommended responses from past replies that received positive feedback (i.e., good user feedback). During retrieval, the latest question is first vectorized to obtain a question vector. The similarity between the question vector and the content of each vector in the preset database is calculated. The top k similarity results are used as reference fragments to obtain the search results.

[0048] In practical applications, if a user asks a series of questions in a short period of time, subsequent requests can reuse the search results of the previous search, and the request payload only incrementally updates the context of the latest question.

[0049] Based on search results, sentiment tags, long-term entity fixed areas, preset styles, and short-term memory areas, structured prompt words are constructed. These structured prompt words are then input into a large language model, enabling the large language model to generate recommended phrases according to the preset styles.

[0050] The preset style is the language style of the recommended dialogue generated by the large language model based on actual business needs. For example, it can be a reassuring and guiding style, a direct handling style, or a confirmation information style. There can be one or more preset styles, and the large language model generates at least one recommended dialogue for each preset style. The large language model can be a model capable of generating text based on input content, such as Doubao, Qianwen, Wenxin Yiyan, or DeepSeek. After generating the recommended dialogue, the server backend also uses SSE (Server-Sent Events, a web technology that continuously pushes event streams from the server to the user, suitable for carrying the streaming output of the large language model) or WebSocket to return the recommended dialogue to the agent frontend in the form of a token stream.

[0051] The front end of the seat is also used for, For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

[0052] When rendering and displaying recommended phrases, each recommended phrase can be assigned an independent card and an independent buffer. The buffer is used to store the token stream of the recommended phrase, and the card is used to render and display the recommended phrase.

[0053] In one example, such as Figure 2 The diagram shown illustrates another architecture of the system provided in this application embodiment. The agent's front end is equipped with a WebSocket STT receiver, used to convert call audio into real-time call text (real-time dual-track text stream). A sliding window and entity solidification manager process the real-time call text, storing the corresponding information in the short-term memory and long-term entity solidification areas, respectively. A trigger determiner identifies whether the user pauses, expresses doubt, or experiences a sudden change in emotion; if so, the script recommendation trigger condition is considered met. The context window (i.e., the complete original dialogue text in the short-term memory), solidified entities (i.e., pre-defined structured information in the long-term entity solidification area), and emotion tags are sent to the server backend.

[0054] The server backend is equipped with a request orchestrator, which sends the request payload to the Embedding vectorization module to vectorize the user's latest question. Using the vectorized question vector, a similarity search is performed in a preset vector database to obtain the top k similar reference texts as the search results.

[0055] Based on search results, sentiment tags, long-term entity fixed areas, preset styles, and short-term memory areas, structured prompt words are constructed. These structured prompt words are then input into a large language model, which performs contextual integration and refinement on the generated recommended statements. This allows the model to generate recommended statements according to preset styles and push them to the agent's front end via SSE or WebSocket Token stream.

[0056] The agent front-end renders and displays recommended dialogue using multi-candidate streaming rendering, and collects feedback from agents on actions such as adoption, copying, or ignoring of recommended dialogue. This feedback is then sent back to the server back-end to enable the server to optimize retrieval from the vector database and training of the large language model.

[0057] In another example, such as Figure 3 The diagram shown illustrates the sequence of events for script recommendation in this embodiment. At the start of the call, the user introduces themselves and explains the product problem: "I'm Lao Li, the router I bought yesterday can't connect to the internet." At this point, the agent's front-end can add a Short Term Memory (STT) segment and extract key elements using a context manager. In one example, structured information such as "User Name: Lao Li," "Product: Router," "Fault: Unable to connect to the internet," and "Order Date: Yesterday" is obtained and stored in the long-term entity storage area. As the call continues, the user says, "Your quality is terrible, I want a refund!" For this STT segment, the user's current emotion is identified as anger, and the short-term memory area stores the complete dialogue transcripts of the most recent preset number of rounds. At this point, the script recommendation condition is met, and the short-term complete dialogue transcripts, preset type structured information, and the current emotion tag are used to generate a request payload and sent to the server back-end. The server backend retrieves the refund policy from the preset database, generates a conversational recommended script based on a soothing style, and pushes it to the agent's frontend via SSE stream, allowing the agent to stream the recommended script: "Mr. Li, I'm very sorry for the bad experience..."

[0058] The system applying this application's embodiments dynamically extracts key elements from real-time call text via a sliding window, storing them separately in a short-term memory (STM) and a long-term entity storage area (LTS). This allows for request payload generation based solely on the STM, LTS, and sentiment tags. Compared to existing technologies that directly transmit call text to a server backend, this system significantly reduces text transmission volume and token costs while ensuring contextual information and related structured data. Furthermore, by recognizing user emotions and setting preset styles, a large language model can generate conversational recommendation scripts based on these styles, thus soothing users and improving user experience. Generating recommended scripts upon recognizing script recommendation trigger conditions automatically generates candidate responses for the current context when the user pauses or completes a semantic task, reducing training costs for novice agents.

[0059] In one possible implementation, the agent front end of this application embodiment is further used for, For a preset number of rounds of the complete dialogue text processed by a dynamic sliding window, Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags. The fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type. The retention score of the complete original dialogue is calculated based on the time decay weight, fixed entity tags, and high-value tags. If the score is greater than or equal to the preset value, the summary corresponding to the complete original dialogue will be added to the script request payload.

[0060] The time decay weight corresponding to the historical rounds relative to the current round of the dynamic sliding window can be understood as the time decay weight of the sliding window relative to the current sliding window when processing the complete dialogue text of the historical rounds. In one example, if the real-time call text contains 9 rounds of complete dialogue text, and the sliding window processes 3 rounds of complete dialogue text per cycle, then when processing rounds 7-9, the time decay weights of rounds 1-3 and 4-6 relative to the current rounds 7-9 can be calculated. In one example, the time decay weight can be calculated using an exponential decay function.

[0061] In one possible implementation, the time decay weight can be calculated using the following formula: ; in, The time decay weight represents the time decay weight corresponding to the complete dialogue text of the i-th cycle in the sliding window processing, which is a preset number of rounds. This indicates the preset attenuation coefficient, which can be set according to actual needs, for example... It can be set to 0.5. This represents the interval between the i-th cycle processed by the sliding window and the current cycle processed by the sliding window. Taking a real-time call text with 9 rounds of complete dialogue as an example, and the sliding window processing 3 rounds of complete dialogue text per cycle, if the sliding window is currently processing the 7th to 9th rounds of complete dialogue text, then for the 1st to 3rd rounds of complete call text processed in the 1st cycle, the corresponding interval is 2; for the 4th to 6th rounds of complete call text processed in the 2nd cycle, the corresponding interval is 1.

[0062] The fixed entity label indicates whether the complete dialogue text contains key elements stored in the long-term fixed entity area. In other words, when processing the complete dialogue text of this historical turn, the sliding window extracts structured information of a preset type already stored in the current long-term fixed entity area during the key element extraction process. For example, the fixed entity label can be represented by a binary value; for instance, "1" indicates that the complete dialogue text of this historical turn contains key elements stored in the long-term fixed entity area, and "0" indicates that the complete dialogue text of this historical turn does not contain key elements stored in the long-term fixed entity area.

[0063] A high-value tag indicates whether the complete original dialogue text contains semantic text representing a preset high-value type. This high-value semantic text is text set according to actual business needs, such as text related to refunds, complaints, or compensation. The high-value tag can be automatically assigned by detecting the complete original dialogue text using a pre-trained semantic recognition model to determine whether it contains keywords of the preset high-value type or semantically similar preset high-value semantic text. Similarly, the high-value tag can also be represented by a binary value, facilitating subsequent calculation of the retention score. In one example, "1" indicates that the complete original dialogue text contains semantic text of the preset high-value type (such as refunds, complaints, or compensation); "0" indicates that the complete original dialogue text does not contain semantic text of the preset high-value type.

[0064] When calculating retention scores, different weights can be assigned to time decay weights, fixed entity tags, and high-value tags before calculating the retention score. In one example, the retention score can be calculated using the following formula: ; Where i represents the complete dialogue text of the preset number of rounds in the i-th cycle of the sliding window processing; Indicates the retained score; Indicates the time decay weight. Indicates a fixed physical label. Indicates a high-value label; The preset weight representing the time decay weight; Indicates the preset weight of the fixed entity label; This indicates the preset weight of high-value tags.

[0065] In practical applications, , and The configuration can be tailored to specific business needs. For example, in an after-sales customer service scenario where high-value tags are considered more important, then... Increase it appropriately, for example, set =0.3、、 =0.3 and =0.4. In another example, to ensure that the retained score is within the normalization range, you can set... , and The constraints between them are + + =1, >0.

[0066] The system using the embodiments of this application can retain text that may affect the subsequent generation of recommended dialogue by calculating the retention score of the processed complete dialogue text, thereby ensuring the accuracy of the generated recommended dialogue.

[0067] In one possible implementation, the agent's front end is specifically used for, For each recommended phrase, upon receiving the corresponding word element, an independent candidate card is created; each candidate card has an independent buffer. For each candidate card, determine whether it continues to receive the word units corresponding to the recommended phrase; If not, then for that candidate card, display the operation buttons and render the recommended message; If so, the buffer corresponding to the candidate card is updated, and the recommended phrase is rendered in the candidate card.

[0068] In practical applications, each recommended phrase has its own independent rendering queue. This queue uses incremental parsing and frame-by-frame updates to render the recommended phrase within the candidate cards. For example, if multiple recommended phrases are being rendered, a separate rendering queue is maintained for each phrase, and the rendering of that phrase is completed through this queue. This independent rendering queue is used to execute the rendering operation for the recommended phrase. This avoids layout jitter caused by rearranging multiple recommended phrases at once, improving the user experience for agents viewing recommended phrases.

[0069] Since the recommended messages are pushed out via a token stream, they can be rendered using streaming rendering. For example... Figure 4 The diagram illustrates a flowchart for rendering recommended phrases. Upon receiving the first token, a blank, independent candidate card is created. This recommended phrase has its own independent buffer and incremental parser, which are then initialized. It is determined whether the token stream corresponding to the recommended phrase is continuously being received. If not, the recommended phrase is considered to have been successfully pushed. While rendering the recommended phrase text, an action button is displayed. This action button corresponds to the agent's relevant actions for the recommended phrase. In one example, this action could be "accept," "copy," "send," "like," "ignore," "delete," or "dislike."

[0070] If so, the token stream is continuously received, text rendering is appended to the candidate card, and updates are continuously performed in its corresponding buffer until the rendering of the recommended message is completed.

[0071] In practical applications, when there are many recommended scripts, agents may be viewing historical recommended scripts. To avoid affecting the agent's viewing experience, the work schedule front-end in this embodiment is also used for... Determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0072] The preset script display interface is the interface displayed on the agent's front end, where recommended scripts are rendered and displayed. This preset script display interface can be located anywhere on the display interface according to actual needs. In one example, the preset script display interface can be located on the left, right, middle, top, or bottom of the display interface. In another example, the preset script display interface can be a draggable active interface, which agents can drag to the target location by touching or operating the mouse.

[0073] If the preset script display shows historical recommended scripts, it means the agent is currently viewing historical recommended scripts. In this case, the currently rendering recommended script will be silently rendered at the bottom and will not be displayed in the preset script display interface to prevent it from obstructing the agent's view. The preset bottom interface can be the bottom of the preset script display interface or the bottom of the display interface. In practical applications, this interface can also be located in any other position that will not obstruct the currently displayed historical recommended scripts. The notification message is used to inform the agent that a new recommended script is being rendered, but it does not forcibly take over the scroll focus.

[0074] If not, the recommended script will automatically and smoothly scroll to the bottom of the preset script display interface. A smooth transition can be achieved through methods such as anti-aliasing, abrupt transition elimination, and linear interpolation. For example... Figure 4 The flowchart shown can also determine whether there are multiple candidate concurrent streams during rendering, i.e., multiple recommended phrases. If so, independent queues are allocated to advance the rendering of the recommended phrases to avoid mutual blocking.

[0075] In practical applications, rendering recommended phrases can also involve pre-estimating the height changes of candidate cards and controlling their expansion through smooth transitions, thereby reducing layout jumps. In another example, when a candidate has reached its full readable threshold within a short period, it can be highlighted so that agents can prioritize reading the rendered content.

[0076] The system using the embodiments of this application renders the recommended scripts in independent cards through independent queues, which can avoid layout jitter when there are multiple candidate recommended scripts and improve the experience of agents viewing recommended scripts.

[0077] In a second aspect of this application, a method for recommending agent scripts is provided. The method is applied to the agent front-end of an agent script recommendation system, the system further including a server back-end. The method includes, for example,... Figure 5 The steps shown are as follows: Step S501: Obtain real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users.

[0078] Step S502: Maintain a dynamic sliding window in memory, retain the complete dialogue text of the most recent preset number of rounds as a short-term memory area; extract key elements from the complete dialogue text, and store the structured information of the preset type in the long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides.

[0079] Step S503: Identify the user's emotions. When the script recommendation trigger condition is met, generate a script request payload based on the short-term memory area, long-term entity solidification area, and emotion tag used to represent the user's current emotion category, and send it to the server backend; receive the recommended script sent by the server backend.

[0080] Step S504: For each recommended phrase, allocate an independent card and an independent buffer, and render the recommended phrase. The card is used to display the recommended phrase.

[0081] In one possible implementation, the method further includes: determining that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0082] In one possible implementation, the method further includes: For the complete dialogue text that has undergone a preset number of rounds of processing via the dynamic sliding window, Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

[0083] In one possible implementation, for each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase, including: For each recommended phrase, upon receiving the corresponding word element, an independent candidate card is created; each candidate card has an independent buffer. For each candidate card, determine whether it continues to receive the word units corresponding to the recommended phrase; If not, then for that candidate card, display the operation buttons and render the recommended message; If so, the buffer corresponding to the candidate card is updated, and the recommended phrase is rendered in the candidate card.

[0084] In one possible implementation, the method further includes: Determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0085] In one possible implementation, the method further includes: Check if multiple recommended phrases are being displayed. If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

[0086] In a third aspect of this application, a seat script recommendation device is provided. The device is applied to the seat front-end of the seat script recommendation system, which further includes a server back-end. The device includes, for example, […]. Figure 6 The structure shown is as follows: The real-time call text acquisition module 601 is used to acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users.

[0087] The element extraction module 602 is used to maintain a dynamic sliding window in memory, retain the complete dialogue text of the most recent preset number of rounds as a short-term memory area; extract key elements from the complete dialogue text, and store the structured information of preset type in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides.

[0088] The script request payload generation module 603 is used to identify the user's emotions. When the script recommendation triggering condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend.

[0089] The recommended script receiving module 604 is used to receive recommended scripts sent by the server backend.

[0090] The recommended dialogue rendering module 605 is used to allocate an independent card and an independent buffer for each recommended dialogue and to render the recommended dialogue. The card is used to display the recommended dialogue.

[0091] In one possible implementation, the device further includes: The script recommendation trigger condition judgment module is used to determine that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

[0092] In one possible implementation, the device further includes: The retention score calculation module is used to process the complete dialogue text for a preset number of rounds after the dynamic sliding window has been processed. Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

[0093] In one possible implementation, the recommended script rendering module includes: The candidate card creation submodule is specifically used to create an independent candidate card for each recommended phrase when the corresponding word unit of the recommended phrase is received; each candidate card has an independent buffer. The word element receiving and judgment submodule is specifically used to determine whether each candidate card continuously receives the word element corresponding to the recommended phrasing; The recommended dialogue rendering submodule is specifically used to display the operation buttons for the candidate card if the condition is not met, thus completing the rendering of the recommended dialogue. The buffer update submodule is specifically used to continue updating the buffer corresponding to the candidate card if the condition is met, and to render the recommended message in the candidate card.

[0094] In one possible implementation, the device further includes: Historical Recommended Script Judgment Module; used to determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

[0095] In one possible implementation, the device further includes: The multi-path rendering detection module is used to detect whether multiple recommended phrases are being rendered; If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

[0096] In another aspect of the embodiments of this application, an electronic device is also provided, see [link to relevant documentation]. Figure 7 ,include: Memory 701 is used to store computer programs; Processor 702, when executing a program stored in memory, implements: Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users. A dynamic sliding window is maintained in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; key elements are extracted from the complete dialogue text, and structured information of preset types is stored in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; The system identifies the user's emotions, and when the script recommendation trigger condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend; it also receives the recommended script sent by the server backend. For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

[0097] The communication bus mentioned in the above electronic devices can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in the diagram, but this does not mean that there is only one bus or one type of bus.

[0098] The communication interface is used for communication between the aforementioned electronic devices and other devices.

[0099] The memory may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

[0100] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0101] In another embodiment provided in this application, a computer-readable storage medium is also provided, which stores a computer program that, when executed by a processor, implements any of the methods described above in the second aspect of the embodiments of this application.

[0102] In yet another embodiment provided in this application, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to implement any of the methods described above in the second aspect of the embodiments of this application.

[0103] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a solid-state drive (SSD), etc.

[0104] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0105] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. Related parts can be found in the descriptions of the system embodiments.

[0106] The above description is merely a preferred embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application are included within the scope of protection of this application.

Claims

1. A seat-based sales script recommendation system, characterized in that, The system includes an agent front-end and a server back-end. The front end of the seat is used for, Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users. Maintain a dynamic sliding window in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; Key elements are extracted from the complete original dialogue text, and structured information of a preset type is stored in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; Identify the user's emotions, and when the script recommendation trigger condition is met, generate a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and send it to the server backend; The server backend is used for, Upon receiving the script request payload, a similarity search is performed in a preset database based on the latest question in the complete original dialogue text, resulting in multiple search results. Based on the search results, the emotion tags, the long-term entity solidification area, the preset style, and the short-term memory area, structured prompt words are constructed, and the structured prompt words are input into the large language model so that the large language model generates recommended dialogue according to the preset style; The front end of the seat is also used for, For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

2. The system according to claim 1, characterized in that, The agent front end is also used to determine that the script recommendation trigger condition has been met when any of the following conditions are satisfied: By combining image stabilization detection and semantic pause detection, it was determined that the user was pausing in speech; Identify user emotions and determine when users experience emotional changes; It detects users' real-time voice and identifies interrogative sentences.

3. The system according to claim 1, characterized in that, The front end of the seat is also used for, For the complete dialogue text that has undergone a preset number of rounds of processing via the dynamic sliding window, Calculate the time decay weight of the corresponding historical round from the round corresponding to the current dynamic sliding window; Obtain its fixed entity tags and high-value tags, wherein the fixed entity tags indicate whether the complete original dialogue text contains key elements stored in the long-term fixed entity area; the high-value tags indicate whether the complete original dialogue text contains semantic text representing a preset high-value type; The retention score of the complete original dialogue text is calculated based on the time decay weight, the fixed entity tag, and the high-value tag. If the retention score is greater than or equal to a preset value, the summary corresponding to the complete original dialogue is added to the script request payload.

4. The system according to claim 1, characterized in that, The front end of the seat is specifically used for, For each recommended phrase, upon receiving the corresponding word element, an independent candidate card is created; each candidate card has an independent buffer. For each candidate card, determine whether it continues to receive the word units corresponding to the recommended phrase; If not, then for that candidate card, display the operation buttons and render the recommended message; If so, the buffer corresponding to the candidate card is updated, and the recommended phrase is rendered in the candidate card.

5. The system according to claim 4, characterized in that, The front end of the seat is also used for, Determine whether the preset script display interface is currently showing historical recommended scripts; If so, the currently rendered recommended script will not be displayed in the preset script display interface, but a prompt message will be displayed in the preset bottom interface; If not, the currently recommended script will be displayed in the preset script display interface in a smooth transition.

6. The system according to claim 5, characterized in that, The front end of the seat is also used for, Check if multiple recommended phrases are being displayed. If so, for each recommended phrase, a separate rendering queue is used to render the recommended phrase; the separate rendering queue is used to perform the rendering operation of the recommended phrase.

7. A method for recommending agent conversation scripts, characterized in that, The method is applied to the agent front-end of the agent dialogue recommendation system, the system further includes a server back-end, and the method includes: Acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings to obtain real-time call text; the real-time call recordings include real-time calls between agents and users. A dynamic sliding window is maintained in memory, retaining the complete dialogue text of the most recent preset number of rounds as a short-term memory area; key elements are extracted from the complete dialogue text, and structured information of preset types is stored in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; The system identifies the user's emotions, and when the script recommendation trigger condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend; it also receives the recommended script sent by the server backend. For each recommended phrase, an independent card and an independent buffer are allocated for rendering the recommended phrase. The card is used to display the recommended phrase.

8. A seat-based sales script recommendation device, characterized in that, The device is applied to the agent front-end of the agent script recommendation system, the system further includes a server back-end, and the device includes: The real-time call text acquisition module is used to acquire real-time call recordings, perform speech recognition and text conversion on the real-time call recordings, and obtain real-time call text; the real-time call recordings include real-time calls between agents and users; The element extraction module is used to maintain a dynamic sliding window in memory, retain the complete dialogue text of the most recent preset number of rounds as a short-term memory area; extract key elements from the complete dialogue text, and store the structured information of preset types in a long-term entity solidification area; the data in the long-term entity solidification area is not deleted as the sliding window slides; The script request payload generation module is used to identify the user's emotions. When the script recommendation triggering condition is met, it generates a script request payload based on the short-term memory area, the long-term entity solidification area, and the emotion tag used to represent the user's current emotion category, and sends it to the server backend. The recommended script receiving module is used to receive recommended scripts sent by the server backend; The recommended dialogue rendering module is used to allocate an independent card and an independent buffer for each recommended dialogue and to render the recommended dialogue. The card is used to display the recommended dialogue.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor, when executing a program stored in memory, implements the method of claim 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method of claim 7.