A service quality inspection method, apparatus, equipment, and storage medium

By processing session text and feature vectors through a multi-encoding network, and utilizing a text encoding and feature encoding network constructed with a hierarchical attention network, the problems of low recovery rate and insufficient accuracy in existing service quality inspection methods are solved, achieving a more accurate and comprehensive service quality assessment.

CN114840646BActive Publication Date: 2026-06-30TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-02-01
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing service quality inspection methods suffer from low response rates and insufficient accuracy, especially when users do not fill in satisfaction ratings and only focus on the first-time resolution rate. They cannot comprehensively and systematically evaluate service quality, particularly for specific business sets such as account suspension and risk control issues.

Method used

A multi-encoding network is used to process the conversation text and feature vectors. The text encoding network constructed by the hierarchical attention network and the feature encoding network of different network models are used to extract user and customer service side information in multi-turn conversations. The quality inspection results are determined by combining classification layers.

Benefits of technology

It improves the accuracy and usability of service quality inspection, enables a more comprehensive assessment of service quality, reduces reliance on user feedback data, and enhances classification effectiveness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114840646B_ABST
    Figure CN114840646B_ABST
Patent Text Reader

Abstract

This application discloses a service quality inspection method, apparatus, device, and storage medium, relating to the field of intelligent customer service technology in artificial intelligence. The method includes: acquiring sentence vector groups for each round of conversation in M ​​rounds of conversation related to a target service, the sentence vector groups including customer service-side sentence vector groups and user-side sentence vector groups; acquiring feature vector groups obtained by pre-extracting features from the conversation data of the M rounds of conversations; inputting the M customer service-side sentence vector groups and the M user-side sentence vector groups into a text encoding network to obtain a target conversation vector; inputting the M feature vector groups into a feature encoding network to obtain a target feature vector, the feature encoding network using a different network model than the text encoding network; and determining the service quality inspection result of the target service based on the target conversation vector and the target feature vector. This application is not limited by user feedback data and special business requirements, improving usability and accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of intelligent customer service technology, and in particular to a service quality inspection method, apparatus, equipment and storage medium. Background Technology

[0002] With the development of artificial intelligence, the way services are provided to users is gradually evolving from the single channel of traditional human telephone to diversified channels, and more and more companies are starting to use intelligent customer service to provide services.

[0003] In related technologies, one approach to service quality inspection is based on satisfaction levels. This method requires users to fill in their satisfaction ratings, but a large proportion of users in current online services do not fill in their satisfaction ratings, resulting in a low return rate. Another approach is based on first-time resolution rate. This evaluation method only focuses on whether users request customer service again, which is not systematic or comprehensive enough. Furthermore, it has low accuracy in evaluating specific business sets because users in specific business sets frequently file customer complaints, such as account suspension issues or risk control issues, leading to low availability. Summary of the Invention

[0004] This application provides a service quality inspection method, apparatus, device, and storage medium that integrates multi-turn conversation text with pre-extracted conversation features using different network models. This allows for the extraction of more information about the conversation between customer service and users, without being limited by user feedback data or special business requirements, thus improving usability and accuracy.

[0005] On the one hand, this application provides a service quality inspection method, the method comprising:

[0006] Obtain the sentence vector group for each round of conversation in M ​​rounds of conversation related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1.

[0007] Obtain the feature vector group obtained by performing feature extraction on the session data of the M rounds of sessions;

[0008] The M customer service-side sentence vector groups and the M user-side sentence vector groups are input into a text encoding network to obtain the target session vector. The text encoding network is pre-constructed based on a hierarchical attention network.

[0009] The feature vector set is input into a feature encoding network to obtain the target feature vector. The feature encoding network uses a different network model than the text encoding network.

[0010] The quality inspection result of the target service is determined based on the target session vector and the target feature vector.

[0011] On the other hand, a service quality inspection device is provided, the device comprising:

[0012] The text vector acquisition module is used to acquire the sentence vector group of each session in M ​​rounds of sessions related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1.

[0013] The feature vector acquisition module is used to acquire the feature vector group obtained by extracting features from the session data of the M rounds of sessions;

[0014] A text encoding module is used to input M customer service-side sentence vector groups and M user-side sentence vector groups into a text encoding network, and obtain a target session vector through the text encoding network, wherein the text encoding network is pre-constructed based on a hierarchical attention network;

[0015] The feature encoding module is used to input the M feature vector groups into the feature encoding network and obtain the target feature vector through the feature encoding network. The feature encoding network uses a different network model than the text encoding network.

[0016] The service quality inspection module is used to determine the quality inspection result of the target service based on the target session vector and the target feature vector.

[0017] On the other hand, an electronic device is provided, the device including a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or at least one program being loaded by the processor and executed by the service quality inspection method as described above.

[0018] On the other hand, a computer storage medium is provided, wherein at least one instruction or at least one program is stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the service quality inspection method as described above.

[0019] This application's embodiments utilize a text encoding network to process sentence vector groups to obtain target session vectors, and a feature encoding network (different from the text encoding network) to process feature vector groups to obtain target feature vectors. Based on the target session vectors and target feature vectors, the quality inspection result of the target service is determined. A multi-encoding network is employed to process the session text (sentence vector groups) and session features (feature vector groups), and the dialogue text is divided into user-side and customer service-side. Through multi-dimensional information input, the service quality inspection model can learn more and finer features from the M rounds of conversation between the user and customer service, resulting in more precise classification results. Furthermore, the text encoding network is built based on a hierarchical attention network. The hierarchical architecture of the hierarchical attention network can better encode hierarchical data such as session text, resulting in better classification performance and improved quality inspection accuracy. Simultaneously, both the sentence vector groups and feature vector groups are obtained from multiple rounds of conversation related to the target service, not limited by subjective data such as user feedback data or specific business requirements, thus improving usability. Attached Figure Description

[0020] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a schematic diagram of the structure of a service quality inspection system provided in an embodiment of this application.

[0022] Figure 2 This is a schematic diagram of the structure of a service quality inspection model provided in an embodiment of this application.

[0023] Figure 3 This is a schematic diagram of the structure of the text encoding network provided in the embodiments of this application.

[0024] Figure 4 This is a schematic diagram of another service quality inspection model provided in the embodiments of this application.

[0025] Figure 5 This is a flowchart illustrating a service quality inspection method provided in an embodiment of this application.

[0026] Figure 6 This is a schematic diagram of the process of obtaining the target session vector through a text encoding network according to an embodiment of this application.

[0027] Figure 7 This is a schematic diagram of the process for encoding and fusing customer service sentence vectors provided in the embodiments of this application.

[0028] Figure 8 This is a schematic diagram of the process for encoding and fusing user-side sentence vectors provided in an embodiment of this application.

[0029] Figure 9 This is a schematic diagram of the process of obtaining the target session vector through the session expression subnetwork provided in the embodiments of this application.

[0030] Figure 10 This is a flowchart illustrating the process of determining the quality inspection results of the target service provided in an embodiment of this application.

[0031] Figure 11 This is a flowchart illustrating another service quality inspection method provided in an embodiment of this application.

[0032] Figure 12 This is a schematic diagram of the structure of a service quality inspection device provided in an embodiment of this application.

[0033] Figure 13 This is a schematic diagram of another service quality inspection device provided in the embodiments of this application.

[0034] Figure 14 This is a schematic diagram of the hardware structure of a device for implementing the method provided in the embodiments of this application. Detailed Implementation

[0035] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0036] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning.

[0037] The technical solutions provided in this application relate to Machine Learning (ML) technology in artificial intelligence. Machine learning is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and many other disciplines. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instructional learning.

[0038] The technical solution provided in this application uses artificial neural network technology from machine learning in its quality inspection model. With the research and advancement of artificial intelligence technology, it is being researched and applied in various fields, such as smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, autonomous driving, drones, robots, smart healthcare, and smart customer service. It is believed that with technological development, artificial intelligence technology will be applied in more fields and play an increasingly important role.

[0039] The technical solutions provided in this application relate to the field of intelligent customer service applications using artificial intelligence. Service quality inspection, as the basis for daily assessment and service process optimization, plays a crucial role in the field of intelligent customer service. In related technologies, one approach to service quality inspection is based on satisfaction levels. Satisfaction levels refer to users evaluating the service at the end of the service, meaning this method requires users to fill in satisfaction scores. However, in current online services, a large proportion of users do not fill in satisfaction scores, resulting in a low return rate. Another approach is based on first-time resolution rate, which refers to whether users request customer service again within a certain period after the current service is completed. This evaluation method only focuses on whether users request customer service again, which is not systematic or comprehensive enough, and its accuracy in evaluating specific business sets is also low because users in specific business sets frequently file complaints, such as account suspension issues, risk control issues, etc., resulting in low availability.

[0040] In view of this, this application provides a service quality inspection method to solve the above-mentioned problems. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0041] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0042] Please see Figure 1 It shows a schematic diagram of a service quality inspection system. For example... Figure 1 As shown, the system may include a client 10, a server 20, and a database 30. The server 20 communicates with the client 10 and the database 30 through a communication network. Optionally, the communication network may be a wired network or a wireless network.

[0043] Client 10 can be a smartphone, desktop computer, tablet, laptop, digital assistant, smart wearable device, monitoring device, or voice interaction device. Client 10 has an application 11 installed that supports conversational functionality. Users can interact with a smart customer service representative, either a human or an AI representative, by running application 11.

[0044] Server 20 can be a standalone server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. In one possible implementation, server 20 can be the backend server for application 11 in client 10.

[0045] like Figure 1As shown, users can run application 11 to display the user interface of the intelligent customer service. Users can enter conversational sentences or voice messages to engage in a conversation with the intelligent customer service for consultation or resolution of issues. After the conversation ends, client 10 sends the conversational sentences (which may include conversational sentences converted from voice messages) or voice messages from the M rounds of conversation between the user and the intelligent customer service to server 20. Server 20 generates a customer service ticket based on the conversational sentences or voice messages from the M rounds of conversation and stores it in database 30. It is understood that the service may be completed in only one round of conversation between the user and the intelligent customer service, therefore M is a natural number greater than or equal to 1.

[0046] When quality inspection of the target service is required, server 20 acquires the feature vector group related to the target service and the sentence vector group of each round of the M-round session. Then, based on the sentence vector group and the feature vector group, it performs quality inspection using a pre-built service quality inspection model. The sentence vector group represents the conversation text. The server divides the sentence vector group of each round of the session into customer service side A and user side U, that is, the sentence vector group includes customer service side sentence vector group and user side sentence vector group. The feature vector group is obtained by feature extraction based on the conversation data of each round of the M-round session. The conversation data represents conversation sentences and / or conversation speech. Here, the target service refers to the service provided by customer service to the target user. It can be understood that the target service can be all services involved in the entire customer service ticket, or it can be a part of the services in the customer service ticket.

[0047] To better perform quality inspection on the target service, this application embodiment constructs a service quality inspection model that combines multi-turn text features. This model extracts quality inspection information from two aspects: multi-turn conversation text and text features extracted from multi-turn conversation text.

[0048] Please see Figure 2 This is a schematic diagram of a service quality inspection model, upon which the server performs quality inspections of the target service. For example... Figure 2 As shown, the model can include an input layer, a multi-turn conversation text encoding layer, a conversation feature encoding layer, and a combined classification layer. The input layer can include two parts: the conversation text of the target service (i.e., the sentence vector groups for each turn of the conversation) and the conversation features (i.e., the feature vector groups). The multi-turn conversation text encoding layer uses M customer service-side sentence vector groups and M user-side sentence vector groups to obtain the target conversation vector. The conversation feature encoding layer uses the feature vector groups to obtain the target feature vector. The combined classification layer determines the quality inspection result of the target service based on the target conversation vector and the target feature vector.

[0049] In this embodiment, the multi-turn conversation text encoding layer uses a text encoding network to encode, fuse, and interact with M customer service-side sentence vector groups and M user-side sentence vector groups respectively to obtain the target conversation vector. The text encoding network is pre-constructed based on Hierarchical Attention Networks (HAN). The conversation feature encoding layer uses a feature encoding network to process the feature vector group to obtain the target feature vector. The feature encoding network and the text encoding network use different network models.

[0050] In existing technical solutions, the classification of multi-turn conversation text typically involves classifying the single-turn text after concatenating multiple turns and using a single feature extraction and encoding method, resulting in poor classification performance. In contrast, HAN's hierarchical architecture can effectively encode hierarchical data such as conversation text, demonstrating excellent classification performance.

[0051] Please refer to the details. Figure 3 This is a schematic diagram of the structure of a text encoding network. For example... Figure 3 As shown, the text encoding network may include a first encoding subnetwork, a second encoding subnetwork, and a session expression subnetwork. The first encoding subnetwork encodes and fuses M customer service-side sentence vector groups input from the customer service side to obtain a first session vector (ASentEmb). The second encoding subnetwork encodes and fuses M user-side sentence vector groups input from the user side to obtain a second session vector (USentEmb). The session expression subnetwork fuses the first and second session vectors to obtain a target session vector (SessionEmb).

[0052] The encoding fusion process of both the first and second encoding subnetworks includes two parts: encoding and fusion. During encoding, the encoding layers in the first and second encoding subnetworks can use different encoders or the same encoder. During fusion, the fusion layers in the first and second encoding subnetworks can use different fusion operations or the same fusion operation. By employing a hybrid network with multiple encoders and different fusion operations, different features from the conversational text can be extracted. The encoder layer can support encoders such as GRU (Gated Recurrent Unit), LSTM (Long Short-Term Memory), CNN (Convolutional Neural Networks), LSTM_Attention (an attention layer added to LSTM), Transformer, Cnn_Lstm (a combination of CNN and LSTM), Cnn_Topk (a combination of CNN and TOPK algorithm), SRU (Simple Recurrent Units), and SWEM (Simple word-embedding model), etc. The fusion layer can use fusion operations such as Self_Attention, Label_Attention, Routing_Attention, LEAM, and MEAN, etc.

[0053] The fusion processing of the session representation subnetwork can be performed by directly fusing the first session vector and the second session vector, or it can be performed by first interactively processing the first session vector and the second session vector and then fusing them. For example... Figure 3 Interaction processing can be achieved through an interaction layer, and fusion processing can be achieved through a third fusion layer. The interaction layer can include several interaction methods such as concat, Multi-Head Attention, and esim (Enhanced Sequential Inference Model, human-computer dialogue model). Similar to the fusion operations used in the fusion layers of the second or first encoding sub-network, the fusion operations used in the third fusion layer can also include Self-Attention, Label Attention, LEAM, MEAN, and Routing Attention, etc.

[0054] For the conversation feature encoding layer, since the feature vector group represents one or more low-order combined features, such as customer service features, voice features, etc., these combined features are meaningful and efficient, while the features learned by deep models are highly nonlinear high-order combined features. Therefore, the feature encoding network can use encoding operations such as DCN (Deep & Cross Network), MLP (Multi-Layer Perceptron), NFM (Neural Factorization Machine), and DeepFM (Deep Factorization Machine) so that the obtained target feature vector (FeatureEmb) can learn high-order combined features.

[0055] After multiple rounds of session text encoding and session feature encoding, two types of vectors, SessionEmb and FeatureEmb, are obtained. The combined classification layer can concatenate or pool these two vectors to obtain the target classification vector. Then, the target classification vector is classified to determine the quality inspection result of the target service.

[0056] like Figure 4 As shown, this is a schematic diagram of another service quality inspection model. Figure 4 In this system, the combined classification layer can include a feature combination layer and a classification layer. The feature combination layer is used to concatenate or pool the SessionEmb and FeatureEmb vectors to obtain the target classification vector. The classification layer is used to classify the target classification vector to determine the quality inspection result of the customer service ticket. The classification process can include binary classification or multi-class classification, and correspondingly, the quality inspection result can be a binary classification result or a multi-class classification result, with different classification results representing different service levels. For example, a binary classification result could be "like" or "dislike," and a multi-class classification result could be "very good," "good," "average," "poor," or "poor," etc. Understandably, the binary or multi-class classification result can also be any letter or number, and the service level can be determined based on the letter or number.

[0057] In the above technical solution, when performing quality inspection on the target service, the server uses different encoding networks based on the service quality inspection model to process the conversation text and conversation features. The conversation text is divided into user-side and customer service-side, allowing the model to learn more and finer features from the M rounds of conversation between the user and customer service, resulting in more precise classification results. Furthermore, HAN's hierarchical architecture can better encode hierarchical data like conversation text, achieving better classification performance and thus improving the accuracy of the quality inspection.

[0058] For ease of description, the service quality inspection methods in the following embodiments are illustrated using a server as the execution subject. This specification provides the operational steps of the methods described in the embodiments or flowcharts, but based on conventional or non-inventive labor, more or fewer operational steps may be included. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or server product execution, the methods can be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment) as shown in the embodiments or accompanying drawings.

[0059] Please see Figure 5 This illustrates a flowchart of a service quality inspection method provided in an embodiment of this application. Figure 5 As shown, the method may include:

[0060] S501, obtain the sentence vector group of each session in the M rounds of sessions related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1.

[0061] In this embodiment, the target service refers to the service provided by customer service to the target user. It is understood that the target service can be all services provided to the target user, that is, all services in the customer service ticket generated after the session between the customer service representative and the target user ends. In some implementations, the target service can also be a portion of the services provided by customer service to the target user in the customer service ticket, that is, a portion of the services in the customer service ticket generated after the session between the customer service representative and the target user ends.

[0062] Each round of conversation is represented by a sentence vector group, which represents the conversation text, i.e., the conversation sentence. The customer service-side sentence vector group includes one or more customer service-side sentence vectors, which represent customer service-side conversation sentences. These customer service-side conversation sentences can be customer service conversation sentences or conversation sentences converted from customer service conversation voice. The user-side sentence vector group includes one or more user-side sentence vectors, which represent user-side conversation sentences. These user-side conversation sentences can be user conversation sentences or conversation sentences converted from user conversation voice.

[0063] like Figure 2 In the middle, if U is used j (j=1,2,…,M) represents the user-side sentence vector group of the j-th round of conversation, denoted by u ji Let U represent the i-th user-side sentence vector in the j-th round of the conversation. If the number of user-side sentence vectors in the j-th round of the conversation is n, then U j It can be represented as {u j1 ,u j2 ,…,u jnSimilarly, if we use A... j (j=1,2,…,M) represents the customer service side sentence vector group of the j-th round of conversation, denoted by a ji Let A represent the i-th customer service phrase vector in the j-th round of the conversation. If the number of customer service phrase vectors in the j-th round of the conversation is n, then A j It can be represented as {a j1 ,a j2 ,…,a jn}

[0064] Understandably, when the target service has only one round of conversation, there may be a situation where there is only one conversational sentence or voice, meaning the total number of customer service-side sentence vectors in the customer service-side sentence vector group and the total number of user-side sentence vectors in the user-side sentence vector group is 1. For example, a customer service representative proactively contacts a user to inquire about their order information, but if the representative does not receive a response from the user for an extended period, the conversation between the representative and the user may end due to timeout mechanisms. The client generates a customer service ticket based on the conversational sentence or voice message from the customer service representative inquiring about the user's order information. This customer service ticket contains only one service, i.e., one round of conversation. Therefore, the customer service-side sentence vector group contains only one customer service-side sentence vector, which represents the conversational sentence inquiring about the user's order information. Since the user did not participate in any conversation, the number of user-side sentence vectors in the user-side sentence vector group is zero.

[0065] S503, obtain the feature vector group obtained by feature extraction from M rounds of sessions.

[0066] In this embodiment, the feature vector set can be obtained by pre-extracting features from the M rounds of sessions, or by extracting features in real time. The feature vector set can include feature vectors composed of various numerical features and background (BGM) features. The server can use feature engineering techniques to extract features from the session data of each round of the M rounds of sessions to obtain various numerical features; then, each type of numerical feature is embedded to obtain a feature vector corresponding to each type of numerical feature. The server can extract features from the session business, customer service information, and user information involved in the M rounds of sessions to obtain feature vectors corresponding to the background features; the server combines the feature vectors corresponding to each type of numerical feature and the feature vectors corresponding to the background features into a feature vector set. Figure 2 In the middle, if f is used j (j = 1, 2, ..., K) represents the j-th eigenvector, and K is the number of eigenvectors. Then the eigenvector set f can be represented as {f1, f2, ..., fj}. K}

[0067] Background features represent the background information of customer service representatives, user background information, or attribute information of the conversation, such as whether the user is a new user, whether the customer service representative is a new employee, whether the conversation is a new business, customer service history records, and whether the user has made repeated inquiries.

[0068] Numerical features can include several categories such as text features, customer service features, user features, ticket features, and voice features. Text features represent the text information of M rounds of conversation, such as the average number of words per sentence by customer service representatives and the average number of words per sentence by users. Customer service features represent the conversational characteristics of customer service representatives, such as the probability of repeating phrases and the probability of customer service representatives becoming impatient. User features represent the conversational characteristics of users, such as the duration of user speech and the number of interactions between users and customer service representatives. Ticket features represent the attribute characteristics of customer service tickets related to the target service, such as the type of customer service ticket and whether it is a repeat order. Voice features represent the information of customer service or user conversational voices, such as the minimum and maximum zero-crossing rates. In practice, more numerical features can be extracted. It is understood that the finer the category or granularity of the numerical features, the higher the accuracy of information extraction obtained by the model.

[0069] S505: Input M customer service-side sentence vector groups and M user-side sentence vector groups into the text encoding network, and obtain the target session vector through the text encoding network. The text encoding network is pre-constructed based on the hierarchical attention network.

[0070] The hierarchical architecture of hierarchical attention networks can effectively encode hierarchical data such as conversational text, resulting in more accurate conversation vector representations. A text encoding network can be used to fuse M customer service-side sentence vector groups and M user-side sentence vector groups. Then, a hierarchical attention network can be used to process the fused vectors to obtain the target conversation vector.

[0071] To extract the different features from the customer service side and the user side, the text encoding network can encode the M customer service-side sentence vector groups and the M user-side sentence vector groups separately using different encoding sub-networks. Then, the two vectors obtained from the separate encoding processes are fused to obtain the target session vector. The architecture of the encoding sub-network is similar to that of a hierarchical attention network, such as... Figure 2 As shown, the text encoding network may include a first encoding subnetwork, a second encoding subnetwork, and a conversational expression subnetwork. The first and second encoding subnetworks adopt a network model similar to that of the hierarchical attention network.

[0072] In view of this, in one possible implementation, such as Figure 6 As shown, step S505 can specifically include the following when implemented:

[0073] S5051, the first session vector is obtained by encoding and fusing the M customer service sentence vector groups using the first encoding sub-network.

[0074] The first conversation vector represents the customer service side's conversation information in M ​​rounds of conversation. The first encoding sub-network can encode the customer service side sentence vectors in each customer service side sentence vector group to obtain the encoded vector corresponding to each customer service side sentence vector. Then, the encoded vectors corresponding to each customer service side sentence vector are directly fused to obtain the first conversation vector. Encoding the customer service side sentence vectors is also a sentence-level encoding process. However, each round of conversation has a different conversation representation. If the sentence-level encoding results are directly fused, the conversation-level features may be ignored.

[0075] Based on the above description, in one possible implementation, such as Figure 7 As shown, step S5051, in a specific implementation, may include:

[0076] S50511, For each customer service sentence vector in each customer service sentence vector group, encode the customer service sentence vector according to the first preset encoding method to obtain the encoding vector corresponding to the customer service sentence vector;

[0077] S50512, For each customer service sentence vector group, perform the first fusion operation on the encoding vectors corresponding to each customer service sentence vector in the customer service sentence vector group to obtain the fusion vector corresponding to the customer service sentence vector group;

[0078] S50513, Encode the fusion vectors corresponding to the M customer service side sentence vector groups according to the second preset encoding method to obtain the first conversation vector.

[0079] like Figure 3 As shown, the first encoding sub-network may include a first encoding layer, a first fusion layer, and a second encoding layer. The server can use the first encoding layer to perform the operation in step S50511, the first fusion layer to perform the operation in step S50512, and the second encoding layer to perform the operation in step S50513. The first encoding layer can realize sentence-level encoding processing; then, the first fusion layer can fuse the encoded vectors of all sentence levels involved in each round of conversation to obtain the conversation-level vector representation ASentEmb; and then the second encoding layer can be used to perform conversation-level encoding processing to obtain the first conversation vector.

[0080] S5053, the second coding sub-network is used to encode and fuse the M user-side sentence vector groups to obtain the second session vector.

[0081] The second conversation vector represents the user-side conversation information in M ​​rounds of conversation. The second encoding sub-network can encode the user-side sentence vectors in each user-side sentence vector group to obtain the encoded vector corresponding to each user-side sentence vector; then, the encoded vectors corresponding to each user-side sentence vector are directly fused to obtain the second conversation vector. Encoding the user-side sentence vectors is also a sentence-level encoding process. However, each round of conversation has a different conversation representation. If the sentence-level encoding results are directly fused, the conversation-level features may be ignored.

[0082] Based on the above description, in one possible implementation, such as Figure 8 As shown, step S5053, in a specific implementation, may include:

[0083] S50531, For each user-side sentence vector in each user-side sentence vector group, the user-side sentence vector is encoded according to the third preset encoding method to obtain the encoded vector corresponding to the user-side sentence vector;

[0084] S50532, For each user-side sentence vector group, perform a second fusion operation on the encoding vectors corresponding to each user-side sentence vector in the user-side sentence vector group to obtain the fusion vector corresponding to the user-side sentence vector group;

[0085] S50533, the fusion vectors corresponding to the M user-side sentence vector groups are encoded according to the fourth preset encoding method to obtain the second session vector.

[0086] Continue as Figure 3 As shown, the second encoding sub-network may include a third encoding layer, a second fusion layer, and a fourth encoding layer. The server can use the third encoding layer to perform the operation in step S50531, the second fusion layer to perform the operation in step S50532, and the fourth encoding layer to perform the operation in step S50533. The third encoding layer can perform sentence-level encoding processing; then, the second fusion layer can fuse the encoded vectors of all sentence levels involved in each round of conversation to obtain the conversation-level vector representation USentEmb; finally, the fourth encoding layer can perform conversation-level encoding processing to obtain the second conversation vector.

[0087] It should be noted that the encoders used in the first, second, third, and fourth encoding layers in the above embodiments can all include GRU, LSTM, CNN, LSTM_Attention, Transformer, CNN_LSTM, CNN_Topk, SRU, SWEM, etc., and the encoders used in the first, second, third, and fourth encoding layers can be the same or different. Furthermore, the first fusion operation used in the first fusion layer and the second fusion operation used in the second fusion layer can include Self_Attention, Label_Attention, LEAM, MEAN, Routing_Attention, etc., and the first and second fusion operations can be the same or different.

[0088] S5055 uses a session expression subnetwork to fuse the first session vector and the second session vector to obtain the target session vector.

[0089] The target session vector represents the session information of M rounds of conversation. Fusion processing refers to the fusion operation of the first session vector and the second session vector. The session representation subnetwork can directly fuse the first session vector and the second session vector to obtain the target session vector, or it can first process the first session vector and the second session vector interactively before fusing them to obtain the target session vector.

[0090] like Figure 9 As shown, in one possible implementation, step S5055 may include:

[0091] S50551, The first session vector and the second session vector are processed interactively according to the preset interaction method to obtain the third session vector;

[0092] S50552, perform a third fusion operation on the third session vector to obtain the target session vector.

[0093] Continue as Figure 3 As shown, the session expression subnetwork may include an interaction layer and a third fusion layer. The server can use the interaction layer to implement the operations in step S50551 and the third fusion layer to implement the operations in step S50552. The preset interaction methods in the interaction layer may include concat, Multi_Head_Attention, and esim, etc., and the third fusion operations used in the third fusion layer may include Self_Attention, Label_Attention, LEAM, MEAN, and Routing_Attention, etc.

[0094] S507, the feature vector group is input into the feature encoding network, and the target feature vector is obtained through the feature encoding network. The feature encoding network and the text encoding network use different network models.

[0095] In this embodiment, since the feature vector group is composed of feature vectors corresponding to some numerical features and feature vectors corresponding to background features, the network model used by the feature encoding network is different from the network model used by the text encoding network. For example, the feature encoding network can use network models such as DCN, MLP, NFM, and DeepFM.

[0096] S509, determine the quality inspection result of the target service based on the target session vector and the target feature vector.

[0097] The server uses a combined classification layer to concatenate or pool the target session vector and the target feature vector to obtain the target classification vector. Then, the target classification vector is classified to determine the quality inspection result of the target service.

[0098] Therefore, in one possible implementation, such as Figure 10 As shown, step S509 may include the following in a specific implementation:

[0099] S5091, concatenate the target session vector and the target feature vector into a target classification vector;

[0100] S5092 calls the target classification network to perform binary or multi-class classification on the target classification vector to obtain the quality inspection result of the target service.

[0101] Reference Figure 4 As shown, the combined classification layer may include a feature combination layer and a classification layer. The feature combination layer can implement the operation of step S5091, and the classification layer can implement the operation of step S5092. The target classification network used in the classification layer can be a binary classification network or a multi-classification network. Correspondingly, the quality inspection result can be a binary classification result or a multi-classification result, with different classification results representing different service levels. For example, a binary classification result can be "like" or "dislike," and a multi-classification result can be "very good," "good," "average," "poor," or "bad," etc. Understandably, the binary or multi-classification result can also be any letter or number, and the service level can be determined based on the letter or number.

[0102] In practical applications, when users and customer service representatives engage in conversations, users can directly input conversational sentences or input voice messages. After the conversation ends, if each round of conversation includes voice messages, the client can convert the voice messages into conversational sentences and then send all conversational sentences from each round to the server. In some implementations, the client may not process the voice messages, and the server can convert the voice messages into conversational sentences.

[0103] Based on the above description, in one possible implementation, such as Figure 11 As shown, step S501 may further include the following before implementation:

[0104] S1101, Obtain session data for each session in the M rounds of sessions related to the target service;

[0105] S1102, For the conversation data of each round of conversation, the conversation speech in the conversation data is recognized by a preset speech recognition model in order to determine the sentence vector group of each round of conversation.

[0106] The session data in each round of the session may include session sentences and / or session voice. For the session data of each round of the session, the server can determine the session sentence group corresponding to that round of the session based on the session data, and then represent each session sentence in the session sentence group as a vector to obtain the sentence vector corresponding to that session sentence. The sentence vector group corresponding to all the session sentences constitutes the sentence vector group of that round of the session.

[0107] To determine the conversation sentence group, the server first identifies whether the conversation data contains conversational speech. If it does, it uses a preset speech recognition model to convert the speech into conversational sentences. Then, it identifies whether the conversation data contains conversational sentences. If so, the converted conversational sentences and the conversational sentences in the conversation data are identified as the conversation sentence group corresponding to this round of conversation. If the conversation data does not contain conversational speech, meaning this round of dialogue only contains conversational sentences, then the conversational sentences in the conversation data are identified as the conversation sentence group corresponding to this round of conversation. Similarly, if the conversation data does not contain conversational sentences, meaning this round of dialogue only contains conversational speech, then the converted conversational sentences are identified as the conversation sentence group corresponding to this round of conversation.

[0108] In the above embodiments, the service quality inspection model used by the server utilizes different encoding networks to fuse session text and session features, thereby obtaining the target service quality inspection result. To demonstrate that the quality inspection result obtained by fusing different encoding networks has higher accuracy, the performance of service quality inspection models constructed using different combinations of encoding networks was evaluated for the same business data. Table 1 shows the performance evaluation results of the service quality inspection models.

[0109] Table 1

[0110]

[0111] Among them, Precision represents the model's accuracy; Recall represents the model's recall; and the F1 score represents the harmonic mean of the model's accuracy and recall, which is equivalent to a comprehensive evaluation index of accuracy and recall. As can be seen from Table 1, the service quality inspection model can achieve higher accuracy by fusing multiple features such as numerical features, multi-turn text features, and background features, and combining them with different encoding networks.

[0112] As shown in the table above, when quality inspection is performed solely based on extracted features without using multi-turn text content, i.e., using only the feature encoding network, the F1 score is the lowest, meaning that quality inspection is performed using only the DCN via the feature encoding network. However, the service quality inspection model, which combines multi-turn text with features and uses a multi-network approach, consistently achieves higher F1 scores than the model that only uses the feature encoding network, with an accuracy rate exceeding 80%. This is not only due to the effect of multi-model fusion but also because HAN's layered architecture is more suitable for hierarchical applications like conversations.

[0113] Because this service quality inspection model has a high accuracy rate, it can greatly reduce the pressure of manual quality inspection when inspecting target services. In practice, the model can be used for initial screening, followed by re-inspection by quality inspectors to achieve 100% quality inspection coverage of the target service.

[0114] As can be seen from the technical solutions provided in the above embodiments, the embodiments of this application utilize a text encoding network to process sentence vector groups to obtain target session vectors, and utilize a feature encoding network, which is different from the text encoding network, to process feature vector groups to obtain target feature vectors. Based on the target session vectors and target feature vectors, the quality inspection results of the target service are determined. A multi-encoding network is used to process the session text (sentence vector groups) and session features (feature vector groups), and the dialogue text is divided into user-side and customer service-side. Through multi-dimensional information input, the service quality inspection model can learn more and finer features from the M rounds of conversation between the user and customer service, thereby obtaining a more explicit classification result. Furthermore, the text encoding network is built based on a hierarchical attention network. The hierarchical architecture of the hierarchical attention network can better encode hierarchical data such as session text, resulting in better classification performance and improved quality inspection accuracy. Simultaneously, both the sentence vector groups and feature vector groups are obtained from multiple rounds of conversation related to the target service, and are not limited by subjective data such as user feedback data or special business requirements, thus improving usability.

[0115] Based on the same inventive concept as the above-described method embodiments, this application also provides a service quality inspection device, referring to... Figure 12 As shown, the device 1200 may include:

[0116] The text vector acquisition module 1210 is used to acquire the sentence vector group of each session in the M rounds of sessions related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1.

[0117] The feature vector acquisition module 1220 is used to acquire the feature vector group obtained by extracting features from the session data of M rounds of sessions;

[0118] The text encoding module 1230 is used to input M customer service-side sentence vector groups and M user-side sentence vector groups into the text encoding network, and obtain the target session vector through the text encoding network. The text encoding network is pre-constructed based on a hierarchical attention network.

[0119] The feature encoding module 1240 is used to input M feature vector groups into the feature encoding network and obtain the target feature vector through the feature encoding network. The feature encoding network and the text encoding network use different network models.

[0120] The service quality inspection module 1250 is used to determine the quality inspection result of the target service based on the target session vector and the target feature vector.

[0121] In one possible implementation, the text encoding network may include a first encoding subnetwork, a second encoding subnetwork, and a session representation subnetwork. Correspondingly, the text encoding module 1230 may include:

[0122] The customer service side encoding unit is used to perform encoding and fusion processing on M customer service side sentence vector groups using the first encoding sub-network to obtain the first conversation vector;

[0123] The user-side coding unit is used to perform coding fusion processing on M user-side sentence vector groups using the second coding sub-network to obtain the second session vector;

[0124] The target vector generation unit is used to fuse the first session vector and the second session vector through the session expression subnetwork to obtain the target session vector.

[0125] In one possible implementation, the customer service-side coding unit may include:

[0126] The first encoding unit is used to encode each customer service sentence vector in each customer service sentence vector group according to a first preset encoding method to obtain the encoding vector corresponding to the customer service sentence vector.

[0127] The first fusion unit is used to perform a first fusion operation on the encoding vectors corresponding to each customer service sentence vector in each customer service sentence vector group to obtain the fusion vector corresponding to the customer service sentence vector group.

[0128] The second encoding unit is used to encode the fusion vectors corresponding to the M customer service side sentence vector groups according to the second preset encoding method to obtain the first conversation vector.

[0129] In one possible implementation, the user-side encoding unit may include:

[0130] The third encoding unit is used to encode each user-side sentence vector in each user-side sentence vector group according to the third preset encoding method to obtain the encoding vector corresponding to the user-side sentence vector.

[0131] The second fusion unit is used to perform a second fusion operation on the encoding vectors corresponding to each user-side sentence vector in each user-side sentence vector group to obtain the fusion vector corresponding to the user-side sentence vector group.

[0132] The fourth encoding unit is used to encode the fusion vectors corresponding to the M user-side sentence vector groups according to the fourth preset encoding method to obtain the second session vector.

[0133] In one possible implementation, the target vector generation unit may include:

[0134] An interaction unit is used to process the first session vector and the second session vector in a preset interaction manner to obtain a third session vector.

[0135] The third fusion unit is used to perform a third fusion operation on the third session vector to obtain the target session vector.

[0136] In one possible implementation, the service quality inspection module 1250 may include:

[0137] The vector combination unit is used to concatenate the target session vector and the target feature vector into a target classification vector.

[0138] The classification unit is used to call the target classification network to perform binary or multi-class classification on the target classification vector to obtain the quality inspection result of the target service.

[0139] In one possible implementation, such as Figure 13 As shown, the device 1200 may further include:

[0140] The session data acquisition module 1260 is used to acquire session data for each round of sessions in the M rounds of sessions related to the target service;

[0141] The text vector generation module 1270 is used to recognize the conversational speech in the conversational data of each round of conversation using a preset speech recognition model, so as to determine the sentence vector group of each round of conversation.

[0142] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0143] This application also provides an electronic device, which includes a processor and a memory. The memory stores at least one instruction or at least one program, which is loaded by the processor and executed to perform the service quality inspection method provided in the above-described method embodiments.

[0144] Furthermore, Figure 14 A schematic diagram of a hardware structure for implementing the method provided in the embodiments of this application is shown. This device can participate in or include the apparatus or system provided in the embodiments of this application. Figure 14 As shown, device 14 may include one or more processors 1402 (shown as 1402a, 1402b, ..., 1042n in the figure) (processor 1402 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 1404 for storing data, and a transmission device 1406 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of the I / O interface), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 14 The structure shown is for illustrative purposes only and does not limit the structure of the electronic device described above. For example, device 14 may also include a... Figure 14 The more or fewer components shown, or having the same Figure 14 The different configurations shown.

[0145] It should be noted that the aforementioned one or more processors 1402 and / or other data processing circuitry are generally referred to herein as "data processing circuitry". This data processing circuitry may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuitry may be a single, independent processing module, or may be integrated, in whole or in part, into any other element within device 14 (or mobile device). As involved in the embodiments of this application, this data processing circuitry serves as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).

[0146] The memory 1404 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the method described in the embodiments of this application. The processor 1402 executes various functional applications and data processing by running the software programs and modules stored in the memory 1404, thereby realizing the above-mentioned service quality inspection method. The memory 1404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 1404 may further include memory remotely located relative to the processor 1402, and these remote memories can be connected to the device 14 via a network. Examples of the above-mentioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0147] The transmission device 1406 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of device 14. In one example, the transmission device 1406 includes a network interface controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 1406 may be a radio frequency (RF) module used for wireless communication with the Internet.

[0148] The display may be, for example, a touchscreen liquid crystal display (LCD) that allows a user to interact with the user interface of device 14 (or a mobile device).

[0149] This application also provides a computer storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement the service quality inspection method provided in the above method embodiments.

[0150] Optionally, in this embodiment, the aforementioned computer storage medium may be located at at least one of the multiple network servers in a computer network. Optionally, in this embodiment, the aforementioned storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0151] This application also provides a computer program product or computer program, which includes computer instructions stored in a computer storage medium. The processor of an electronic device reads the computer instructions from the computer storage medium and executes the computer instructions, causing the electronic device to perform the service quality inspection method provided in the above-described method embodiments.

[0152] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, specific embodiments have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired result. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0153] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device and electronic device embodiments are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0154] The foregoing description has fully disclosed the specific embodiments of this application. It should be noted that any modifications made by those skilled in the art to the specific embodiments of this application do not depart from the scope of the claims. Accordingly, the scope of the claims of this application is not limited to the foregoing specific embodiments.

Claims

1. A service quality inspection method, characterized in that, The method includes: Obtain the sentence vector group for each round of conversation in M ​​rounds of conversation related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1. Obtain a feature vector group obtained by performing feature extraction on the M rounds of conversation. The feature vector group includes feature vectors composed of background features, which represent the background information of the customer service representative, the background information of the user, or the attribute information of the conversation business. A first session vector is obtained by encoding and fusing M customer service-side sentence vector groups using a first encoding subnetwork; a second session vector is obtained by encoding and fusing M user-side sentence vector groups using a second encoding subnetwork; and a target session vector is obtained by fusing the first session vector and the second session vector using a session expression subnetwork to indicate features of the session level. The first encoding subnetwork, the second encoding subnetwork, and the session expression subnetwork belong to a text encoding network; wherein the text encoding network is pre-constructed based on a hierarchical attention network. The feature vector set is input into a feature encoding network to obtain the target feature vector. The feature encoding network uses a different network model than the text encoding network. The quality inspection result of the target service is determined based on the target session vector and the target feature vector.

2. The method according to claim 1, characterized in that, The first session vector is obtained by encoding and fusing the M customer service-side sentence vector groups using the first encoding sub-network, including: For each customer service sentence vector in each customer service sentence vector group, the customer service sentence vector is encoded according to the first preset encoding method to obtain the encoded vector corresponding to the customer service sentence vector; For each customer service sentence vector group, the encoding vectors corresponding to each customer service sentence vector in the customer service sentence vector group are subjected to a first fusion operation to obtain the fusion vector corresponding to the customer service sentence vector group. The first conversation vector is obtained by encoding the fusion vectors corresponding to the M customer service-side sentence vector groups according to the second preset encoding method.

3. The method according to claim 1, characterized in that, The second session vector is obtained by encoding and fusing the M user-side sentence vector groups using a second coding subnetwork, including: For each user-side sentence vector in each user-side sentence vector group, the user-side sentence vector is encoded according to the third preset encoding method to obtain the encoded vector corresponding to the user-side sentence vector; For each user-side sentence vector group, the encoding vectors corresponding to each user-side sentence vector in the user-side sentence vector group are subjected to a second fusion operation to obtain the fusion vector corresponding to the user-side sentence vector group. The fusion vectors corresponding to the M user-side sentence vector groups are encoded according to the fourth preset encoding method to obtain the second session vector.

4. The method according to claim 1, characterized in that, Determining the quality inspection result of the target service based on the target session vector and the target feature vector includes: The target session vector and the target feature vector are concatenated to form a target classification vector; The target classification network is invoked to perform binary or multi-class classification on the target classification vector to obtain the quality inspection result of the target service.

5. The method according to claim 1, characterized in that, Before obtaining the sentence vector group for each round of sessions in the M rounds of sessions related to the target service, the method further includes: Obtain session data for each round of sessions in the M rounds of sessions related to the target service; For each round of conversation data, a preset speech recognition model is used to recognize the conversation speech in the conversation data in order to determine the sentence vector group of each round of conversation.

6. A service quality inspection device, characterized in that, The device includes: The text vector acquisition module is used to acquire the sentence vector group of each session in M ​​rounds of sessions related to the target service. The sentence vector group includes the customer service side sentence vector group and the user side sentence vector group, where M is a natural number greater than or equal to 1. The feature vector acquisition module is used to acquire the feature vector group obtained by feature extraction of the M rounds of conversation. The feature vector group includes feature vectors composed of background features, which represent the background information of the customer service representative, the background information of the user, or the attribute information of the conversation business. The text encoding module specifically includes: a customer service-side encoding unit, used to perform encoding fusion processing on M customer service-side sentence vector groups using a first encoding subnetwork to obtain a first conversation vector; a user-side encoding unit, used to perform encoding fusion processing on M user-side sentence vector groups using a second encoding subnetwork to obtain a second conversation vector; and a target vector generation unit, used to fuse the first conversation vector and the second conversation vector through a conversation expression subnetwork to obtain a target conversation vector used to indicate features at the conversation level. The first encoding subnetwork, the second encoding subnetwork, and the conversation expression subnetwork belong to a text encoding network; wherein, the text encoding network is pre-constructed based on a hierarchical attention network. The feature encoding module is used to input the feature vector group into the feature encoding network and obtain the target feature vector through the feature encoding network. The feature encoding network and the text encoding network use different network models. The service quality inspection module is used to determine the quality inspection result of the target service based on the target session vector and the target feature vector.

7. The apparatus according to claim 6, characterized in that, The customer service-side coding unit includes: The first encoding unit is used to encode each customer service sentence vector in each customer service sentence vector group according to a first preset encoding method to obtain the encoding vector corresponding to the customer service sentence vector. The first fusion unit is used to perform a first fusion operation on the encoding vectors corresponding to each customer service sentence vector in each customer service sentence vector group to obtain the fusion vector corresponding to the customer service sentence vector group. The second encoding unit is used to encode the fusion vectors corresponding to the M customer service side sentence vector groups according to the second preset encoding method to obtain the first conversation vector.

8. The apparatus according to claim 6, characterized in that, The user-side encoding unit includes: The third encoding unit is used to encode each user-side sentence vector in each user-side sentence vector group according to the third preset encoding method to obtain the encoding vector corresponding to the user-side sentence vector. The second fusion unit is used to perform a second fusion operation on the encoding vectors corresponding to each user-side sentence vector in each user-side sentence vector group to obtain the fusion vector corresponding to the user-side sentence vector group. The fourth encoding unit is used to encode the fusion vectors corresponding to the M user-side sentence vector groups according to the fourth preset encoding method to obtain the second session vector.

9. The apparatus according to claim 6, characterized in that, The target vector generation unit includes: An interaction unit is used to process the first session vector and the second session vector in a preset interaction manner to obtain a third session vector. The third fusion unit is used to perform a third fusion operation on the third session vector to obtain the target session vector.

10. The apparatus according to claim 6, characterized in that, The service quality inspection module includes: The vector combination unit is used to concatenate the target session vector and the target feature vector into a target classification vector. The classification unit is used to call the target classification network to perform binary or multi-class classification on the target classification vector to obtain the quality inspection result of the target service.

11. The apparatus according to claim 6, characterized in that, The device further includes: The session data acquisition module is used to acquire session data for each round of sessions in the M rounds of sessions related to the target service; The text vector generation module is used to identify the conversational speech in the conversational data of each round of conversation using a preset speech recognition model, so as to determine the sentence vector group of each round of conversation.

12. An electronic device, characterized in that, The device includes a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or at least one program being loaded by the processor and executed as the service quality inspection method as described in any one of claims 1-5.

13. A computer storage medium, characterized in that, The computer storage medium stores at least one instruction or at least one program, which is loaded and executed by a processor to implement the service quality inspection method as described in any one of claims 1-5.

14. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer storage medium. The processor of the electronic device reads the computer instructions from the computer storage medium and executes the computer instructions, causing the electronic device to perform the service quality inspection method according to any one of claims 1-5.