A work order classification model training method and device, electronic equipment and storage medium
By employing screening and data augmentation methods, the long-tail problem of training data in the customer service field was solved, improving the accuracy of the work order classification model and ensuring that the model can comprehensively learn the characteristics of various work orders, thus avoiding overfitting.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2022-05-16
- Publication Date
- 2026-06-26
AI Technical Summary
In the customer service field, existing technologies suffer from the long-tail problem of training data, resulting in poor model performance, inability to effectively learn the characteristics of various types of work orders, and a tendency to overfit, which affects the accuracy of the model.
By selecting reference work order samples that meet certain conditions and using preset data augmentation strategies to augment the dialogue text information, including methods such as synonym replacement, paragraph crossing, and information mask reconstruction, the number of work order samples is expanded and the training data is balanced.
It effectively solves the long-tail problem in training data, improves the accuracy of the model, enables the model to fully learn the characteristics of each type of work order, avoids overfitting, and improves the accuracy of work order classification.
Smart Images

Figure CN115329068B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more particularly to the field of artificial intelligence technology, and provides a method, apparatus, electronic device, and storage medium for training a work order classification model. Background Technology
[0002] In the customer service field, scenarios such as archiving, recommending similar work orders, and classifying multi-turn dialogue intent all require customer service work orders as raw data for model training.
[0003] Taking the intelligent archiving model of work orders in the customer service system as an example, in the relevant archiving system, a large number of manually archived historical service work orders are mainly used as the original training data. In this process, the training data is mainly long dialogues with many rounds, fixed context and clear demands, such as customer service work orders. Multi-turn dialogue models can be used as the original models for multi-classification task training.
[0004] Because the scenario involves complex business processes and the archive directory changes frequently, thousands of archive paths are generated during the application process, making the learning process too difficult. Furthermore, the usage rates of different archive paths vary greatly, resulting in a serious long-tail problem (meaning that a small number of categories occupy the vast majority of samples, while a large number of categories have only a small number of samples), which causes the model performance to fall short of expectations.
[0005] Therefore, how to address the impact of long-tail problems in training data on model training and improve model accuracy is an urgent issue to be addressed. Summary of the Invention
[0006] This application provides a work order classification model training method, apparatus, electronic device, and storage medium to improve the accuracy of the model.
[0007] This application provides a method for training a work order classification model, including:
[0008] Obtain a sample set of work orders. Each work order sample includes: the category tag of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. The dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order.
[0009] Based on the category labels of each work order sample, at least one reference work order sample to be expanded is selected from the work order sample set.
[0010] Based on a preset data augmentation strategy, data augmentation is performed on the dialogue text information in at least one reference work order sample to obtain the corresponding extended work order sample. The preset data augmentation strategy is used to indicate that non-critical information is replaced in the dialogue text information. The non-critical information is information that does not change the semantics of the dialogue text information before and after the replacement.
[0011] The model is trained based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model. The work order classification model is used to determine the work order category to which the customer service work order to be classified belongs.
[0012] This application provides a work order classification model training device, comprising:
[0013] The acquisition unit is used to acquire a work order sample set. Each work order sample includes: the category tag of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. The dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order.
[0014] A filtering unit is used to filter out at least one reference work order sample to be expanded from the work order sample set based on the category label of each work order sample.
[0015] An augmentation unit is used to augment the dialogue text information in at least one reference work order sample based on a preset data augmentation strategy to obtain corresponding extended work order samples. The preset data augmentation strategy is used to instruct: to replace non-critical information in the dialogue text information. The non-critical information is information whose semantics of the dialogue text information are not changed before and after the replacement.
[0016] The training unit is used to train the model based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model. The work order classification model is used to determine the work order category to which the customer service work order to be classified belongs.
[0017] Optionally, the preset data augmentation strategy includes at least one of the following:
[0018] Synonym replacement strategy for replacing non-critical information in dialogue text;
[0019] Paragraph crossing strategies for non-critical information crossing in dialogue text;
[0020] Information masking reconstruction strategy used to mask and reconstruct non-critical information in dialogue text.
[0021] Optionally, the preset data augmentation strategy includes a synonym replacement strategy;
[0022] The augmentation unit is specifically used to perform the following operations on some or all of the at least one reference work order sample:
[0023] For a reference work order sample, based on a preset thesaurus, at least one business-related word in the dialogue text information of the reference work order sample is replaced with a synonym to obtain the corresponding extended work order sample.
[0024] Optionally, the preset data augmentation strategy includes a paragraph crossing strategy;
[0025] The augmentation unit is specifically used to perform the following operations on some or all of the at least one reference work order sample:
[0026] For two reference work order samples with the same category label, the dialogue text information of the two reference work order samples belonging to the same dialogue publisher is cross-referenced to obtain the corresponding extended work order sample; the dialogue publisher is the business processing object or the business service object.
[0027] Optionally, the dialogue text information includes: at least one round of dialogue text between the business processing object and the business service object;
[0028] The augmentation unit is specifically used to perform dialogue crossover in at least one of the following ways:
[0029] In the two reference work order samples, the dialogue texts published by the same dialogue publisher in the same number of dialogue rounds are exchanged in parallel.
[0030] Randomly swap the dialogue texts published by the same dialogue publisher in different dialogue rounds in the two reference work order samples.
[0031] The dialogue text from one of the two reference work order samples is randomly inserted into the dialogue text published by the same dialogue publisher in the other reference work order sample.
[0032] Optionally, the preset data augmentation strategy includes an information mask reconstruction strategy;
[0033] The augmentation unit is specifically used to perform the following operations on some or all of the at least one reference work order sample:
[0034] For a reference work order sample, the word vectors of each word segment in the dialogue text information of the reference work order sample are obtained through word vector mapping;
[0035] Based on the word vectors of each segmented word, determine the mask probability of each target information in the dialogue text information, where the target information is a segmented word or dialogue text;
[0036] Based on the mask probability of each target information, at least one target information in the dialogue text information is masked and reconstructed to obtain the corresponding extended work order sample.
[0037] Optionally, the augmentation unit is specifically used for:
[0038] Based on the word vectors of each word segment, the significance coefficient of each target information is determined. The significance coefficient is used to characterize the importance of the target information to the work order classification result.
[0039] Each mask probability is determined based on a significance coefficient, and the mask probability is inversely proportional to the corresponding significance coefficient.
[0040] Optionally, the augmentation unit is specifically used for:
[0041] Based on the classification probability of the reference work order sample and the information vector of each target information, the significance coefficient corresponding to each target information is determined respectively; the classification probability is obtained based on the prediction of the work order classification model.
[0042] Wherein, if the target information is word segmentation, the information vector is word vector; if the target information is dialogue text, the information vector is sentence vector determined based on the word vectors of each word segmentation in the dialogue text.
[0043] Optionally, the augmentation unit is further configured to:
[0044] After determining the significance coefficient of each target information based on the word vectors of each word segment, the corresponding significance coefficient covariance matrix is determined based on the significance coefficient of each target information.
[0045] Based on the significance coefficient covariance matrix, the updated significance coefficients corresponding to each target information are determined;
[0046] The augmentation unit is specifically used for:
[0047] Each mask probability is determined based on the updated significance coefficient, and the mask probability is inversely proportional to the corresponding updated significance coefficient.
[0048] Optionally, the augmentation unit is specifically used to perform the following operations for each piece of target information:
[0049] For a target information, the information vector of the target information is modified multiple times based on the significance coefficient covariance matrix;
[0050] Based on the classification probability of the reference work order sample and the corrected information vectors, the intermediate significance coefficients corresponding to the target information are determined respectively.
[0051] The mean of the intermediate significance coefficients is used as the updated significance coefficient corresponding to the target information; the classification probability is obtained based on the work order classification model prediction.
[0052] Optionally, the augmentation unit is specifically used for:
[0053] Multiple Gaussian noises corresponding to a single target information are obtained by using a Gaussian distribution determined based on the significance coefficient covariance matrix; the variance of the Gaussian distribution is the sum of the diagonal elements of the significance coefficient covariance matrix.
[0054] The information vector of the target information is corrected once based on each Gaussian noise.
[0055] Optionally, the augmentation unit is specifically used for:
[0056] If the target information is dialogue text, then the significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each dialogue text in the dialogue text information.
[0057] If the target information is word segmentation, then the significance coefficient covariance matrix corresponds one-to-one with the dialogue text in the dialogue text information. Each significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each word segment in the corresponding dialogue text.
[0058] Optionally, the augmentation unit is further configured to determine the information vector of the target information in the following ways:
[0059] If the target information is dialogue text, then the information vector is a sentence vector determined based on the word vectors of each segment in the dialogue text;
[0060] If the target information is word segmentation, then the information vector is a word vector.
[0061] Optionally, the augmentation unit is further configured to:
[0062] In the dialogue text information of the reference work order sample obtained through word vector mapping, after obtaining the word vectors of each word segment, the word vectors of each word segment are weighted by attention through an attention mechanism to obtain the updated word vectors of each word segment.
[0063] The step of determining the mask probability of each target information in the dialogue text information based on the word vectors of each segmented word includes:
[0064] Based on the updated word vectors of each segmented word, the mask probability of each target information in the dialogue text information is determined.
[0065] Optionally, the filtering unit is specifically used for:
[0066] Based on the category labels of each work order sample, determine the number of work order samples of each category in the work order sample set;
[0067] At least one work order sample corresponding to a category whose quantity is lower than a preset threshold is used as the reference work order sample.
[0068] An electronic device provided in this application includes a processor and a memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor performs the steps of any of the above-described work order classification model training methods.
[0069] This application provides a computer-readable storage medium including a computer program. When the computer program is run on an electronic device, the computer program is used to cause the electronic device to perform the steps of any of the above-described work order classification model training methods.
[0070] This application provides a computer program product, which includes a computer program stored in a computer-readable storage medium. When the processor of an electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, causing the electronic device to perform the steps of any of the above-described work order classification model training methods.
[0071] The beneficial effects of this application are as follows:
[0072] This application provides a method, apparatus, electronic device, and storage medium for training a work order classification model. Because this application proposes an augmentation method applicable to customer service work orders, samples are selected based on the category to which each customer service work order belongs. Based on this method, reference work order samples whose categories meet certain conditions can be selected from the initial sample set. Then, based on a preset data augmentation strategy, the dialogue text information in the reference work order samples is augmented, ensuring that the dialogue text information of the extended work order samples obtained through data augmentation has the same semantics as the dialogue text information in the corresponding reference work order samples. This allows for the expansion of work order samples for some categories without changing the semantics of the customer service dialogue recorded in the work order. Based on this method, by expanding the work order samples, the number of work order samples of each category can be balanced. Furthermore, by training the model based on each work order sample and the obtained extended work order samples, the impact of the long-tail problem in the training data on model training can be effectively solved, allowing the model to fully learn the characteristics of various categories of work orders, thereby avoiding overfitting and improving the model's accuracy.
[0073] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims, and drawings. Attached Figure Description
[0074] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0075] Figure 1 This is an optional schematic diagram of an application scenario in an embodiment of this application;
[0076] Figure 2 This is a flowchart illustrating the implementation of a work order classification model training method in an embodiment of this application.
[0077] Figure 3 This is a schematic diagram of a dialogue text message in an embodiment of this application;
[0078] Figure 4 This is a schematic diagram of a sample screening method in an embodiment of this application;
[0079] Figure 5 This is a schematic diagram of a synonym replacement in an embodiment of this application;
[0080] Figure 6A This is a schematic diagram of a parallel paragraph swapping embodiment in this application;
[0081] Figure 6B This is a schematic diagram of a random paragraph swapping example in an embodiment of this application;
[0082] Figure 6C This is a schematic diagram illustrating a random paragraph insertion in an embodiment of this application;
[0083] Figure 7 This is a schematic diagram of a method for superimposing parallel paragraph swapping and random paragraph replacement in an embodiment of this application;
[0084] Figure 8 This is a flowchart illustrating a data augmentation method according to an embodiment of this application;
[0085] Figure 9 This is a logical diagram illustrating information mask reconstruction in an embodiment of this application;
[0086] Figure 10 This is a flowchart illustrating a method for calculating mask probability in an embodiment of this application;
[0087] Figure 11 This is a schematic diagram of an overall model structure in an embodiment of this application;
[0088] Figure 12 This is a timing flowchart of a work order classification method in an embodiment of this application;
[0089] Figure 13 This is a logical schematic diagram of a work order classification method in an embodiment of this application;
[0090] Figure 14 This is a schematic diagram of the composition structure of a work order model training device in an embodiment of this application;
[0091] Figure 15 This is a schematic diagram of the hardware structure of an electronic device using an embodiment of this application;
[0092] Figure 16 This is a schematic diagram of the hardware structure of another electronic device using an embodiment of this application. Detailed Implementation
[0093] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this application. Obviously, the described embodiments are only some embodiments of the technical solutions of this application, and not all embodiments. Based on the embodiments recorded in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the technical solutions of this application.
[0094] The following describes some of the concepts involved in the embodiments of this application.
[0095] Archiving: Customer service representatives need to categorize each service ticket according to its business issue. This process is called archiving, which involves classifying customer service tickets into corresponding directories based on user requests and business processing flows. This application provides an intelligent archiving method, which uses a model to predict the current content for archiving selection, unlike manual searching and selection.
[0096] Work order: Literally, a work order is a document that records transactions between a customer service representative and the recipient of their service. A work order can be independent or part of a larger project, and sub-work orders can be defined for each work order. In this embodiment, the customer service work order (customer service work order) is used as an example. A customer service work order mainly refers to the dialogue records between a customer service representative and the recipient they are serving.
[0097] Dialogue text information: refers to the dialogue text between a business processing object and the business service object it serves. For example, in a customer service conversation to resolve a user's problem, the customer service representative and the user may have one or even multiple rounds of dialogue. After these dialogue messages are converted into text format, the resulting text information can be used as dialogue text information.
[0098] Business processing object: One party in the session involved in the work order. In this embodiment of the application, it refers to the party that is mainly used to solve business problems during the session, such as customer service.
[0099] Business service recipient: The other party in the session involved in the work order. In this embodiment of the application, it refers to the party that is mainly used to provide feedback on business issues during the session, such as the recipient of customer service.
[0100] Data augmentation is a commonly used technique in deep learning, primarily used to increase the training dataset, making it as diverse as possible and enabling the trained model to have stronger generalization ability. In this embodiment, it mainly refers to: using a preset data augmentation strategy to augment the dialogue text information in the reference work order sample to be expanded, obtaining dialogue text information without changing its semantics, and generating new expanded work order samples based on the dialogue text information obtained through data augmentation, thereby expanding the training dataset.
[0101] Pre-defined data augmentation strategy: This refers to the strategy pre-set in this application for data augmentation. This strategy instructs the replacement of non-critical information in the dialogue text. Non-critical information refers to information whose semantics remain unchanged before and after replacement. Specifically, based on the different methods of replacing non-critical information, it can be categorized as: a synonym replacement strategy for replacing non-critical information in the dialogue text with synonyms; a paragraph crossing strategy for crossing non-critical information in the dialogue text; and an information mask reconstruction strategy for mask reconstruction of non-critical information in the dialogue text.
[0102] Significance coefficient: Significance refers to the level of risk incurred in rejecting the null hypothesis when it is true; it is also called the probability level. In the embodiments of this application, the significance coefficient is a coefficient used to characterize the importance of a target information to the work order classification result. The larger the significance coefficient, the more important the information is, and the less likely it is to be reconstructed by masking to ensure semantic invariance.
[0103] Covariance matrix: In statistics and probability theory, a covariance matrix is a matrix in which each element is the covariance between the elements of each vector. In the embodiments of this application, the significance coefficients corresponding to each target information can be regarded as vector elements, and the covariance matrix is constructed by calculating the covariance between the significance coefficients corresponding to the target information.
[0104] The embodiments of this application relate to artificial intelligence (AI), natural language processing (NLP), and machine learning (ML) technologies, and are designed based on computer vision technology and machine learning in artificial intelligence.
[0105] Artificial intelligence (AI) technology mainly includes computer vision, natural language processing, machine learning / deep learning, autonomous driving, and intelligent transportation. With the research and advancement of AI technology, it is being researched and applied in multiple fields, such as smart homes, intelligent customer service, virtual assistants, smart speakers, intelligent marketing, autonomous driving, robotics, and smart healthcare. It is believed that with technological development, AI will be applied in more fields and play an increasingly important role. The work order classification method in this application embodiment can be applied to the customer service field. By combining AI with customer service work order classification in the customer service field, efficient and accurate intelligent classification of customer service work orders can be achieved.
[0106] Furthermore, in the process of classifying customer service work orders, it is necessary to process customer information such as customer service conversations and customer service dialogue texts. The processing of text information in this information can be achieved by combining natural language processing technology.
[0107] Furthermore, the work order classification model in this application embodiment is trained using machine learning or deep learning techniques. After training the work order classification model using the aforementioned techniques, it can be applied to achieve intelligent classification of customer service work orders, thereby effectively improving the accuracy of work order classification.
[0108] The design concept of this application is briefly introduced below:
[0109] As the business continues to grow, customer service capabilities are also rapidly improving, with the number of service orders handled daily reaching hundreds of thousands. In the relevant intelligent archiving implementation solution, the aforementioned large volume of manually archived historical service orders is primarily used as the original training data, and a multi-turn dialogue model is used as the original model for multi-category training tasks.
[0110] However, many problems arise during actual training, such as: too many archival items (approximately several thousand classification targets), excessive learning difficulty, unreliable historical data, and a certain error rate. In practice, using too much historical data can actually lower the overall training accuracy, causing the model to fail to converge on certain high-error-rate archival items. Furthermore, since archival items are adjusted according to business rules, older data can introduce training noise due to differing rules. Therefore, model training must be conducted on a limited, high-quality dataset.
[0111] Furthermore, high-quality data is limited. Due to the complex business system, there may be dozens or even hundreds of archived items under the same product, while only about 30% of the archived items are frequently used. This results in 40% of the archived items generating 80% of the work orders in the entire business, while the remaining 60% of the archived items have very few or even only a single-digit number of orders. This leads to an imbalance in the number of samples of various types and a very serious long-tail problem in the data. In the actual training process, it is very easy to overfit, resulting in extremely poor performance on the test set. Therefore, it is necessary to augment the training data.
[0112] Considering that customer service work orders often involve lengthy, multi-turn dialogues, traditional text augmentation methods are insufficient to effectively improve model performance. Furthermore, most industry methods are based on image and audio data, with limited application to long texts like customer service work orders. Therefore, this application proposes an augmentation method suitable for customer service work orders. Samples are selected based on the category of each work order. This method allows for the selection of reference work order samples that meet certain category criteria from the initial sample set. Then, based on a pre-defined data augmentation strategy, the dialogue text information in the reference work order samples is augmented. This expands the sample size for some work order categories without altering the semantics of the recorded customer service dialogue. By expanding the work order samples, this method balances the number of samples across different categories. Furthermore, training the model based on the individual work order samples and the expanded work order samples effectively addresses the impact of long-tail problems in the training data, enabling the model to fully learn the features of various work order categories, thus avoiding overfitting and improving model accuracy.
[0113] The preferred embodiments of this application are described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are for illustration and explanation only and are not intended to limit this application. Furthermore, the embodiments and features in the embodiments of this application can be combined with each other without conflict.
[0114] like Figure 1 The diagram shown is an application scenario illustration of an embodiment of this application. The application scenario diagram includes two terminal devices 110 and one server 120.
[0115] In this embodiment, the terminal device 110 includes, but is not limited to, mobile phones, tablets, laptops, desktop computers, e-book readers, smart voice interaction devices, smart home appliances, and in-vehicle terminals. The terminal device may have a work order-related client installed, which can be software (e.g., a browser, shopping software), a webpage, or a mini-program. The server 120 is the backend server corresponding to the software, webpage, or mini-program, or a server specifically used for work order classification or training of work order classification models; this application does not impose specific limitations. The server 120 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.
[0116] It should be noted that the work order classification model training method in the various embodiments of this application can be executed by an electronic device, which can be a terminal device 110 or a server 120. That is, the method can be executed by the terminal device 110 or the server 120 alone, or by both the terminal device 110 and the server 120. For example, when executed by both the terminal device 110 and the server 120, the terminal device 110 can obtain a work order sample set and, based on the category labels of each work order sample, select one or more reference work order samples to be expanded from the work order sample set. Then, based on a preset data augmentation strategy, the terminal device 110 performs data augmentation on the dialogue text information in each reference work order sample to obtain the corresponding expanded work order samples. Finally, after the server 120 obtains each work order sample and the expanded work order sample from the terminal device 110, it trains the work order classification model based on the training sample set constructed from these samples to obtain the trained work order classification model.
[0117] After the model is trained, it can be deployed directly on server 120 or on terminal device 110. Generally, it is deployed on server 120. Subsequently, server 120 can be used to classify customer service work orders to be classified and obtain the work order category to which the customer service work orders belong. Server 120 can also feed back the classification results to terminal device 110 for display.
[0118] In one alternative implementation, the terminal device 110 and the server 120 can communicate via a communication network.
[0119] In one alternative implementation, the communication network is a wired network or a wireless network.
[0120] It should be noted that, Figure 1 The examples shown are merely illustrative; in reality, the number of terminal devices and servers is unlimited and is not specifically limited in the embodiments of this application.
[0121] In this embodiment of the application, when there are multiple servers, the multiple servers can form a blockchain, and the servers are nodes on the blockchain; as in the work order classification model training method disclosed in this embodiment of the application, the work order sample data involved can be stored on the blockchain, such as the category label of customer service work orders, the corresponding dialogue text information, the corresponding classification probability, etc.
[0122] Furthermore, the embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, and assisted driving.
[0123] The following describes the work order classification model training method provided by the exemplary embodiments of this application in conjunction with the application scenarios described above and with reference to the accompanying drawings. It should be noted that the above application scenarios are only shown to facilitate understanding of the spirit and principles of this application, and the embodiments of this application are not limited in any way in this respect.
[0124] See Figure 2 The diagram shown is a flowchart of an implementation method for training a work order classification model according to an embodiment of this application. Taking the server as the execution subject as an example, the specific implementation process of this method is as follows (S21-S24):
[0125] S21: The server retrieves a sample set of work orders.
[0126] Each work order sample in the work order sample set includes: the category label of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. This dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order.
[0127] Specifically, dialogue text information refers to the text of conversations between a business processing object and the business service object it serves. For example, in a customer service conversation where the customer service representative resolves a user's problem, there may be one or even multiple rounds of dialogue between the representative and the user. This dialogue information, after being converted into text format, can be used as dialogue text information. If the dialogue between the business processing object and the business service object includes voice data, the corresponding text information can be obtained through speech recognition.
[0128] When generating dialogue text information based on customer service work order records, dialogue text information containing at least one round of dialogue text can be generated according to the number of dialogue rounds.
[0129] See Figure 3 The diagram shown is a schematic representation of dialogue text information in an embodiment of this application. Customer service representative A is the business processing target, and customer service representative B is the business service target. During this customer service conversation, customer service representative A and target B engage in four rounds of dialogue: Dialogue Text 1-Dialogue Text 2; Dialogue Text 3-Dialogue Text 4; Dialogue Text 5-Dialogue Text 6; Dialogue Text 7-Dialogue Text 8.
[0130] Specifically, the work order sample set is constructed based on historical work order data. These work orders can involve many businesses such as games, social networking, payment, and transactions. The work order samples can be some manually archived historical customer service work orders. The category label of a work order sample indicates the actual work order category of the customer service work order. It can be determined by manual archiving or by other methods. This article does not make specific limitations.
[0131] S22: Based on the category labels of each work order sample, the server selects at least one reference work order sample to be expanded from the work order sample set.
[0132] In this embodiment of the application, in order to solve the long-tail problem of samples, at least one work order sample is selected from the work order sample set containing work order samples of each category based on the category label of each work order sample as a reference work order sample to be expanded.
[0133] One possible filtering method is as follows: First, based on the category label of each work order sample, determine the number of work order samples of each category in the work order sample set; then, take at least one work order sample corresponding to the category with a number lower than a preset threshold as a reference work order sample.
[0134] For example, the work order sample set is obtained by cleaning and filtering historical work order data from three consecutive months. Figure 4As shown, this is a sample screening diagram in an embodiment of this application. After dividing the samples according to four different archiving paths, the work order sample set contains four categories of work order samples: Category A: 240 work order samples, Category B: 70 work order samples, Category C: 150 work order samples, and Category D: 90 work order samples. Assuming a preset threshold of 100 is defined, if the data under an archiving path is less than 100 orders, then the work order samples of that category are long-tail samples.
[0135] In the above implementation, the number of work orders in each category can be counted by combining the category labels of the work order samples. Based on this, long-tail samples that may lead to long-tail problems can be effectively screened out so as to accurately balance the number of samples in the future.
[0136] S23: The server performs data augmentation on the dialogue text information in at least one reference work order sample based on a preset data augmentation strategy to obtain the corresponding extended work order sample.
[0137] The preset data augmentation strategy is used to indicate that non-critical information in the dialogue text is replaced. Non-critical information is information that does not change the semantics of the dialogue text before and after the replacement.
[0138] Optionally, depending on the method of replacing non-critical information, the preset data augmentation strategies include, but are not limited to, at least one of the following:
[0139] Strategy 1: A synonym replacement strategy used to replace non-critical information in dialogue text.
[0140] In this embodiment of the application, strategy one specifically refers to replacing non-critical information in the dialogue text between customer service and user in a reference work order sample with synonyms to obtain new dialogue text information; then, an extended work order sample is constructed based on the new dialogue text information.
[0141] Strategy 2: Paragraph crossing strategy for non-critical information crossing in dialogue text.
[0142] In this embodiment of the application, strategy two specifically refers to exchanging or inserting the dialogue text between customer service and user in two reference work order samples to obtain new dialogue text information; then, an extended work order sample is constructed based on the new dialogue text information.
[0143] Strategy 3: Information masking reconstruction strategy for masking non-critical information in dialogue text.
[0144] In this embodiment of the application, strategy three specifically refers to masking and reconstructing non-critical information in the dialogue text between customer service and user in a reference work order sample to obtain new dialogue text information; then, an extended work order sample is constructed based on the new dialogue text information.
[0145] Masking reconstruction refers to masking non-critical information in the dialogue text and then learning and recovering it by combining the context of the masked part.
[0146] In this embodiment of the application, a pre-trained language representation model BERT (Bidirectional Encoder Representation from Transformers) can be used as a masked language model (MLM). The main purpose is to use the characteristics of its masked language model to enhance text reconstruction, such as predicting and reconstructing masked words in a sentence, or predicting and reconstructing masked dialogues in multi-turn dialogues.
[0147] It should be noted that there are many ways to perform mask reconstruction of text. This article mainly uses BERT as an example for illustration. Any mask reconstruction method is applicable to the embodiments of this application, and this article does not make any specific limitations.
[0148] In addition, in this embodiment of the application, since the semantics of the dialogue text information are not changed before and after data augmentation, the newly added extended work order samples and the corresponding reference work order samples have the same semantics of dialogue text information and the same category labels. Based on this method, high-quality sample data can be effectively expanded and the number of high-quality samples of various types can be balanced.
[0149] S24: The server trains the model based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model.
[0150] Specifically, by combining the original work order samples and the obtained extended work order samples, a training sample set is constructed, which balances the number of work orders of various types and enhances the long-tail data. Then, based on these training samples, the multi-turn dialogue classification model to be trained is trained to obtain a trained work order classification model. This can effectively solve the impact of the long-tail problem in the training data on model training, so that the model can fully learn the features of various types of work orders, thereby avoiding overfitting and improving the accuracy of the model.
[0151] For example, the multi-turn dialogue classification model to be trained can be a multi-dimensional hierarchical attention network (MHAN). MHAN is a hierarchical text classification model used to analyze the structure of text and incorporate multi-dimensional information reinforcement learning. In this paper, MHAN can be used as the basic model for intelligent archiving.
[0152] The work order classification model is used to determine the work order category to which a customer service work order belongs. Specifically, when using the trained work order classification model to classify customer service work orders, the model can combine the dialogue text information of the work order, as well as basic information of customer service personnel and users, business query information, etc., as input features. By performing multiple classifications through the model, the final determination of the work order category to which the customer service work order belongs is more accurate and reliable.
[0153] In addition, it should be noted that in the specific implementation of this application, data related to customer service work orders is involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0154] It should also be noted that, in this embodiment, when augmenting the dialogue text information of a reference work order sample, at least one of the preset data augmentation strategies listed above can be used. For example, strategy one can perform synonym replacement on dialogue text 1 in the dialogue text information, and strategy two can perform paragraph cross-referencing on dialogue text 2 in the dialogue text information. Furthermore, combinations of the above preset data augmentation strategies can be used to augment the reference work order in different proportions. For example, 20% of the reference work orders can be augmented using strategy one, and 80% can be augmented using strategy two. Alternatively, one-third of the reference work orders can be augmented using strategy one, one-third using strategy two, and one-third using strategy three, etc. Specifically, any method of data augmentation based on any strategy or any combination of strategies is applicable to this embodiment, and is not specifically limited herein.
[0155] The following sections will provide a detailed introduction to the data augmentation process for each of these pre-defined data augmentation strategies:
[0156] Strategy 1: Synonym Substitution Strategy
[0157] In this embodiment, data augmentation can be performed on some or all of the reference work order samples selected in step S22 based on a synonym replacement strategy. Specifically, the process of data augmentation for a reference work order sample is as follows:
[0158] Based on a pre-defined thesaurus, at least one business-related word in the dialogue text of the reference work order sample is replaced with a synonym to obtain the replaced dialogue text. Based on the replaced dialogue text and the category label of the reference work order sample, an extended work order sample corresponding to the reference work order sample is generated.
[0159] In certain business archives, customer requests are often very specific to a particular product or service. This application can perform efficient data augmentation based on similar specialized terms for this business. Referring to Table 1, which is a thesaurus of synonyms listed in the embodiments of this application, Table 1 lists some business-specific terms (also called business-related terms), as follows:
[0160] Table 1: List of Specialized Terms for Some Business Functions
[0161]
[0162]
[0163] like Figure 5 As shown, it is an embodiment of this application based on Figure 3 The diagram shown illustrates the synonym replacement process for the dialogue text. For Figure 5 The reference work order sample shown (category label: payment) contains a total of 8 dialogue texts. Among them, dialogue text 1 and dialogue text 2 contain "payment code". This time, by replacing "payment code" with "receive money code" in the dialogue text information, a new dialogue text information can be obtained. Then, the dialogue text information of the new dialogue text information is used as the dialogue text information of the corresponding extended work order sample, and the category label of the reference work order sample is used as the category label of the corresponding extended work order sample. Based on this, the corresponding extended work order sample can be obtained.
[0164] In the above implementation, unlike synonyms in a general sense, business or product names may have abbreviations, colloquialisms, English names, or pronunciation translation errors. The augmented data generated by replacing these words according to the business-specific vocabulary can ensure semantic invariance to the greatest extent and increase the robustness of the model in different scenarios.
[0165] Strategy Two: Paragraph Crossing Strategy
[0166] In this embodiment, data augmentation can be performed on some or all of the reference work order samples selected in step S22 based on a paragraph cross-validation strategy. Specifically, the process of data augmentation for the reference work order samples is as follows:
[0167] For two reference work order samples with the same category label, the dialogue texts from the two reference work order samples belonging to the same dialogue initiator are cross-referenced to obtain the cross-referenced dialogue text information. Based on the cross-referenced dialogue text information and the category label of the reference work order sample, an extended work order sample corresponding to the reference work order sample is generated. Here, the dialogue initiator is the business processing object or business service object, i.e., customer service or user.
[0168] In the same archived service tickets, the user's requests, the agent's scripts, and the guidance process are basically the same. By performing operations such as cross-exchanging and inserting dialogue paragraphs of the same role between different tickets in the same archive, semantically unchanged augmented ticket data can be obtained.
[0169] Specifically, based on different exchange rules, the following three dialogue crossover methods can be distinguished:
[0170] Method 1: Exchange the dialogue texts published by the same dialogue publisher in the same number of dialogue rounds in two reference work order samples.
[0171] In this embodiment of the application, this method is also called parallel paragraph swapping, which involves swapping the dialogues of two work orders with the same number of service dialogue rounds and the same role. For example... Figure 6A As shown, it is a schematic diagram of a parallel paragraph exchange in an embodiment of this application. Figure 6A The following are dialogue texts from two sample work orders, each containing four rounds of dialogue and eight dialogue texts.
[0172] The two reference work order samples share the same category label, meaning they belong to the same category. The eight dialogue texts within archived work order 1 are as follows:
[0173] Round 1: (User dialogue text 1) My account has been frozen, how can I unfreeze it? - (Customer service dialogue text 2) Please provide your account and contact information so we can check for you.
[0174] Second round: (User dialogue text 3) The account is 1234, and the mobile phone number is 133XXXXXXXX- (Customer service dialogue text 4) Your authorization is required to query personal information. Do you agree?
[0175] Third round: (User dialogue text 5) Okay, okay, please check for me quickly. - (Customer service dialogue text 6) Has your account been shared with others recently?
[0176] Round 4: (User dialogue text 7) My account was recently hacked, and when I got it back, they said I was involved in a violation. [Screenshot] - (Customer service dialogue text 8) Your account has been noted and processed. Please pay attention to subsequent SMS notifications.
[0177] The eight dialogue texts in archived work order 2 are as follows:
[0178] Round 1: (User dialogue text 1) I can't log in to my account. It says I've violated the rules. Please check it for me; I need to use my account urgently. - (Customer service dialogue text 2) Querying personal information requires your authorization. Do you agree?
[0179] Second round: (User dialogue text 3) Agreed. - (Customer service dialogue text 4) Is the abnormal account bound to the incoming phone number?
[0180] Third round: (User dialogue text 5) This is another number, 133XXXXXXXX- (Customer service dialogue text 6) Your account is involved in illegal information and will be frozen for 72 hours. We suggest you wait patiently for the freeze to be lifted.
[0181] Fourth round: (User dialogue text 7) When did I violate the rules? Do you have any evidence? Please unbind me now. - (Customer service dialogue text 8) We understand your feelings. We suggest you submit relevant materials through the appeal channel.
[0182] For the aforementioned work order 1 and work order 2 in the same archive, these two work orders are of the same category, both involving the unblocking of social media accounts, with the roles divided into user and customer service. Figure 6A Parallel paragraph replacement refers to exchanging the user dialogue text in the fourth round of dialogue between archived work order 1 and archived work order 2. After the exchange, the fourth round of dialogue with archived work order 1 is as follows: (User dialogue text 7) When did I violate the rules? Do you have any evidence? Can you resolve this for me now? - (Customer service dialogue text 8) Your case has been noted and processed. Please pay attention to subsequent SMS notifications.
[0183] The fourth round of dialogue with archived work order 2 is as follows: (User dialogue text 7) I was recently hacked once, and when I got it back, they said I was involved in a violation. [Screenshot] - (Customer service dialogue text 8) We understand your feelings very well. We suggest you submit relevant information through the appeal channel.
[0184] Method 2: Randomly exchange the dialogue texts published by the same dialogue publisher in different dialogue rounds from two reference work order samples.
[0185] In this embodiment of the application, this method is also called random paragraph swapping, which involves randomly selecting two work orders with different rounds of dialogue for the same role and swapping them. For example... Figure 6BAs shown, this is a schematic diagram of a random paragraph exchange in an embodiment of this application. Taking the aforementioned archival work order 1 and archival work order 2 as examples, Figure 6B The random paragraph replacement refers to exchanging the user dialogue text from the third round of dialogue in archived work order 1 with the user dialogue text from the second round of dialogue in archived work order 2. After the exchange, the third round of dialogue in archived work order 1 is as follows:
[0186] Third round: (User dialogue text 5) Agreed. - (Customer service dialogue text 6) Has your account been shared with others recently?
[0187] The second round of dialogue with archived work order 2 is as follows:
[0188] Second round: (User dialogue text 3) Okay, okay, please check it for me quickly. - (Customer service dialogue text 4) Excuse me, is the abnormal account bound to the incoming phone number?
[0189] Method 3: Randomly insert the dialogue text from one of the two reference work order samples into the other reference work order sample, between the dialogue texts published by the same dialogue publisher.
[0190] In this embodiment of the application, this method is also called random paragraph insertion, which means randomly selecting dialogue from other work orders with the same role and inserting it into the original work order. For example... Figure 6C As shown, this is a schematic diagram of random paragraph insertion in an embodiment of this application. Taking the same archive work order 1 and the same archive work order 2 listed above as examples, Figure 6B Random paragraph insertion in the document refers to inserting the user dialogue text (i.e., user dialogue text 1) from the first round of dialogue in the same archived work order 2 into the user dialogue text (i.e., user dialogue text 1 and user dialogue text 3) from the first round of dialogue and the second round of dialogue in the same archived work order 1.
[0191] Additionally, considering that a question-and-answer session typically represents one round of dialogue, but in actual conversations, there may be multiple questions and answers or one question and multiple answers, the number of dialogue rounds can be aligned by methods such as supplementation or truncation. For example, for the same archived work order 1 after inserting a new dialogue text, the original user dialogue text 1 (My account is frozen, how can I unfreeze it?) and the newly inserted user dialogue text 3 (I can't log in to my account, it tells me I violated the rules, please check it for me, I urgently need my account.) can be merged into one sentence, forming the merged dialogue text 1. Thus, the first round of dialogue in the same archived work order 1 after adjustment is: (User dialogue text 1) My account is frozen, how can I unfreeze it? I can't log in to my account, it tells me I violated the rules, please check it for me, I urgently need my account. - (Customer service dialogue text 2) Please provide your account and contact information so I can check it for you. Obviously, the semantics of the newly generated dialogue text information do not change through this method.
[0192] Specifically, when augmenting data across multiple reference work order samples using a paragraph crossover strategy, the three dialogue crossover methods listed above can be used individually, in pairs, or in combination. See also... Figure 7 As shown, this is a schematic diagram of a method for superimposing parallel paragraph swapping and random paragraph replacement in an embodiment of this application. The two work orders listed in this schematic diagram are the same as those described above. Figure 6A and Figure 6B The same means that it is still the same archived work order 1 and the same archived work order 2. Based on the parallel paragraph replacement method, it means that the user dialogue text in the fourth round of dialogue of the same archived work order 1 and the same archived work order 2 will be exchanged; and based on the random paragraph replacement method, the user dialogue text in the third round of dialogue of the same archived work order 1 will be exchanged with the user dialogue text in the second round of dialogue of the same archived work order 2.
[0193] After the exchange, the third round of dialogue with archived ticket 1 is as follows: (User dialogue text 5) Agreed. - (Customer service dialogue text 6) Has your account been shared with others recently?
[0194] The fourth round of dialogue is as follows: (User dialogue text 7) When did I violate the rules? Do you have any evidence? Can you unblock me now? - (Customer service dialogue text 8) Your case has been noted and processed. Please pay attention to subsequent SMS notifications.
[0195] The second round of dialogue with archived work order 2 is as follows: (User dialogue text 3) Okay, okay, please check it for me quickly. - (Customer service dialogue text 4) Excuse me, is the abnormal account bound to the incoming phone number?
[0196] The fourth round of dialogue was as follows: (User dialogue text 7) My account was recently hacked, and when I got it back, they said I had violated regulations. [Screenshot] - (Customer service dialogue text 8) We understand your feelings. We suggest you submit relevant information through the appeal channel.
[0197] Furthermore, when using Strategy 2 to augment data on multiple reference work order samples, in addition to using different exchange rules to superimpose on the same reference work order sample as listed above, data augmentation can also be performed on multiple different reference work order samples according to different proportions. For example: parallel paragraph exchange (30%), random paragraph exchange (30%), random paragraph insertion (40%), etc.
[0198] It should be noted that the above-mentioned exchange rules listed in the embodiments of this application are just simple examples. In fact, any exchange rule that does not change the semantics is applicable to the embodiments of this application, and no specific limitation is made here.
[0199] Furthermore, it should be noted that the data augmentation methods listed in Strategies 1 and 2 above involve cross-interchange and synonym substitution in natural language, a process that requires manual intervention or annotation, resulting in low efficiency. If a method of data augmentation by introducing model features is more universal and general, requires no manual intervention, and is easily transferable to other model structures, the specific implementation method is as follows: Strategy 3.
[0200] Strategy 3: Information Mask Reconstruction Strategy
[0201] In this embodiment, data augmentation can be performed on some or all of the reference work order samples selected in step S22 based on an information mask reconstruction strategy. Specifically, the process of data augmentation for the reference work order samples is as follows:
[0202] One alternative implementation is to proceed as follows: Figure 8 The flowchart shown is an implementation of S23, which is a schematic diagram of a data augmentation method in an embodiment of this application, including the following steps (S81-S84):
[0203] S81: For a reference work order sample, the server obtains the word vectors of each word segment in the dialogue text information of the reference work order sample through word vector mapping.
[0204] In this embodiment of the application, the dialogue text information of the reference work order sample can be segmented by a word segmentation tool, and then the word vectors of each segmented word contained in each dialogue text (also known as dialogue statement) can be obtained through word vector mapping.
[0205] For example, for each dialogue statement S, word vector mapping can generate... The matrix, where X i This represents the word vector of the segmented words in the dialogue statement S, where i represents the i-th word in the sentence and takes values from 1 to n (positive integers), and n represents the number of segmented words in the dialogue statement S.
[0206] In addition, the embodiments of this application can further set the number of words in each dialogue text to be consistent. For example, if it is set to 60 words, after word segmentation, for dialogue text with less than 60 words, one or more meaningless words can be used for padding to ensure that each sentence obtained in the end has 60 words.
[0207] S82: The server determines the mask probability of each target information in the dialogue text information based on the word vectors of each segment.
[0208] Optionally, the target information can be word segmentation or dialogue text. That is, for multiple dialogue texts in a work order, one or more dialogue texts can be masked and reconstructed as a whole, or word segmentation at certain positions in one or more dialogue texts can be masked and reconstructed.
[0209] Considering that random masking methods may mask out some keywords in the dialogue, causing semantic loss and generating unpredictable noise, the method in this application is different from the commonly used random masking method. This application uses a mask probability matrix to represent the probability that each target information is replaced by a mask. The more important a target information is, the lower the probability that the target information is replaced, that is, the lower the mask probability corresponding to the target information.
[0210] Optionally, embodiments of this application provide a mask reconstruction enhancement based on saliency maps, which can calculate the mask probability of each target information based on the saliency map. Specifically, when determining the mask probability of each target information based on the word vectors of each word segment, it can be divided into the following sub-steps (S821-S822):
[0211] S821: The server determines the saliency coefficient of each target information based on the word vectors of each segmented word.
[0212] The significance coefficient is used to characterize the importance of the target information to the work order classification results.
[0213] In this embodiment of the application, considering that the target information can be dialogue text or word segmentation, the calculation process of step S821 can be further divided into the following two types:
[0214] If the target information is word segmentation, then the significance coefficient of each word segment is calculated directly based on the word vector of each word segment.
[0215] If the target information is dialogue text, the sentence vector of each dialogue text needs to be determined first based on the word vectors of each segment of the text. Then, based on the sentence vectors of each dialogue text, the salience coefficient of each dialogue text is calculated.
[0216] Specifically, when determining the sentence vector of a dialogue text based on the word vectors of each segment of the text, a simple concatenation and combination method can be used, or a gated recurrent unit (GRU) + self-attention mechanism can be employed. For example, GRU and self-attention can be used to encode the input word vectors, and the word vectors of each sentence can be fed into a deep neural network to generate the sentence vector for each sentence, and so on. This paper does not impose specific limitations on the method of generating a sentence vector based on multiple word vectors.
[0217] An optional implementation method is to perform step S821 in the following manner:
[0218] Based on the classification probability of a reference work order sample and the information vector of each target information, the significance coefficient corresponding to each target information is determined. The classification probability is obtained based on the prediction of the work order classification model.
[0219] In this embodiment of the application, if the target information is word segmentation, then the information vector is word vector; that is, for each dialogue text in a reference work order sample, the dialogue text can be used as a unit, and the significance coefficient of each word segmentation in each dialogue text can be calculated in the following way: When calculating the significance coefficient of each word segmentation in a dialogue text, the significance coefficient of each word segmentation in the dialogue text can be determined based on the classification probability of the reference work order sample and the word vector of each word segmentation in the dialogue text. The specific calculation formula can be found in the following formula (3), and the detailed calculation process can be found in the following text. The repeated parts will not be repeated.
[0220] If the target information is dialogue text, the information vector is the sentence vector determined based on the word vectors of each segment in the dialogue text. That is, for each dialogue text in a reference work order sample, the work order can be used as a unit. When calculating the significance coefficient of each dialogue text, the significance coefficient of each dialogue text in the work order can be determined based on the classification probability of the reference work order sample and the sentence vector of each dialogue text in the work order. The specific calculation process is similar to the calculation process of the significance coefficient of each segment in a dialogue text listed above. The specific calculation formula can also be found in the following formula (3), the difference being that the vector X i y represents the sentence vector, while n represents the number of dialogue texts contained in a work order.
[0221] In the above implementation, the significance coefficient calculated by combining the contribution of the information vector of each target information to the classification probability can effectively characterize the importance of each target information to the classification result, and the accuracy is higher.
[0222] S822: The server determines the corresponding mask probability based on each significance coefficient.
[0223] In this embodiment, since the significance coefficient represents the importance of the target information to the work order classification result, that is, the larger the significance coefficient of a target information, the more important the target information is and the lower the probability of it being replaced. Therefore, the mask probability is inversely proportional to the corresponding significance coefficient. A specific implementation method can be found in the following formula (6), which is not specifically limited here.
[0224] S83: The server performs mask reconstruction on at least one target information in the dialogue text information based on the mask probability of each target information, and obtains the masked dialogue text information.
[0225] Specifically, considering that the higher the mask probability of a target information, the greater the possibility that the target information is selected for mask reconstruction in the dialogue text information, in step S83, the mask probabilities of each target information can be sorted in descending order, and the target information in the top 15% of the sorted results can be selected for mask reconstruction.
[0226] For example, for a dialogue text, if the target information is dialogue text (i.e. dialogue statements), and the dialogue text contains 10 rounds of dialogue and 20 dialogue statements, then the top three dialogue statements (20 × 15% = 3) can be masked, and BERT can be used as MLM for text reconstruction and enhancement.
[0227] For example, if the target information is word segmentation, then for each dialogue text, the top 15% of the word segments in that dialogue text can be selected for mask reconstruction. Taking a certain dialogue text as an example, this dialogue text has 60 word segments. Based on the mask probability of each of these 60 word segments, after sorting them from largest to smallest, the top 9 word segments (60 × 15% = 9) are selected for masking, and BERT is used as MLM for text reconstruction enhancement.
[0228] It should be noted that the above-described methods for masking and reconstructing at least one target information in the dialogue text information based on the mask probability of each target information are merely illustrative examples. In fact, any method for masking and reconstructing target information in the dialogue text information based on mask probability is applicable to the embodiments of this application, and is not specifically limited herein.
[0229] Furthermore, regarding the mask reconstruction of strategy three, the above examples illustrate mask reconstruction at the word or sentence level. In addition, low-information-interaction sentences can also be directly deleted. Low-information-interaction sentences are those that have little or no impact on the classification result of the entire conversation. When the target information is the dialogue text, these sentences can be determined based on the mask probability of each dialogue text. If the mask probability of a dialogue text is very high, it indicates that the dialogue text is unimportant and can be directly deleted.
[0230] S84: The server generates an extended work order sample corresponding to the reference work order sample based on the dialogue text information reconstructed by the mask and the category label of the reference work order sample.
[0231] See Figure 9 As shown, it is a logical schematic diagram of information mask reconstruction in an embodiment of this application.
[0232] Specifically, for a reference work order sample, its dialogue text information includes: dialogue 1, dialogue 2, dialogue 3, dialogue 4... First, it is necessary to calculate the corresponding mask probability for each word segment in the dialogue text information of the reference work order sample; then, select the top 15% of the word segments for masking; then, perform mask reconstruction based on BERT to obtain new dialogue text information, including: dialogue 1', dialogue 2', dialogue 3', dialogue 4'... Combine the new dialogue text information with the category label (Category A) of the reference work order sample to obtain the corresponding extended work order sample.
[0233] It should be noted that the information mask reconstruction methods listed above are merely illustrative examples. Any information mask reconstruction method is applicable to the embodiments of this application, and no specific limitation is made herein.
[0234] In the above implementation, model features are introduced for data augmentation, which does not require human intervention and is easy to transfer to other model structures. Furthermore, the mask probability is calculated by combining the saliency map, which, unlike random masking, can effectively ensure the semantic invariance of the generated text.
[0235] It should be noted that the main architecture network in Strategy 3 listed above can also achieve better performance by increasing the number of nodes and optimizing the coding method.
[0236] Furthermore, to prevent anomalous augmented data from affecting the final result, the augmented data can be weighted using an attention matrix to optimize the final outcome. One optional implementation involves obtaining the word vectors of each segment in the dialogue text information of a reference work order sample through word vector mapping. Then, an attention mechanism can be used to weight the word vectors of each segment to obtain updated word vectors. In this approach, the mask probability of each target information in the dialogue text information needs to be determined based on the updated word vectors of each segment.
[0237] See Figure 10 As shown, it is a flowchart illustrating a method for calculating mask probability in an embodiment of this application, specifically including the following processes (S101-S106):
[0238] S101: For a reference work order sample, the server obtains the word vectors of each word segment in the dialogue text information of the reference work order sample through word vector mapping.
[0239] S102: The server uses an attention mechanism to perform attention weighting on the word vectors of each word segment, and obtains the updated word vectors of each word segment.
[0240] Specifically, by combining attention mechanisms, the importance of each word segment can be learned. Based on this approach, more important words can be assigned higher weights, while less important words can be assigned lower weights. Furthermore, by combining the word vectors adjusted in this way with the calculated saliency coefficients, the importance of each target information can be represented more effectively.
[0241] S103: The server determines the saliency coefficient of each target information based on the word vectors of each segmented word after the update.
[0242] Specifically, the implementation of this step is similar to that of step S821, except that the word vectors used are different. The word vectors in this step are word vectors optimized through an attention mechanism.
[0243] In this embodiment, considering that the calculation of the saliency coefficient in this step depends on the currently trained network, and that the network gradient in the early stages of training may contain a lot of noise and fluctuate drastically in a local range, this application proposes a method for optimizing the saliency coefficient. After determining the saliency coefficient of each target information based on the word vectors of each segment (or the updated word vectors of each segment), the saliency coefficient of the target information can be further optimized.
[0244] An optional implementation is to take determining the saliency coefficient of each target information based on the word vectors of each updated word segment as an example. Then, after step S103, the following steps may be further included:
[0245] S104: The server determines the corresponding significance coefficient covariance matrix based on the significance coefficients of each target information.
[0246] In statistics and probability theory, a covariance matrix is a matrix in which each element is the covariance between the elements of each vector. In the embodiments of this application, the significance coefficients corresponding to each target information can be regarded as vector elements, and the covariance matrix is constructed by calculating the covariance between the significance coefficients corresponding to the target information.
[0247] Specifically, the target information can be dialogue text or word segments within the dialogue text. A dialogue text will contain at least one round of dialogue text (a question and answer – two dialogue texts), and each dialogue text can contain one or more word segments. Therefore, depending on the target information, there can be one or more significance coefficient covariance matrices, as follows:
[0248] If the target information is dialogue text, then there is one significance coefficient covariance matrix for each target information. This significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each dialogue text in the dialogue text information.
[0249] If the target information is word segmentation, there can be one or more significance coefficient covariance matrices. If there is only one, it is generated based on the covariance between the significance coefficients of each word in the dialogue text. If there are multiple matrices, each significance coefficient covariance matrix corresponds one-to-one with the dialogue text in the dialogue text, and each significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each word in the corresponding dialogue text. In the case where the target information is word segmentation, since a dialogue text typically contains multiple rounds of dialogue and multiple dialogue texts, and each dialogue text can be further divided into multiple words, this paper mainly uses the dialogue text as the unit for calculation convenience, illustrating the case where the significance coefficient covariance matrix corresponds one-to-one with the dialogue text in the dialogue text. See below for details.
[0250] S105: The server determines the updated significance coefficients for each target information based on the significance coefficient covariance matrix.
[0251] In an optional implementation, step S105 can be further divided into the following sub-steps (S1051-S1053), and the following operations are performed for each target information:
[0252] S1051: For a given target information, the server performs multiple corrections on the information vector of that target information based on the corresponding significance coefficient covariance matrix.
[0253] Specifically, based on the type of target information, it can be divided into the following two cases:
[0254] Case 1: If the target information is dialogue text, then the information vector is a sentence vector determined based on the word vectors of each segment in the dialogue text.
[0255] That is, if the target information is a dialogue statement, the sentence vector of the dialogue statement can be modified multiple times based on the significance coefficient covariance matrix corresponding to the work order.
[0256] Case 2: If the target information is word segmentation, then the information vector is a word vector.
[0257] That is, if the target information is word segmentation, the word vector of the word segmentation can be modified multiple times based on the saliency coefficient covariance matrix of the dialogue statement to which the word segmentation belongs.
[0258] Optionally, step S1051 can be further divided into the following sub-steps (S10511-S10512, ...). Figure 10 (Not shown in the image):
[0259] S10511: The server obtains multiple Gaussian noises corresponding to the target information by using a Gaussian distribution determined based on the covariance matrix of the significance coefficient.
[0260] The variance of this Gaussian distribution is the sum of the diagonal elements of the significance coefficient covariance matrix; in addition, the mean of this Gaussian distribution is 0.
[0261] This application uses a Gaussian distribution. To obtain the desired perturbation (i.e. Gaussian noise), where the mean of this Gaussian distribution is 0, and ∑ is the significance coefficient covariance matrix. The distribution variance of the high-dimensional feature is obtained by calculating the sum of the diagonal elements of the significance coefficient covariance matrix ∑, which is also the variance of the Gaussian distribution.
[0262] Based on this Gaussian distribution, multiple Gaussian noises can be obtained, and the dimension of each Gaussian noise is the same as the dimension of the information vector of the target information.
[0263] S10512: The server corrects the information vector of the target information multiple times based on multiple Gaussian noises.
[0264] Specifically, the dimension of the Gaussian noise determined in step S10511 is the same as the dimension of the information vector of the target information. Therefore, when the information vector of the target information is corrected based on a Gaussian noise, it can be expressed as the Gaussian noise and the original information vector of the target information are summed to obtain the corrected information vector.
[0265] In the above implementation, by adding Gaussian noise to the original gradient for a smooth transition, the drastic fluctuations of the network gradient in the local range during the early stage of training can be effectively reduced, thereby improving the accuracy of the calculation results.
[0266] S1052: Based on the classification probability of the reference work order sample and the corrected information vectors, the server determines the intermediate significance coefficients corresponding to the target information.
[0267] Specifically, the calculation method for this intermediate significance coefficient is similar to the calculation process for the initial significance coefficient listed in S82 above, and can also refer to formulas (4) and (5). The difference lies in the vector X here. i This represents the corrected information vector.
[0268] S1053: The server uses the mean of all intermediate significance coefficients as the updated significance coefficient corresponding to the target information.
[0269] The classification probability is predicted based on the work order classification model.
[0270] For example, if the target information in step S1051 is a dialogue statement, then multiple Gaussian noises corresponding to the dialogue statement can be determined. Based on each Gaussian noise, the sentence vector of the dialogue statement is corrected. Based on the corrected sentence vectors, the significance coefficient (i.e., the intermediate significance coefficient) of each dialogue statement is recalculated. The mean of the recalculated intermediate significance coefficients can then be used as the updated significance coefficient of the dialogue statement.
[0271] For example, if the target information in step S1051 is a word segment, then multiple Gaussian noises corresponding to the word segment can be determined. Based on each Gaussian noise, the sentence vector of the word segment is corrected. Based on the corrected word vectors, the significance coefficient (i.e., the intermediate significance coefficient) of each word segment is recalculated. The mean of the recalculated intermediate significance coefficients can then be used as the updated significance coefficient of the word segment.
[0272] It should be noted that the specific calculation process in step S105 can be found in the following formulas (4) and (5), and the repeated parts will not be repeated.
[0273] S106: The server determines the corresponding mask probability based on each updated significance coefficient, and the mask probability is inversely proportional to the corresponding updated significance coefficient.
[0274] Specifically, this step is similar to S822 listed above, except that the significance coefficient is updated. The repeated parts will not be described again.
[0275] Specifically, BERT is used as an MLM for text reconstruction enhancement. A saliency map-based mask replacement method is introduced to ensure the semantic invariance of the generated text. Taking word segmentation as an example, the mask reconstruction process is described in detail below.
[0276] In this embodiment of the application, the following process can be performed for each dialogue statement in the dialogue text information, taking dialogue statements as the unit:
[0277] The method in this application differs from the commonly used random mask generation method. This application uses a mask probability matrix to represent the probability that each word in the sentence will be replaced by a mask. The more important a word is, the lower the probability that the word will be replaced. One way to represent the mask probability matrix is as follows: Formula (1):
[0278] p = [p1, p2, ..., p] n (1)
[0279] Where, pn Let $\mathbf{n}$ be the probability that the $n$-th word in the sentence will be replaced. For each input dialogue statement $S$, word vector mapping can be used to generate $\mathbf{n}$. The matrix is used to obtain the classification score y of the current model after passing through a multi-turn dialogue classification model. This application uses a saliency map to measure the importance of each word in the sentence to the result y, as shown in the following formulas (2) and (3):
[0280]
[0281]
[0282] Where m is a vector composed of the significance coefficients of each word in the sentence, containing a total of n elements, and y is the classification result score (i.e., classification probability) obtained through the current multi-turn dialogue model, 1 T This is an indicator function used to indicate the zero-padding value during data normalization. (The rest of the text appears to be a list of parameters and instructions, possibly related to a function or instruction.) Differentiate the obtained scores and sum the embedding matrices for each word. The differential gradients of all dimensions are used to measure the importance of the i-th word to the classification result. M(X) i The significance coefficient of the i-th word in a sentence represents the importance of the i-th word to the classification result, i.e., the significance coefficient of the i-th word. The value of i is from 1 to n (positive integers), and n represents the number of words in the dialogue statement S.
[0283] Since this gradient calculation method depends on the currently trained network, the network gradient in the early stages of training may have a lot of noise and fluctuate wildly in a local range. Therefore, this application adds Gaussian noise to the original gradient to make the transition smooth.
[0284]
[0285]
[0286] This application uses a Gaussian distribution. To obtain the desired perturbation, where the mean of this Gaussian distribution is 0. z j That is, the j-th Gaussian noise corresponding to the i-th word in the dialogue statement S. This Gaussian noise has the same dimension as its corresponding word vector, which is 1×d, that is, a one-dimensional vector containing d elements. Here, the value of j can be from 1 to n. In formula (4), it means that n intermediate significance coefficients can be calculated, M(X i +z j That is, the j-th intermediate significance coefficient corresponding to the i-th word segment, calculated in the same way as the above formula (3).
[0287] In this embodiment, the distribution variance of the high-dimensional features is obtained by calculating the sum of the diagonal elements of the significance coefficient covariance matrix ∑. After obtaining Gaussian noise, the original gradient is summed with noise and the mean is calculated to obtain the Gaussian smoothed significance coefficient, i.e., the updated significance coefficient.
[0288] In practical applications, the significance coefficient obtained from the above formula is used as a measure of the importance of a word in the text. Therefore, the probability p of a word being replaced during the masking process is... i It should be inversely proportional to its significance coefficient, that is, the more important the word, the lower the probability of it being replaced, so as to avoid semantic loss caused by keyword replacement.
[0289]
[0290] Among them, the hyperparameter β controls the smoothness of the probability, and the probability p i Normalize by summing them.
[0291] This application is based on p i The matrix replaces 15% of the words in each work order with a mask, and BERT is used to predict and reconstruct these masked parts to form an augmented dataset for training. Since the gradient changes with network training, the probability matrix p in this application will also change in each iteration.
[0292] It should be noted that the calculation process is similar when the target information is dialogue text, and the repetitive parts will not be repeated.
[0293] The following examples illustrate three strategy combinations. Figure 11 The diagram shown illustrates an overall model structure in an embodiment of this application. This application uses MHAN as the base model to construct an intelligent archive classification network, augmenting the data by adding an additional data augmentation module. The augmentation methods consist of three parts: word-level replacement enhancement, cross-insertion between paragraphs in the same archive work order, and mask reconstruction based on saliency maps. The augmentation operations primarily target the large number of long-tail samples in the actual data.
[0294] In the embodiments of this application, comparative experiments show that after augmenting the long-tail data, the model's performance is significantly improved, and it also exhibits better robustness on the test set. Specific experiments are as follows:
[0295] This application uses three consecutive months of historical work order data, after cleaning and filtering, as training data. It defines a long-tail sample as data with fewer than 100 orders under an archive path. Simultaneously, it uses one week of live network data as the test set to evaluate the model's training effect and coverage. This application also compares the impact of no data augmentation, word / paragraph level data augmentation, and all augmentation methods on model performance.
[0296] The comparative experiments with different archive numbers clearly show that the original model's performance on the test set is significantly improved after data augmentation is introduced. Horizontally, the more diverse the augmentation methods, the better the model's learning effect on long-tail samples. Vertically, the more long-tail samples there are, the greater the improvement in model performance brought by data augmentation. This also proves that data augmentation has a significant improvement effect on long-tail problems.
[0297] Table 2: Comparison Experiment of Data Augmentation Effect
[0298]
[0299] This technical solution enhances the intelligent archiving model's learning ability on long-tailed samples by performing various types of data augmentation on the long-tailed work order samples. This improves the model's performance on long-tailed samples, which is prone to overfitting, even with limited data and a large number of long-tailed samples. The results are shown in Table 2. This method is easy to implement and highly versatile, and can be applied to various work order application scenarios.
[0300] See Figure 12 The diagram shown is a timing flowchart of a work order classification method in an embodiment of this application. Taking the server as the execution entity as an example, the specific implementation process of this method is as follows:
[0301] Step S121: The server obtains the initial work order sample set;
[0302] Step S122: The server determines the number of work order samples of each category in the work order sample set based on the category label of each work order sample.
[0303] Step S123: The server uses at least one work order sample corresponding to a category whose quantity is lower than a preset threshold as a reference work order sample.
[0304] Step S124: The server divides the selected reference work order samples into three parts: the first part of the reference work order samples, the second part of the reference work order samples, and the third part of the reference work order samples.
[0305] Step S125: The server augments the first part of the reference work order samples based on the synonym replacement strategy to obtain the corresponding extended work order samples;
[0306] Step S126: The server performs data augmentation on the second part of the reference work order sample based on the paragraph cross-cutting strategy to obtain the corresponding extended work order sample;
[0307] Step S127: The server performs data augmentation on the third part of the reference work order sample based on the information mask reconstruction strategy to obtain the corresponding extended work order sample.
[0308] Step S128: The server constructs a training sample set based on each work order sample and the obtained extended work order samples, trains the multi-turn dialogue classification model to be trained, and obtains the trained work order classification model.
[0309] Step S129: The server determines the category of the customer service work order to be classified based on the trained work order classification model.
[0310] See Figure 13 The diagram shown is a logical schematic of a work order classification method according to an embodiment of this application. Specifically, the initial work order sample set contains work order samples of four categories: A, B, C, and D. (See reference...) Figure 4 The listed screening methods select reference work order samples, which are then divided into three groups: Group A, Group B, and Group C. Three different strategies are then used to augment the reference work order samples, resulting in corresponding expanded work order samples. A training sample set is constructed based on the expanded work order samples and the initial work order sample set. Finally, MHAN can be trained using this sample set.
[0311] In summary, this application proposes a joint augmentation method utilizing business synonym generalization, cross-interchange of archived work order paragraphs, and mask reconstruction based on saliency maps to augment long-tail sample data in customer service work orders. Furthermore, this application uses the MHAN model as the basic intelligent archiving model and performs data augmentation operations on this model for comparative effect evaluation. Experimental data show that this data augmentation method achieves good results in the intelligent work order archiving model, and this method is also applicable to various task-oriented dialogue models or unsupervised clustering systems based on long dialogues.
[0312] Based on the same inventive concept, embodiments of this application also provide a work order classification model training device. For example... Figure 14 As shown, this is a schematic diagram of the structure of the work order classification model training device 1400, which may include:
[0313] The acquisition unit 1401 is used to acquire a work order sample set. Each work order sample includes: the category label of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. The dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order.
[0314] The filtering unit 1402 is used to filter out at least one reference work order sample to be expanded from the work order sample set based on the category label of each work order sample.
[0315] The augmentation unit 1403 is used to perform data augmentation on the dialogue text information in at least one reference work order sample based on a preset data augmentation strategy to obtain the corresponding extended work order sample. The preset data augmentation strategy is used to indicate that non-critical information is replaced in the dialogue text information, and the non-critical information is information that does not change the semantics of the dialogue text information before and after the replacement.
[0316] Training unit 1404 is used to train the model based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model. The work order classification model is used to determine the work order category to which the customer service work order to be classified belongs.
[0317] Optionally, the preset data augmentation strategies include at least one of the following:
[0318] Synonym replacement strategy for replacing non-critical information in dialogue text;
[0319] Paragraph crossing strategies for non-critical information crossing in dialogue text;
[0320] Information masking reconstruction strategy used to mask and reconstruct non-critical information in dialogue text.
[0321] Optionally, the preset data augmentation strategies include synonym replacement strategies;
[0322] The augmentation unit 1403 is specifically configured to perform the following operations for at least some or all of the reference work order samples:
[0323] For a reference work order sample, based on a preset thesaurus, at least one business-related word in the dialogue text information of the reference work order sample is replaced with a synonym to obtain the corresponding extended work order sample.
[0324] Optionally, preset data augmentation strategies include paragraph cross-cutting strategies;
[0325] The augmentation unit 1403 is specifically configured to perform the following operations for at least some or all of the reference work order samples:
[0326] For two reference work order samples with the same category label, the dialogue text information of the two reference work order samples belonging to the same dialogue publisher is cross-referenced to obtain the corresponding extended work order sample; the dialogue publisher is the business processing object or the business service object.
[0327] Optionally, the dialogue text information includes: at least one round of dialogue text between the business processing object and the business service object;
[0328] Augmentation unit 1403 is specifically used to perform dialogue crossover in at least one of the following ways:
[0329] In the two reference work order samples, the dialogue texts published by the same dialogue publisher in the same number of dialogue rounds are exchanged in parallel.
[0330] Randomly swap the dialogue texts published by the same dialogue publisher in different dialogue rounds in two reference work order samples.
[0331] In two reference work order samples, the dialogue text from one of the reference work order samples is randomly inserted into the dialogue text published by the same dialogue publisher in the other reference work order sample.
[0332] Optionally, the preset data augmentation strategies include information mask reconstruction strategies;
[0333] The augmentation unit 1403 is specifically configured to perform the following operations for at least some or all of the reference work order samples:
[0334] For a reference work order sample, word vectors of each word segment in the dialogue text information of the reference work order sample are obtained through word vector mapping.
[0335] Based on the word vectors of each segment, determine the mask probability of each target information in the dialogue text information, where the target information is either a segment or the dialogue text.
[0336] Based on the mask probability of each target information, at least one target information in the dialogue text is masked and reconstructed to obtain the corresponding extended work order sample.
[0337] Optionally, augmentation unit 1403 is specifically used for:
[0338] Based on the word vectors of each word segment, the significance coefficient of each target information is determined. The significance coefficient is used to characterize the importance of the target information to the work order classification result.
[0339] Based on each significance coefficient, the corresponding mask probability is determined, and the mask probability is inversely proportional to the corresponding significance coefficient.
[0340] Optionally, augmentation unit 1403 is specifically used for:
[0341] Based on the classification probability of a reference work order sample and the information vector of each target information, the significance coefficient of each target information is determined respectively; the classification probability is predicted based on the work order classification model.
[0342] Wherein, if the target information is word segmentation, the information vector is word vector; if the target information is dialogue text, the information vector is sentence vector determined based on the word vectors of each word segmentation in the dialogue text.
[0343] Optionally, augmentation unit 1403 is also used for:
[0344] After determining the significance coefficient of each target information based on the word vectors of each word segment, the corresponding significance coefficient covariance matrix is determined based on the significance coefficient of each target information.
[0345] Based on the significance coefficient covariance matrix, the updated significance coefficients corresponding to each target information are determined;
[0346] Augmentation unit 1403 is specifically used for:
[0347] Based on each updated significance coefficient, the corresponding mask probability is determined, and the mask probability is inversely proportional to the corresponding updated significance coefficient.
[0348] Optionally, augmentation unit 1403 is specifically used to perform the following operations for each target information:
[0349] For a given target information, the information vector of the target information is modified multiple times based on the significance coefficient covariance matrix;
[0350] Based on the classification probability of a reference work order sample and the corrected information vectors, the intermediate significance coefficients corresponding to the target information are determined respectively.
[0351] The mean of each intermediate significance coefficient is used as the updated significance coefficient corresponding to the target information; the classification probability is predicted based on the work order classification model.
[0352] Optionally, augmentation unit 1403 is specifically used for:
[0353] Multiple Gaussian noises corresponding to a target information are obtained by using a Gaussian distribution determined based on the significance coefficient covariance matrix; the variance of the Gaussian distribution is the sum of the diagonal elements of the significance coefficient covariance matrix.
[0354] Each Gaussian noise is used to correct the information vector of a target.
[0355] Optionally, augmentation unit 1403 is specifically used for:
[0356] If the target information is dialogue text, then the significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each dialogue text in the dialogue text information.
[0357] If the target information is word segmentation, then the significance coefficient covariance matrix corresponds one-to-one with the dialogue text in the dialogue text information. Each significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each word segment in the corresponding dialogue text.
[0358] Optionally, the augmentation unit 1403 is further configured to determine the information vector of the target information in the following ways:
[0359] If the target information is dialogue text, then the information vector is a sentence vector determined based on the word vectors of each segment in the dialogue text;
[0360] If the target information is word segmentation, then the information vector is a word vector.
[0361] Optionally, augmentation unit 1403 is also used for:
[0362] In the dialogue text information of a reference work order sample obtained through word vector mapping, after obtaining the word vectors of each word segment, the word vectors of each word segment are weighted by attention mechanism to obtain the updated word vectors of each word segment.
[0363] Based on the word vectors of each segment, determine the mask probability of each target information in the dialogue text, including:
[0364] Based on the updated word vectors of each segment, the mask probability of each target information in the dialogue text is determined.
[0365] Optionally, the filtering unit 1402 is specifically used for:
[0366] Based on the category labels of each work order sample, determine the number of work order samples in each category in the work order sample set;
[0367] At least one work order sample corresponding to a category whose quantity is lower than a preset threshold is used as a reference work order sample.
[0368] This application proposes an augmentation method applicable to customer service work orders. Samples are selected based on the category to which each customer service work order belongs. Based on this method, reference work order samples that meet certain category conditions can be selected from the initial sample set. Then, based on a preset data augmentation strategy, the dialogue text information in the reference work order samples is augmented, ensuring that the dialogue text information of the extended work order samples obtained through data augmentation has the same semantics as the dialogue text information in the corresponding reference work order samples. This allows for the expansion of work order samples for some categories without changing the semantics of the customer service dialogue recorded in the work order. Based on this method, the number of work order samples of each category can be balanced by expanding the work order samples. Furthermore, model training based on each work order sample and the obtained extended work order samples effectively solves the impact of the long-tail problem in the training data on model training, enabling the model to fully learn the characteristics of various types of work orders, thus avoiding overfitting and improving the model's accuracy.
[0369] For ease of description, the above sections are divided into modules (or units) according to their functions and described separately. Of course, in implementing this application, the functions of each module (or unit) can be implemented in one or more software or hardware components.
[0370] After introducing the work order classification model training method and apparatus according to exemplary embodiments of this application, the electronic device according to another exemplary embodiment of this application will be introduced next.
[0371] Those skilled in the art will understand that various aspects of this application can be implemented as a system, method, or program product. Therefore, various aspects of this application can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, collectively referred to herein as a "circuit," "module," or "system."
[0372] Based on the same inventive concept as the above-described method embodiments, this application also provides an electronic device. In one embodiment, the electronic device may be a server, such as... Figure 1 The server 120 is shown. In this embodiment, the electronic device can be structured as follows: Figure 15 As shown, it includes a memory 1501, a communication module 1503, and one or more processors 1502.
[0373] The memory 1501 is used to store computer programs executed by the processor 1502. The memory 1501 may mainly include a program storage area and a data storage area. The program storage area may store the operating system and programs required to run instant messaging functions, etc.; the data storage area may store various instant messaging information and operation instruction sets, etc.
[0374] Memory 1501 may be volatile memory, such as random-access memory (RAM); memory 1501 may also be non-volatile memory, such as read-only memory, flash memory, hard disk drive (HDD), or solid-state drive (SSD); or memory 1501 may be any other medium capable of carrying or storing a desired computer program having the form of instructions or data structures and accessible by a computer, but is not limited thereto. Memory 1501 may be a combination of the above-described memories.
[0375] Processor 1502 may include one or more central processing units (CPUs) or digital processing units, etc. Processor 1502 is used to implement the above-mentioned work order classification model training method when calling the computer program stored in memory 1501.
[0376] The communication module 1503 is used to communicate with terminal devices and other servers.
[0377] This application embodiment does not limit the specific connection medium between the memory 1501, communication module 1503, and processor 1502. This application embodiment... Figure 15 The memory 1501 and the processor 1502 are connected via a bus 1504, and the bus 1504 is in Figure 15 The diagram uses thick lines to describe the connections between other components; these are for illustrative purposes only and should not be considered limiting. The 1504 bus can be divided into address bus, data bus, control bus, etc. For ease of description, Figure 15 It is described using only a thick line, but does not indicate that there is only one bus or one type of bus.
[0378] The memory 1501 stores a computer storage medium, which stores computer-executable instructions. These instructions are used to implement the work order classification model training method of this application embodiment. The processor 1502 is used to execute the aforementioned work order classification model training method, such as... Figure 2 As shown.
[0379] In another embodiment, the electronic device may also be other electronic devices, such as... Figure 1 The terminal device 110 is shown. In this embodiment, the electronic device can be structured as follows: Figure 16 As shown, it includes components such as: communication component 1610, memory 1620, display unit 1630, camera 1640, sensor 1650, audio circuit 1660, Bluetooth module 1670, processor 1680, etc.
[0380] The communication component 1610 is used to communicate with the server. In some embodiments, it may include a Circuit-Based Wireless Fidelity (WiFi) module, which is a short-range wireless transmission technology. Electronic devices can use the WiFi module to help users send and receive information.
[0381] The memory 1620 can be used to store software programs and data. The processor 1680 executes various functions of the terminal device 110 and performs data processing by running the software programs or data stored in the memory 1620. The memory 1620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1620 stores an operating system that enables the terminal device 110 to run. In this application, the memory 1620 may store the operating system and various application programs, and may also store a computer program that executes the work order classification model training method of the embodiments of this application.
[0382] The display unit 1630 can also be used to display information input by the user or information provided to the user, as well as various menus of the terminal device 110, forming a graphical user interface (GUI). Specifically, the display unit 1630 may include a display screen 1632 disposed on the front of the terminal device 110. The display screen 1632 may be configured as a liquid crystal display, a light-emitting diode, or the like. The display unit 1630 can be used to display work order classifications, customer service conversations, and other related user interfaces as described in this embodiment.
[0383] The display unit 1630 can also be used to receive input digital or character information and generate signal inputs related to user settings and function control of the terminal device 110. Specifically, the display unit 1630 may include a touch screen 1631 disposed on the front of the terminal device 110, which can collect touch operations of the user on or near it, such as clicking buttons, dragging scroll boxes, etc.
[0384] The touchscreen 1631 can be placed on top of the display screen 1632, or the touchscreen 1631 and the display screen 1632 can be integrated to realize the input and output functions of the terminal device 110. After integration, it can be referred to as a touch display screen. In this application, the display unit 1630 can display the application and the corresponding operation steps.
[0385] Camera 1640 can be used to capture still images, which users can then share via an application. There can be one or multiple cameras 1640. An object is projected onto a photosensitive element through a lens, generating an optical image. This photosensitive element can be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the light signal into an electrical signal, which is then transmitted to the processor 1680 for conversion into a digital image signal.
[0386] The terminal device may also include at least one sensor 1650, such as an accelerometer 1651, a proximity sensor 1652, a fingerprint sensor 1653, and a temperature sensor 1654. The terminal device may also be equipped with other sensors such as a gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, and motion sensor.
[0387] Audio circuitry 1660, speaker 1661, and microphone 1662 provide an audio interface between the user and terminal device 110. Audio circuitry 1660 converts received audio data into electrical signals, which are then transmitted to speaker 1661, where they are converted into sound signals for output. Terminal device 110 may also be equipped with volume buttons for adjusting the volume of the sound signal. On the other hand, microphone 1662 converts collected sound signals into electrical signals, which are received by audio circuitry 1660, converted into audio data, and then output to communication component 1610 for transmission to, for example, another terminal device 110, or to memory 1620 for further processing.
[0388] The Bluetooth module 1670 is used to interact with other Bluetooth devices that also have a Bluetooth module via the Bluetooth protocol. For example, a terminal device can establish a Bluetooth connection with a wearable electronic device (such as a smartwatch) that also has a Bluetooth module through the Bluetooth module 1670, thereby exchanging data.
[0389] The processor 1680 is the control center of the terminal device, connecting various parts of the terminal through various interfaces and lines. It executes various functions and processes data by running or executing software programs stored in the memory 1620 and calling data stored in the memory 1620. In some embodiments, the processor 1680 may include one or more processing units; the processor 1680 may also integrate an application processor and a baseband processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the baseband processor mainly handles wireless communication. It is understood that the baseband processor may not be integrated into the processor 1680. In this application, the processor 1680 can run the operating system, applications, user interface display and touch response, and the work order classification model training method of this application embodiment. Furthermore, the processor 1680 is coupled to the display unit 1630.
[0390] In some possible implementations, various aspects of the work order classification model training method provided in this application can also be implemented in the form of a program product, which includes a computer program. When the program product is run on an electronic device, the computer program is used to cause the electronic device to perform the steps in the work order classification model training method according to the various exemplary embodiments of this application described above. For example, the electronic device can perform actions such as... Figure 2The steps are shown in the figure.
[0391] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of readable storage media include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0392] The program product of the embodiments of this application may employ a portable compact disc read-only memory (CD-ROM) and include a computer program, and may run on an electronic device. However, the program product of this application is not limited thereto. In this document, the readable storage medium may be any tangible medium that contains or stores a program that may be used by or in conjunction with a command execution system, apparatus, or device.
[0393] A readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a readable computer program. This propagated data signal may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting a program for use by or in conjunction with a command execution system, apparatus, or device.
[0394] Computer programs contained on readable media may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.
[0395] Computer programs for performing the operations of this application can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The computer program can execute entirely on the user's electronic device, partially on the user's electronic device, as a standalone software package, partially on the user's electronic device and partially on a remote electronic device, or entirely on a remote electronic device or server. In cases involving remote electronic devices, the remote electronic device can be connected to the user's electronic device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external electronic device.
[0396] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. This application can also take the form of a computer program product embodied on one or more computer-usable storage media containing a computer-usable computer program.
[0397] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0398] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for training a work order classification model, characterized in that, The method includes: Obtain a sample set of work orders. Each work order sample includes: the category tag of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. The dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order. Based on the category labels of each work order sample, at least one reference work order sample to be expanded is selected from the work order sample set. Based on a preset data augmentation strategy, data augmentation is performed on the dialogue text information in at least one reference work order sample to obtain the corresponding extended work order sample. The preset data augmentation strategy is used to indicate that non-critical information is replaced in the dialogue text information. The non-critical information is information that does not change the semantics of the dialogue text information before and after the replacement. When the preset data augmentation strategy includes a mask reconstruction strategy, the extended work order samples are obtained as follows: For each reference work order sample, word vectors of each word segment in the dialogue text information of the reference work order sample are obtained through word vector mapping; based on the gradient of the classification probability of the reference work order sample with respect to the information vector of each target information, the significance coefficients corresponding to each target information are determined respectively; the classification probability is predicted based on the work order classification model; the significance coefficient is used to characterize the importance of the target information to the work order classification result; the gradient reflects the degree of influence of the information vector of each target information on the classification probability of the reference work order sample when it changes; based on each significance coefficient, the corresponding mask probability is determined, the mask probability is inversely proportional to the corresponding significance coefficient, and the target information is word segmentation or dialogue text; based on the mask probability of each target information, at least one target information in the dialogue text information is masked and reconstructed to obtain the corresponding extended work order sample. The model is trained based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model. The work order classification model is used to determine the work order category to which the customer service work order to be classified belongs.
2. The method as described in claim 1, characterized in that, The preset data augmentation strategy also includes at least one of the following: Synonym replacement strategy for replacing non-critical information in dialogue text; Paragraph crossing strategies for non-critical information crossing in dialogue text.
3. The method as described in claim 2, characterized in that, The preset data augmentation strategy includes a synonym replacement strategy; When performing data augmentation on the dialogue text information in at least one reference work order sample based on a preset data augmentation strategy to obtain corresponding extended work order samples, the following operations are performed on some or all of the at least one reference work order sample: For a reference work order sample, based on a preset thesaurus, at least one business-related word in the dialogue text information of the reference work order sample is replaced with a synonym to obtain the corresponding extended work order sample.
4. The method as described in claim 2, characterized in that, The preset data augmentation strategy includes a paragraph crossing strategy; Based on a preset data augmentation strategy, when data augmentation is performed on the dialogue text information in at least one reference work order sample to obtain the corresponding extended work order sample, the following operations are performed on some or all of the at least one reference work order sample: For two reference work order samples with the same category label, the dialogue text information of the two reference work order samples belonging to the same dialogue publisher is cross-referenced to obtain the corresponding extended work order sample; the dialogue publisher is the business processing object or the business service object.
5. The method as described in claim 4, characterized in that, The dialogue text information includes: at least one round of dialogue text between the business processing object and the business service object; The method of cross-referencing dialogue texts from the two reference work order samples, where the texts belong to the same speaker, includes at least one of the following methods: In the two reference work order samples, the dialogue texts published by the same dialogue publisher in the same number of dialogue rounds are exchanged in parallel. Randomly swap the dialogue texts published by the same dialogue publisher in different dialogue rounds in the two reference work order samples. The dialogue text from one of the two reference work order samples is randomly inserted into the dialogue text published by the same dialogue publisher in the other reference work order sample.
6. The method as described in claim 1, characterized in that, If the target information is word segmentation, the information vector is a word vector; if the target information is dialogue text, the information vector is a sentence vector determined based on the word vectors of each word segmentation in the dialogue text.
7. The method as described in claim 1, characterized in that, After determining the saliency coefficient of each target information based on the word vectors of each word segment, the method further includes: Based on the significance coefficients of each target information, the corresponding significance coefficient covariance matrix is determined; Based on the significance coefficient covariance matrix, the updated significance coefficients corresponding to each target information are determined; The determination of the corresponding mask probability based on each significance coefficient includes: Each mask probability is determined based on the updated significance coefficient, and the mask probability is inversely proportional to the corresponding updated significance coefficient.
8. The method as described in claim 7, characterized in that, When determining the updated significance coefficients for each target piece of information based on the significance coefficient covariance matrix, the following operations are performed for each target piece of information: For a target information, the information vector of the target information is modified multiple times based on the significance coefficient covariance matrix; Based on the classification probability of the reference work order sample and the corrected information vectors, the intermediate significance coefficients corresponding to the target information are determined respectively. The mean of the intermediate significance coefficients is used as the updated significance coefficient corresponding to the target information; the classification probability is obtained based on the work order classification model prediction.
9. The method as described in claim 8, characterized in that, The process of repeatedly refining the information vector of a target information based on the significance coefficient covariance matrix includes: Multiple Gaussian noises corresponding to a single target information are obtained by using a Gaussian distribution determined based on the significance coefficient covariance matrix; the variance of the Gaussian distribution is the sum of the diagonal elements of the significance coefficient covariance matrix. The information vector of the target information is corrected once based on each Gaussian noise.
10. The method as described in claim 7, characterized in that, The determination of the corresponding significance coefficient covariance matrix based on the significance coefficients of each target information includes: If the target information is dialogue text, then the significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each dialogue text in the dialogue text information. If the target information is word segmentation, then the significance coefficient covariance matrix corresponds one-to-one with the dialogue text in the dialogue text information. Each significance coefficient covariance matrix is generated based on the covariance between the significance coefficients of each word segment in the corresponding dialogue text.
11. The method as described in claim 8, characterized in that, The information vector of the target information is determined in the following ways: If the target information is dialogue text, then the information vector is a sentence vector determined based on the word vectors of each segment in the dialogue text; If the target information is word segmentation, then the information vector is a word vector.
12. The method according to any one of claims 1 to 11, characterized in that, In the dialogue text information of the reference work order sample obtained through word vector mapping, after the word vectors of each segmented word, the following is also included: By using an attention mechanism, the word vectors of each word segment are weighted by attention to obtain the updated word vectors of each word segment; The step of determining the mask probability of each target information in the dialogue text information based on the word vectors of each segmented word includes: Based on the updated word vectors of each segmented word, the mask probability of each target information in the dialogue text information is determined.
13. The method according to any one of claims 1 to 11, characterized in that, The step of selecting at least one reference work order sample to be expanded from the work order sample set based on the category labels of each work order sample includes: Based on the category labels of each work order sample, determine the number of work order samples of each category in the work order sample set; At least one work order sample corresponding to a category whose quantity is lower than a preset threshold is used as the reference work order sample.
14. A work order classification model training device, characterized in that, include: The acquisition unit is used to acquire a work order sample set. Each work order sample includes: the category tag of the corresponding customer service work order, and the dialogue text information between the business processing object and the business service object of the corresponding customer service work order. The dialogue text information is obtained based on the customer service conversation recorded in the corresponding customer service work order. A filtering unit is used to filter out at least one reference work order sample to be expanded from the work order sample set based on the category label of each work order sample. An augmentation unit is used to augment the dialogue text information in at least one reference work order sample based on a preset data augmentation strategy to obtain corresponding extended work order samples. The preset data augmentation strategy is used to instruct: to replace non-critical information in the dialogue text information. The non-critical information is information whose semantics of the dialogue text information are not changed before and after the replacement. When the preset data augmentation strategy includes a mask reconstruction strategy, the augmentation unit is specifically used to obtain extended work order samples in the following ways: For each reference work order sample, word vectors of each word segment in the dialogue text information of the reference work order sample are obtained through word vector mapping; based on the gradient of the classification probability of the reference work order sample with respect to the information vector of each target information, the significance coefficients corresponding to each target information are determined respectively; the classification probability is predicted based on the work order classification model; the significance coefficient is used to characterize the importance of the target information to the work order classification result; the gradient reflects the degree of influence of the information vector of each target information on the classification probability of the reference work order sample when it changes; based on each significance coefficient, the corresponding mask probability is determined, the mask probability is inversely proportional to the corresponding significance coefficient, and the target information is word segmentation or dialogue text; based on the mask probability of each target information, at least one target information in the dialogue text information is masked and reconstructed to obtain the corresponding extended work order sample. The training unit is used to train the model based on each work order sample and the obtained extended work order samples to obtain a trained work order classification model. The work order classification model is used to determine the work order category to which the customer service work order to be classified belongs.
15. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods described in claims 1 to 13.
16. A computer-readable storage medium, characterized in that, It includes a computer program that, when run on an electronic device, causes the electronic device to perform the steps of any of the methods described in claims 1 to 13.
17. A computer program product, characterized in that, The method includes a computer program stored in a computer-readable storage medium; when a processor of an electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, causing the electronic device to perform the steps of any one of claims 1 to 13.