Model training method, long text multi-label classification method and related device

By splitting long texts into short texts and using an iterative training method, the multi-label classification of long texts is transformed into single-label classification of short texts, solving the accuracy and efficiency problems of multi-label classification of long texts and achieving efficient multi-label prediction.

CN116150369BActive Publication Date: 2026-06-12MASHANG CONSUMER FINANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MASHANG CONSUMER FINANCE CO LTD
Filing Date
2023-02-13
Publication Date
2026-06-12

Smart Images

  • Figure CN116150369B_ABST
    Figure CN116150369B_ABST
Patent Text Reader

Abstract

The application discloses a model training method, a long text multi-label classification method and related equipment, which are used for training a text classification model capable of being used for a long text multi-label task and accurately predicting a multi-label corresponding to a long text through the text classification model. The training method comprises the following steps: acquiring a long text corpus and a multi-label corresponding to the long text corpus; performing splitting processing on the long text corpus to obtain a plurality of short text corpora; iteratively training a first short text classification model based on the multi-label and the plurality of short text corpora, and determining a mapping relationship between the plurality of short text corpora and the multi-label based on classification information output by the first short text classification model in the iterative training process, so as to obtain a single label corresponding to each short text corpus; and training a second short text classification model based on the plurality of short text corpora and the single labels corresponding to the plurality of short text corpora respectively.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of natural language processing technology, and in particular to a model training method, a long text multi-label classification method, and related equipment. Background Technology

[0002] With the digital transformation of various industries generating massive amounts of vertical domain text, implementing long text classification technology based on specific business needs can help organizations of all types manage, analyze, and develop unstructured text big data.

[0003] However, long texts, sometimes reaching thousands or even tens of thousands of words, present significant challenges to annotation and model training due to their inherent semantic nesting and conflicts. Current methods for long text classification primarily include machine learning, deep learning, and pre-trained model-based approaches. Machine learning methods mainly use Term Frequency-Inverse Document Frequency (TF-IDF) to represent the feature vectors of long texts, classifying them based on this vector. However, these feature vectors lack semantic features, leading to inaccurate classification results. Deep learning methods primarily utilize Long Short-Term Memory (LSTM) networks or Text-CNN, but these methods cannot leverage the extensive prior knowledge embedded in pre-trained language models, resulting in inaccurate classification. Pre-trained model-based methods, such as LongFormer, are complex to implement and time-consuming. Furthermore, long texts often have multiple labels, making multi-label classification a challenging problem in natural language processing, as it's difficult to determine the classification threshold for each label. When long text and multi-label classification are combined into one task, the long text multi-label classification task becomes very challenging, and the methods mentioned above cannot accurately predict the multi-label corresponding to the long text. Summary of the Invention

[0004] The purpose of this application is to provide a model training method, a long text multi-label classification method, and related equipment for training a text classification model that can be used for long text multi-label tasks and accurately predicting the multi-labels corresponding to long texts through the text classification model.

[0005] To achieve the above objectives, the embodiments of this application adopt the following technical solutions:

[0006] In a first aspect, embodiments of this application provide a method for training a text classification model, comprising:

[0007] Obtain long text corpus and the corresponding multi-tags for the long text corpus;

[0008] The long text corpus is split into multiple short text corpora;

[0009] The first short text classification model is iteratively trained based on the multi-label and the multiple short text corpora, and the mapping relationship between the multiple short text corpora and the multi-label is determined based on the classification information output by the first short text classification model during the iterative training process, so as to obtain the single label corresponding to each short text corpus.

[0010] The second short text classification model is trained based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

[0011] The text classification model training method provided in this application splits long text corpora into multiple short text corpora. Iterative model training maps the multi-labels corresponding to the long text to the multiple short text corpora, obtaining a single label for each short text corpus, thus achieving automatic alignment between the multi-labels and the short text corpora. Furthermore, a short text classification model is trained using the multiple short text corpora and their respective single labels. The trained short text classification model can be used for long text multi-label classification tasks, thereby transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has high recognition accuracy, it can improve the accuracy of long text multi-label classification tasks and eliminates the complex data processing involved in directly classifying long texts with multiple labels. This simplifies implementation and improves classification efficiency.

[0012] Secondly, embodiments of this application provide a method for classifying long text using multiple tags, including:

[0013] The target long text is split into multiple short texts;

[0014] The multiple short texts are classified by the second short text classification model to obtain the classification information corresponding to each of the multiple short texts. The second short text classification model is trained based on the training method of the text classification model described in the first aspect.

[0015] Based on the classification information corresponding to each of the multiple short texts, predict the multiple labels corresponding to the target long text.

[0016] The long text multi-label classification method provided in this application splits the target long text to be classified into multiple short texts, classifies the multiple short texts using a pre-trained second short text classification model, and predicts the multi-label corresponding to the target long text using the classification information of each of the multiple short texts. This is equivalent to transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has a high recognition accuracy, it can improve the accuracy of the long text multi-label classification task, and saves the complex data processing in the process of directly performing multi-label classification on long texts. It is simple to implement and conducive to improving classification efficiency.

[0017] Thirdly, embodiments of this application provide a training apparatus for a text classification model, comprising:

[0018] The acquisition unit is used to acquire long text corpus and the multi-tags corresponding to the long text corpus;

[0019] The first splitting unit is used to split the long text corpus into multiple short text corpora.

[0020] The mapping unit is used to iteratively train the first short text classification model based on the multi-label and the multiple short text corpora, and to determine the mapping relationship between the multiple short text corpora and the multi-label based on the classification information output by the first short text classification model during the iterative training process, so as to obtain the single label corresponding to each short text corpus.

[0021] The training unit is used to train the second short text classification model based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

[0022] Fourthly, embodiments of this application provide a long text multi-label classification device, including:

[0023] The second splitting unit is used to split the target long text into multiple short texts;

[0024] The classification unit is used to classify the multiple short texts respectively using the second short text classification model to obtain the classification information corresponding to each of the multiple short texts. The second short text classification model is trained based on the training method of the text classification model described in the first aspect.

[0025] The label prediction unit is used to predict multiple labels corresponding to the target long text based on the classification information corresponding to each of the multiple short texts.

[0026] Fifthly, embodiments of this application provide an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method as described in the first aspect; or, the processor is configured to execute the instructions to implement the method as described in the second aspect.

[0027] In a sixth aspect, embodiments of this application provide a computer-readable storage medium that, when the instructions in the storage medium are executed by a processor of an electronic device, enables the electronic device to perform the method described in the first aspect; or, when the instructions in the storage medium are executed by a processor of an electronic device, enables the electronic device to perform the method described in the second aspect. Attached Figure Description

[0028] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0029] Figure 1 A flowchart illustrating a training method for a text classification model provided in one embodiment of this application;

[0030] Figure 2 This is a schematic diagram illustrating text compression of a long text corpus, provided as an embodiment of this application.

[0031] Figure 3 A flowchart illustrating a long text compression method provided in one embodiment of this application;

[0032] Figure 4 A flowchart illustrating an alignment method between multiple tags and multiple short text corpora provided in one embodiment of this application;

[0033] Figure 5 A flowchart illustrating an alignment method between multiple tags and multiple short text corpora provided for another embodiment of this application;

[0034] Figure 6 A schematic diagram of the structure of a second short text classification model provided in one embodiment of this application;

[0035] Figure 7 A flowchart illustrating a long text multi-label classification method provided in one embodiment of this application;

[0036] Figure 8 A schematic diagram of the structure of a training device for a text classification model provided in one embodiment of this application;

[0037] Figure 9A schematic diagram of the structure of a long text multi-label classification device provided in one embodiment of this application;

[0038] Figure 10 This is a schematic diagram of the structure of an electronic device provided in one embodiment of this application. Detailed Implementation

[0039] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0040] The terms "first," "second," etc., used in this specification and claims are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of this application can be implemented in sequences other than those illustrated or described herein. Furthermore, in this specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0041] As described in the background section, existing long text classification methods suffer from inaccurate classification results and complex implementation. Furthermore, when long text classification and multi-labeling are combined into one task, existing long text classification methods are not applicable and thus cannot accurately predict the multi-labels corresponding to long texts.

[0042] In view of this, the embodiments of this application aim to propose a training method for a text classification model. By splitting a long text corpus into multiple short text corpora, and using an iterative model training method to map the multi-labels corresponding to the long text to the multiple short text corpora, a single label corresponding to each short text corpus is obtained, achieving automatic alignment between the multi-labels and the short text corpora. Furthermore, a short text classification model is trained using the multiple short text corpora and the single labels corresponding to each short text. The trained short text classification model can be used for long text multi-label classification tasks, thereby transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has a high recognition accuracy, it can improve the accuracy of the long text multi-label classification task, and saves the complex data processing in the process of directly performing multi-label classification on long texts. It is simple to implement and conducive to improving classification efficiency.

[0043] This application also proposes a long text multi-label classification method. By splitting the target long text to be classified into multiple short texts, using a pre-trained short text classification model to classify the multiple short texts respectively, and using the classification information corresponding to each of the multiple short texts to predict the multi-label corresponding to the target long text, it is equivalent to transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has a high recognition accuracy, it can improve the accuracy of the long text multi-label classification task, and saves the complex data processing in the process of directly performing multi-label classification on long texts. It is simple to implement and conducive to improving classification efficiency.

[0044] It should be understood that the training method for the text classification model and the long text multi-label classification method proposed in the embodiments of this application can both be executed by electronic devices or software installed in electronic devices. The electronic devices referred to herein may include terminal devices, such as smartphones, tablets, laptops, desktop computers, smart voice interaction devices, smart home appliances, smartwatches, vehicle terminals, aircraft, etc.; or, the electronic devices may also include servers, such as independent physical servers, server clusters or distributed systems composed of multiple physical servers, or cloud servers providing cloud computing services.

[0045] The technical solutions provided by the various embodiments of this application are described in detail below with reference to the accompanying drawings.

[0046] Please see Figure 1 The following is a flowchart illustrating a training method for a text classification model, provided as an embodiment of this application. The method may include the following steps:

[0047] S102, Obtain long text corpus related to the target business and the multi-tags corresponding to the long text corpus.

[0048] The multi-label system for long text corpora consists of multiple single labels. Each single label corresponds to a portion of the sentence within the long text corpus, and each single label represents the category to which the corresponding portion of the sentence belongs. Multiple single labels are combined to form a multi-label system, which can represent the category to which the long text corpus belongs. The multi-label system for long text corpora varies depending on the classification task; correspondingly, each single label represents the category to which the corresponding portion of the sentence belongs under that classification task. Each single label can have any appropriate form, such as a single-level label or a multi-level label.

[0049] For example, assuming the classification task includes identifying claims and demands, the optional categories corresponding to this classification task can include three main categories: claims, claims, and others. Each main category can also include multiple subcategories. For example, claims can include fee-related, rule and operation-related, negotiated repayment-related, and credit-related categories, while claims can include exemption-related, negotiated repayment-related, credit-related, and operation-related categories. Suppose the long text corpus is: "Contact: Oh, it's like this. I called before saying I wanted to settle this bill early, but now I'm having some cash flow problems and can't settle it early. I'll see if I can settle this bill first. He's already calculated it and told me to settle it all at once. The agent said, 'You've already booked early settlement, but you can't settle it now, right?'... The agent said, 'Okay, we've noted that down for you. Besides that, please wait patiently for our staff to contact you. Is there anything else we can help you with?'" Then, the corresponding multi-tags for this long text corpus are: "Complaint | Fee | Settlement Fee | Interest, ..., Request | Other Types | Other". In the above long text corpus, "contact" represents the user, and "agent" represents customer service. Among the multi-tags, "Complaint | Fee | Settlement Fee | Interest" is a multi-level single tag, and "Request | Other Types | Other" is also a multi-level single tag.

[0050] For example, suppose the classification task is intent classification. The possible categories under this task include three main categories: pre-sales consultation, after-sales consultation, and others. Each main category can further include multiple subcategories. For instance, pre-sales consultation can include subcategories such as product parameter consultation and product price consultation, while after-sales consultation can include subcategories such as returns and exchanges, refund applications, and logistics progress. Suppose the long text corpus is: "agent: I placed an order for a mobile phone a few days ago, but it shows as not shipped yet. contact: You mean you've placed an order but it hasn't been shipped yet? agent: Yes, yes, yes. contact: Okay, I've checked and it's been shipped, but the logistics information hasn't been updated yet. Please wait patiently... agent: Okay, no problem." Then, the corresponding multi-label for this long text corpus is "pre-sales|logistics progress, ..., other|common phrases." Here, "contact" represents the user, and "agent" represents customer service. Among the multi-labels, "pre-sales|logistics progress" is a multi-level single label, and "other|common phrases" is another multi-level single label.

[0051] It is worth noting that the long text corpus and the corresponding multi-tags acquired in S102 are both related to the target business. For example, if the target business is to analyze user behavior of a certain product, then the long text corpus may include the dialogue text between the user and customer service regarding the product, and the business tags corresponding to the long text corpus are tags used to indicate the user's usage of the product.

[0052] S104 splits the long text corpus into multiple short text corpora.

[0053] In one optional implementation, the long text corpus can be directly split into multiple short text corpora. For example, a fixed-length sliding window can be pre-set, and then this window can be used to split the long text corpus into multiple short text corpora. The length of the sliding window can be set according to actual needs, and this embodiment does not limit this.

[0054] In another optional implementation, considering that long text corpora may contain much text unrelated to the business topic, such irrelevant text can not only affect the performance of the final trained second short text classification model but also interfere with its classification process. Therefore, in S104 above, this irrelevant text can be removed first, and then the remaining text can be split. Specifically, S104 may include the following steps: compressing the long text corpus based on keywords to obtain compressed text corpora; further, splitting the compressed text corpus to obtain multiple short text corpora.

[0055] For example, suppose the long text corpus is: "Agent, hello, how can I help you? Contact...uh, hello, is this XX product? Agent, XX product is one of our products. So, how can I help you? Contact...oh, it's like this, I said before that I called to say I wanted to settle this bill in advance, but now I might be short of funds and can't settle it in advance. I'll just wait and see if I can settle this one later. At that time, he had already calculated it and told me to settle it all at once. The agent said, 'You had already made an appointment to settle it in advance, and now you can't settle it in advance, right?'" "Okay, agent, I've made a note for you. Besides this, please wait patiently for our staff to contact you. Is there anything else we can help you with?" After the above compression process, the short text corpus is obtained: "On contact, oh, it's like this. I said before that I called to say I wanted to settle this bill early, but now I'm having some cash flow problems and can't settle it early. I'll see if I can settle this one first. He had already calculated it and told me to settle it all at once. The agent said you had already booked early settlement and now you can't settle it temporarily, is that right?"

[0056] In S104 above, keywords in long text corpora refer to keywords related to the target business within the long text corpus. In practical applications, various unsupervised keyword extraction methods commonly used in this field can be employed to extract keywords related to the target business from the corpus, such as the text_rank method or TF-IDF-based keyword extraction methods, which will not be elaborated further.

[0057] In S104 above, text compression of long text corpora refers to reducing the length of long text corpora. For example... Figure 2 As shown, long text corpora consist of multiple sentences. After identifying the keywords in the long text corpora, a model iterative training approach can be used at the sentence level. First, a binary classification corpus is constructed based on the keywords in the long text corpus to initially distinguish between useful and useless sentences. Then, the sentence classification model is iteratively trained using the binary classification corpus. During the iterative training process, the binary classification corpus is optimized, and finally, the key sentences in the long text corpus that can reflect the semantics of the long text corpus under the target business are identified. By deleting other sentences in the long text corpus except for the key sentences, the text compression of the long text corpus can be completed, resulting in compressed text corpora.

[0058] Specifically, such as Figure 3 As shown, the above-mentioned method of compressing long text corpora based on keywords in long text corpora to obtain compressed text corpora may include the following steps:

[0059] Step A1: Determine the importance score of keywords based on the frequency of their occurrence in long text corpora.

[0060] The importance score of a keyword is used to represent its significance to the long text corpus. A higher importance score indicates greater importance to the long text corpus. For example, the importance score can be represented using TF-IDF. TF-IDF is typically used to evaluate the importance of a word to a document within a document set or corpus. A word's importance to a document increases proportionally to the frequency of its occurrence in the document, but decreases inversely proportionally to its frequency in the corpus. Therefore, in this application, the TF-IDF value of a keyword can be determined by its frequency of occurrence in the long text corpus, and this TF-IDF value can be used as the keyword's importance score.

[0061] Step A2: Based on the sentences containing keywords in the long text corpus and the importance scores of the keywords, determine the usefulness labels corresponding to each of the multiple sentences.

[0062] The usefulness tag corresponding to the sentence is used to indicate whether the sentence is relevant to the target business.

[0063] Specifically, if the target sentence contains keywords with an importance score greater than a preset importance threshold, it can be preliminarily determined that the target sentence is relevant to the target business, and a usefulness label indicating the relevance of the target sentence to the target business can be assigned to the target sentence. If the target sentence does not contain keywords, or if the importance score of the keywords in the target sentence is less than or equal to the preset importance score, it can be preliminarily determined that the relevance between the target sentence and the target business is not significant, and a usefulness label indicating that the target sentence is not relevant to the target business can be assigned to the target sentence. The target sentence is any sentence in the long text corpus.

[0064] Step A3 involves iteratively training a sentence classification model based on multiple sentences and their respective usefulness labels, and then identifying key sentences from among the multiple sentences based on the classification information output by the sentence classification model during the iterative training process.

[0065] Among them, sentence classification model refers to a model that can classify sentences at the granular level.

[0066] In one optional implementation, the sentence classification model includes a logistic regression model and a deep learning model. Accordingly, key sentences can be determined as follows: Step A11, the logistic regression model is iteratively trained based on multiple sentences contained in the long text corpus and their corresponding usefulness labels, and multiple candidate sentences are selected from the multiple sentences based on the classification information output by the logistic regression model during the iterative training process; Step A12, the deep learning model is iteratively trained based on the multiple candidate sentences and their respective usefulness labels, and key sentences are selected from the multiple candidate sentences based on the classification information output by the deep learning model during the iterative training process.

[0067] For logistic regression models, the classification information output for an input sentence during iterative training can include the probability that the sentence is useful. The multiple candidate sentences obtained through iterative training of the logistic regression model refer to multiple sentences in a long text corpus that have been initially determined to be useful.

[0068] To quickly and accurately identify multiple candidate sentences from a long text corpus, step A11 may include: using multiple sentences as sample sentences for the first round of training iterations, for example, based on the usefulness labels corresponding to each sentence, sentences related to the target business are used as positive samples, and sentences unrelated to the target business are used as negative samples, where positive samples indicate useful sentences and negative samples indicate useless sentences; then, classifying the sample sentences using a logistic regression model to obtain the classification information corresponding to the sample sentences, where the classification information corresponding to the sample sentences includes the probability that the sample sentence is a useful sentence; if the logistic regression module does not meet the second preset training stopping condition, then based on the classification information corresponding to the sample sentences... The system uses class information and usefulness labels to optimize the logistic regression model. Sentences meeting the second preset screening criteria are selected as sample sentences for the next round of training iterations. The second preset threshold is greater than the second preset probability threshold or less than the third preset probability threshold, and the third preset probability threshold is less than the second preset probability threshold. This process is repeated until the logistic regression model meets the second preset training stopping condition. Further, the logistic regression model trained in the last round of iterations is used to classify multiple sentences, obtaining the probability that each sentence is a useful sentence. Finally, sentences with probabilities greater than the fourth preset probability threshold are selected as candidate sentences, thus obtaining multiple candidate sentences.

[0069] The second preset training stopping condition can be set according to actual needs, such as the prediction accuracy of the logistic regression model exceeding a preset accuracy threshold (e.g., 98%), etc., which is not limited in this embodiment. The second, third, and fourth preset probability thresholds can be set according to actual needs, such as the second preset probability threshold being 0.6, the third preset probability threshold being 0.4, and the fourth preset probability threshold being 0.5, etc.

[0070] Understandably, in each iteration of training, sentences with probabilities between the second and third preset probability thresholds are filtered out from the sample sentences. These sentences are likely to be corpora that fall between useful and useless, making them difficult for the model to distinguish. By filtering out these sentences, the remaining sentences are used as sample sentences for the next iteration. For example, sentences with probabilities greater than the second preset probability threshold are used as positive samples, and sentences with probabilities less than the third preset probability threshold are used as negative samples. This optimizes the sample sentences, ensuring that the sample sentences used to train the logistic regression model are of better quality. By continuously iterating the logistic regression model with optimized sample sentences, the accuracy of the logistic regression model can be improved. Using a highly accurate logistic regression model (i.e., the logistic regression model after the last iteration) to classify multiple sentences, and then selecting sentences with probabilities greater than the fourth preset probability threshold as candidate sentences, ensures that the selected candidate sentences are useful for long text corpora and relevant to the target business.

[0071] For deep learning models, the classification information output during iterative training of an input sentence can include the probability that the sentence is useful. Key sentences obtained through iterative training of a deep learning model refer to sentences in long text corpora that are further determined to be useful. In practical applications, deep learning models can include various higher-performing models suitable for text classification, such as LSTM, Convolutional Neural Networks (CNN), and BERT.

[0072] The specific implementation method of step A12 is similar to that of step A11 above, and will not be repeated here.

[0073] Understandably, by employing both logistic regression and deep learning models, the first step is to use iterative training of the computationally faster logistic regression model to initially filter useful sentences from multiple sentences, resulting in several candidate sentences. Then, the higher-performing deep learning model is used for a second filtering process to select key sentences from these candidate sentences. This ensures that the selected key sentences are useful for classifying and recognizing long text corpora and are relevant to the target business. Thus, while compressing long text corpora, the quality of the compressed text corpora is ensured, thereby improving the training effect of subsequent text classification models.

[0074] In another alternative implementation, the sentence classification model can include either a logistic regression model or a deep learning model. Accordingly, only an iterative sequence of a single sentence classification model is used to filter useful sentences from multiple sentences, obtaining key sentences from among them. Understandably, this implementation is faster than the previous one, but the quality of the key sentences obtained may be inferior to those selected using the previous method.

[0075] Step A4: Delete sentences from the long text corpus except for key sentences to obtain compressed text corpus.

[0076] For example, suppose a long text corpus includes sentences 1 to 10, where sentences 1 to 8 are key sentences. Then, the text corpus obtained by deleting sentences 9 to 10 from the long text corpus is the compressed text corpus.

[0077] S106, the first short text classification model is iteratively trained based on the multi-label and the multiple short text corpora, and the mapping relationship between the multiple short text corpora and the multi-label is determined based on the classification information output by the first short text classification model during the iterative training process, so as to obtain the single label corresponding to each short text corpus.

[0078] Because long-text multi-labeling is a very difficult task in Natural Language Processing (NLP), existing text classification methods perform poorly on this task, which also demands extremely high computational resources. Through extensive research, the inventors discovered that short-text single-labeling classification algorithms are very mature and effective. By transforming the long-text multi-labeling task into a short-text single-labeling task, the long-text multi-labeling task can also be completed. The inventors also found that to transform the long-text multi-labeling task into a short-text single-labeling task, the model training process needs to know the single label corresponding to each short text corpus. However, the labels obtained through S102 are multi-labels, and it cannot determine which short text corpus each single label corresponds to. Therefore, it is necessary to solve the global mapping from multi-labels to multiple short text corpora.

[0079] Specifically, the model can be trained iteratively. Assuming that each short text corpus has only one corresponding single label, the information output by the model during the iterative training process can be used to automatically align the global multi-label with the multiple short text corpora, thus obtaining the single label corresponding to each of the multiple short text corpora.

[0080] In this application, the first short text classification model refers to a model that can classify short texts. It can be set according to actual needs, and this application embodiment does not limit it.

[0081] In one alternative implementation, such as Figure 4 As shown, the above S106 may include the following steps:

[0082] S161, based on long text corpora and multi-label model training, obtains a long text classification model.

[0083] Among them, the long text classification model refers to a model that can be used to classify long texts. Specifically, compressed text corpora can be used as training samples, and multi-labels can be used as supervision signals. The long text classification model to be trained is iteratively trained based on the training samples and their corresponding multi-labels, so that the long text classification model trained in the end can be used to perform preliminary classification of input text and determine the candidate labels corresponding to the input text.

[0084] In practical applications, the long text classification model to be trained can be any model commonly used in the field that has long text classification function. The specific settings can be made according to actual needs, and this application embodiment does not limit this.

[0085] In S106 above, a long text classification model can be obtained by directly training the model based on the long text corpus and multi-labels. Alternatively, the long text corpus can be compressed, and the model can be trained based on the resulting compressed text corpus and multi-labels to obtain the long text classification model.

[0086] S162 uses a long text classification model to classify multiple short text corpora to obtain candidate labels for each short text corpus.

[0087] Since the long text classification model has the function of performing preliminary classification of text, in S162 above, multiple short text corpora are input into the long text classification model to obtain the classification information corresponding to each of the multiple short text corpora. The classification information corresponding to each short text can include the probability of the short text corresponding to each single label in the multi-label. Furthermore, for each short text corpus, the single label with the highest probability exceeding the preset value can be selected from the multi-label as the candidate label corresponding to the short text corpus.

[0088] For example, suppose a short text corpus reads: "Contact: Oh, like this, I called before and said I wanted to settle this debt early, but now I'm having a cash flow problem and can't settle it early. I'll see how to settle this installment first. He had already calculated it and told me to settle it all at once. The agent said, 'You had already booked this early settlement and now you can't repay it temporarily, right?' Contact: Oh, it's not that I can't repay, I just... uh... I'll repay part of it first. Agent: Let me show you your contract first. It's also to protect customer information security and we need to verify your identity." Through the above steps, the candidate tags corresponding to this short text can be obtained as "Request | Negotiated Repayment | Early Settlement".

[0089] S163, based on multiple short texts and their respective candidate labels, the first short text classification model is iteratively trained, and based on the classification information output by the first short text classification model during the iterative training process, the single label corresponding to each of the multiple short text corpora is determined from the multiple labels.

[0090] The classification information output by the first short text classification model for a given short text corpus may include the probability of the short text corpus corresponding to each single label in the multi-label system.

[0091] To ensure that the single label mapped to each short text corpus accurately represents the meaning of each short text corpus, such as Figure 5 As shown, S163 above may include the following steps:

[0092] First, multiple short text corpora are used as sample corpora for the first round of iteration training.

[0093] Then, the sample corpus is classified using the first short text classification model to obtain the classification information corresponding to the sample corpus. The classification information includes the probability that the sample corpus matches each single label in the multi-label.

[0094] Next, if the first short text classification model does not meet the first preset training stopping condition, the first short text classification model is optimized based on the classification information and candidate labels corresponding to the sample corpus, and the sample corpus that meets the first preset screening condition is selected as the sample corpus for the next round of iteration training. The first preset screening condition includes: the selected sample corpus has the highest probability of matching its corresponding candidate label and is greater than the first preset probability threshold.

[0095] Furthermore, the process of classifying the sample corpus using the first short text classification model until selecting sample corpus that meets the first preset screening conditions as the sample corpus for the next round of iteration training is repeated until the first short text classification model meets the first preset training stopping condition.

[0096] Finally, the first short text classification model, trained in the last round of iterations, classifies each short text corpus to obtain the classification information corresponding to each short text corpus. It also selects single labels from the multi-labels that match each short text corpus with a probability greater than the first preset probability threshold as the single labels corresponding to each short text corpus.

[0097] In practical applications, the first preset training stopping condition can be set according to actual needs, such as the prediction accuracy of the first short text classification model exceeding a preset accuracy threshold (e.g., 98%), etc., which is not limited in this embodiment. The first preset probability threshold can be set according to actual needs, such as the first preset probability threshold being 0.5, etc.

[0098] Understandably, given the shortcomings of the long text classification model obtained from the above training, we first use the long text classification model to perform a preliminary mapping from multiple labels to multiple short text corpora, obtaining candidate labels for each short text. Then, we use iterative training of the first short text classification model to optimize the preliminary mapping results, thereby accurately mapping multiple labels to multiple short text corpora and obtaining single labels for each short text corpus. This makes it easier to transform the long text multi-label classification task into a more mature, accurate, and highly accurate short text single-label classification task.

[0099] Based on this, in each iteration of training the first short text classification model, samples with low probability of matching their corresponding candidate labels are filtered out from the training data. These samples are difficult for the first short text classification model to distinguish. By filtering out these samples and using the remaining samples for the next iteration, the training data is optimized, resulting in a higher quality training data for the first short text classification model. Continuously iterating the training with the optimized data improves the accuracy of the first short text classification model. Using the highly accurate first short text classification model (i.e., the model after the last iteration) to classify each short text, and then selecting the single label with the highest probability of matching each short text from the multiple labels, this ensures that the single label accurately represents the meaning of each short text, achieving accurate alignment between multiple labels and multiple short texts, thus improving the training effect of subsequent models.

[0100] S108, based on multiple short text corpora and their respective single labels, the second short text classification model is trained.

[0101] Specifically, a second short text classification model can be used to classify multiple short text corpora to obtain classification information corresponding to each short text corpus. Here, the classification information corresponding to each short text corpus may include the predicted category to which the short text corpus belongs. Then, based on the classification information and single label corresponding to each short text corpus, the prediction loss corresponding to the second short text classification model is determined. This prediction loss reflects the performance of the second short text classification model on multiple short text corpora. Further, based on the prediction loss corresponding to the second short text classification model, the model parameters of the second short text classification model are adjusted. The above process is repeated multiple times until a third preset training stopping condition is met. The third preset training stopping condition can be set according to actual needs, such as the prediction loss corresponding to the second short text classification model being less than a preset loss threshold, or the prediction accuracy of the second short text classification model being greater than a preset accuracy threshold, or the number of iterations of training the second short text classification model reaching a preset number, etc. This application embodiment does not limit this. In addition, the model parameters of the second short text classification model may include, but are not limited to, the number of nodes (such as neurons) in each network layer of the second short text classification model, the connection relationships and edge weights between nodes in different network layers, and the biases corresponding to nodes in each network layer.

[0102] In this embodiment, the second short text classification model can be any appropriate model that can be used for short text classification. The specific model can be set according to actual needs, and this embodiment does not limit it.

[0103] Optionally, considering that if the semantic representation of the text is not considered when performing text semantic representation, the semantic representation of the text may drift, which will eventually affect the training effect of the second short text classification model, the training method of the text classification model provided in this application embodiment can adopt a second short text classification model including a language representation network and a classification network, and also consider the local information in the compressed text corpus, and introduce other short text corpus related to the short text corpus in the entire compressed text corpus into the language representation network.

[0104] Specifically, the second short text classification model includes a language representation network and a classification network. Accordingly, S108 may include: selecting associated short texts from the long text corpus, where the target short text is any one of multiple short texts; then, encoding the target short text based on the target short text and associated short texts using the language representation network to obtain a semantic representation vector for the target short text; further, classifying the target short text based on its semantic representation vector using the classification network to obtain the predicted category to which the target short text belongs; finally, adjusting the model parameters of the second short text classification model based on the predicted categories of the multiple short texts and their corresponding single labels.

[0105] In practical applications, the associated short text corpus of the target short text may include at least one short text corpus adjacent to the target short text in the compressed text corpus. For example, let the target short text be the k-th short text corpus W in the compressed text corpus. k Therefore, the associated short text corpus with the target short text corpus can include: the (k-2)th short text corpus W in the compressed text corpus. k-2 The (k-1)th short text corpus W l-1 The (k+1)th short text corpus W k+1 and the (k+2)th short text corpus W k+2 wait.

[0106] More specifically, an attention mechanism can be introduced during the encoding of the target short text corpus using a language representation network. Specifically, the semantic representation vector of the target short text corpus can be encoded as follows: First, the target short text corpus and related short text corpora are embedded using a language representation network to obtain text vectors for the target and related short text corpora. Then, based on the attention mechanism and the distance between the target and related short text corpora, attention weights for the target and related short text corpora are determined. These attention weights represent the importance of the corresponding short text corpus to the classification process of the second short text classification model. Finally, based on the attention weights of the target and related short text corpora, the text vectors of the target and related short text corpora are weighted to obtain the semantic representation vector of the target short text corpus.

[0107] For example, the attention weight corresponding to the target short text corpus can be set to 1, and the initial weight corresponding to the associated short text corpus can be calculated using the following formula (1); further, based on the distance between the target short text corpus and the associated short text corpus, the attention score corresponding to the associated short text corpus can be calculated, wherein the calculation method of the attention score is a commonly used calculation method in this field, as shown in formula (2), and will not be elaborated further; finally, the product between the initial weight corresponding to the associated short text corpus and the attention score is determined as the attention weight corresponding to the associated short text.

[0108]

[0109] Among them, w i Represents the related short text corpus W i The corresponding initial weights are as follows: i represents the index of the associated short text corpus in the compressed text corpus, and k represents the index of the target short text corpus in the compressed text corpus.

[0110]

[0111] Where α(q,k) i ) represents the attention score corresponding to the target short text corpus, q represents the query vector sequence obtained after processing the target short text corpus by the language representation network, and k represents the query vector sequence obtained by the language representation network. i This represents the sequence of key vectors obtained after processing the target short text corpus by a language representation network. `softmax` represents a normalization function, where `a(q,k)` represents the key vector sequence. i ) represents the initial attention score corresponding to the target short text corpus, exp represents the exponential function, m represents the number of short texts in the input language representation network, j represents the index of the short text corpus in the input language representation network, and k represents the number of short texts in the input language representation network. j Let a(q,k) represent the key vector sequence obtained after the j-th short text corpus input to the language representation network has been processed by the language representation network. j ) represents the initial attention score corresponding to the j-th short text corpus.

[0112] For example, such as Figure 6 As shown, the target-related short text corpus W k and related short text corpus W k-2 W k-1 W k+1 and W k+2 After being input into the language representation network, the network determines the attention weights for the target short text corpus and the related short text corpus based on an attention mechanism. Then, based on the attention weights of the target and related short text corpora, the text vectors of the target and related short text corpora are weighted to obtain the semantic representation vector H of the target short text corpus.wk Furthermore, the semantic representation vector H wk After pooling, the data is fed into the classification network to obtain the semantic representation vector corresponding to the target short text.

[0113] In practical applications, language representation networks can employ various language models with encoding capabilities commonly used in this field, such as the BERT model. Furthermore, adversarial training can be introduced during training to enhance the generalization performance of the second short text classification model, thus mitigating the impact of short text corpus quality on the model. Since the second short text classification model is used to perform single-label classification tasks, cross-entropy loss functions and the Adam optimizer can also be used during training.

[0114] The text classification model training method provided in one or more embodiments of this application splits long text corpora into multiple short text corpora, and uses an iterative model training method to map the multi-labels corresponding to the long text to the multiple short text corpora, obtaining a single label corresponding to each short text corpus, thereby achieving automatic alignment between the multi-labels and the short text corpora; furthermore, a short text classification model is trained using the multiple short text corpora and the single labels corresponding to the multiple short texts respectively. The trained short text classification model can be used for long text multi-label classification tasks, thereby transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has a high recognition accuracy, it can improve the accuracy of the long text multi-label classification task, and saves the complex data processing in the process of directly performing multi-label classification on long texts, which is simple to implement and conducive to improving classification efficiency.

[0115] Based on the second short text classification model trained as described above, this application also provides a long text multi-label classification method. Please refer to... Figure 7 The following is a flowchart illustrating a long text multi-label classification method according to an embodiment of this application. The method may include the following steps:

[0116] S702, split the target long text into multiple short texts.

[0117] The specific implementation of S702 is similar to that of S102. For details, please refer to the detailed description of S102 above, which will not be repeated here.

[0118] S704: The second short text classification model is used to classify multiple short texts respectively, and the classification information corresponding to each short text is obtained.

[0119] The second short text classification model is trained using the training method of the text classification model described in one or more embodiments of this application.

[0120] In step S704 above, multiple short texts are input into the second short text classification model to obtain the classification information corresponding to each short text. The classification information for each short text can include the probability that each short text corresponds to one of multiple candidate single labels.

[0121] S706 predicts the multi-label corresponding to the target long text based on the classification information of multiple short texts.

[0122] Specifically, for each short text, a predicted label can be determined based on the classification information corresponding to that short text. Furthermore, according to the order of each short text in the target long text, the predicted labels corresponding to each short text are combined and deduplicated to obtain the multi-label corresponding to the target long text.

[0123] Optionally, an assumption is made during the training phase that a short text corpus corresponds to only one single label. However, in practical applications, a short text may correspond to two single labels. Based on this, to ensure that the single label corresponding to each short text can comprehensively and accurately represent the meaning of the short text, the co-occurrence probability among multiple optional single labels can be combined to determine the label set corresponding to each short text. The label set includes at least one single label. Specifically, S706 above may include the following steps: based on the classification information corresponding to each short text and the co-occurrence probability among multiple optional single labels, select the label set corresponding to each short text from the multiple optional single labels; further, combine and remove duplicates from the label sets corresponding to each of the multiple short texts to obtain the multiple labels corresponding to the target long text.

[0124] More specifically, the tag set corresponding to each short text can be obtained as follows: Based on the classification information corresponding to the target short text, select the top N optional single tags with the highest probability from multiple optional single tags, where the target short text is any one of the multiple short texts, and N is an integer greater than 2; if the probability difference between the top N optional single tags is less than a preset difference, and the co-occurrence probability of the top N optional single tags is greater than a preset probability threshold, then it can be determined that the top N optional single tags correspond to the target short text, and then the top N optional single tags are used as the single tags corresponding to the target short text and added to the tag set corresponding to the target short text; if the probability difference between the top N optional single tags is greater than or equal to a preset difference, or the co-occurrence probability of the top N optional single tags is less than or equal to a preset probability threshold, then the optional single tag with the highest probability corresponding to the target short text is selected from multiple optional single tags, used as the single tag corresponding to the target short text, and added to the tag set corresponding to the target short text.

[0125] It is worth noting that the co-occurrence probability of two optional labels represents the likelihood that the two optional labels will appear simultaneously. The co-occurrence probability of two co-occurring labels can be calculated using various probabilistic and statistical algorithms commonly used in this field, and will not be elaborated further.

[0126] The long text multi-label classification method provided in one or more embodiments of this application splits the target long text to be classified into multiple short texts, classifies the multiple short texts separately using a pre-trained second short text classification model, and predicts the multi-label corresponding to the target long text using the classification information corresponding to each of the multiple short texts. This is equivalent to transforming the long text multi-label classification task into a short text single-label classification task. Since the short text single-label classification algorithm is very mature and has a high recognition accuracy, it can improve the accuracy of the long text multi-label classification task, and saves the complex data processing in the process of directly performing multi-label classification on long texts. It is simple to implement and conducive to improving classification efficiency.

[0127] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0128] In addition, with the above Figure 1 Corresponding to the training method of the text classification model shown, this application embodiment also provides a training apparatus for a text classification model. Please refer to... Figure 8 The diagram below illustrates the structure of a text classification model training device 800 according to an embodiment of this application. The device 800 may include:

[0129] The acquisition unit 810 is used to acquire long text corpus and the multi-tags corresponding to the long text corpus;

[0130] The first splitting unit 820 is used to split the long text corpus into multiple short text corpora.

[0131] The mapping unit 830 is used to iteratively train the first short text classification model based on the multi-label and the multiple short text corpora, and to determine the mapping relationship between the multiple short text corpora and the multi-label based on the classification information output by the first short text classification model during the iterative training process, so as to obtain the single label corresponding to each short text corpus.

[0132] Training unit 840 is used to train the second short text classification model based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

[0133] Optionally, the mapping unit is specifically used for: training a model based on the long text corpus and the multi-label to obtain a long text classification model; classifying the multiple short text corpora using the long text classification model to obtain candidate labels corresponding to each of the multiple short text corpora; iteratively training the first short text classification model based on the multiple short texts and the candidate labels corresponding to each of the multiple short text corpora, and determining the single label corresponding to each of the multiple short text corpora from the multi-label based on the classification information output by the first short text classification model during the iterative training process.

[0134] Optionally, the mapping unit iteratively trains the first short text classification model based on the plurality of short texts and the candidate labels corresponding to each of the plurality of short text corpora, and determines the single label corresponding to each of the plurality of short text corpora from the multiple labels based on the classification information output by the first short text classification model during the iterative training process, including:

[0135] The aforementioned short text corpora are used as sample corpora for the first round of iteration training;

[0136] The sample text is classified by the first short text classification model to obtain the classification information corresponding to the sample text. The classification information includes the probability that the sample text matches each single label in the multi-label.

[0137] If the first short text classification model does not meet the first preset training stopping condition, then based on the classification information and candidate labels corresponding to the sample corpus, the first short text classification model is optimized, and the sample corpus that meets the first preset screening condition is selected as the sample corpus for the next round of iteration training. The first preset screening condition includes: the selected sample corpus has the highest probability of matching its corresponding candidate label and is greater than the first preset probability threshold.

[0138] Repeat the above steps of classifying the sample corpus using the first short text classification model until the sample corpus that meets the first preset screening condition is selected as the sample corpus for the next round of iteration training, until the first short text classification model meets the first preset training stopping condition.

[0139] The first short text classification model, after the last round of iteration training, classifies each short text corpus to obtain the classification information corresponding to each short text corpus.

[0140] Select a single tag from the multiple tags whose probability of matching each short text corpus is greater than the first preset probability threshold, and use it as the single tag corresponding to each short text corpus.

[0141] Optionally, the first splitting unit is specifically used to: compress the long text corpus based on keywords in the long text corpus to obtain compressed text corpus; and split the compressed text corpus to obtain multiple short text corpora.

[0142] Optionally, the long text corpus consists of multiple sentences;

[0143] The first splitting unit performs text compression on the long text corpus based on keywords in the corpus to obtain compressed text corpus, including: determining the importance score of the keywords based on the frequency of occurrence of the keywords in the long text corpus, wherein the keywords are related to the target business; determining the usefulness tags corresponding to each of the sentences containing the keywords in the long text corpus and the importance scores of the keywords, wherein the usefulness tags are used to indicate whether the corresponding sentences are related to the target business; iteratively training a sentence classification model based on the multiple sentences and their corresponding usefulness tags, and determining the key sentences among the multiple sentences based on the classification information output by the sentence classification model during the iterative training process; and deleting sentences other than the key sentences from the long text corpus to obtain the compressed text corpus.

[0144] Optionally, the first splitting unit determines the usefulness tags corresponding to each of the plurality of sentences based on the sentences containing the keyword in the long text corpus and the importance scores of the keyword, including: if a keyword with an importance score greater than a preset importance threshold appears in the target sentence, then a usefulness tag is set for the target sentence to indicate that the target sentence is related to the target business, wherein the target sentence is any sentence in the long text corpus; or, if the target sentence does not contain the keyword or the importance score of the keyword appearing in the target sentence is less than or equal to the preset importance threshold, then a usefulness tag is set for the target sentence to indicate that the target sentence is not related to the target business.

[0145] Optionally, the sentence classification model includes a logistic regression model and a deep learning model;

[0146] The first splitting unit iteratively trains a sentence classification model based on the plurality of sentences and their respective usefulness labels, and determines key sentences among the plurality of sentences based on the classification information output by the sentence classification model during the iterative training process. This includes: iteratively training a logistic regression model based on the plurality of sentences and their respective usefulness labels, and selecting a plurality of candidate sentences from the plurality of sentences based on the classification information output by the logistic regression model during the iterative training process; iteratively training a deep learning model based on the plurality of candidate sentences and their respective usefulness labels, and selecting key sentences from the plurality of candidate sentences based on the classification information output by the deep learning model during the iterative training process.

[0147] Optionally, the first splitting unit iteratively trains the logistic regression model based on the plurality of sentences and their respective usefulness labels, and selects a plurality of candidate sentences from the plurality of sentences based on the classification information output by the logistic regression model during the iterative training process, including:

[0148] Each of the aforementioned sentences will be used as a sample sentence for the first round of iteration training;

[0149] The sample sentences are classified using the logistic regression model to obtain the classification information corresponding to the sample sentences, which includes the probability that the sample sentence is a useful sentence.

[0150] If the logistic regression model does not meet the second preset training stopping condition, then based on the classification information and usefulness label corresponding to the sample sentence, the logistic regression model is optimized, and the sample sentence that meets the second preset screening condition is selected as the sample sentence for the next round of iteration training. The second preset screening condition includes: the probability that the selected sample sentence is a useful sentence is greater than the second preset probability threshold or less than the third preset probability threshold, wherein the third preset probability threshold is less than the second preset probability threshold.

[0151] Repeat the above steps of classifying sample sentences using the logistic regression model until the sample sentences that meet the second preset screening conditions are selected as the sample sentences for the next round of iterative training, until the logistic regression model meets the second preset training stopping condition;

[0152] The logistic regression model trained in the last iteration is used to classify the multiple sentences respectively, and the probability that each sentence is a useful sentence is obtained.

[0153] Each of the multiple sentences whose corresponding probability is greater than the fourth preset probability threshold is selected as a candidate sentence.

[0154] Optionally, the second short text classification model includes a language representation network and a classification network;

[0155] The training unit is specifically used for: selecting associated short text corpora of the target short text corpus from the long text corpus, wherein the target short text corpus is any one of the plurality of short text corpora; encoding the target short text corpus based on the target short text corpus and the associated short text corpus through the language representation network to obtain the semantic representation vector of the target short text corpus; classifying the target short text corpus based on the semantic representation vector of the target short text corpus through the classification network to obtain the predicted category to which the target short text corpus belongs; and adjusting the model parameters of the second short text classification model based on the predicted categories to which the plurality of short text corpora belong and the single labels corresponding to the plurality of short text corpora.

[0156] Optionally, the training unit encodes the target short text corpus based on the target short text corpus and the associated short text corpus using the language representation network to obtain a semantic representation vector of the target short text corpus. This includes: embedding the target short text corpus and the associated short text corpus using the language representation network to obtain text vectors for the target short text corpus and the associated short text corpus; determining attention weights for the target short text corpus and the associated short text corpus based on an attention mechanism and the distance between the target short text corpus and the associated short text corpus, whereby the attention weights represent the importance of the corresponding short text corpus to the classification process of the second short text classification model; and weighting the text vectors of the target short text corpus and the associated short text corpus based on their respective attention weights to obtain a semantic representation vector of the target short text corpus.

[0157] Obviously, the training device for the text classification model provided in this application embodiment can be used as described above. Figure 1 The main body executing the training method of the text classification model shown is thus able to realize the training method of the text classification model in... Figure 2 The functions implemented are the same, so I will not go into details.

[0158] In addition, with the above Figure 7 Corresponding to the long text multi-label classification method shown, this application also provides a long text multi-label classification device. Please refer to... Figure 9 The diagram below illustrates the structure of a long text multi-label classification device 900 according to an embodiment of this application. The device 900 may include:

[0159] The second splitting unit 910 is used to split the target long text into multiple short texts;

[0160] The classification unit 920 is used to classify the plurality of short texts respectively by the second short text classification model to obtain the classification information corresponding to each of the plurality of short texts. The second short text classification model is trained based on the training method of the text classification model described in the first aspect.

[0161] The label prediction unit 930 is used to predict multiple labels corresponding to the target long text based on the classification information corresponding to each of the multiple short texts.

[0162] Optionally, the classification information corresponding to the short text includes the probability that the short text corresponds to multiple optional single labels;

[0163] The label prediction unit is specifically used to: select a label set corresponding to each short text from the multiple optional single labels based on the classification information corresponding to each short text and the co-occurrence probability among the multiple optional single labels, wherein the label set includes at least one single label; and combine and remove duplicates from the label sets corresponding to the multiple short texts to obtain the multiple labels corresponding to the target long text.

[0164] Optionally, the tag prediction unit selects a tag set corresponding to each short text from the plurality of optional single tags based on the classification information corresponding to each short text and the co-occurrence probability among the plurality of optional single tags. This includes: selecting the top N optional single tags with the highest probability among the target short text based on the classification information corresponding to the target short text, wherein the target short text is any one of the plurality of short texts, and N is an integer greater than 2; if the probability difference among the top N optional single tags is less than a preset difference, and the probability difference among the top N optional single tags is less than a preset difference, then the selection of the top N optional single tags is determined by the following steps. If the co-occurrence probability of a single tag is greater than a preset probability threshold, then the top N optional single tags are used as the single tags corresponding to the target short text and added to the tag set corresponding to the target short text. If the probability difference between the top N optional single tags is greater than or equal to a preset difference, or if the co-occurrence probability of the top N optional single tags is less than or equal to a preset probability threshold, then the optional single tag with the highest probability corresponding to the target short text is selected from the multiple optional single tags and used as the single tag corresponding to the target short text and added to the tag set corresponding to the target short text.

[0165] Obviously, the long text multi-label classification device provided in this application embodiment can be used as the above-mentioned Figure 7 The execution entity shown is the long text multi-label classification method, thus enabling the long text multi-label classification method to achieve... Figure 7The functions implemented are the same, so I will not go into details.

[0166] Figure 10 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Please refer to it. Figure 10 At the hardware level, the electronic device includes a processor, and optionally also includes an internal bus, a network interface, and memory. The memory may include main memory, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk drive. Of course, the electronic device may also include other hardware required for other business operations.

[0167] The processor, network interface, and memory can be interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 10 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.

[0168] Memory is used to store programs. Specifically, programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.

[0169] The processor reads the corresponding computer program from non-volatile memory into memory and runs it, forming a training device for a text classification model at the logical level. The processor executes the program stored in memory and specifically performs the following operations: acquiring a long text corpus and its corresponding multi-label; splitting the long text corpus into multiple short text corpora; iteratively training a first short text classification model based on the multi-label and the multiple short text corpora, and determining the mapping relationship between the multiple short text corpora and the multi-label based on the classification information output by the first short text classification model during iterative training, to obtain a single label corresponding to each short text corpus; and training a second short text classification model based on the multiple short text corpora and their respective single labels.

[0170] Alternatively, the processor reads the corresponding computer program from non-volatile memory into memory and runs it, forming a long text multi-label classification device at the logical level. The processor executes the program stored in memory and specifically performs the following operations: splitting the target long text into multiple short texts; classifying each of the multiple short texts using a second short text classification model to obtain classification information corresponding to each of the multiple short texts, wherein the second short text classification model is trained based on the text classification model training method described in the embodiments of this application; and predicting the multi-label corresponding to the target long text based on the classification information corresponding to each of the multiple short texts.

[0171] The above is as stated in this application. Figure 1 The illustrated embodiment discloses a method for training a text classification model using a device, or as described in this application. Figure 7 The method executed by the long text multi-label classification device disclosed in the illustrated embodiment can be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.

[0172] The electronic device can also perform Figure 1 The method, and the implementation of a training device for a text classification model in Figures 1 to 6 The illustrated embodiment may also perform the functions of the electronic device, or the electronic device may also perform the functions of the embodiment shown. Figure 7 The method, and implement a long text multi-label classification device in Figure 6 The functions of the embodiments shown in this application will not be repeated here.

[0173] Of course, in addition to software implementation, the electronic device of this application does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

[0174] This application also proposes a computer-readable storage medium that stores one or more programs, the programs including instructions that, when executed by a portable electronic device including multiple applications, enable the portable electronic device to perform... Figure 1 The method of the illustrated embodiment is specifically used to perform the following operations: obtaining a long text corpus and the multi-labels corresponding to the long text corpus; splitting the long text corpus to obtain multiple short text corpora; iteratively training a first short text classification model based on the multi-labels and the multiple short text corpora, and determining the mapping relationship between the multiple short text corpora and the multi-labels based on the classification information output by the first short text classification model during the iterative training process, so as to obtain a single label corresponding to each short text corpus; and training a second short text classification model based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

[0175] Alternatively, when executed by a portable electronic device that includes multiple applications, the instruction can enable the portable electronic device to perform... Figure 7 The method of the illustrated embodiment is specifically used to perform the following operations: splitting the target long text into multiple short texts; classifying the multiple short texts respectively using a second short text classification model to obtain the classification information corresponding to each of the multiple short texts, wherein the second short text classification model is trained based on the training method of the text classification model described in the embodiments of this application; and predicting the multi-label corresponding to the target long text based on the classification information corresponding to each of the multiple short texts.

[0176] In summary, the above description is merely a preferred embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

[0177] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0178] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0179] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0180] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

Claims

1. A training method for a text classification model, characterized in that, include: Obtain long text corpus related to the target business and the corresponding multi-tags of the long text corpus; The long text corpus is split into multiple short text corpora; A long text classification model is obtained by training the model based on the long text corpus and the multi-label; The long text classification model is used to classify the multiple short text corpora to obtain the candidate labels corresponding to each of the multiple short text corpora. Based on the multiple short text corpora and the candidate labels corresponding to each of the multiple short text corpora, the first short text classification model is iteratively trained, and based on the classification information output by the first short text classification model during the iterative training process, the single label corresponding to each of the multiple short text corpora is determined from the multiple labels. The second short text classification model is trained based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

2. The method according to claim 1, characterized in that, The step of iteratively training a first short text classification model based on the multiple short text corpora and their respective candidate labels, and determining the single label corresponding to each of the multiple short text corpora from the multiple labels based on the classification information output by the first short text classification model during the iterative training process, includes: The aforementioned short text corpora are used as sample corpora for the first round of iteration training; The sample text is classified by the first short text classification model to obtain the classification information corresponding to the sample text. The classification information includes the probability that the sample text matches each single label in the multi-label. If the first short text classification model does not meet the first preset training stopping condition, then based on the classification information and candidate labels corresponding to the sample corpus, the first short text classification model is optimized, and the sample corpus that meets the first preset screening condition is selected as the sample corpus for the next round of iteration training. The first preset screening condition includes: the selected sample corpus has the highest probability of matching its corresponding candidate label and is greater than the first preset probability threshold. Repeat the above steps of classifying the sample corpus using the first short text classification model until the sample corpus that meets the first preset screening condition is selected as the sample corpus for the next round of iteration training, until the first short text classification model meets the first preset training stopping condition. The first short text classification model, after the last round of iteration training, classifies each short text corpus to obtain the classification information corresponding to each short text corpus. Select a single tag from the multiple tags whose probability of matching each short text corpus is greater than the first preset probability threshold, and use it as the single tag corresponding to each short text corpus.

3. The method according to claim 1, characterized in that, The process of splitting the long text corpus yields multiple short text corpora, including: Based on the keywords in the long text corpus, the long text corpus is compressed to obtain a compressed text corpus; The compressed text corpus is split into multiple short text corpora.

4. The method according to claim 3, characterized in that, The long text corpus consists of multiple sentences; The step of compressing the long text corpus based on keywords in the long text corpus to obtain compressed text corpus includes: Based on the frequency of occurrence of keywords in the corpus within the long text corpus, the importance score of the keywords is determined, and the keywords are related to the target business. Based on the sentences containing the keyword in the long text corpus and the importance score of the keyword, a usefulness tag is determined for each of the multiple sentences. The usefulness tag is used to indicate whether the corresponding sentence is related to the target business. The sentence classification model is iteratively trained based on the multiple sentences and their respective usefulness labels, and the key sentences among the multiple sentences are determined based on the classification information output by the sentence classification model during the iterative training process. The compressed text corpus is obtained by deleting sentences other than the key sentences from the long text corpus.

5. The method according to claim 4, characterized in that, The determination of usefulness tags for each of the multiple sentences based on the sentences containing the keywords in the long text corpus and the importance scores of the keywords includes: If a keyword with an importance score greater than a preset importance threshold appears in the target sentence, a usefulness tag is set for the target sentence to indicate that the target sentence is relevant to the target business. The target sentence can be any sentence in the long text corpus; or... If the target sentence does not contain any keywords or the importance score of the keywords in the target sentence is less than or equal to the preset importance threshold, then a usefulness tag is set for the target sentence to indicate that the target sentence is not relevant to the target business.

6. The method according to claim 4, characterized in that, The sentence classification model includes a logistic regression model and a deep learning model; The process of iteratively training a sentence classification model based on the plurality of sentences and their respective usefulness labels, and determining key sentences among the plurality of sentences based on the classification information output by the sentence classification model during the iterative training process, includes: The logistic regression model is iteratively trained based on the multiple sentences and their respective usefulness labels, and multiple candidate sentences are selected from the multiple sentences based on the classification information output by the logistic regression model during the iterative training process. The deep learning model is iteratively trained based on the multiple candidate sentences and their respective usefulness labels, and key sentences are selected from the multiple candidate sentences based on the classification information output by the deep learning model during the iterative training process.

7. The method according to claim 6, characterized in that, The logistic regression model is iteratively trained based on the plurality of sentences and their respective usefulness labels, and multiple candidate sentences are selected from the plurality of sentences based on the classification information output by the logistic regression model during the iterative training process, including: Each of the aforementioned sentences will be used as a sample sentence for the first round of iteration training; The sample sentences are classified using the logistic regression model to obtain the classification information corresponding to the sample sentences, which includes the probability that the sample sentence is a useful sentence. If the logistic regression model does not meet the second preset training stopping condition, then based on the classification information and usefulness label corresponding to the sample sentence, the logistic regression model is optimized, and the sample sentence that meets the second preset screening condition is selected as the sample sentence for the next round of iteration training. The second preset screening condition includes: the probability that the selected sample sentence is a useful sentence is greater than the second preset probability threshold or less than the third preset probability threshold, wherein the third preset probability threshold is less than the second preset probability threshold. Repeat the above steps of classifying sample sentences using the logistic regression model until the sample sentences that meet the second preset screening condition are selected as the sample sentences for the next round of iteration training, until the logistic regression model meets the second preset training stopping condition; The logistic regression model trained in the last iteration is used to classify the multiple sentences respectively, and the probability that each sentence is a useful sentence is obtained. Each of the multiple sentences whose corresponding probability is greater than the fourth preset probability threshold is selected as a candidate sentence.

8. The method according to claim 1, characterized in that, The second short text classification model includes a language representation network and a classification network; The training of the second short text classification model based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora includes: Select related short text corpora of the target short text corpus from the long text corpus, wherein the target short text corpus is any one of the plurality of short text corpora; The language representation network encodes the target short text corpus based on the target short text corpus and the associated short text corpus to obtain the semantic representation vector of the target short text corpus; The classification network classifies the target short text corpus based on the semantic representation vector of the target short text corpus to obtain the predicted category to which the target short text corpus belongs; Based on the predicted category of each of the multiple short text corpora and the single label corresponding to each of the multiple short text corpora, the model parameters of the second short text classification model are adjusted.

9. The method according to claim 8, characterized in that, The step of encoding the target short text corpus based on the target short text corpus and the associated short text corpus using the language representation network to obtain the semantic representation vector of the target short text corpus includes: The target short text corpus and the associated short text corpus are embedded using the language representation network to obtain the text vectors of the target short text corpus and the associated short text corpus. Based on the attention mechanism, the distance between the target short text corpus and the associated short text corpus, the attention weights of the target short text corpus and the associated short text corpus are determined. The attention weights are used to represent the importance of the corresponding short text corpus to the classification process of the second short text classification model. Based on the attention weights of the target short text corpus and the related short text corpus, the text vectors of the target short text corpus and the related short text corpus are weighted to obtain the semantic representation vector of the target short text corpus.

10. A method for classifying long texts using multiple labels, characterized in that, include: The target long text is split into multiple short texts; The multiple short texts are classified by the second short text classification model to obtain the classification information corresponding to each of the multiple short texts. The second short text classification model is trained based on the training method of the text classification model according to any one of claims 1 to 9. Based on the classification information corresponding to each of the multiple short texts, predict the multiple labels corresponding to the target long text.

11. The method according to claim 10, characterized in that, The classification information corresponding to the short text includes the probability that the short text corresponds to multiple optional single labels; The step of predicting the multiple labels corresponding to the target long text based on the classification information corresponding to each of the multiple short texts includes: Based on the classification information corresponding to each short text and the co-occurrence probability among the multiple optional single tags, a tag set corresponding to each short text is selected from the multiple optional single tags, and the tag set includes at least one single tag; After combining and deduplicating the tag sets corresponding to the multiple short texts, the multiple tags corresponding to the target long text are obtained.

12. The method according to claim 11, characterized in that, The step of selecting a tag set corresponding to each short text from the plurality of optional single tags based on the classification information corresponding to each short text and the co-occurrence probability among the plurality of optional single tags includes: Based on the classification information corresponding to the target short text, select the top N optional single tags with the highest probability corresponding to the target short text from the plurality of optional single tags, wherein the target short text is any one of the plurality of short texts, and N is an integer greater than 2; If the probability difference between the top N optional single tags is less than a preset difference, and the co-occurrence probability of the top N optional single tags is greater than a preset probability threshold, then the top N optional single tags are used as single tags corresponding to the target short text and added to the tag set corresponding to the target short text. If the probability difference between the top N optional single tags is greater than or equal to a preset difference, or if the co-occurrence probability of the top N optional single tags is less than or equal to a preset probability threshold, then the optional single tag with the highest probability corresponding to the target short text is selected from the multiple optional single tags and used as the single tag corresponding to the target short text and added to the tag set corresponding to the target short text.

13. A training device for a text classification model, characterized in that, include: The acquisition unit is used to acquire long text corpus and the multi-tags corresponding to the long text corpus; The first splitting unit is used to split the long text corpus into multiple short text corpora. A mapping unit is used to train a model based on the long text corpus and the multi-label to obtain a long text classification model; classify the multiple short text corpora using the long text classification model to obtain candidate labels corresponding to each of the multiple short text corpora; iteratively train a first short text classification model based on the multiple short text corpora and the candidate labels corresponding to each of the multiple short text corpora; and determine the single label corresponding to each of the multiple short text corpora from the multi-label based on the classification information output by the first short text classification model during the iterative training process. The training unit is used to train the second short text classification model based on the multiple short text corpora and the single labels corresponding to each of the multiple short text corpora.

14. A long text multi-label classification device, characterized in that, include: The second splitting unit is used to split the target long text into multiple short texts; The classification unit is used to classify the plurality of short texts respectively using the second short text classification model to obtain the classification information corresponding to each of the plurality of short texts. The second short text classification model is trained based on the training method of the text classification model according to any one of claims 1 to 9. The label prediction unit is used to predict multiple labels corresponding to the target long text based on the classification information corresponding to each of the multiple short texts.

15. An electronic device, characterized in that, include: processor; Memory used to store the processor's executable instructions; The processor is configured to execute the instructions to implement the method as described in any one of claims 1 to 9; or the processor is configured to execute the instructions to implement the method as described in any one of claims 10 to 12.

16. A computer-readable storage medium, characterized in that, When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device is able to perform the method as described in any one of claims 1 to 9; or, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device is able to perform the method as described in any one of claims 10 to 12.